TLDR:

Uses new libraries for improved content scraping:
- pypdf2 for those urls that are actually pdf
- wikipediaapi for those that are wikipedia
- requests, readability-lxml and html2text for most others
- requests, beautifulsoup and html2text for bbc

In [1]:
import json
import pickle
import PyPDF2
import html2text
import random
import requests
import wikipediaapi
import pandas as pd
from io import BytesIO
from tqdm import tqdm
import idna
from bs4 import BeautifulSoup, Comment
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Optional, List, Dict, Tuple, Union, Any, Callable
from urllib.parse import urlparse, quote
import socket
import concurrent
from readability import Document
# from newspaper import Article

from IPython.display import display, HTML, Markdown

pd.set_option('display.precision', 5)

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
# load all questions
path = "../benchmark/data/autocast/autocast_questions.json"
df = pd.read_json(path)
print(df.shape)

# filter out non-true/false questions
df = df[df["qtype"] == "t/f"].reset_index(drop=True)
print(df.shape)

# make sure answers is not None
df = df[df["answer"].notnull()].reset_index(drop=True)
print(df.shape)

# make sure source_links is not []
df = df[df["source_links"].map(len) > 0].reset_index(drop=True)
print(df.shape)

(6532, 14)
(3225, 14)
(2003, 14)
(1403, 14)


In [3]:
# number of links per question
df['num_links'] = df['source_links'].apply(lambda x: len(x))

In [4]:
all_questions = [q for questions in df["source_links"] for q in questions if q not in ["", None]]
all_questions = list(set(all_questions))
len(all_questions)

47357

In [5]:
# remove questions that ends in image extensions (png, jpg, jpeg, gif, svg)
image_extensions = ["png", "jpg", "jpeg", "gif", "svg"]
all_questions = [q for q in all_questions if not q.endswith(tuple(image_extensions))]
len(all_questions)

47130

### Extractions

In [6]:
HTTP_TIMEOUT = 60

def get_hostname(url: str) -> str:
    """Extracts and validates the hostname from a URL."""
    if not url:
        return False

    try:
        parsed_url = urlparse(url)
        if not parsed_url.scheme:
            return False

        hostname = parsed_url.hostname
        if not hostname:
            return False

        # Check for consecutive periods in the hostname
        if '..' in hostname:
            return False

        # IDNA encoding for internationalized domain names
        hostname = idna.encode(hostname).decode('ascii')

        if len(hostname) > 255:
            return False

        return True

    except Exception as e:
        return False
    

def extract_bbc_text(url: str, num_words: Optional[int] = None) -> str:
    """Extract text from a single HTML document using html2text, removing scripts, nav, header, and footer."""
    try:
        response = requests.get(url, timeout=HTTP_TIMEOUT)
        response.raise_for_status()
        html = response.text
        soup = BeautifulSoup(html, "html.parser")

        # Remove unwanted elements
        for element in soup([
            "script", "style", "nav", 
            "header", "footer", "form", 
            "iframe", ".navbar", ".menu", 
            ".breadcrumb", ".pagination", ".nav",
            ".ad", ".sidebar", ".popup", ".modal",
            ".social-icons", ".hamburger-menu",
        ]):
            element.decompose()

        # Remove comments
        for comment in soup.find_all(text=lambda text: isinstance(text, Comment)):
            comment.extract()

        # Convert to text using html2text
        h = html2text.HTML2Text()
        h.ignore_links = True
        h.ignore_images = True
        h.ignore_emphasis = True
        text = h.handle(str(soup))

        # Trim to a specific number of words if required
        if num_words:
            words = text.split()[:num_words]
            words = ' '.join(words)

            return {"url": url, "error": False, "error_message": None, "text": words}

        return {"url": url, "error": False, "error_message": None, "text": text}
    
    except Exception as e:
        print(f"Error extracting text from {url}: {e}")
        return {"url": url, "error": True, "error_message": str(e), "text": None}


def extract_default_text(url: str, num_words: Optional[int] = None) -> str:
    """Extract text from a single HTML document using html2text."""
    try:
        response = requests.get(url, timeout=HTTP_TIMEOUT)
        response.raise_for_status()
        doc = Document(response.text)
        readable_article = doc.summary()
        
        # use html2text to convert HTML to markdown
        h = html2text.HTML2Text()
        h.ignore_links = True
        h.ignore_images = True
        h.ignore_emphasis = True
        text = h.handle(readable_article)

        # Trim to a specific number of words if required
        if num_words:
            words = text.split()[:num_words]
            return ' '.join(words)
        
        return {"url": url, "error": False, "error_message": None, "text": text}

    except Exception as e:
        print(f"Error extracting text from {url}: {e}")
        return {"url": url, "error": True, "error_message": str(e), "text": None}


def extract_from_wikipedia(url: str, num_words: Optional[int] = None) -> Dict[str, Any]:
    """Extract full text from a Wikipedia page using the wikipedia-api wrapper."""

    wiki = wikipediaapi.Wikipedia(user_agent='MyWikiExtractor/1.0 (example@gmail.com)')

    try:
        # Extract the page title from the URL
        title = url.split("/")[-1]

        page = wiki.page(title)
        if page.exists():
            # Use 'text' property to get full content of the page
            text = page.text

            # Optionally, trim the text
            if num_words:
                words = text.split()[:num_words]
                text = ' '.join(words)

            return {"url": url, "error": False, "error_message": None, "text": text}
        else:
            return {"url": url, "error": True, "error_message": "Wikipedia page does not exist", "text": None}
    except Exception as e:
        return {"url": url, "error": True, "error_message": str(e), "text": None}


def extract_text_from_pdf(url: str, num_words: Optional[int] = None) -> Dict[str, Any]:
    """Extract text from a PDF document at the given URL."""
    try:
        response = requests.get(url, timeout=HTTP_TIMEOUT)
        response.raise_for_status()
        
        if 'application/pdf' not in response.headers.get('Content-Type', ''):
            return ValueError("URL does not point to a PDF document")
        
        with BytesIO(response.content) as pdf_file:
            reader = PyPDF2.PdfReader(pdf_file)
            text = ""
            for page in reader.pages:
                text += page.extract_text()
            # Optionally, trim the text to the specified number of words
            if num_words:
                words = text.split()[:num_words]
                text = ' '.join(words)

            return {"url": url, "error": False, "error_message": None, "text": text}
    
    except Exception as e:
        print(f"Error extracting text from {url}: {e}")
        return {"url": url, "error": True, "error_message": str(e), "text": None}


def get_text(url: str, num_words: Optional[int] = None) -> Dict[str, Any]:
    """Get the content of a URL and extract text. Handles both HTML and PDF."""
    hostname = get_hostname(url)
    if not hostname:
        return {"url": url, "error": True, "error_message": "Invalid hostname", "text": None}

    filter_words = [
        "facebook", "twitter", "youtube",
        "instagram", "pinterest", "linkedin", "bloomberg",
    ]
    if any(word in url.lower() for word in filter_words):
        print(f"URL filtered: {url}")
        return {"url": url, "error": True, "error_message": "URL filtered", "text": None}
    
    # Extract text from pdf
    if url.lower().endswith(".pdf"):
        return extract_text_from_pdf(url, num_words)
    
    # Extract text from wikipedia
    if "wikipedia" in url.lower():
        return extract_from_wikipedia(url, num_words)
    
    # if bbc in url, use custom function to extract text
    if "bbc" in url.lower():
        return extract_bbc_text(url, num_words)
    
    # Extract text from other websites
    return extract_default_text(url, num_words)

In [7]:
def process_urls_chunk(urls_chunk):
    """Process a chunk of URLs and return their processed data."""
    processed_data = []
    for url in tqdm(urls_chunk, desc="Processing URLs", total=len(urls_chunk)):
        processed_result = get_text(url)
        processed_data.append(processed_result)
    return processed_data

def save_progress(data, filename):
    """Save the processed data to a file."""
    with open(filename, 'ab') as file:
        pickle.dump(data, file)

def process_urls_concurrently(urls, chunk_size=1000, max_workers=10):
    for i in range(0, len(urls), chunk_size):
        urls_chunk = urls[i:i + chunk_size]
        with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
            future = executor.submit(process_urls_chunk, urls_chunk)
            processed_data = future.result()
            save_progress(processed_data, f'processed_urls_{i // chunk_size}.pickle')


In [60]:
process_urls_concurrently(all_questions, chunk_size=1000, max_workers=10)

Processing URLs:   0%|          | 3/1000 [00:00<04:38,  3.58it/s]

Error extracting text from http://researchbriefings.files.parliament.uk/documents/CBP-7886/CBP-7886.pdf: 403 Client Error: Forbidden for url: http://researchbriefings.files.parliament.uk/documents/CBP-7886/CBP-7886.pdf


Processing URLs:   0%|          | 4/1000 [00:01<07:55,  2.09it/s]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-islamicstate-comment-2bb6f8fc-b306-11e5-8abc-d09392edc612-20160104-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-islamicstate-comment-2bb6f8fc-b306-11e5-8abc-d09392edc612-20160104-story.html


  for comment in soup.find_all(text=lambda text: isinstance(text, Comment)):
Processing URLs:   1%|          | 8/1000 [00:06<16:49,  1.02s/it]

Error extracting text from https://www.rferl.org/a/trump-says-us-protective-baltic-region-declines-call-russia-threat/28702319.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/trump-says-us-protective-baltic-region-declines-call-russia-threat/28702319.html


Processing URLs:   1%|          | 9/1000 [00:15<1:00:57,  3.69s/it]

URL filtered: https://twitter.com/burgessev/status/1468310354015010822


Processing URLs:   1%|          | 11/1000 [00:16<35:50,  2.17s/it] 

Error extracting text from http://www.opec.org/opec_web/static_files_project/media/downloads/publications/MOMR%20May%202017.pdf: 404 Client Error: Not Found for url: https://www.opec.org/opec_web/static_files_project/media/downloads/publications/MOMR%20May%202017.pdf


Processing URLs:   1%|▏         | 13/1000 [00:18<25:03,  1.52s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-15/tesla-nears-record-high-as-musk-stays-mum-on-solarcity-s-impact


Processing URLs:   2%|▏         | 20/1000 [00:29<25:53,  1.59s/it]

URL filtered: https://twitter.com/wpjenna/status/684171208292712449/photo/1


Processing URLs:   2%|▏         | 22/1000 [01:30<3:52:55, 14.29s/it]

Error extracting text from http://www.charlotteobserver.com/news/politics-government/election/article101085787.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   3%|▎         | 26/1000 [01:32<1:14:42,  4.60s/it]

Error extracting text from http://www.nytimes.com/2016/05/13/world/europe/russia-nato-us-romania-missile-defense.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/13/world/europe/russia-nato-us-romania-missile-defense.html?_r=0


Processing URLs:   3%|▎         | 29/1000 [01:35<35:07,  2.17s/it]  

Error extracting text from http://www.nytimes.com/2014/07/22/nyregion/a-homeowner-who-refused-to-cash-out-in-a-gambling-town-may-have-missed-her-chance.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2014/07/22/nyregion/a-homeowner-who-refused-to-cash-out-in-a-gambling-town-may-have-missed-her-chance.html


Processing URLs:   3%|▎         | 33/1000 [01:37<13:25,  1.20it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/venezuela-restructuring-heightens-rosneft-s-6-billion-risk
Error extracting text from http://www.washingtontimes.com/news/2017/may/1/recep-tayyip-erdogan-turkey-hold-referendum-eu-mem/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/may/1/recep-tayyip-erdogan-turkey-hold-referendum-eu-mem/


Processing URLs:   3%|▎         | 34/1000 [01:37<11:28,  1.40it/s]

Error extracting text from http://www.wsj.com/articles/south-korea-warms-to-u-s-missile-shield-1453970662: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korea-warms-to-u-s-missile-shield-1453970662


Processing URLs:   4%|▎         | 36/1000 [01:39<11:12,  1.43it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/trumps-nominee-for-state-clears-procedural-hurdle-in-senate/articleshow/56882784.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/trumps-nominee-for-state-clears-procedural-hurdle-in-senate/articleshow/56882784.cms


Processing URLs:   4%|▍         | 42/1000 [01:45<13:24,  1.19it/s]

Error extracting text from http://www.nytimes.com/2016/01/04/world/middleeast/iran-saudi-arabia-execution-sheikh-nimr.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/04/world/middleeast/iran-saudi-arabia-execution-sheikh-nimr.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0
Error extracting text from http://blogs.wsj.com/washwire/2016/11/04/erdogans-latest-crackdown-and-how-turkeys-chaos-plays-to-extremists-strengths/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2016/11/04/erdogans-latest-crackdown-and-how-turkeys-chaos-plays-to-extremists-strengths/


Processing URLs:   4%|▍         | 44/1000 [01:48<15:25,  1.03it/s]

Error extracting text from http://www.38north.org/2017/11/sinpo111617/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:   5%|▌         | 51/1000 [02:43<1:10:24,  4.45s/it]

URL filtered: https://twitter.com/PhysicsWorld/status/699252183791964160
Error extracting text from https://www.reuters.com/business/cop/early-signs-are-good-cop26-rulebook-negotiations-eu-says-2021-11-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/cop/early-signs-are-good-cop26-rulebook-negotiations-eu-says-2021-11-04/


Processing URLs:   5%|▌         | 53/1000 [02:43<40:24,  2.56s/it]  

Error extracting text from http://www.reuters.com/article/2015/09/10/us-saudi-oil-output-idUSKCN0R91GN20150910: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/10/us-saudi-oil-output-idUSKCN0R91GN20150910
Error extracting text from http://www.nytimes.com/2014/10/16/opinion/a-deadly-legacy-in-iraq.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2014/10/16/opinion/a-deadly-legacy-in-iraq.html


Processing URLs:   6%|▌         | 55/1000 [02:47<37:21,  2.37s/it]

Error extracting text from http://www.ibtimes.com/thailand-china-submarine-deal-suspended-following-concern-over-jeopardizing-ties-2012291: 403 Client Error: Forbidden for url: https://www.ibtimes.com/thailand-china-submarine-deal-suspended-following-concern-over-jeopardizing-ties-2012291


Processing URLs:   6%|▌         | 56/1000 [02:48<30:46,  1.96s/it]

Error extracting text from http://www.pravdareport.com/russia/politics/17-02-2015/129844-russia_returning_latin_america-0/#sthash.8o4UkHEo.dpuf: 404 Client Error: Not Found for url: https://www.pravda.ru/russia/politics/17-02-2015/129844-russia_returning_latin_america-0/#sthash.8o4UkHEo.dpuf


Processing URLs:   6%|▌         | 58/1000 [02:50<21:45,  1.39s/it]

Error extracting text from http://www.arabnews.com/news/nawaz-sharif-proposes-panama-papers-probe-parliament-opposition-walks-out: 403 Client Error: Forbidden for url: https://www.arabnews.com/news/nawaz-sharif-proposes-panama-papers-probe-parliament-opposition-walks-out


Processing URLs:   6%|▌         | 60/1000 [02:53<22:14,  1.42s/it]

Error extracting text from http://www.conservativehome.com/thecolumnists/2015/10/henry-hill-snp-face-investigation-into-referendum-spending.html?utm_medium=email&amp;utm_campaign=Wednesday+14th+October+2015&amp;utm_content=Wednesday+14th+October+2015+CID_58567e2fae60fbc8e42bf004ce288118&amp;utm_source=Daily%20Email&amp;utm_term=SNP%20face%20investigation%20into%20referendum%20spending: 403 Client Error: Forbidden for url: http://conservativehome.com/thecolumnists/2015/10/henry-hill-snp-face-investigation-into-referendum-spending.html?utm_medium=email&amp;utm_campaign=Wednesday+14th+October+2015&amp;utm_content=Wednesday+14th+October+2015+CID_58567e2fae60fbc8e42bf004ce288118&amp;utm_source=Daily%20Email&amp;utm_term=SNP%20face%20investigation%20into%20referendum%20spending


Processing URLs:   6%|▋         | 64/1000 [03:00<26:18,  1.69s/it]

Error extracting text from http://uk.reuters.com/article/us-saudi-aramco-breakingviews-idUKKBN14H1BM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   7%|▋         | 66/1000 [03:02<18:38,  1.20s/it]

Error extracting text from http://the-japan-news.com/news/article/0002836896: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002836896
Error extracting text from https://www.researchgate.net/publication/339013698_Determining_famine_Multi-dimensional_analysis_for_the_twenty-first_century: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/339013698_Determining_famine_Multi-dimensional_analysis_for_the_twenty-first_century


Processing URLs:   7%|▋         | 69/1000 [03:08<21:50,  1.41s/it]

Error extracting text from http://cnnphilippines.com/world/2016/03/18/North-Korea-launches-ballistic-missile.html: 503 Server Error: Unavailable for url: http://cnnphilippines.com/world/2016/03/18/North-Korea-launches-ballistic-missile.html


Processing URLs:   7%|▋         | 70/1000 [03:09<20:29,  1.32s/it]

Error extracting text from https://futurism.com/nasa-were-not-putting-science-instruments-on-spacexs-first-dragon-capsule/: 404 Client Error: Not Found for url: https://futurism.com/nasa-were-not-putting-science-instruments-on-spacexs-first-dragon-capsule


Processing URLs:   7%|▋         | 73/1000 [03:17<30:17,  1.96s/it]

Error extracting text from https://www.researchgate.net/publication/271898807_Terrorism_and_Voting_The_Effect_of_Rocket_Threat_on_Voting_in_Israeli_Elections: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/271898807_Terrorism_and_Voting_The_Effect_of_Rocket_Threat_on_Voting_in_Israeli_Elections


Processing URLs:   7%|▋         | 74/1000 [03:19<30:14,  1.96s/it]

Error extracting text from http://en.trend.az/world/turkey/2561165.html: 404 Client Error: Not Found for url: https://www.trend.az/world/turkey/2561165.html


Processing URLs:   8%|▊         | 75/1000 [03:19<22:11,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-negotiations-idUSKBN1A42AD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-negotiations-idUSKBN1A42AD


Processing URLs:   8%|▊         | 76/1000 [03:20<17:03,  1.11s/it]

Error extracting text from https://www.nytimes.com/2017/03/17/business/dealbook/teslas-1-billion-cash-raise-may-fall-short.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/17/business/dealbook/teslas-1-billion-cash-raise-may-fall-short.html


Processing URLs:   8%|▊         | 77/1000 [03:21<19:16,  1.25s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096887/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096887/


Processing URLs:   8%|▊         | 79/1000 [03:23<14:52,  1.03it/s]

Error extracting text from https://www.nytimes.com/2017/02/11/world/asia/north-korea-missile-test-trump.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/11/world/asia/north-korea-missile-test-trump.html?_r=0


Processing URLs:   8%|▊         | 80/1000 [03:35<1:06:15,  4.32s/it]

Error extracting text from http://oracleherald.com/2015/11/27/will-crude-oil-prices-react-to-the-december-opec-meeting.html: HTTPConnectionPool(host='oracleherald.com', port=80): Max retries exceeded with url: /2015/11/27/will-crude-oil-prices-react-to-the-december-opec-meeting.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff1cdd00>: Failed to resolve 'oracleherald.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   8%|▊         | 81/1000 [03:35<48:51,  3.19s/it]  

Error extracting text from https://www.predictit.org/Market/1327/Who-will-win-the-2016-Iowa-Republican-caucuses: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1327/Who-will-win-the-2016-Iowa-Republican-caucuses


Processing URLs:   8%|▊         | 85/1000 [03:38<19:34,  1.28s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/foreign-policy/259312-syrian-refugee-zone-think-beyond-military-issues: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/foreign-policy/259312-syrian-refugee-zone-think-beyond-military-issues/
Error extracting text from https://www.reuters.com/world/americas/bolsonaro-attacks-brazil-judges-warns-institutional-rupture-2021-08-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/bolsonaro-attacks-brazil-judges-warns-institutional-rupture-2021-08-14/


Processing URLs:   9%|▊         | 87/1000 [03:40<16:15,  1.07s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&amp;s=WCRFPUS2&amp;f=W: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:   9%|▉         | 89/1000 [03:44<21:21,  1.41s/it]

Error extracting text from https://apple.news/At9Y6QsjDMrmcGH45V1sHtw: 404 Client Error: Not Found for url: https://apple.news/At9Y6QsjDMrmcGH45V1sHtw


Processing URLs:   9%|▉         | 92/1000 [03:47<15:54,  1.05s/it]

Error extracting text from http://news.yahoo.com/south-china-sea-tensions-surge-china-lands-plane-021700533: 404 Client Error: Not Found for url: http://news.yahoo.com/south-china-sea-tensions-surge-china-lands-plane-021700533
Error extracting text from https://www.imf.org/en/Publications/WEO/Issues/2017/07/07/world-economic-outlook-update-july-2017: 403 Client Error: Forbidden for url: https://www.imf.org/en/Publications/WEO/Issues/2017/07/07/world-economic-outlook-update-july-2017


Processing URLs:  10%|▉         | 95/1000 [03:49<12:34,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0YI2IQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0YI2IQ


Processing URLs:  10%|▉         | 96/1000 [03:50<09:51,  1.53it/s]

Error extracting text from http://21stcenturywire.com/2016/04/10/burundi-geopolitical-jewel-in-the-cross-hairs-of-regime-change-and-hybrid-war-pundits/&quot: 403 Client Error: Forbidden for url: http://21stcenturywire.com/2016/04/10/burundi-geopolitical-jewel-in-the-cross-hairs-of-regime-change-and-hybrid-war-pundits/&quot


Processing URLs:  10%|▉         | 97/1000 [03:50<09:50,  1.53it/s]

Error extracting text from http://www.tradingeconomics.com/germany/car-registrations/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/germany/car-registrations/forecast
URL filtered: https://www.youtube.com/watch?v=smhi6jts97I


Processing URLs:  10%|█         | 102/1000 [03:55<12:36,  1.19it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-paracels-idUSKCN0VT0YA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-paracels-idUSKCN0VT0YA


Processing URLs:  10%|█         | 103/1000 [03:55<11:37,  1.29it/s]

Error extracting text from http://thehill.com/policy/energy-environment/253221-house-panel-votes-to-lift-oil-export-ban: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/253221-house-panel-votes-to-lift-oil-export-ban/


Processing URLs:  11%|█         | 106/1000 [04:02<20:07,  1.35s/it]

Error extracting text from https://www.google.com/amp/s/cointelegraph.com/news/many-pieces-of-the-diem-puzzle-still-missing-as-launch-gets-delayed/amp: 403 Client Error: Forbidden for url: https://cointelegraph.com/news/many-pieces-of-the-diem-puzzle-still-missing-as-launch-gets-delayed/amp
Error extracting text from http://ir.tesla.com/releasedetail.cfm?releaseid=991720: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?releaseid=991720


Processing URLs:  11%|█         | 107/1000 [04:03<15:29,  1.04s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://bahiaempauta.com.br/%3Fp%3D129459&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://bahiaempauta.com.br/%3Fp%3D129459&amp;prev=search


Processing URLs:  11%|█         | 109/1000 [04:05<17:12,  1.16s/it]

Error extracting text from https://www.imdb.com/title/tt4773522/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt4773522/


Processing URLs:  11%|█         | 111/1000 [04:07<14:52,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-oil-demand-iea-exclusive-idUSKBN18809U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-demand-iea-exclusive-idUSKBN18809U


Processing URLs:  11%|█▏        | 113/1000 [04:09<12:28,  1.18it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-parties-agree-on-key-migrant-issue-in-coalition-talks-idUSKBN1FJ18F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-parties-agree-on-key-migrant-issue-in-coalition-talks-idUSKBN1FJ18F


Processing URLs:  11%|█▏        | 114/1000 [04:39<2:19:34,  9.45s/it]

Error extracting text from https://www.washingtonpost.com/news/checkpoint/wp/2016/01/15/wanted-foreign-special-operations-troops-to-join-the-u-s-in-targe: 404 Client Error: Not Found for url: https://www.washingtonpost.com/news/checkpoint/wp/2016/01/15/wanted-foreign-special-operations-troops-to-join-the-u-s-in-targe/


Processing URLs:  12%|█▏        | 117/1000 [04:43<59:40,  4.05s/it]  

Error extracting text from https://www.outlookindia.com/website/story/how-do-isro-keep-moon-mission-cost-cheaper-than-hollywood-hit-interstellar-scien/308595: 403 Client Error: Forbidden for url: https://www.outlookindia.com/website/story/how-do-isro-keep-moon-mission-cost-cheaper-than-hollywood-hit-interstellar-scien/308595


Processing URLs:  12%|█▏        | 118/1000 [04:44<47:42,  3.25s/it]

Error extracting text from http://thinkprogress.org/justice/2015/12/11/3731121/the-brutal-delegate-math-that-could-deny-donald-trump-the-nomination-at-a-brokered-convention/: 403 Client Error: Forbidden for url: https://thinkprogress.org/justice/2015/12/11/3731121/the-brutal-delegate-math-that-could-deny-donald-trump-the-nomination-at-a-brokered-convention/


Processing URLs:  12%|█▏        | 120/1000 [04:48<34:39,  2.36s/it]

Error extracting text from http://www.iol.co.za/news/politics/anc-in-zumas-clutches-2028852: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/anc-in-zumas-clutches-2028852


Processing URLs:  12%|█▎        | 125/1000 [04:52<13:12,  1.10it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-spd-membership-to-vote-on-any-deal-propping-up-merkel-idUSKBN1DO0NS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-spd-membership-to-vote-on-any-deal-propping-up-merkel-idUSKBN1DO0NS


Processing URLs:  13%|█▎        | 127/1000 [05:11<1:27:31,  6.02s/it]

Error extracting text from http://ew.com/tv/2017/08/15/cnn-trump-ad-criticizing-journalists/: 406 Client Error: Not Acceptable for url: https://ew.com/tv/2017/08/15/cnn-trump-ad-criticizing-journalists/
URL filtered: http://www.bloomberg.com/news/articles/2016-04-04/saudis-want-to-double-their-stock-market-for-a-post-oil-economy


Processing URLs:  13%|█▎        | 130/1000 [05:14<47:22,  3.27s/it]  

URL filtered: https://twitter.com/realdonaldtrump/status/828574430800539648


Processing URLs:  13%|█▎        | 133/1000 [05:18<29:51,  2.07s/it]

Error extracting text from http://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1414687415278&amp;uri=CELEX:02006R0562-20131126: 404 Client Error: Not Found for url: https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1414687415278&amp;uri=CELEX:02006R0562-20131126


Processing URLs:  14%|█▎        | 137/1000 [05:24<23:13,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-09/state-prosecutor-charges-brazil-s-lula-with-hiding-assets


Processing URLs:  14%|█▍        | 141/1000 [05:24<09:21,  1.53it/s]

Error extracting text from http://www.wsj.com/articles/inside-the-battle-for-mosul-1481153476: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/inside-the-battle-for-mosul-1481153476
URL filtered: https://www.youtube.com/watch?v=thBQhCoJEFQ
Error extracting text from http://www.reuters.com/article/us-opec-meeting-nigeria-idUSKBN13H0SD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting-nigeria-idUSKBN13H0SD


Processing URLs:  14%|█▍        | 143/1000 [05:26<10:49,  1.32it/s]

Error extracting text from https://www.reuters.com/article/us-taiwan-china-security/taiwan-reports-large-incursion-by-chinese-air-force-idUSKBN29S0BK?feedType=mktg&amp;feedName=topNews&amp;WT.mc_id=Partner-Google: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-taiwan-china-security/taiwan-reports-large-incursion-by-chinese-air-force-idUSKBN29S0BK?feedType=mktg&amp;feedName=topNews&amp;WT.mc_id=Partner-Google


Processing URLs:  15%|█▍        | 147/1000 [05:29<09:33,  1.49it/s]

Error extracting text from http://www.swindonadvertiser.co.uk/news/national/14255373.Cameron_dismisses_claims_EU_reforms_package_could_be_reversed/: HTTPSConnectionPool(host='nationalfeeds.newsquest.co.uk', port=443): Max retries exceeded with url: /news/national/14255373.david-cameron-faces-grassroots-backlash-over-eu-referendum/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe55f3e0>: Failed to resolve 'nationalfeeds.newsquest.co.uk' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.hindustantimes.com/world-news/pompeo-amps-up-pitch-says-will-use-all-tools-to-support-countries-over-south-china-sea/story-I3nUnUD7Oks1dcikQ6zjPI.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/pompeo-amps-up-pitch-says-will-use-all-tools-to-support-countries-over-south-china-sea/story-I3nUnUD7Oks1dcikQ6zjPI.html


Processing URLs:  15%|█▍        | 149/1000 [05:29<05:59,  2.37it/s]

Error extracting text from https://www.predictit.org/Home/SingleOption?marketId=1273#openoffers1: 403 Client Error: Forbidden for url: https://www.predictit.org/Home/SingleOption?marketId=1273#openoffers1
Error extracting text from http://www.reuters.com/article/2015/11/10/volkswagen-emissions-investigation-idUSL8N1355WV20151110#Hfuuu0b9kC7z6vw7.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/volkswagen-emissions-investigation-idUSL8N1355WV20151110#Hfuuu0b9kC7z6vw7.97


Processing URLs:  15%|█▌        | 150/1000 [05:31<09:42,  1.46it/s]

Error extracting text from http://blogs.wsj.com/moneybeat/2016/06/06/sterling-slumps-as-support-for-brexit-grows-in-referendum-polls/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2016/06/06/sterling-slumps-as-support-for-brexit-grows-in-referendum-polls/


Processing URLs:  15%|█▌        | 153/1000 [05:38<23:13,  1.65s/it]

Error extracting text from http://www.hybridcars.com/nevada-grants-first-ever-autonomous-vehicle-drivers-license/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/nevada-grants-first-ever-autonomous-vehicle-drivers-license/


Processing URLs:  15%|█▌        | 154/1000 [05:39<22:27,  1.59s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-08/hsbc-cuts-treasury-yield-forecasts-on-shallow-fed-rate-path-bets


Processing URLs:  16%|█▌        | 159/1000 [06:40<2:08:21,  9.16s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2015/10/07/biden-backing-asia-trade-pact-despite-union-opposition: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://twitter.com/carlbildt/status/1426111994466377730
Error extracting text from http://news.siteintelgroup.com/blog/index.php/categories/jihad/entry/410-the-rising-tide-of-terror-in-bangladesh: 403 Client Error: Forbidden for url: http://news.siteintelgroup.com/blog/index.php/categories/jihad/entry/410-the-rising-tide-of-terror-in-bangladesh


Processing URLs:  16%|█▌        | 160/1000 [06:40<1:39:38,  7.12s/it]

Error extracting text from http://www.reuters.com/article/2015/10/16/us-brazil-rousseff-cunha-idUSKCN0SA2C920151016: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/16/us-brazil-rousseff-cunha-idUSKCN0SA2C920151016


Processing URLs:  16%|█▌        | 162/1000 [06:42<1:00:23,  4.32s/it]

Error extracting text from http://www.reuters.com/article/us-nuclear-security-usa-russia-idUSKBN0IP24K20141105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nuclear-security-usa-russia-idUSKBN0IP24K20141105


Processing URLs:  16%|█▋        | 163/1000 [06:42<44:47,  3.21s/it]  

Error extracting text from http://www.reuters.com/article/us-turkey-eu-immigration-idUSKCN10J0XC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-immigration-idUSKCN10J0XC?il=0


Processing URLs:  17%|█▋        | 167/1000 [06:43<14:37,  1.05s/it]

Error extracting text from https://www.nytimes.com/2017/02/01/world/europe/theresa-may-expected-to-win-parliaments-ok-on-brexit-bill.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/01/world/europe/theresa-may-expected-to-win-parliaments-ok-on-brexit-bill.html
Error extracting text from http://www.nbcnews.com/politics/supreme-court/merrick-garland-supreme-court-nomination-six-10-americans-approve-n541681: 403 Client Error: Forbidden for url: http://www.nbcnews.com/politics/supreme-court/merrick-garland-supreme-court-nomination-six-10-americans-approve-n541681
Error extracting text from http://ir.teslamotors.com/releasedetail.cfm?ReleaseID=904880: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /releasedetail.cfm?ReleaseID=904880 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe74cc80>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error 

Processing URLs:  17%|█▋        | 169/1000 [06:51<28:04,  2.03s/it]

Error extracting text from https://www.reuters.com/business/finance/eu-excludes-seven-russian-banks-swift-official-journal-2022-03-02/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/finance/eu-excludes-seven-russian-banks-swift-official-journal-2022-03-02/


Processing URLs:  17%|█▋        | 171/1000 [06:51<19:31,  1.41s/it]

Error extracting text from http://www.preposterousuniverse.com/blog/2008/10/29/dark-photons/: 406 Client Error: Not Acceptable for url: http://www.preposterousuniverse.com/blog/2008/10/29/dark-photons/


Processing URLs:  18%|█▊        | 175/1000 [07:00<23:09,  1.68s/it]

Error extracting text from https://www.reuters.com/article/idUSKBN28B57D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN28B57D


Processing URLs:  18%|█▊        | 182/1000 [07:28<1:08:18,  5.01s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-03/greek-banks-may-need-only-1-3-billion-of-new-private-capital


Processing URLs:  19%|█▊        | 186/1000 [07:37<44:18,  3.27s/it]  

Error extracting text from https://www.tesla.com/support/autopilot: 403 Client Error: Forbidden for url: https://www.tesla.com/support/autopilot


Processing URLs:  19%|█▉        | 191/1000 [07:52<35:44,  2.65s/it]

Error extracting text from http://www.mti.gov.eg/English/MediaCenter/News/Pages/Kabil-discusses-mutual-cooperation-with-Greek-economy-minister.aspx: 403 Client Error: Forbidden for url: http://www.mti.gov.eg/English/MediaCenter/News/Pages/Kabil-discusses-mutual-cooperation-with-Greek-economy-minister.aspx


Processing URLs:  20%|█▉        | 195/1000 [08:13<1:27:32,  6.52s/it]

Error extracting text from http://psychology.about.com/od/cognitivepsychology/a/left-brain-right-brain.htm: 406 Client Error: Not Acceptable for url: https://www.verywellmind.com/left-brain-vs-right-brain-2795005


Processing URLs:  20%|█▉        | 197/1000 [08:16<54:11,  4.05s/it]  

Error extracting text from http://news.xinhuanet.com/english/2016-04/27/c_135314536.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/27/c_135314536.htm


Processing URLs:  20%|█▉        | 198/1000 [08:19<47:35,  3.56s/it]

Error extracting text from http://www.agathocledesyracuse.com/archives/691: HTTPConnectionPool(host='www.agathocledesyracuse.com', port=80): Max retries exceeded with url: /archives/691 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe656b40>: Failed to resolve 'www.agathocledesyracuse.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  20%|██        | 200/1000 [08:22<33:34,  2.52s/it]

Error extracting text from https://www.afghanistan-analysts.org/afghanistan-election-conundrum-1-political-pressure-on-commissioners-puts-2018-vote-in-doubt/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/afghanistan-election-conundrum-1-political-pressure-on-commissioners-puts-2018-vote-in-doubt/


Processing URLs:  20%|██        | 201/1000 [08:22<25:49,  1.94s/it]

Error extracting text from http://thehill.com/homenews/house/261581-ryan-eyes-strategy-for-shutdown-fight: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/261581-ryan-eyes-strategy-for-shutdown-fight/


Processing URLs:  20%|██        | 203/1000 [08:25<22:14,  1.67s/it]

Error extracting text from http://www.state.gov/t/avc/c42328.htm: 404 Client Error: Not Found for url: https://www.state.gov/t/avc/c42328.htm
Error extracting text from http://ir.teslamotors.com/releasedetail.cfm?releaseid=920434: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /releasedetail.cfm?releaseid=920434 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe449a30>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  20%|██        | 205/1000 [08:27<16:32,  1.25s/it]

Error extracting text from https://www.whitehouse.gov/the-press-office/2017/01/24/presidential-memorandum-regarding-construction-dakota-access-pipeline: 404 Client Error: Not Found for url: https://www.whitehouse.gov/the-press-office/2017/01/24/presidential-memorandum-regarding-construction-dakota-access-pipeline


Processing URLs:  21%|██        | 206/1000 [08:27<14:31,  1.10s/it]

Error extracting text from http://www.businessinsider.com.au/pollsters-clash-over-brexit-predictions-2016-5?r=UK&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/pollsters-clash-over-brexit-predictions-2016-5?r=UK&amp;IR=T


Processing URLs:  21%|██        | 207/1000 [08:28<12:16,  1.08it/s]

Error extracting text from http://www.wsj.com/articles/nigerian-army-killed-347-shiite-muslims-during-clash-investigation-finds-1460493168: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nigerian-army-killed-347-shiite-muslims-during-clash-investigation-finds-1460493168


Processing URLs:  21%|██        | 211/1000 [08:41<31:53,  2.42s/it]

Error extracting text from http://nationalinterest.org/feature/tangled-web-corruption-strangling-moldova-17518: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/tangled-web-corruption-strangling-moldova-17518


Processing URLs:  21%|██▏       | 214/1000 [08:50<38:25,  2.93s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/russia-japanese-leader-abe-visit-meet-putin-38187549: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/russia-japanese-leader-abe-visit-meet-putin-38187549


Processing URLs:  22%|██▏       | 215/1000 [08:52<32:29,  2.48s/it]

Error extracting text from http://www.nytimes.com/2016/03/22/world/europe/deal-appears-to-curb-migrant-flow-but-greece-still-faces-uphill-effort.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/22/world/europe/deal-appears-to-curb-migrant-flow-but-greece-still-faces-uphill-effort.html


Processing URLs:  22%|██▏       | 216/1000 [08:52<25:14,  1.93s/it]

Error extracting text from http://www.nilc.org/issues/immigration-reform-and-executive-actions/united-states-v-state-of-texas/supreme-courts-tie-vote-means-dapa-daca/: 403 Client Error: Forbidden for url: http://www.nilc.org/issues/immigration-reform-and-executive-actions/united-states-v-state-of-texas/supreme-courts-tie-vote-means-dapa-daca/


Processing URLs:  22%|██▏       | 217/1000 [08:57<36:45,  2.82s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/atts-85-4-bn-deal-for-time-warner-wins-eu-thumbs-up/articleshow/57652505.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/atts-85-4-bn-deal-for-time-warner-wins-eu-thumbs-up/articleshow/57652505.cms


Processing URLs:  22%|██▏       | 220/1000 [09:15<52:13,  4.02s/it]  

Error extracting text from http://www.ibtimes.co.uk/assad-forces-make-crucial-gains-southern-aleppo-fighting-rages-1588017: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/assad-forces-make-crucial-gains-southern-aleppo-fighting-rages-1588017


Processing URLs:  22%|██▏       | 222/1000 [09:17<30:18,  2.34s/it]

Error extracting text from http://greece.greekreporter.com/2016/02/27/imf-predicts-difficulties-in-greeces-debt-management/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/02/27/imf-predicts-difficulties-in-greeces-debt-management/


Processing URLs:  22%|██▏       | 223/1000 [09:17<22:23,  1.73s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-idUSKBN15P12V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-idUSKBN15P12V


Processing URLs:  23%|██▎       | 226/1000 [09:27<40:55,  3.17s/it]

Error extracting text from http://www.asean.org/images/2015/August/47th-aem/10%20-%20JMS%20RCEP%203%20MM%20-%20Final%2020150824rev.pdf: 404 Client Error: Not Found for url: https://asean.org/images/2015/August/47th-aem/10%20-%20JMS%20RCEP%203%20MM%20-%20Final%2020150824rev.pdf


Processing URLs:  23%|██▎       | 229/1000 [09:33<27:51,  2.17s/it]

Error extracting text from https://www.nytimes.com/2020/12/23/business/china-european-union-united-states.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/23/business/china-european-union-united-states.html


Processing URLs:  23%|██▎       | 230/1000 [09:35<24:57,  1.94s/it]

Error extracting text from https://dl.dropboxusercontent.com/u/238511/papers/gov.uscourts.dcd.178502.19.0.pdf: 404 Client Error: Not Found for url: https://dl.dropboxusercontent.com/u/238511/papers/gov.uscourts.dcd.178502.19.0.pdf


Processing URLs:  23%|██▎       | 231/1000 [09:35<19:10,  1.50s/it]

Error extracting text from http://www.wsj.com/articles/dollar-strengthens-on-u-s-economic-data-1448475082: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/dollar-strengthens-on-u-s-economic-data-1448475082


Processing URLs:  23%|██▎       | 232/1000 [09:36<14:20,  1.12s/it]

Error extracting text from https://www.nytimes.com/reuters/2017/04/11/world/europe/11reuters-usa-nato-montenegro.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/04/11/world/europe/11reuters-usa-nato-montenegro.html?_r=0


Processing URLs:  23%|██▎       | 234/1000 [09:46<34:39,  2.71s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-idUSKBN17Q0AK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-idUSKBN17Q0AK


Processing URLs:  24%|██▎       | 237/1000 [09:52<29:45,  2.34s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-10/26/c_134751534.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-10/26/c_134751534.htm


Processing URLs:  24%|██▍       | 238/1000 [09:53<22:23,  1.76s/it]

Error extracting text from http://gamapserver.who.int/gho/interactive_charts/ncd/risk_factors/overweight/atlas.html: 403 Client Error: Forbidden for url: http://gamapserver.who.int/gho/interactive_charts/ncd/risk_factors/overweight/atlas.html


Processing URLs:  24%|██▍       | 241/1000 [09:57<20:25,  1.62s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-01-23/china-adopts-law-letting-coast-guard-fire-on-foreign-vessels


Processing URLs:  24%|██▍       | 243/1000 [09:58<11:45,  1.07it/s]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL1N1CU0T5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL1N1CU0T5


Processing URLs:  24%|██▍       | 245/1000 [10:00<11:52,  1.06it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-15/tesla-drops-as-morgan-stanley-warns-of-cash-burn-richer-rivals


Processing URLs:  25%|██▌       | 251/1000 [10:10<24:06,  1.93s/it]

Error extracting text from http://predictwise.com/politics/2016-governors: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016-governors


Processing URLs:  25%|██▌       | 252/1000 [10:10<18:30,  1.48s/it]

Error extracting text from http://www.majorityleader.gov/wp-content/uploads/2014/11/114thCongressFirstSession.pdf: 403 Client Error: Forbidden for url: http://www.majorityleader.gov/wp-content/uploads/2014/11/114thCongressFirstSession.pdf


Processing URLs:  25%|██▌       | 253/1000 [10:11<17:14,  1.38s/it]

Error extracting text from http://www.presstv.com/Detail/2015/02/12/397301/Russia-Venezuela-to-hold-joint-drills: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2015/02/12/397301/Russia-Venezuela-to-hold-joint-drills


Processing URLs:  26%|██▌       | 255/1000 [10:13<15:33,  1.25s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=329777: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=329777


Processing URLs:  26%|██▌       | 258/1000 [10:20<20:52,  1.69s/it]

Error extracting text from http://zeenews.india.com/news/india/eye-on-china-india-to-build-satellite-tracking-and-imaging-centre-in-vietnam_1848904.html: 403 Client Error: Forbidden for url: https://zeenews.india.com/news/india/eye-on-china-india-to-build-satellite-tracking-and-imaging-centre-in-vietnam_1848904.html


Processing URLs:  26%|██▌       | 260/1000 [10:21<14:12,  1.15s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics-merkel/europe-global-environment-mean-germany-needs-stable-government-merkel-says-idUSKBN1DR1HK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-merkel/europe-global-environment-mean-germany-needs-stable-government-merkel-says-idUSKBN1DR1HK


Processing URLs:  26%|██▋       | 263/1000 [10:24<11:25,  1.08it/s]

Error extracting text from http://www.fox5ny.com/news/local-news/105064499-story: 403 Client Error: Forbidden for url: http://www.fox5ny.com/news/local-news/105064499-story


Processing URLs:  27%|██▋       | 269/1000 [10:35<18:25,  1.51s/it]

Error extracting text from http://tame.healthspanpolicy.org/: 503 Server Error: Service Unavailable for url: http://tame.healthspanpolicy.org/


Processing URLs:  27%|██▋       | 272/1000 [10:38<13:20,  1.10s/it]

Error extracting text from https://www.nytimes.com/2015/08/28/world/migrants-refugees-europe-syria.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/08/28/world/migrants-refugees-europe-syria.html?_r=0


Processing URLs:  28%|██▊       | 275/1000 [10:41<13:14,  1.10s/it]

Error extracting text from https://www.iqt.org/: 403 Client Error: Forbidden for url: https://www.iqt.org/


Processing URLs:  28%|██▊       | 282/1000 [10:48<09:40,  1.24it/s]

Error extracting text from https://www.justice.gov/usao-ndca/page/file/1135066/download: 403 Client Error: Forbidden for url: https://www.justice.gov/usao-ndca/page/file/1135066/download


Processing URLs:  28%|██▊       | 283/1000 [10:52<21:41,  1.82s/it]

Error extracting text from http://www.fool.com/investing/general/2015/12/27/instant-analysis-apple-inc-and-ibms-enterprise-par.aspx: 500 Server Error: Internal Server Error for url: https://www.fool.com/investing/general/2015/12/27/instant-analysis-apple-inc-and-ibms-enterprise-par.aspx


Processing URLs:  28%|██▊       | 284/1000 [10:53<17:25,  1.46s/it]

Error extracting text from https://www.eurosport.com/tennis/roland-garros/2022/zinedine-zidane-backs-rafael-nadal-to-win-it-all-at-the-french-open-despite-younger-talent-being-pre_sto8953589/story-amp.shtml: 403 Client Error: Forbidden for url: https://www.eurosport.com/tennis/roland-garros/2022/zinedine-zidane-backs-rafael-nadal-to-win-it-all-at-the-french-open-despite-younger-talent-being-pre_sto8953589/story-amp.shtml


Processing URLs:  28%|██▊       | 285/1000 [10:54<15:24,  1.29s/it]

Error extracting text from http://www.japantimes.co.jp/news/2017/02/19/national/politics-diplomacy/senior-ldp-executive-komura-suggests-study-acquiring-pre-emptive-strike-capability/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2017/02/19/national/politics-diplomacy/senior-ldp-executive-komura-suggests-study-acquiring-pre-emptive-strike-capability/


Processing URLs:  29%|██▉       | 288/1000 [11:01<21:58,  1.85s/it]

URL filtered: https://www.youtube.com/watch?v=0Uc4DI-BF28


Processing URLs:  29%|██▉       | 291/1000 [11:05<20:00,  1.69s/it]

Error extracting text from http://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/249a3e52-aa2b-43c7-a1f5-96e07f94480a.pdf: 404 Client Error: Not Found for url: https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/249a3e52-aa2b-43c7-a1f5-96e07f94480a.pdf
URL filtered: https://www.wsj.com/articles/japan-wants-u-s-back-in-the-tpp-it-will-likely-have-to-wait-11618570801?reflink=desktopwebshare_twitter


Processing URLs:  29%|██▉       | 293/1000 [11:06<15:07,  1.28s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7537588/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7537588/


Processing URLs:  29%|██▉       | 294/1000 [11:07<13:39,  1.16s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2016/08/17/0401000000AEN20160817010400315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/news/videos/2016-01-21/soros-china-hard-landing-is-practically-unavoidable


Processing URLs:  30%|██▉       | 296/1000 [11:09<12:38,  1.08s/it]

Error extracting text from http://www.atimes.com/article/new-silk-road-will-go-syria//: 404 Client Error: Not Found for url: https://atimes.com/article/new-silk-road-will-go-syria/


Processing URLs:  30%|██▉       | 297/1000 [11:10<12:51,  1.10s/it]

URL filtered: https://www.theguardian.com/politics/2019/jan/12/labour-set-to-trigger-vote-to-topple-theresa-may-government?CMP=twt_gu&__twitter_impression=true


Processing URLs:  30%|███       | 303/1000 [11:17<12:49,  1.10s/it]

Error extracting text from https://ajmbroadcasteducator.wordpress.com/2015/05/07/nate-silver-the-world-may-have-a-polling-problem/: 410 Client Error: Gone for url: https://ajmbroadcasteducator.wordpress.com/2015/05/07/nate-silver-the-world-may-have-a-polling-problem/


Processing URLs:  31%|███       | 307/1000 [11:21<11:15,  1.03it/s]

Error extracting text from https://www.google.com/amp/www.telegraph.co.uk/news/2017/01/08/nicola-sturgeon-not-bluffing-second-scottish-independence-referendum/amp/?client=safari: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/news/2017/01/08/nicola-sturgeon-not-bluffing-second-scottish-independence-referendum/amp/


Processing URLs:  31%|███       | 308/1000 [11:21<08:51,  1.30it/s]

Error extracting text from https://www.nytimes.com/2018/02/09/world/asia/kim-yo-jong-history-facts.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/09/world/asia/kim-yo-jong-history-facts.html


Processing URLs:  31%|███       | 309/1000 [11:23<10:31,  1.09it/s]

Error extracting text from http://www.theepochtimes.com/n3/2156735-mass-purge-in-chinas-liaoning-province-advances-xis-political-aims/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2156735-mass-purge-in-chinas-liaoning-province-advances-xis-political-aims/


Processing URLs:  31%|███       | 312/1000 [11:25<09:25,  1.22it/s]

Error extracting text from https://www.npd.com/wps/portal/npd/us/news/press-releases/2020/one-in-four-books-in-the-us-is-purchased-during-the-holidays/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/news/press-releases/2020/one-in-four-books-in-the-us-is-purchased-during-the-holidays/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))
URL filtered: https://www.youtube.com/watch?v=PC7iGFQ9ICQ


Processing URLs:  32%|███▏      | 315/1000 [11:29<13:48,  1.21s/it]

Error extracting text from http://mobile.nytimes.com/2016/01/17/world/middleeast/iran-sanctions-lifted-nuclear-deal.html?referer=: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/01/17/world/middleeast/iran-sanctions-lifted-nuclear-deal.html?referer=


Processing URLs:  32%|███▏      | 316/1000 [11:30<13:59,  1.23s/it]

Error extracting text from https://www.fda.gov/media/145801/download: 403 Client Error: Forbidden for url: https://www.fda.gov/media/145801/download


Processing URLs:  32%|███▏      | 317/1000 [11:31<13:04,  1.15s/it]

Error extracting text from https://finance.yahoo.com/chart/LIT?ltr=1#eyJtdWx0aUNvbG9yTGluZSI6ZmFsc2UsImJvbGxpbmdlclVwcGVyQ29sb3IiOiIjZTIwMDgxIiwiYm9sbGluZ2VyTG93ZXJDb2xvciI6IiM5NTUyZmYiLCJtZmlMaW5lQ29sb3IiOiIjNDVlM2ZmIiwibWFjZERpdmVyZ2VuY2VDb2xvciI6IiNmZjdiMTIiLCJtYWNkTWFjZENvbG9yIjoiIzc4N2Q4MiIsIm1hY2RTaWduYWxDb2xvciI6IiMwMDAwMDAiLCJyc2lMaW5lQ29sb3IiOiIjZmZiNzAwIiwic3RvY2hLTGluZUNvbG9yIjoiI2ZmYjcwMCIsInN0b2NoRExpbmVDb2xvciI6IiM0NWUzZmYiLCJyYW5nZSI6IjV5IiwiYWxsb3dDaGFydFN0YWNraW5nIjp0cnVlfQ%3D%3D: 404 Client Error: Not Found for url: https://finance.yahoo.com/chart/LIT?ltr=1#eyJtdWx0aUNvbG9yTGluZSI6ZmFsc2UsImJvbGxpbmdlclVwcGVyQ29sb3IiOiIjZTIwMDgxIiwiYm9sbGluZ2VyTG93ZXJDb2xvciI6IiM5NTUyZmYiLCJtZmlMaW5lQ29sb3IiOiIjNDVlM2ZmIiwibWFjZERpdmVyZ2VuY2VDb2xvciI6IiNmZjdiMTIiLCJtYWNkTWFjZENvbG9yIjoiIzc4N2Q4MiIsIm1hY2RTaWduYWxDb2xvciI6IiMwMDAwMDAiLCJyc2lMaW5lQ29sb3IiOiIjZmZiNzAwIiwic3RvY2hLTGluZUNvbG9yIjoiI2ZmYjcwMCIsInN0b2NoRExpbmVDb2xvciI6IiM0NWUzZmYiLCJyYW5nZSI6IjV5IiwiYWxsb3dDaGFydFN0YWNraW5nIj

Processing URLs:  32%|███▏      | 322/1000 [11:40<19:06,  1.69s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-iran-idUSKBN15L15C?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-iran-idUSKBN15L15C?il=0


Processing URLs:  32%|███▏      | 323/1000 [11:41<17:10,  1.52s/it]

Error extracting text from http://news.sky.com/story/1646670/is-benefits-from-rise-in-russian-airstrikes: 404 Client Error: Not Found for url: https://news.sky.com/story/1646670/is-benefits-from-rise-in-russian-airstrikes


Processing URLs:  32%|███▏      | 324/1000 [11:42<13:26,  1.19s/it]

Error extracting text from https://beta.finance.yahoo.com/news/u-november-employment-report-seen-052252580.html?ltr=1: 400 Client Error: Invalid HTTP Request for url: https://beta.finance.yahoo.com/news/u-november-employment-report-seen-052252580.html?ltr=1


Processing URLs:  33%|███▎      | 327/1000 [11:45<11:00,  1.02it/s]

Error extracting text from https://www.nytimes.com/2021/04/13/us/politics/afghanistan-terrorism-threat.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/13/us/politics/afghanistan-terrorism-threat.html


Processing URLs:  33%|███▎      | 330/1000 [11:48<09:28,  1.18it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.bbc.com/portuguese/noticias/2016/03/160310_protestos_convencao_ms&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.bbc.com/portuguese/noticias/2016/03/160310_protestos_convencao_ms&amp;prev=search


Processing URLs:  33%|███▎      | 331/1000 [11:48<08:17,  1.34it/s]

Error extracting text from https://www.ipsos.com/en-za/south-african-youth-concerned-about-effects-downgrade-status: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-za/south-african-youth-concerned-about-effects-downgrade-status


Processing URLs:  33%|███▎      | 333/1000 [11:55<21:18,  1.92s/it]

Error extracting text from https://www.myanmar-now.org/en/news: 404 Client Error: Not Found for url: https://myanmar-now.org/en/news


Processing URLs:  33%|███▎      | 334/1000 [11:55<16:36,  1.50s/it]

Error extracting text from http://allafrica.com/stories/201706190040.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201706190040.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ff930440>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  34%|███▍      | 338/1000 [12:01<16:17,  1.48s/it]

Error extracting text from https://www.caracaschronicles.com/2017/11/02/debt-restructuring-thats-really-default-thats-really-giant-money-laundering-operation/: 403 Client Error: Forbidden for url: https://www.caracaschronicles.com/2017/11/02/debt-restructuring-thats-really-default-thats-really-giant-money-laundering-operation/


Processing URLs:  34%|███▍      | 340/1000 [12:04<16:06,  1.46s/it]

Error extracting text from https://reut.rs/3hONVLS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/rwanda-says-will-start-deploying-troops-mozambique-2021-07-09/


Processing URLs:  34%|███▍      | 341/1000 [12:05<15:35,  1.42s/it]

Error extracting text from http://inhomelandsecurity.com/january-terror-threat-snapshot-2016/: 403 Client Error: Forbidden for url: https://amuedge.com/january-terror-threat-snapshot-2016/


Processing URLs:  35%|███▌      | 350/1000 [12:14<07:32,  1.44it/s]

Error extracting text from http://www.wsj.com/articles/apple-earnings-lifted-by-iphone-sales-in-china-1445977904: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/apple-earnings-lifted-by-iphone-sales-in-china-1445977904
Error extracting text from http://www.business-standard.com/article/news-ians/seoul-warns-officials-of-cyber-attacks-from-pyongyang-116030200529_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/seoul-warns-officials-of-cyber-attacks-from-pyongyang-116030200529_1.html


Processing URLs:  35%|███▌      | 351/1000 [12:15<07:10,  1.51it/s]

Error extracting text from http://www.hybridcars.com/china-takes-lead-as-number-one-in-plug-in-vehicle-sales/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/china-takes-lead-as-number-one-in-plug-in-vehicle-sales/


Processing URLs:  35%|███▌      | 352/1000 [12:15<07:20,  1.47it/s]

Error extracting text from https://www.amazon.com/Superforecasting-Prediction-Philip-E-Tetlock/dp/0804136718/ref=sr_1_1?ie=UTF8&amp;qid=1477755123&amp;sr=8-1&amp;keywords=superforecasting: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Superforecasting-Prediction-Philip-E-Tetlock/dp/0804136718/ref=sr_1_1?ie=UTF8&amp;qid=1477755123&amp;sr=8-1&amp;keywords=superforecasting


Processing URLs:  35%|███▌      | 353/1000 [12:17<11:47,  1.09s/it]

Error extracting text from http://www.malaysia-chronicle.com/index.php?option=com_k2&amp;view=item&amp;id=610570:the-question-burning-malaysia-will-the-ringgit-have-to-hit-500-before-teflon-pm-najib-can-be-kicked-out&amp;Itemid=2: HTTPConnectionPool(host='www.malaysia-chronicle.com', port=80): Max retries exceeded with url: /index.php?option=com_k2&amp;view=item&amp;id=610570:the-question-burning-malaysia-will-the-ringgit-have-to-hit-500-before-teflon-pm-najib-can-be-kicked-out&amp;Itemid=2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffa51220>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  36%|███▌      | 355/1000 [12:19<10:49,  1.01s/it]

Error extracting text from http://www.nti.org/about/global-nuclear-policy/?subject=global-nuclear-policy: 403 Client Error: Forbidden for url: https://www.nti.org/about/global-nuclear-policy/?subject=global-nuclear-policy


Processing URLs:  36%|███▌      | 357/1000 [12:21<08:39,  1.24it/s]

Error extracting text from http://carnegieendowment.org/syriaincrisis/?fa=61941: 403 Client Error: Forbidden for url: http://carnegieendowment.org/syriaincrisis/?fa=61941


Processing URLs:  36%|███▌      | 361/1000 [12:25<10:33,  1.01it/s]

Error extracting text from http://www.newsweek.com/russia-vetoes-un-demand-end-aleppo-slaughter-507881?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/russia-vetoes-un-demand-end-aleppo-slaughter-507881?rx=us


Processing URLs:  36%|███▋      | 364/1000 [12:30<15:04,  1.42s/it]

Error extracting text from https://missilethreat.csis.org/missile/hwasong-15-kn-22/: 403 Client Error: Forbidden for url: https://missilethreat.csis.org/missile/hwasong-15-kn-22/


Processing URLs:  36%|███▋      | 365/1000 [12:31<14:48,  1.40s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/damascus-uneasy-stability-boosts-syrias-assad-38617948: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/damascus-uneasy-stability-boosts-syrias-assad-38617948


Processing URLs:  37%|███▋      | 366/1000 [12:33<14:07,  1.34s/it]

Error extracting text from https://www.vhha.com/communications/virginia-hospital-covid-19-data-dashboard/: 403 Client Error: Forbidden for url: https://www.vhha.com/communications/virginia-hospital-covid-19-data-dashboard/


Processing URLs:  37%|███▋      | 368/1000 [12:35<11:35,  1.10s/it]

Error extracting text from http://www.reuters.com/article/2015/11/25/us-opec-meeting-idUSKBN0TE0I020151125: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/25/us-opec-meeting-idUSKBN0TE0I020151125


Processing URLs:  37%|███▋      | 372/1000 [12:40<11:59,  1.15s/it]

Error extracting text from http://www.balkans.com/open-news.php?uniquenumber=210334: 404 Client Error: Not Found for url: http://www.balkans.com/open-news.php?uniquenumber=210334
URL filtered: http://www.bloomberg.com/politics/articles/2015-12-12/cruz-soars-to-front-of-the-pack-in-iowa-poll-trump-support-stays-flat-ii3p88rp
URL filtered: http://www.bloomberg.com/news/articles/2016-01-23/u-s-russia-said-to-near-compromise-to-unlock-syria-peace-talks


Processing URLs:  38%|███▊      | 376/1000 [12:41<06:13,  1.67it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/03/09/world/asia/ap-as-myanmar-politics.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/03/09/world/asia/ap-as-myanmar-politics.html


Processing URLs:  38%|███▊      | 379/1000 [12:44<07:34,  1.37it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-exclu-idUSKBN17B124: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-exclu-idUSKBN17B124


Processing URLs:  38%|███▊      | 384/1000 [12:49<08:45,  1.17it/s]

Error extracting text from http://www.wsj.com/articles/panama-canal-administrator-expects-new-locks-to-open-on-time-1444653273: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/panama-canal-administrator-expects-new-locks-to-open-on-time-1444653273
URL filtered: https://www.bloomberg.com/news/articles/2017-08-25/trump-targets-new-venezuelan-debt-in-latest-round-of-sanctions


Processing URLs:  39%|███▉      | 388/1000 [12:55<14:21,  1.41s/it]

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/February/REINZ%20Monthly%20Property%20Report%20-%20February%202021.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/February/REINZ%20Monthly%20Property%20Report%20-%20February%202021.pdf


Processing URLs:  39%|███▉      | 390/1000 [12:58<13:47,  1.36s/it]

Error extracting text from http://english.alarabiya.net/en/views/news/middle-east/2016/05/15/Killing-of-Hezbollah-s-Badreddine-points-to-open-score-settling-in-Syria-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/views/news/middle-east/2016/05/15/Killing-of-Hezbollah-s-Badreddine-points-to-open-score-settling-in-Syria-.html


Processing URLs:  39%|███▉      | 391/1000 [13:00<15:15,  1.50s/it]

Error extracting text from http://politico.pro/1ZOXknD: HTTPConnectionPool(host='politico.pro', port=80): Max retries exceeded with url: /1ZOXknD (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff820080>: Failed to resolve 'politico.pro' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  39%|███▉      | 392/1000 [13:03<20:20,  2.01s/it]

Error extracting text from http://vestnikkavkaza.net/news/King-of-Bahrain-to-go-to-Moscow-Monday-for-talks-with-Putin.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/King-of-Bahrain-to-go-to-Moscow-Monday-for-talks-with-Putin.html


Processing URLs:  39%|███▉      | 393/1000 [13:04<17:38,  1.74s/it]

Error extracting text from http://www.atlanticcouncil.org/images/publications/Arming_for_Deterrence_web_0719.pdf: 404 Client Error: Not Found for url: https://www.atlanticcouncil.org/images/publications/Arming_for_Deterrence_web_0719.pdf


Processing URLs:  39%|███▉      | 394/1000 [13:05<14:11,  1.41s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/gop-primaries/271790-priebus-contested-convention-is-unlikely: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/gop-primaries/271790-priebus-contested-convention-is-unlikely/


Processing URLs:  40%|███▉      | 398/1000 [13:09<10:21,  1.03s/it]

Error extracting text from https://www.neweurope.eu/article/polish-coal-policy-unloved-in-warsaw-disliked-in-brussels/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/polish-coal-policy-unloved-in-warsaw-disliked-in-brussels/


Processing URLs:  40%|███▉      | 399/1000 [13:09<08:19,  1.20it/s]

Error extracting text from http://www.el-nacional.com/noticias/economia/venezuela-riesgo-caer-default_210446: 403 Client Error: Forbidden for url: https://www.elnacional.com/noticias/economia/venezuela-riesgo-caer-default_210446


Processing URLs:  40%|████      | 400/1000 [13:09<07:24,  1.35it/s]

Error extracting text from http://www.govtech.com/dc/articles/Even-Einstein-Couldnt-Fix-Cybersecurity.html: 403 Client Error: Forbidden for url: https://www.govtech.com/dc/articles/Even-Einstein-Couldnt-Fix-Cybersecurity.html


Processing URLs:  40%|████      | 401/1000 [13:10<07:13,  1.38it/s]

Error extracting text from https://www.defense.gov/Portals/1/Documents/pubs/June_2017_1225_Report_to_Congress.pdf: 403 Client Error: Forbidden for url: https://www.defense.gov/Portals/1/Documents/pubs/June_2017_1225_Report_to_Congress.pdf


Processing URLs:  40%|████      | 405/1000 [13:13<05:46,  1.72it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-nafta-idUSKBN15H24R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-nafta-idUSKBN15H24R
Error extracting text from http://www.realclearpolitics.com/articles/2013/01/24/ashley_judd_challenge_from_right_may_confront_mcconnell_116774.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2013/01/24/ashley_judd_challenge_from_right_may_confront_mcconnell_116774.html
Error extracting text from http://www.reuters.com/article/us-britain-eu-davis/eu-agreed-no-sum-needed-to-move-talks-forward-british-brexit-minister-idUSKBN1DC0EE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-davis/eu-agreed-no-sum-needed-to-move-talks-forward-british-brexit-minister-idUSKBN1DC0EE


Processing URLs:  41%|████      | 409/1000 [13:18<09:03,  1.09it/s]

Error extracting text from http://mobile.nytimes.com/2016/09/18/world/middleeast/his-position-still-secure-bashar-al-assad-smiles-as-syria-burns.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/09/18/world/middleeast/his-position-still-secure-bashar-al-assad-smiles-as-syria-burns.html
Error extracting text from http://www.nytimes.com/2016/10/06/world/americas/antonio-guterres-un-secretary-general-united-nations.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/06/world/americas/antonio-guterres-un-secretary-general-united-nations.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  41%|████      | 412/1000 [13:21<08:22,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0YU077: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0YU077


Processing URLs:  41%|████▏     | 414/1000 [13:22<07:57,  1.23it/s]

Error extracting text from http://thehill.com/homenews/senate/359381-senate-dems-introduce-bill-to-ban-assault-weapons-bump-stocks: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/359381-senate-dems-introduce-bill-to-ban-assault-weapons-bump-stocks/


Processing URLs:  42%|████▏     | 415/1000 [13:23<08:07,  1.20it/s]

Error extracting text from https://www.icrc.org/en/document/war-hunger-south-sudan-somalia-yemen-nigeria-conflict-food: 403 Client Error: Forbidden for url: https://www.icrc.org/en/document/war-hunger-south-sudan-somalia-yemen-nigeria-conflict-food
Error extracting text from http://espn.go.com/nfl/story/_/id/7502396/commissioner-roger-goodell-gets-extension-2018-season: 403 Client Error: Forbidden for url: http://espn.go.com/nfl/story/_/id/7502396/commissioner-roger-goodell-gets-extension-2018-season


Processing URLs:  42%|████▏     | 422/1000 [13:36<09:05,  1.06it/s]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/11/peaceful-pro-biafra-activists-killed-in-chilling-crackdown/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/11/peaceful-pro-biafra-activists-killed-in-chilling-crackdown/
URL filtered: https://www.youtube.com/watch?v=pLHZHBivIiA
Error extracting text from http://www.nytimes.com/reuters/2016/08/14/world/middleeast/14reuters-mideast-crisis-iraq-mosul.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/08/14/world/middleeast/14reuters-mideast-crisis-iraq-mosul.html?ref=world


Processing URLs:  42%|████▏     | 423/1000 [13:37<10:47,  1.12s/it]

Error extracting text from http://www.lloydslist.com/ll/sector/article474922.ece: 404 Client Error: Page not found for url: https://www.lloydslist.com:443/ll/sector/article474922.ece


Processing URLs:  43%|████▎     | 426/1000 [13:41<10:12,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-taiwan-defence-idUSKBN1690H4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-taiwan-defence-idUSKBN1690H4


Processing URLs:  43%|████▎     | 430/1000 [13:45<10:14,  1.08s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-politics-idUSKCN1B60KA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN1B60KA


Processing URLs:  43%|████▎     | 432/1000 [13:46<06:59,  1.36it/s]

Error extracting text from http://www.japantimes.co.jp/news/2015/10/14/national/politics-diplomacy/rouhani-conveys-kishida-invite-abe-visit-iran/#.Vh7OFek0r04: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/10/14/national/politics-diplomacy/rouhani-conveys-kishida-invite-abe-visit-iran/#.Vh7OFek0r04
URL filtered: http://statusmind.com/clever-facebook-status-2570/


Processing URLs:  44%|████▎     | 437/1000 [13:53<09:58,  1.06s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/hundreds-myanmar-activists-hold-flash-mob-protest-against-military-rule-2021-06-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/hundreds-myanmar-activists-hold-flash-mob-protest-against-military-rule-2021-06-03/


Processing URLs:  44%|████▍     | 438/1000 [13:55<13:11,  1.41s/it]

Error extracting text from https://polcms.secure.europarl.europa.eu/cmsdata/115748/IPOL_IDA(2017)583130_EN.pdf: 403 Client Error: Forbidden for url: https://telacms.europarl.europa.eu/cmsdata/115748/IPOL_IDA(2017)583130_EN.pdf


Processing URLs:  44%|████▍     | 440/1000 [13:57<11:24,  1.22s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/12/31/0401000000AEN20151231000200315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  44%|████▍     | 441/1000 [13:58<09:45,  1.05s/it]

Error extracting text from http://www.nasdaq.com/article/asian-shares-lose-momentum3rd-update-20151007-00004#ixzz3o3PgdPRj: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/asian-shares-lose-momentum3rd-update-20151007-00004#ixzz3o3PgdPRj


Processing URLs:  45%|████▍     | 446/1000 [14:07<14:22,  1.56s/it]

Error extracting text from https://www.asil.org/insights/volume/11/issue/28/canadian-made-drugs-rwanda-first-application-wto-waiver-patents-and: 500 Server Error: Internal Server Error Suspected DOS Attack for url: https://www.asil.org/insights/volume/11/issue/28/canadian-made-drugs-rwanda-first-application-wto-waiver-patents-and


Processing URLs:  45%|████▍     | 447/1000 [14:08<12:29,  1.35s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-north-carolina-governor-mccrory-vs-cooper: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-north-carolina-governor-mccrory-vs-cooper


Processing URLs:  45%|████▌     | 452/1000 [14:13<10:31,  1.15s/it]

Error extracting text from https://www.courtlistener.com/docket/4609586/1/waymo-llc-v-uber-technologies-inc/: 403 Client Error: Forbidden for url: https://www.courtlistener.com/docket/4609586/1/waymo-llc-v-uber-technologies-inc/


Processing URLs:  45%|████▌     | 454/1000 [14:16<11:48,  1.30s/it]

Error extracting text from http://www.acwa.com/news/conservation/california%E2%80%99s-conservation-continues-meet-cumulative-target: 403 Client Error: Forbidden for url: http://www.acwa.com/news/conservation/california%E2%80%99s-conservation-continues-meet-cumulative-target


Processing URLs:  46%|████▌     | 457/1000 [14:21<11:00,  1.22s/it]

Error extracting text from http://sofrep.com/47122/watch-rough-urban-parachute-landing-by-french-foreign-legion/: 500 Server Error: Internal Server Error for url: https://sofrep.com/news/watch-rough-urban-parachute-landing-by-french-foreign-legion/
Error extracting text from http://www.reuters.com/article/us-burundi-violence-un-idUSKCN0XM273: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-violence-un-idUSKCN0XM273


Processing URLs:  46%|████▌     | 461/1000 [14:29<17:51,  1.99s/it]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKCN0UT201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  46%|████▌     | 462/1000 [14:30<13:06,  1.46s/it]

Error extracting text from https://www.dbs.com.sg/treasures/aics/GenericArticle.page?dcrPath=templatedata/article/generic/data/en/GR/022016/160205_economics_currency_effect_masks_export_reality.xml: 403 Client Error: Forbidden for url: https://www.dbs.com.sg/treasures/aics/GenericArticle.page?dcrPath=templatedata/article/generic/data/en/GR/022016/160205_economics_currency_effect_masks_export_reality.xml


Processing URLs:  46%|████▋     | 463/1000 [14:31<14:05,  1.57s/it]

Error extracting text from http://www.ibtimes.com/syrian-refugees-heading-europe-are-big-business-lebanons-travel-agents-2126557: 403 Client Error: Forbidden for url: https://www.ibtimes.com/syrian-refugees-heading-europe-are-big-business-lebanons-travel-agents-2126557


Processing URLs:  46%|████▋     | 465/1000 [14:33<10:42,  1.20s/it]

Error extracting text from http://chronicle.gi/2016/05/congress-puts-focus-on-russian-warships/: 503 Server Error: Service Temporarily Unavailable for url: http://chronicle.gi/2016/05/congress-puts-focus-on-russian-warships/
URL filtered: https://twitter.com/billatnapier/status/1342877839008411648?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1342935639663259648%7Ctwgr%5E%7Ctwcon%5Es2_&amp;ref_url=https%3A%2F%2Fwww.bbc.co.uk%2Fnews%2Ftechnology-55475433


Processing URLs:  47%|████▋     | 467/1000 [14:34<06:38,  1.34it/s]

Error extracting text from http://www.laprensasa.com/309_america-in-english/3677438_brazil-s-gov-t-with-lula-s-help-looks-to-resist-impeachment-push.html: 404 Client Error: Not Found for url: http://www.laprensasa.com/309_america-in-english/3677438_brazil-s-gov-t-with-lula-s-help-looks-to-resist-impeachment-push.html


Processing URLs:  47%|████▋     | 468/1000 [14:35<07:50,  1.13it/s]

Error extracting text from http://www.ibtimes.co.uk/abu-dhabi-national-oil-company-may-beat-saudi-aramco-mega-stock-floatation-1630052: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/abu-dhabi-national-oil-company-may-beat-saudi-aramco-mega-stock-floatation-1630052


Processing URLs:  47%|████▋     | 471/1000 [14:40<11:42,  1.33s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds-idUSKBN18S3JY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-idUSKBN18S3JY?il=0


Processing URLs:  48%|████▊     | 475/1000 [14:46<15:08,  1.73s/it]

URL filtered: https://cryptopotato.com/sec-could-approve-a-bitcoin-futures-etf-by-october-says-bloomberg-strategist/


Processing URLs:  48%|████▊     | 478/1000 [14:51<14:11,  1.63s/it]

Error extracting text from https://en.el-balad.com/2359519: 404 Client Error: Not Found for url: https://en.el-balad.com/2359519


Processing URLs:  49%|████▊     | 487/1000 [15:17<28:07,  3.29s/it]

Error extracting text from http://www.wsj.com/articles/syria-calm-as-cease-fire-holds-1456566198: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-calm-as-cease-fire-holds-1456566198


Processing URLs:  49%|████▉     | 491/1000 [15:27<20:04,  2.37s/it]

Error extracting text from http://thehill.com/policy/energy-environment/335898-dakota-access-pipeline-now-in-service: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/335898-dakota-access-pipeline-now-in-service/


Processing URLs:  49%|████▉     | 494/1000 [15:29<10:46,  1.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-27/brazil-police-widen-petrobras-probe-to-offshore-money-laundering


Processing URLs:  50%|████▉     | 498/1000 [15:29<04:04,  2.05it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-security-un-idUSKBN18Q24V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-un-idUSKBN18Q24V
Error extracting text from https://www.citizensjournal.us/nih-admits-funding-wuhan-experiment-resulting-in-gain-of-function/: 403 Client Error: Forbidden for url: https://www.citizensjournal.us/nih-admits-funding-wuhan-experiment-resulting-in-gain-of-function/


Processing URLs:  50%|████▉     | 499/1000 [15:30<03:15,  2.56it/s]

Error extracting text from http://www.nytimes.com/2016/08/25/world/asia/american-university-attack-kabul-afghanistan.html?src=twr&amp;smid=tw-nytimes&amp;smtyp=cur: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/25/world/asia/american-university-attack-kabul-afghanistan.html?src=twr&amp;smid=tw-nytimes&amp;smtyp=cur


Processing URLs:  50%|█████     | 500/1000 [15:31<05:50,  1.43it/s]

URL filtered: https://twitter.com/BritainElects/status/1388131356870430723


Processing URLs:  50%|█████     | 502/1000 [15:36<11:19,  1.37s/it]

Error extracting text from https://www.reuters.com/world/middle-east/blinken-visit-israel-west-bank-may-26-27-source-says-2021-05-22/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/blinken-visit-israel-west-bank-may-26-27-source-says-2021-05-22/


Processing URLs:  50%|█████     | 505/1000 [15:45<17:34,  2.13s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-biden/biden-will-move-to-extend-pause-on-student-loan-payments-transition-official-idUSKBN29D2WM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-biden/biden-will-move-to-extend-pause-on-student-loan-payments-transition-official-idUSKBN29D2WM


Processing URLs:  51%|█████     | 506/1000 [15:46<14:23,  1.75s/it]

Error extracting text from http://greece.greekreporter.com/2015/12/02/the-forgotten-greek-economic-crisis/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2015/12/02/the-forgotten-greek-economic-crisis/


Processing URLs:  51%|█████     | 511/1000 [15:51<08:26,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-russia-economy-idUSKCN0IU0HV20141110: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-economy-idUSKCN0IU0HV20141110


Processing URLs:  52%|█████▏    | 521/1000 [16:11<15:12,  1.91s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-09/spain-s-socialists-won-t-back-rajoy-s-bid-to-form-government


Processing URLs:  52%|█████▎    | 525/1000 [16:18<13:11,  1.67s/it]

URL filtered: https://twitter.com/ChinaDailyUSA


Processing URLs:  53%|█████▎    | 530/1000 [16:22<08:57,  1.14s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-08-25/russia-more-prey-than-predator-to-cyber-firm-wary-of-china


Processing URLs:  53%|█████▎    | 533/1000 [16:28<12:06,  1.56s/it]

Error extracting text from http://eng.mod.gov.cn/HomePicture/: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/HomePicture/


Processing URLs:  54%|█████▎    | 535/1000 [16:30<10:57,  1.41s/it]

Error extracting text from http://www.boerse-berlin.com/index.php/Bonds?isin=XS0460546798: 403 Client Error: Forbidden for url: https://www.boerse-berlin.com/index.php/Bonds?isin=XS0460546798


Processing URLs:  54%|█████▎    | 536/1000 [16:30<08:26,  1.09s/it]

Error extracting text from http://www.vanguardngr.com/2016/03/imn-drags-nigeria-army-icc-genocide/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/03/imn-drags-nigeria-army-icc-genocide/


Processing URLs:  54%|█████▍    | 538/1000 [16:33<07:53,  1.03s/it]

Error extracting text from http://www.wsj.com/articles/u-k-talks-on-eu-reform-enter-second-day-with-issues-still-unresolved-1455869486: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-talks-on-eu-reform-enter-second-day-with-issues-still-unresolved-1455869486


Processing URLs:  54%|█████▍    | 539/1000 [16:35<09:47,  1.27s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/north-korea-fires-missiles-into-sea-as-nations-ramp-up-toughersanctions/article29007664/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/north-korea-fires-missiles-into-sea-as-nations-ramp-up-toughersanctions/article29007664/
URL filtered: http://www.bloomberg.com/news/articles/2015-11-03/rousseff-s-most-powerful-foe-weakened-on-ethics-committee-probe


Processing URLs:  55%|█████▍    | 545/1000 [16:47<11:36,  1.53s/it]

Error extracting text from http://wwwnc.cdc.gov/travel/notices: 403 Client Error: Forbidden for url: http://wwwnc.cdc.gov/travel/notices


Processing URLs:  55%|█████▍    | 547/1000 [16:47<06:30,  1.16it/s]

Error extracting text from https://www.el19digital.com/articulos/ver/titulo:111872-mas-de-115-millones-de-dolares-adicionales-para-la-produccion-en-nicaragua: 403 Client Error: Forbidden for url: https://www.el19digital.com/articulos/ver/titulo:111872-mas-de-115-millones-de-dolares-adicionales-para-la-produccion-en-nicaragua
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN19I2YM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN19I2YM


Processing URLs:  55%|█████▌    | 551/1000 [16:53<10:12,  1.36s/it]

Error extracting text from http://www.un.org/en/sc/programme/: 403 Client Error: Forbidden for url: https://www.un.org/en/sc/programme/


Processing URLs:  55%|█████▌    | 553/1000 [16:55<08:22,  1.12s/it]

Error extracting text from https://www.amazon.com/Structured-Analytic-Techniques-Intelligence-Analysis/dp/1452241511/ref=mt_spiral_bound?_encoding=UTF8&amp;me=: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Structured-Analytic-Techniques-Intelligence-Analysis/dp/1452241511/ref=mt_spiral_bound?_encoding=UTF8&amp;me=


Processing URLs:  56%|█████▌    | 556/1000 [17:07<26:58,  3.65s/it]

Error extracting text from http://www.moonexpress.com/expeditions/: 404 Client Error: Not Found for url: https://moonexpress.com/expeditions/


Processing URLs:  56%|█████▌    | 557/1000 [17:09<21:58,  2.98s/it]

URL filtered: https://www.rferl.org/a/facebook-ads-run-by-russian-operatives-st-petersburg-troll-factor-actively-promoted-trump-agenda/28732621.html


Processing URLs:  56%|█████▌    | 560/1000 [17:14<15:48,  2.16s/it]

Error extracting text from https://www.nytimes.com/2017/06/19/world/asia/afghanistan-taliban-faction-renouncers.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=5&amp;pgtype=collection: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/19/world/asia/afghanistan-taliban-faction-renouncers.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=5&amp;pgtype=collection


Processing URLs:  57%|█████▋    | 566/1000 [17:29<12:32,  1.73s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-attack-idUSKBN0TH03Q20151128: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-attack-idUSKBN0TH03Q20151128
Error extracting text from http://www.reuters.com/article/us-britain-eu-schaeuble-idUSKBN15921T?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-schaeuble-idUSKBN15921T?mod=related&amp;channelName=worldNews


Processing URLs:  57%|█████▋    | 568/1000 [17:32<10:46,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-energy-summit-turkey-russia-idUSKCN12A244: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-energy-summit-turkey-russia-idUSKCN12A244


Processing URLs:  57%|█████▋    | 570/1000 [17:32<06:55,  1.03it/s]

Error extracting text from http://www.hybridcars.com/video-highlights-hand-made-in-japan-toyota-mirai-fcv-production/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/video-highlights-hand-made-in-japan-toyota-mirai-fcv-production/


Processing URLs:  57%|█████▋    | 573/1000 [17:35<07:20,  1.03s/it]

Error extracting text from http://www.nytimes.com/2013/10/13/world/europe/behind-flurry-of-killing-potency-of-hate.html?emc=edit_tnt_20131012&amp;tntemail0=y&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2013/10/13/world/europe/behind-flurry-of-killing-potency-of-hate.html?emc=edit_tnt_20131012&amp;tntemail0=y&amp;_r=0


Processing URLs:  57%|█████▊    | 575/1000 [17:40<10:45,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-usa-tax/senate-republicans-tie-tax-plan-to-repeal-of-key-obamacare-mandate-idUSKBN1DE2OF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax/senate-republicans-tie-tax-plan-to-repeal-of-key-obamacare-mandate-idUSKBN1DE2OF


Processing URLs:  58%|█████▊    | 580/1000 [18:00<26:25,  3.78s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0YV2HA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0YV2HA
Error extracting text from http://freenews.xyz/2016/01/13/naryshkin-in-june-of-2016-planning-to-visit-japan/: HTTPConnectionPool(host='freenews.xyz', port=80): Max retries exceeded with url: /2016/01/13/naryshkin-in-june-of-2016-planning-to-visit-japan/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2febe0320>: Failed to resolve 'freenews.xyz' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 583/1000 [18:06<19:11,  2.76s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/beijing-rejects-vietnam/2394844.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/beijing-rejects-vietnam/2394844.html


Processing URLs:  58%|█████▊    | 585/1000 [18:10<15:50,  2.29s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-31/venezuela-bond-holders-in-the-dark-as-to-status-of-pdvsa-payment


Processing URLs:  59%|█████▊    | 587/1000 [18:12<12:17,  1.79s/it]

Error extracting text from https://alphaglider.com/blog/2015/01/14/oil-antibiotics/: 404 Client Error: Not Found for url: https://www.alphaglider.com/blog/2015/01/14/oil-antibiotics/


Processing URLs:  60%|█████▉    | 595/1000 [18:40<35:04,  5.20s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/the-latest-rebel-shelling-kills-3-in-syrias-aleppo/2016/04/25/09462f04-0adb-11e6-bc53-db634ca94a2a_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/the-latest-rebel-shelling-kills-3-in-syrias-aleppo/2016/04/25/09462f04-0adb-11e6-bc53-db634ca94a2a_story.html


Processing URLs:  60%|█████▉    | 597/1000 [18:43<22:58,  3.42s/it]

Error extracting text from http://www.ndb.int/New-Development-ICICI-ink-preferred-partner-pact.php: 403 Client Error: Forbidden for url: https://www.ndb.int/New-Development-ICICI-ink-preferred-partner-pact.php


Processing URLs:  60%|█████▉    | 598/1000 [18:44<17:59,  2.69s/it]

URL filtered: https://twitter.com/jeremybowers/status/748880748241752064


Processing URLs:  60%|██████    | 604/1000 [18:50<08:28,  1.28s/it]

URL filtered: https://twitter.com/sandyleevincent/status/685982027183558656


Processing URLs:  61%|██████    | 607/1000 [18:52<06:41,  1.02s/it]

Error extracting text from https://obroncology.com/article/car-t-cell-therapy-on-fast-track-for-commercialization-in-2017/: 403 Client Error: Forbidden for url: https://www.obroncology.com/article/car-t-cell-therapy-on-fast-track-for-commercialization-in-2017/


Processing URLs:  61%|██████    | 610/1000 [18:55<05:06,  1.27it/s]

Error extracting text from http://www.scotsman.com/news/indyref2-what-would-a-separate-scotland-s-deficit-really-be-1-4367408: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/indyref2-what-would-a-separate-scotland-s-deficit-really-be-1-4367408
Error extracting text from http://www.reuters.com/article/2015/10/09/us-usa-congress-eximbank-idUSKCN0S321X20151009: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/09/us-usa-congress-eximbank-idUSKCN0S321X20151009


Processing URLs:  61%|██████▏   | 614/1000 [18:59<05:37,  1.14it/s]

Error extracting text from http://www.worldtribune.com/saudi-king-salman-hospitalized-after-erratic-behavior/: 403 Client Error: Forbidden for url: http://www.worldtribune.com/saudi-king-salman-hospitalized-after-erratic-behavior/


Processing URLs:  62%|██████▏   | 616/1000 [19:05<12:01,  1.88s/it]

Error extracting text from http://www.politicoscope.com/syria-over-70-syrians-want-assad-to-remain-in-power-poll/: HTTPConnectionPool(host='www.politicoscope.com', port=80): Max retries exceeded with url: /syria-over-70-syrians-want-assad-to-remain-in-power-poll/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2febe2e40>: Failed to resolve 'www.politicoscope.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  62%|██████▏   | 617/1000 [19:07<11:46,  1.85s/it]

Error extracting text from http://www.businessgreen.com/bg/news/3007567/vatican-urges-trump-to-rethink-climate-policy-attack: 500 Server Error: Internal Server Error for url: https://www.businessgreen.com/news/3007567/vatican-urges-trump-to-rethink-climate-policy-attack


Processing URLs:  62%|██████▏   | 618/1000 [19:08<11:22,  1.79s/it]

Error extracting text from http://allthingsnuclear.org/dwright/north-korea-is-launching-a-rocket-soon-what-do-we-know-about-it: 403 Client Error: Forbidden for url: https://blog.ucsusa.org/dwright/north-korea-is-launching-a-rocket-soon-what-do-we-know-about-it


Processing URLs:  62%|██████▏   | 620/1000 [19:13<14:07,  2.23s/it]

Error extracting text from http://www.scmagazine.com/nation-state-sponsored-malware-believed-to-target-european-electric-company/article/509341/: 404 Client Error: Not Found for url: https://www.scmagazine.com/news/nation-state-sponsored-malware-believed-to-target-european-electric-company


Processing URLs:  62%|██████▏   | 624/1000 [19:17<06:49,  1.09s/it]

Error extracting text from http://www.wsj.com/articles/russian-u-s-troops-edge-closer-in-northeastern-syria-1453505033: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russian-u-s-troops-edge-closer-in-northeastern-syria-1453505033


Processing URLs:  63%|██████▎   | 627/1000 [19:22<07:56,  1.28s/it]

URL filtered: https://www.youtube.com/watch?v=xniXLGW80Pg
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O2N1QM6S973101-5DKGS2LIP24GAS5JOCHUO5DEQU
Error extracting text from https://www.reuters.com/lifestyle/sports/china-says-us-diplomatic-boycott-winter-olympics-could-harm-co-operation-2021-12-07/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/china-says-us-diplomatic-boycott-winter-olympics-could-harm-co-operation-2021-12-07/


Processing URLs:  63%|██████▎   | 634/1000 [19:30<06:14,  1.02s/it]

Error extracting text from http://www.macrotrends.net/1369/crude-oil-price-history-chart: 403 Client Error: Forbidden for url: http://www.macrotrends.net/1369/crude-oil-price-history-chart


Processing URLs:  64%|██████▍   | 639/1000 [19:39<09:55,  1.65s/it]

Error extracting text from http://www.newindianexpress.com/world/2017/oct/22/china-president-xi-jinpings-top-aide-may-not-make-it-to-top-body-set-to-retire-1679925.html: 404 Client Error: Not Found for url: https://www.newindianexpress.com/world/2017/oct/22/china-president-xi-jinpings-top-aide-may-not-make-it-to-top-body-set-to-retire-1679925.html


Processing URLs:  64%|██████▍   | 642/1000 [19:42<06:51,  1.15s/it]

Error extracting text from http://www.scotsman.com/lifestyle/insight-how-much-can-vladimir-putin-influence-what-we-think-and-do-1-4617294: 403 Client Error: Forbidden for url: https://www.scotsman.com/lifestyle/insight-how-much-can-vladimir-putin-influence-what-we-think-and-do-1-4617294


Processing URLs:  64%|██████▍   | 643/1000 [19:44<06:52,  1.15s/it]

Error extracting text from http://nationalinterest.org/feature/tough-tougher-clinton-vs-trump-north-korea-17726: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/tough-tougher-clinton-vs-trump-north-korea-17726


Processing URLs:  65%|██████▍   | 646/1000 [20:00<22:09,  3.76s/it]

Error extracting text from https://www.johnkasich.com/blog-posts/kasich-campaign-statement-upcoming-primaries/: 404 Client Error: Not Found for url: https://www.johnkasich.com/blog-posts/kasich-campaign-statement-upcoming-primaries/


Processing URLs:  65%|██████▍   | 648/1000 [20:02<13:52,  2.37s/it]

Error extracting text from http://abcnews.go.com/Technology/wireStory/school-websites-hacked-show-pro-islamic-state-message-50990276: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/school-websites-hacked-show-pro-islamic-state-message-50990276


Processing URLs:  65%|██████▌   | 651/1000 [20:03<06:07,  1.05s/it]

Error extracting text from https://www.wsj.com/articles/myanmar-doctors-risk-arrest-to-treat-covid-19-patients-in-secret-11628601972: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/myanmar-doctors-risk-arrest-to-treat-covid-19-patients-in-secret-11628601972
Error extracting text from http://www.nytimes.com/2016/12/08/us/politics/political-divide-on-campuses-hardens-after-trumps-victory.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/08/us/politics/political-divide-on-campuses-hardens-after-trumps-victory.html


Processing URLs:  65%|██████▌   | 652/1000 [20:06<08:23,  1.45s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/my.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/my.html


Processing URLs:  66%|██████▌   | 657/1000 [20:15<07:15,  1.27s/it]

Error extracting text from http://www.nytimes.com/2015/10/08/upshot/joe-biden-no-money-weak-polls-but-still-clintons-toughest-rival.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/08/upshot/joe-biden-no-money-weak-polls-but-still-clintons-toughest-rival.html
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://fatoonline.com.br/conteudo/16386/disputa-pela-lideranca-do-pmdb-reflete-controle-do-impeachment%3For%3Dh-not%26p%3Dl%26i%3D3%26v%3D0&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://fatoonline.com.br/conteudo/16386/disputa-pela-lideranca-do-pmdb-reflete-controle-do-impeachment%3For%3Dh-not%26p%3Dl%26i%3D3%26v%3D0&amp;prev=search
Error extracting text from http://www.nytimes.com/2015/10/12/world/middleeast/iran-tests-long-range-missile-possibly-violating-nuclear-accord.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/1

Processing URLs:  66%|██████▌   | 661/1000 [20:22<08:45,  1.55s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN0VO0H2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN0VO0H2


Processing URLs:  66%|██████▋   | 665/1000 [20:25<05:38,  1.01s/it]

Error extracting text from https://www.wsj.com/articles/british-leader-theresa-may-talks-down-scotlands-exit-as-brexit-looms-1488552723: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/british-leader-theresa-may-talks-down-scotlands-exit-as-brexit-looms-1488552723


Processing URLs:  67%|██████▋   | 667/1000 [20:30<09:21,  1.69s/it]

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/merck-says-us-govt-buy-about-17-mln-courses-cos-covid-19-drug-2021-06-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/merck-says-us-govt-buy-about-17-mln-courses-cos-covid-19-drug-2021-06-09/


Processing URLs:  67%|██████▋   | 669/1000 [20:31<06:36,  1.20s/it]

Error extracting text from https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=8f828df5-f836-46bc-ba27-58da3a16e18e&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KRK-X471-JCMN-Y2X2-00000-00&amp;pddocid=urn%3AcontentItem%3A5KRK-X471: 403 Client Error: Forbidden for url: https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=8f828df5-f836-46bc-ba27-58da3a16e18e&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KRK-X471-JCMN-Y2X2-00000-00&amp;pddocid=urn%3AcontentItem%3A5KRK-X471


Processing URLs:  67%|██████▋   | 670/1000 [20:33<06:52,  1.25s/it]

Error extracting text from https://shar.es/1IlqAx: 500 Server Error: Internal Server Error for url: https://shar.es/1IlqAx/


Processing URLs:  68%|██████▊   | 678/1000 [20:43<06:26,  1.20s/it]

Error extracting text from https://www.reuters.com/article/us-brazil-politics/bolsonaro-allies-set-to-win-control-of-brazils-congress-idUSKBN29N1V6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics/bolsonaro-allies-set-to-win-control-of-brazils-congress-idUSKBN29N1V6


Processing URLs:  68%|██████▊   | 681/1000 [20:46<05:05,  1.04it/s]

Error extracting text from https://www.yahoo.com/news/poland-pm-says-never-bow-eu-ultimatum-105140745.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/poland-pm-says-never-bow-eu-ultimatum-105140745.html


Processing URLs:  68%|██████▊   | 683/1000 [20:48<05:03,  1.04it/s]

Error extracting text from http://mobile.nytimes.com/2016/12/31/us/politics/donald-trump-russia-hacking.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/12/31/us/politics/donald-trump-russia-hacking.html


Processing URLs:  69%|██████▉   | 689/1000 [21:07<10:28,  2.02s/it]

Error extracting text from https://www.ajpmonline.org/article/S0749-3797(20)30284-1/fulltext: 403 Client Error: Forbidden for url: https://www.ajpmonline.org/article/S0749-3797(20)30284-1/fulltext


Processing URLs:  69%|██████▉   | 690/1000 [21:09<09:29,  1.84s/it]

Error extracting text from https://www.newspressnow.com/news/national_news/coronavirus/brazils-bolsonaro-recovers-in-hospital-no-surgery-planned/article_85097e19-cd5e-549b-bbd7-a8adf66bcd3e.html: 404 Client Error: Not Found for url: https://www.newspressnow.com/news/national_news/coronavirus/brazils-bolsonaro-recovers-in-hospital-no-surgery-planned/article_85097e19-cd5e-549b-bbd7-a8adf66bcd3e.html


Processing URLs:  69%|██████▉   | 692/1000 [21:11<08:13,  1.60s/it]

URL filtered: https://twitter.com/natesilver538/status/1105181997792747520
URL filtered: https://www.spytalk.co/p/high-level-chinese-defection-rumored?r=2hta&amp;utm_campaign=post&amp;utm_medium=web&amp;utm_source=linkedin


Processing URLs:  70%|██████▉   | 695/1000 [21:15<07:18,  1.44s/it]

Error extracting text from http://post.nyssa.org/nyssa-news/2010/05/whither-efficient-markets-efficient-market-theory-and-behavioral-finance.html: 404 Client Error: Not Found for url: https://cfany.org/nyssa-news/2010/05/whither-efficient-markets-efficient-market-theory-and-behavioral-finance.html


Processing URLs:  70%|██████▉   | 697/1000 [21:18<07:18,  1.45s/it]

Error extracting text from http://uk.reuters.com/article/uk-germany-election-poll-idUKKBN19J0EB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.youtube.com/watch?v=U46di6mKO4s


Processing URLs:  70%|███████   | 701/1000 [21:21<04:46,  1.04it/s]

Error extracting text from http://mashable.com/2017/11/09/ford-china-electric-cars/#0jeIqfosPOqw: 404 Client Error: Not Found for url: https://mashable.com/2017/11/09/ford-china-electric-cars/#0jeIqfosPOqw


Processing URLs:  70%|███████   | 705/1000 [21:32<08:47,  1.79s/it]

Error extracting text from https://in.rbth.com/tag/russian%20navy: HTTPSConnectionPool(host='in.rbth.com', port=443): Max retries exceeded with url: /tag/russian%20navy (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301cd18b0>: Failed to resolve 'in.rbth.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/EigenCori/status/1497312849298923527


Processing URLs:  71%|███████   | 709/1000 [21:37<07:11,  1.48s/it]

Error extracting text from http://reut.rs/1PD8R6E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/04/us-opec-meeting-idUSKBN0TM30B20151204


Processing URLs:  72%|███████▏  | 717/1000 [21:51<10:21,  2.20s/it]

Error extracting text from http://space.mit.edu/home/tegmark/dimensions.pdf: HTTPSConnectionPool(host='space.mit.edu', port=443): Max retries exceeded with url: /home/tegmark/dimensions.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  72%|███████▏  | 720/1000 [21:53<05:50,  1.25s/it]

Error extracting text from http://www.ibtimes.com/russia-sends-missile-defense-systems-syria-amid-fears-its-planes-could-be-hijacked-2170420: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russia-sends-missile-defense-systems-syria-amid-fears-its-planes-could-be-hijacked-2170420
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-russell-idUSKBN16K2H3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-russell-idUSKBN16K2H3


Processing URLs:  72%|███████▏  | 724/1000 [22:03<08:24,  1.83s/it]

Error extracting text from http://www.topspeed.com/cars/car-news/study-shows-that-electric-cars-could-take-over-wealthy-cities-as-early-as-2030-ar174822.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  73%|███████▎  | 726/1000 [22:03<04:29,  1.02it/s]

Error extracting text from http://orient-news.net/en/news_show/119459/0/Assads-dream-will-not-come-true: 403 Client Error: Forbidden for url: https://orient-news.net/en/news_show/119459/0/Assads-dream-will-not-come-true
Error extracting text from http://www.reuters.com/article/eurozone-crisis-greece-idUSB4N13100W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/eurozone-crisis-greece-idUSB4N13100W


Processing URLs:  73%|███████▎  | 728/1000 [22:05<03:54,  1.16it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/house-bill/2029: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/house-bill/2029


Processing URLs:  73%|███████▎  | 731/1000 [22:07<03:32,  1.27it/s]

Error extracting text from https://www.fireeye.com/cyber-map/threat-map.html?utm_campaign=newHP: 530 Server Error:  for url: https://www.fireeye.com/cyber-map/threat-map.html?utm_campaign=newHP


Processing URLs:  73%|███████▎  | 732/1000 [22:09<05:11,  1.16s/it]

Error extracting text from http://www.fukuoka.unhabitat.org/projects/myanmar/detail10_en.html: 404 Client Error: Not Found for url: http://fukuoka.unhabitat.org/projects/myanmar/detail10_en.html


Processing URLs:  73%|███████▎  | 733/1000 [22:10<04:44,  1.07s/it]

Error extracting text from https://finance.yahoo.com/news/big-legal-battle-wont-go-away-anytime-soon-uber-202552556.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/big-legal-battle-wont-go-away-anytime-soon-uber-202552556.html
URL filtered: https://twitter.com/Conflicts/status/787383620440887296


Processing URLs:  74%|███████▎  | 735/1000 [22:11<02:48,  1.57it/s]

Error extracting text from http://www.wsj.com/articles/learning-from-vladimir-1458601927: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/learning-from-vladimir-1458601927


Processing URLs:  74%|███████▎  | 737/1000 [22:13<03:35,  1.22it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/pa/pennsylvania_senate_toomey_vs_mcginty-5074.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/pa/pennsylvania_senate_toomey_vs_mcginty-5074.html


Processing URLs:  74%|███████▍  | 739/1000 [22:14<03:09,  1.37it/s]

Error extracting text from https://www.nytimes.com/2018/01/29/us/politics/nafta-talks-conclude-in-montreal-with-signs-of-progress-and-risk.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/29/us/politics/nafta-talks-conclude-in-montreal-with-signs-of-progress-and-risk.html


Processing URLs:  74%|███████▍  | 741/1000 [22:17<03:53,  1.11it/s]

Error extracting text from http://mobile.nytimes.com/2016/03/10/opinion/to-stop-the-missiles-stop-north-korea-inc.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/10/opinion/to-stop-the-missiles-stop-north-korea-inc.html?_r=0&amp;referer=https://www.google.com/


Processing URLs:  74%|███████▍  | 743/1000 [22:25<12:11,  2.85s/it]

Error extracting text from http://panamadisease.org/en/theproblem: 404 Client Error: Not Found for url: https://fusariumwilt.org/en/theproblem


Processing URLs:  75%|███████▍  | 746/1000 [22:39<20:00,  4.72s/it]

URL filtered: https://www.facebook.com/zuck/posts/10103269806149061


Processing URLs:  75%|███████▍  | 749/1000 [22:41<09:47,  2.34s/it]

Error extracting text from http://apps.dor.wa.gov/ResearchStats/Content/GrossBusinessIncome/Report.aspx: HTTPSConnectionPool(host='apps.dor.wa.gov', port=443): Max retries exceeded with url: /ResearchStats/Content/GrossBusinessIncome/Report.aspx (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  75%|███████▌  | 752/1000 [22:44<05:36,  1.36s/it]

Error extracting text from http://armscontrolcenter.org/fact-sheet-north-koreas-nuclear-and-ballistic-missile-programs/: 403 Client Error: Forbidden for url: http://armscontrolcenter.org/fact-sheet-north-koreas-nuclear-and-ballistic-missile-programs/
Error extracting text from http://www.reuters.com/article/us-saudiaramco-ipo-restructuring-idUSKBN15W1ZB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudiaramco-ipo-restructuring-idUSKBN15W1ZB


Processing URLs:  75%|███████▌  | 754/1000 [22:48<06:31,  1.59s/it]

Error extracting text from https://www.olg-duesseldorf.nrw.de/behoerde/presse/Presse_aktuell/20210825_PM_Nord-Stream-2/index.php: 404 Client Error: Not Found for url: https://www.olg-duesseldorf.nrw.de/behoerde/presse/Presse_aktuell/20210825_PM_Nord-Stream-2/index.php


Processing URLs:  76%|███████▌  | 759/1000 [22:59<06:27,  1.61s/it]

Error extracting text from http://www.nytimes.com/2016/03/13/opinion/sunday/how-saudi-arabia-turned-its-greatest-weapon-on-itself.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/13/opinion/sunday/how-saudi-arabia-turned-its-greatest-weapon-on-itself.html


Processing URLs:  76%|███████▌  | 762/1000 [23:03<05:40,  1.43s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-12/with-rousseff-s-survival-in-doubt-eyes-turn-to-brazil-protests


Processing URLs:  76%|███████▋  | 764/1000 [23:04<03:45,  1.05it/s]

URL filtered: https://www.youtube.com/watch?v=dX5QNZtQXX4


Processing URLs:  77%|███████▋  | 773/1000 [23:15<05:25,  1.43s/it]

Error extracting text from http://www.38north.org/2017/04/jschilling042117/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  78%|███████▊  | 779/1000 [23:26<05:58,  1.62s/it]

Error extracting text from https://larswericson.wordpress.com/2016/01/07/06jan16pm-sitrep/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/01/07/06jan16pm-sitrep/


Processing URLs:  78%|███████▊  | 780/1000 [23:27<05:29,  1.50s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-15/indonesia-to-destroy-71-boats-in-display-of-maritime-sovereignty


Processing URLs:  78%|███████▊  | 784/1000 [23:31<03:45,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16G0DP?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16G0DP?il=0


Processing URLs:  79%|███████▊  | 787/1000 [23:35<04:57,  1.40s/it]

Error extracting text from http://dfat.gov.au/trade/agreements/rcep/news/Pages/seventeenth-round-of-negotiations.aspx: 403 Client Error: Forbidden for url: https://www.dfat.gov.au/trade/agreements/rcep/news/Pages/seventeenth-round-of-negotiations.aspx


Processing URLs:  79%|███████▉  | 788/1000 [24:36<1:05:10, 18.44s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-07-10/russian-venezuelan-leaders-discuss-cooperation-energy-projects-kremlin: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  79%|███████▉  | 790/1000 [24:40<36:06, 10.32s/it]  

URL filtered: https://www.youtube.com/watch?v=1fnzcAFy8d8


Processing URLs:  79%|███████▉  | 793/1000 [24:46<18:35,  5.39s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-31/goldman-sachs-gave-big-hand-to-venezuela-hunger-bonds-movement


Processing URLs:  80%|███████▉  | 795/1000 [24:47<11:40,  3.42s/it]

Error extracting text from http://www.cnbc.com/2015/09/30/reuters-america-update-1-iranian-condensate-exports-at-2015-high-as-china-resumes-buying-sources.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/09/30/reuters-america-update-1-iranian-condensate-exports-at-2015-high-as-china-resumes-buying-sources.html


Processing URLs:  80%|███████▉  | 797/1000 [24:49<08:17,  2.45s/it]

Error extracting text from https://the-world-is-watching.org/2021/09/08/legal-opinon-the-representation-of-the-state-of-the-union-of-myanmar-at-the-united-nations/: 406 Client Error: Not Acceptable for url: https://the-world-is-watching.org/2021/09/08/legal-opinon-the-representation-of-the-state-of-the-union-of-myanmar-at-the-united-nations/


Processing URLs:  80%|███████▉  | 799/1000 [24:54<07:27,  2.22s/it]

Error extracting text from http://triblive.com/mobile/10840941-96/isis-mosul-says: 403 Client Error: Forbidden for url: http://triblive.com/mobile/10840941-96/isis-mosul-says
Error extracting text from http://www.nasdaq.com/markets/crude-oil-brent.aspx: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/crude-oil-brent.aspx


Processing URLs:  80%|████████  | 801/1000 [24:55<05:06,  1.54s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/hezbollah-vows-send-fighters-syrias-aleppo-40108655: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/hezbollah-vows-send-fighters-syrias-aleppo-40108655


Processing URLs:  80%|████████  | 802/1000 [24:56<04:21,  1.32s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-bookmakers-idUKKCN0WO20B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  80%|████████  | 804/1000 [24:57<02:56,  1.11it/s]

Error extracting text from http://postimg.org/image/kmvwx9pzj/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/kmvwx9pzj/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300725040>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.nytimes.com/2017/10/02/technology/facebook-russia-ads-.html


Processing URLs:  81%|████████  | 806/1000 [25:00<03:35,  1.11s/it]

Error extracting text from http://www.platts.com/latest-news/shipping/houston/repairs-to-new-panama-canal-locks-to-be-completed-21549411: 403 Client Error: Forbidden for url: https://www.spglobal.com/platts/en/market-insights/latest-news


Processing URLs:  81%|████████  | 810/1000 [25:07<04:09,  1.31s/it]

Error extracting text from https://www.axios.com/trump-econ-advisor-no-go-on-house-border-adjustment-tax-2282642439.html: 403 Client Error: Forbidden for url: https://www.axios.com/trump-econ-advisor-no-go-on-house-border-adjustment-tax-2282642439.html


Processing URLs:  81%|████████  | 811/1000 [25:09<05:30,  1.75s/it]

Error extracting text from http://www.governo.it/agenda/2016-04-12t000000-2016-04-13t000000/visita-di-renzi-iran/4424: 403 Client Error: Forbidden for url: https://www.governo.it/it/agenda/2016-04-12t000000-2016-04-13t000000/visita-di-renzi-iran/4424


Processing URLs:  81%|████████  | 812/1000 [25:11<05:41,  1.81s/it]

Error extracting text from http://www.ibtimes.com/isis-moving-libyas-oil-fields-recruiting-engineers-boost-revenue-2255879: 403 Client Error: Forbidden for url: https://www.ibtimes.com/isis-moving-libyas-oil-fields-recruiting-engineers-boost-revenue-2255879


Processing URLs:  81%|████████▏ | 814/1000 [25:12<03:31,  1.14s/it]

Error extracting text from http://www.businessinsider.com/r-spacex-could-be-grounded-for-9-12-months-ula-chief-2016-9: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-spacex-could-be-grounded-for-9-12-months-ula-chief-2016-9
Error extracting text from http://www.reuters.com/article/2015/11/03/us-mideast-crisis-syria-russia-idUSKCN0SS0TY20151103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/03/us-mideast-crisis-syria-russia-idUSKCN0SS0TY20151103


Processing URLs:  82%|████████▏ | 815/1000 [25:13<02:59,  1.03it/s]

Error extracting text from http://www.aisb.org.uk/events/loebner-prize: 406 Client Error: Not Acceptable for url: http://www.aisb.org.uk/events/loebner-prize


Processing URLs:  82%|████████▏ | 816/1000 [25:13<02:25,  1.26it/s]

Error extracting text from https://www.wsj.com/articles/teamsters-union-votes-to-help-organize-amazon-workers-11624558332: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/teamsters-union-votes-to-help-organize-amazon-workers-11624558332


Processing URLs:  82%|████████▏ | 818/1000 [25:16<03:22,  1.11s/it]

URL filtered: https://www.youtube.com/watch?v=NYJZj7qux_c


Processing URLs:  82%|████████▏ | 821/1000 [25:18<02:15,  1.32it/s]

Error extracting text from https://www.nytimes.com/2017/04/25/opinion/a-cruel-april-in-kashmir.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/25/opinion/a-cruel-april-in-kashmir.html?_r=0


Processing URLs:  82%|████████▏ | 824/1000 [25:25<05:06,  1.74s/it]

Error extracting text from http://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm17_2016_n_05_16_pm_komplett.html?nn=716864: 404 Client Error: Not Found for url: https://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm17_2016_n_05_16_pm_komplett.html?nn=716864


Processing URLs:  82%|████████▎ | 825/1000 [25:25<03:58,  1.36s/it]

Error extracting text from https://www.sciencedirect.com/science/article/pii/S1873506120304165: 403 Client Error: Forbidden for url: https://www.sciencedirect.com/science/article/pii/S1873506120304165


Processing URLs:  83%|████████▎ | 828/1000 [25:29<03:49,  1.33s/it]

Error extracting text from http://www.economist.com/blogs/graphicdetail/2016/01/graphics-britain-s-referendum-eu-membership: 404 Client Error: Not Found for url: https://www.economist.com/blogs/graphicdetail/2016/01/graphics-britain-s-referendum-eu-membership
URL filtered: https://www.youtube.com/watch?v=Tt-mpuR_QHQ


Processing URLs:  83%|████████▎ | 831/1000 [25:31<02:24,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-indonesia-japan-southchinasea-idUSKBN14Z0G1?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-indonesia-japan-southchinasea-idUSKBN14Z0G1?feedType=RSS&amp;feedName=worldNews


Processing URLs:  83%|████████▎ | 833/1000 [25:34<03:11,  1.15s/it]

Error extracting text from http://www.timesofisrael.com/hezbollah-says-israeli-warplanes-struck-arms-convoy-in-syria/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/hezbollah-says-israeli-warplanes-struck-arms-convoy-in-syria/


Processing URLs:  83%|████████▎ | 834/1000 [25:34<02:27,  1.13it/s]

Error extracting text from http://www.wsj.com/articles/u-s-accuses-russia-of-violating-missile-treaty-1476912606: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-accuses-russia-of-violating-missile-treaty-1476912606


Processing URLs:  84%|████████▍ | 840/1000 [25:45<04:07,  1.55s/it]

Error extracting text from https://www.wsj.com/articles/head-of-who-team-investigating-origins-of-covid-19-calls-for-closer-look-at-china-lab-11628807623: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/head-of-who-team-investigating-origins-of-covid-19-calls-for-closer-look-at-china-lab-11628807623


Processing URLs:  84%|████████▍ | 843/1000 [27:47<1:36:27, 36.86s/it]

Error extracting text from https://kaplanherald.com/2017/12/31/taliban-decreasing-ambitions-in-afghanistan-after-main-losses-says-us-normal-main-combat/: HTTPSConnectionPool(host='kaplanherald.com', port=443): Max retries exceeded with url: /2017/12/31/taliban-decreasing-ambitions-in-afghanistan-after-main-losses-says-us-normal-main-combat/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301538b90>, 'Connection to kaplanherald.com timed out. (connect timeout=60)'))


Processing URLs:  84%|████████▍ | 844/1000 [28:47<1:53:55, 43.82s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article141565204.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  84%|████████▍ | 845/1000 [28:51<1:22:23, 31.90s/it]

URL filtered: https://twitter.com/NCPoliticsEU/status/722919578129436673?s=09


Processing URLs:  85%|████████▍ | 847/1000 [28:52<44:26, 17.43s/it]  

Error extracting text from http://www.businessinsider.com/r-update-1--venezuela-pdvsa-extends-early-bond-swap-deadline-2016-10?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-update-1--venezuela-pdvsa-extends-early-bond-swap-deadline-2016-10?IR=T


Processing URLs:  85%|████████▌ | 850/1000 [28:58<20:24,  8.16s/it]

Error extracting text from http://www.wsj.com/articles/camerons-eu-intentions-are-likely-too-ambitious-1431632667: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/camerons-eu-intentions-are-likely-too-ambitious-1431632667


Processing URLs:  85%|████████▌ | 852/1000 [29:00<11:26,  4.64s/it]

Error extracting text from http://stm.sciencemag.org/content/7/278/278ra33: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/scitranslmed.aaa2512


Processing URLs:  86%|████████▌ | 859/1000 [29:14<05:00,  2.13s/it]

Error extracting text from https://reut.rs/3s1WSXa: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/india/india-deploys-warships-south-china-sea-part-act-east-policy-2021-08-04/
URL filtered: https://www.youtube.com/watch?v=9JHhhg9cUjA


Processing URLs:  86%|████████▌ | 861/1000 [29:24<08:01,  3.47s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/un-asked-to-take-action-against-iran-for-recent-missile-test/2015/10/21/9ef966e8-7868-11e5-a5e2-40d6b2ad18dd_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/un-asked-to-take-action-against-iran-for-recent-missile-test/2015/10/21/9ef966e8-7868-11e5-a5e2-40d6b2ad18dd_story.html


Processing URLs:  86%|████████▌ | 862/1000 [29:25<06:18,  2.74s/it]

Error extracting text from https://www.afghanistan-analysts.org/another-hurdle-for-elections-in-2016-mps-reject-presidential-decree-on-electoral-commissions/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/another-hurdle-for-elections-in-2016-mps-reject-presidential-decree-on-electoral-commissions/


Processing URLs:  86%|████████▋ | 864/1000 [29:27<04:33,  2.01s/it]

Error extracting text from https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/02/27/remarks-bypresident-biden-on-the-american-rescue-plan/: 404 Client Error: Not Found for url: https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/02/27/remarks-by%02president-biden-on-the-american-rescue-plan/


Processing URLs:  87%|████████▋ | 867/1000 [29:31<03:07,  1.41s/it]

Error extracting text from http://dictionary.cambridge.org/us/dictionary/english/sanction: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  87%|████████▋ | 869/1000 [29:33<02:28,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-nafta-idUSKBN156128: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-nafta-idUSKBN156128


Processing URLs:  87%|████████▋ | 870/1000 [29:33<01:53,  1.15it/s]

Error extracting text from https://www.nytimes.com/2018/01/26/health/flu-rates-deaths.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/26/health/flu-rates-deaths.html


Processing URLs:  87%|████████▋ | 872/1000 [29:35<02:18,  1.08s/it]

Error extracting text from http://cnn.it/1PC2bC3&quot: 404 Client Error: Not Found for url: https://edition.cnn.com/error


Processing URLs:  87%|████████▋ | 873/1000 [29:37<02:25,  1.14s/it]

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/airstrike-south-of-mosul-kill-28-isis-members/amp/?client=ms-android-rogers-ca#: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/airstrike-south-of-mosul-kill-28-isis-members/amp/


Processing URLs:  88%|████████▊ | 877/1000 [29:43<02:18,  1.12s/it]

Error extracting text from https://news.yahoo.com/email-suggests-biden-family-ties-012824340.html: 404 Client Error: Not Found for url: https://news.yahoo.com/email-suggests-biden-family-ties-012824340.html
URL filtered: https://www.youtube.com/watch?v=f_Ad0tHFqgY


Processing URLs:  88%|████████▊ | 882/1000 [29:45<01:12,  1.63it/s]

Error extracting text from https://www.nytimes.com/2017/04/21/business/energy-environment/treasury-exxon-mobil-sanctions-waiver.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/21/business/energy-environment/treasury-exxon-mobil-sanctions-waiver.html?_r=0


Processing URLs:  89%|████████▊ | 886/1000 [29:54<03:14,  1.70s/it]

URL filtered: https://www.youtube.com/results?search_query=trump+megyn+kelly+new+york+values


Processing URLs:  89%|████████▉ | 890/1000 [29:59<02:12,  1.20s/it]

Error extracting text from http://www.latimes.com/world/la-fg-russia-syria-20150905-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-russia-syria-20150905-story.html


Processing URLs:  89%|████████▉ | 893/1000 [30:03<02:14,  1.26s/it]

Error extracting text from https://www.un.org/en/sc/members/: 403 Client Error: Forbidden for url: https://www.un.org/en/sc/members/


Processing URLs:  89%|████████▉ | 894/1000 [30:03<01:48,  1.02s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-25/canada-mexico-describe-nafta-win-win-win-opportunity-at-davos


Processing URLs:  90%|████████▉ | 896/1000 [30:08<02:56,  1.70s/it]

Error extracting text from http://www.el-balad.com/2140407: 404 Client Error: Not Found for url: https://el-balad.com/2140407


Processing URLs:  90%|█████████ | 900/1000 [30:26<05:14,  3.14s/it]

Error extracting text from http://cdec.water.ca.gov/cdecapp/snowapp/sweq.action: 404 Client Error: Not Found for url: http://cdec.water.ca.gov/cdecapp/snowapp/sweq.action


Processing URLs:  90%|█████████ | 902/1000 [30:30<04:21,  2.67s/it]

Error extracting text from http://nnsa.energy.gov/sites/default/files/nnsa/pageinlineimages/Pantex%20-%20Weapons%20Surveillance.JPG: 404 Client Error: Not Found for url: https://www.energy.gov/nnsa/national-nuclear-security-administrationsites/default/files/nnsa/pageinlineimages/Pantex%20-%20Weapons%20Surveillance.JPG


Processing URLs:  91%|█████████ | 909/1000 [30:40<02:04,  1.37s/it]

Error extracting text from https://www.wsj.com/articles/opec-chief-sees-no-u-s-threat-as-shale-production-rises-1487689959: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-chief-sees-no-u-s-threat-as-shale-production-rises-1487689959


Processing URLs:  91%|█████████▏| 914/1000 [30:47<01:27,  1.01s/it]

Error extracting text from http://www.odbrana.gov.me/en/search/164774/Turkey-ratifies-Protocol-on-Montenegro-s-Accession-to-NATO.html: HTTPConnectionPool(host='www.odbrana.gov.me', port=80): Max retries exceeded with url: /en/search/164774/Turkey-ratifies-Protocol-on-Montenegro-s-Accession-to-NATO.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300eba270>: Failed to resolve 'www.odbrana.gov.me' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.nbcnews.com/news/world/iran-s-options-retaliation-against-u-s-americans-span-globe-n1109966: 403 Client Error: Forbidden for url: https://www.nbcnews.com/news/world/iran-s-options-retaliation-against-u-s-americans-span-globe-n1109966


Processing URLs:  92%|█████████▏| 915/1000 [30:50<01:58,  1.39s/it]

URL filtered: https://www.bostonglobe.com/2021/10/28/business/supply-chain-devastation-spreads-bookstores-big-holiday-season-draws-near/?s_campaign=bostonglobe%3Asocialflow%3Atwitter


Processing URLs:  92%|█████████▏| 917/1000 [31:06<05:41,  4.12s/it]

Error extracting text from https://www.almasdarnews.com/article/iraqi-fm-one-third-mosul-liberated-isis/: 522 Server Error:  for url: https://www.almasdarnews.com/article/iraqi-fm-one-third-mosul-liberated-isis/


Processing URLs:  92%|█████████▏| 918/1000 [31:06<04:26,  3.25s/it]

Error extracting text from https://www.fastcompany.com/40492335/well-resourced-cyber-spies-are-reportedly-targeting-south-america-asia: 403 Client Error: Forbidden for url: https://www.fastcompany.com/40492335/well-resourced-cyber-spies-are-reportedly-targeting-south-america-asia


Processing URLs:  92%|█████████▏| 922/1000 [31:19<03:06,  2.40s/it]

Error extracting text from http://www.comw.org/qdr/fulltext/0207thomason.pdf: 406 Client Error: Not Acceptable for url: http://www.comw.org/qdr/fulltext/0207thomason.pdf
Error extracting text from http://www.foxnews.com/politics/2016/10/15/cia-reportedly-preparing-major-cyber-assault-against-russia-in-wake-hack-attacks.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/politics/2016/10/15/cia-reportedly-preparing-major-cyber-assault-against-russia-in-wake-hack-attacks.html
Error extracting text from http://www.foxnews.com/world/2016/03/01/assad-proposes-full-amnesty-for-syrian-rebels-who-lay-down-their-arms.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/world/2016/03/01/assad-proposes-full-amnesty-for-syrian-rebels-who-lay-down-their-arms.html


Processing URLs:  92%|█████████▏| 923/1000 [31:19<02:37,  2.05s/it]

Error extracting text from https://www.espn.com/mlb/story/_/id/33362477/inside-self-inflicted-crisis-boiling-mlb-lockout-deadline-arrives: 403 Client Error: Forbidden for url: https://www.espn.com/mlb/story/_/id/33362477/inside-self-inflicted-crisis-boiling-mlb-lockout-deadline-arrives


Processing URLs:  92%|█████████▎| 925/1000 [31:22<02:17,  1.84s/it]

Error extracting text from http://www.turkishweekly.net/2015/10/27/news/serbia-s-vucic-visits-moscow-seeking-russian-economic-support-in-return-for-foreign-policy-backing/: 404 Client Error: Not Found for url: https://turkishweekly.net/2015/10/27/news/serbia-s-vucic-visits-moscow-seeking-russian-economic-support-in-return-for-foreign-policy-backing/


Processing URLs:  93%|█████████▎| 927/1000 [31:24<01:45,  1.45s/it]

Error extracting text from http://finance.yahoo.com/news/dakota-access-pipeline-may-not-163502398.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/dakota-access-pipeline-may-not-163502398.html


Processing URLs:  93%|█████████▎| 928/1000 [31:25<01:26,  1.20s/it]

Error extracting text from http://intelligencebriefs.com/uganda-promises-5000-troops-to-somalia-to-fight-al-shabaab-in-the-wake-of-amisom-withdrawal/: 406 Client Error: Not Acceptable for url: http://intelligencebriefs.com/uganda-promises-5000-troops-to-somalia-to-fight-al-shabaab-in-the-wake-of-amisom-withdrawal/


Processing URLs:  93%|█████████▎| 929/1000 [31:25<01:05,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-india-brics-bank-idUSKBN12G04P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-brics-bank-idUSKBN12G04P
URL filtered: https://www.bloomberg.com/news/articles/2016-10-06/saudi-aramco-ipo-will-offer-stake-in-all-of-company-s-operations
Error extracting text from http://blogs.wsj.com/moneybeat/2015/09/23/little-hope-for-china-ipo-market-in-2015-says-ey/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2015/09/23/little-hope-for-china-ipo-market-in-2015-says-ey/


Processing URLs:  93%|█████████▎| 934/1000 [31:30<01:09,  1.05s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/south-korea-warns-north-korea-launch-satellite-36682037: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/south-korea-warns-north-korea-launch-satellite-36682037


Processing URLs:  94%|█████████▎| 936/1000 [31:31<00:52,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-conservatives/uk-pms-deputy-urges-conservatives-to-back-eu-repeal-bill-the-sunday-telegraph-idUSKCN1BD0UK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-conservatives/uk-pms-deputy-urges-conservatives-to-back-eu-repeal-bill-the-sunday-telegraph-idUSKCN1BD0UK


Processing URLs:  94%|█████████▍| 938/1000 [31:33<00:54,  1.14it/s]

Error extracting text from http://www.latimes.com/business/autos/la-fi-hy-google-waymo-self-driving-20161213-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/autos/la-fi-hy-google-waymo-self-driving-20161213-story.html


Processing URLs:  94%|█████████▍| 940/1000 [31:36<00:56,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/john-boehner-races-time-on-debt-limit-1445634628: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/john-boehner-races-time-on-debt-limit-1445634628


Processing URLs:  94%|█████████▍| 942/1000 [31:39<01:09,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/u-n-atomic-inspectors-visit-irans-parchin-site-1442769344: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-atomic-inspectors-visit-irans-parchin-site-1442769344


Processing URLs:  95%|█████████▍| 949/1000 [31:54<01:47,  2.11s/it]

Error extracting text from http://www.geekwire.com/2016/alphago-lee-sedol-whos-underdog-in-google-ai-million-go-match/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2016/alphago-lee-sedol-whos-underdog-in-google-ai-million-go-match/


Processing URLs:  95%|█████████▌| 951/1000 [31:56<01:06,  1.35s/it]

Error extracting text from http://calhoun.nps.edu/bitstream/handle/10945/11515/SI_V9_I1_2010_Greenhill_116.pdf: 403 Client Error: Forbidden for url: https://calhoun.nps.edu/bitstream/handle/10945/11515/SI_V9_I1_2010_Greenhill_116.pdf
Error extracting text from https://www.predictit.org/Contract/1565/Will-Republicans-have-a-brokered-convention-in-2016#openoffers: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/1565/Will-Republicans-have-a-brokered-convention-in-2016#openoffers


Processing URLs:  95%|█████████▌| 953/1000 [31:56<00:35,  1.32it/s]

Error extracting text from https://www.scotsman.com/news/opinion/columnists/alex-salmond-has-plunged-a-stake-into-the-heart-of-scottish-democracy-brian-monteith-3181513: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/opinion/columnists/alex-salmond-has-plunged-a-stake-into-the-heart-of-scottish-democracy-brian-monteith-3181513
Error extracting text from https://www.geekwire.com/2018/financial-analyst-renews-call-amazon-spin-off-amazon-web-services/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2018/financial-analyst-renews-call-amazon-spin-off-amazon-web-services/
URL filtered: https://www.bnnbloomberg.ca/hungary-poland-thwart-eu-s-push-for-gender-equality-1.1600717


Processing URLs:  96%|█████████▌| 955/1000 [31:56<00:19,  2.28it/s]

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-polls-what-are-the-latest-opinion-polls-for-the-2021-election-and-what-happened-in-2016-3217223: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-polls-what-are-the-latest-opinion-polls-for-the-2021-election-and-what-happened-in-2016-3217223


Processing URLs:  96%|█████████▌| 959/1000 [32:10<02:24,  3.53s/it]

Error extracting text from https://www.washingtonpost.com/politics/federal_government/us-sues-vw-over-emissions-cheating-software-in-diesel-cars/2016/01/04/b965cb68-b30d-11e5-8abc-d09392edc612_story.html?hpid=hp_no-name_no-name%3Apage%2Fbreaking-news-bar: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/federal_government/us-sues-vw-over-emissions-cheating-software-in-diesel-cars/2016/01/04/b965cb68-b30d-11e5-8abc-d09392edc612_story.html?hpid=hp_no-name_no-name%3Apage%2Fbreaking-news-bar


Processing URLs:  96%|█████████▋| 963/1000 [32:18<01:26,  2.35s/it]

Error extracting text from http://nationalinterest.org/feature/adiz-the-south-china-sea-nine-dash-line-20-17121: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/adiz-the-south-china-sea-nine-dash-line-20-17121


Processing URLs:  96%|█████████▋| 965/1000 [32:20<00:51,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-renault-emissions-idUSKCN0UX12K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-renault-emissions-idUSKCN0UX12K


Processing URLs:  97%|█████████▋| 967/1000 [32:22<00:42,  1.28s/it]

Error extracting text from https://scontent-mia1-1.xx.fbcdn.net/v/t1.0-9/13339695_10153876438179635_6309036767169764845_n.jpg?oh=72a77ac7618c4728cfb8cc31de231473&amp;oe=57C4AD48: HTTPSConnectionPool(host='scontent-mia1-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13339695_10153876438179635_6309036767169764845_n.jpg?oh=72a77ac7618c4728cfb8cc31de231473&amp;oe=57C4AD48 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3011f1e80>: Failed to resolve 'scontent-mia1-1.xx.fbcdn.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  97%|█████████▋| 972/1000 [32:28<00:37,  1.33s/it]

Error extracting text from http://www.ibtimes.com/ukraine-launches-investigation-power-grid-cyberattack-blamed-russia-2246206: 403 Client Error: Forbidden for url: https://www.ibtimes.com/ukraine-launches-investigation-power-grid-cyberattack-blamed-russia-2246206


Processing URLs:  97%|█████████▋| 974/1000 [32:28<00:22,  1.15it/s]

Error extracting text from http://www.cdc.gov/zika/transmission/index.html: 404 Client Error: Not Found for url: https://www.cdc.gov/zika/transmission/index.html
Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-debt-idUSKCN11X043: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-debt-idUSKCN11X043


Processing URLs:  98%|█████████▊| 977/1000 [32:31<00:20,  1.14it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-20/venezuela-2016-default-likely-pdvsa-may-go-first-moody-s-says


Processing URLs:  98%|█████████▊| 981/1000 [32:35<00:15,  1.20it/s]

Error extracting text from http://www.pcmag.com/article2/0,2817,2496828,00.asp: 403 Client Error: Forbidden for url: http://www.pcmag.com/article2/0,2817,2496828,00.asp


Processing URLs:  98%|█████████▊| 982/1000 [32:37<00:24,  1.36s/it]

Error extracting text from https://koreaexpose.com/in-depth/feudal-presidency-ban-ki-moon-korea/: 404 Client Error: Not Found for url: https://koreaexpose.com/in-depth/feudal-presidency-ban-ki-moon-korea/


Processing URLs:  98%|█████████▊| 983/1000 [32:39<00:26,  1.54s/it]

URL filtered: https://twitter.com/QuicoToro/status/928256258217570305


Processing URLs:  99%|█████████▊| 987/1000 [32:45<00:20,  1.55s/it]

Error extracting text from https://www.scientificamerican.com/article/the-nail-biting-journey-of-nasas-james-webb-space-telescope-is-about-to-begin/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/the-nail-biting-journey-of-nasas-james-webb-space-telescope-is-about-to-begin/


Processing URLs:  99%|█████████▉| 988/1000 [32:49<00:26,  2.18s/it]

Error extracting text from http://www.scmagazine.com/us-officials-look-to-sanctions-to-curb-cyberattacks-by-foreign-hackers/article/436092/: 404 Client Error: Not Found for url: https://www.scmagazine.com/news/us-officials-look-to-sanctions-to-curb-cyberattacks-by-foreign-hackers


Processing URLs:  99%|█████████▉| 991/1000 [32:54<00:14,  1.58s/it]

Error extracting text from https://www.reuters.com/business/energy/oil-strikes-2018-highs-demand-recovery-iran-nuclear-talks-2021-06-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-strikes-2018-highs-demand-recovery-iran-nuclear-talks-2021-06-28/


Processing URLs:  99%|█████████▉| 993/1000 [32:56<00:08,  1.22s/it]

Error extracting text from https://nationalinterest.org/blog/reboot/massive-russian-icbm-can-nuke-any-place-earth-maybe-179256: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/reboot/massive-russian-icbm-can-nuke-any-place-earth-maybe-179256


Processing URLs:  99%|█████████▉| 994/1000 [33:07<00:25,  4.30s/it]

Error extracting text from https://www.washingtonpost.com/news/powerpost/paloma/the-finance-202/2017/07/06/the-finance-202-how-committed-is-the-gop-to-tax-cuts-for-the-rich-we-re-about-to-find-out/595d2570e9b69b7071abca80/: 404 Client Error: Not Found for url: https://www.washingtonpost.com/news/powerpost/paloma/the-finance-202/2017/07/06/the-finance-202-how-committed-is-the-gop-to-tax-cuts-for-the-rich-we-re-about-to-find-out/595d2570e9b69b7071abca80/
URL filtered: https://www.youtube.com/watch?v=tsuUHpX7hHA


Processing URLs: 100%|█████████▉| 998/1000 [33:10<00:03,  1.78s/it]

Error extracting text from https://www.nytimes.com/2021/03/15/business/economy/labor-force-dropouts.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/15/business/economy/labor-force-dropouts.html


Processing URLs: 100%|██████████| 1000/1000 [33:13<00:00,  1.99s/it]
Processing URLs:   0%|          | 1/1000 [00:01<17:13,  1.03s/it]

Error extracting text from https://www.10.tv/news/156046: HTTPSConnectionPool(host='www.10.tv', port=443): Max retries exceeded with url: /news/156046 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300c920c0>: Failed to resolve 'www.10.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   0%|          | 4/1000 [08:04<52:42:56, 190.54s/it]

Error extracting text from https://www.thespainreport.com/articles/878-160829135138-sanchez-confirms-psoe-no-after-meeting-rajoy: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/878-160829135138-sanchez-confirms-psoe-no-after-meeting-rajoy (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300c93dd0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))
URL filtered: https://www.youtube.com/watch?v=_AXyeKbw3tU


Processing URLs:   1%|          | 7/1000 [08:07<18:06:47, 65.67s/it] 

URL filtered: https://twitter.com/arpitrage/status/1277677320401235968
Error extracting text from https://www.imf.org/external/pubs/ft/wp/2012/wp12282.pdf: 403 Client Error: Forbidden for url: https://www.imf.org/external/pubs/ft/wp/2012/wp12282.pdf


Processing URLs:   1%|          | 12/1000 [08:11<5:17:11, 19.26s/it]

Error extracting text from https://home.nra.org/joint-statement: 404 Client Error: Not Found for url: https://home.nra.org/joint-statement


Processing URLs:   2%|▏         | 15/1000 [08:15<2:23:56,  8.77s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-aramco-ipo-idUSKBN14W15N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-ipo-idUSKBN14W15N


Processing URLs:   2%|▏         | 17/1000 [08:17<1:27:19,  5.33s/it]

Error extracting text from http://www.idea.int/vt/countryview.cfm?CountryCode=PE: 404 Client Error: Not Found for url: https://www.idea.int/vt/countryview.cfm?CountryCode=PE


Processing URLs:   2%|▏         | 19/1000 [08:19<54:00,  3.30s/it]  

Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-production-idUSKBN1710F1?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-production-idUSKBN1710F1?il=0


Processing URLs:   2%|▏         | 22/1000 [08:22<29:59,  1.84s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-fuel-idUSKCN11K07Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-fuel-idUSKCN11K07Y


Processing URLs:   2%|▏         | 23/1000 [08:22<22:28,  1.38s/it]

Error extracting text from https://www.nytimes.com/2017/05/13/world/asia/north-korea-missile-test-kim-jong-un-moon-jae-in.html?emc=edit_na_20170513&amp;nl=breaking-news&amp;nlid=52725637&amp;ref=headline&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/13/world/asia/north-korea-missile-test-kim-jong-un-moon-jae-in.html?emc=edit_na_20170513&amp;nl=breaking-news&amp;nlid=52725637&amp;ref=headline&amp;_r=0


Processing URLs:   3%|▎         | 28/1000 [08:29<20:29,  1.26s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VB1ES: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VB1ES


Processing URLs:   3%|▎         | 29/1000 [08:31<24:40,  1.53s/it]

Error extracting text from http://www.caam.org.cn/Views/search.aspx?key=%25E6%2596%25B0%25E8%2583%25BD&amp;type=0&amp;pageindex=2: 404 Client Error: Not Found for url: http://www.caam.org.cn/Views/search.aspx?key=%25E6%2596%25B0%25E8%2583%25BD&amp;type=0&amp;pageindex=2


Processing URLs:   3%|▎         | 32/1000 [08:31<10:23,  1.55it/s]

Error extracting text from http://www.wsj.com/articles/impeachment-proceedings-opened-against-brazilian-president-1449092751?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/impeachment-proceedings-opened-against-brazilian-president-1449092751?mod=e2fb
Error extracting text from https://www.reuters.com/business/feds-kaplan-sees-financial-market-excesses-eyes-qe-taper-2021-04-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/feds-kaplan-sees-financial-market-excesses-eyes-qe-taper-2021-04-30/
Error extracting text from http://www.reuters.com/article/us-india-budget-privatisation-idUSKBN15L06M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-budget-privatisation-idUSKBN15L06M
Error extracting text from https://www.reuters.com/article/us-asia-oil-appec-iran/low-stocks-growth-in-domestic-demand-to-cap-iran-oil-exports-nioc-idUSKCN1C00JV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.

Processing URLs:   4%|▎         | 36/1000 [08:33<08:35,  1.87it/s]

Error extracting text from https://www.nytimes.com/2018/02/16/world/americas/brazil-rio-military-security.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/16/world/americas/brazil-rio-military-security.html


Processing URLs:   4%|▍         | 38/1000 [08:37<18:25,  1.15s/it]

Error extracting text from http://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm25_2016_n_08_16_pm_komplett.html?nn=716864: 404 Client Error: Not Found for url: https://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm25_2016_n_08_16_pm_komplett.html?nn=716864


Processing URLs:   4%|▍         | 44/1000 [08:46<18:42,  1.17s/it]

Error extracting text from http://www.samaa.tv/pakistan/2016/01/major-power-breakdown-hits-parts-of-pakistan/: 403 Client Error: Forbidden for url: https://www.samaa.tv/pakistan/2016/01/major-power-breakdown-hits-parts-of-pakistan/


Processing URLs:   5%|▍         | 49/1000 [09:34<1:54:54,  7.25s/it]

Error extracting text from https://www.newsweek.com/hamas-asks-palestinians-resist-israeli-nationalist-parade-through-jerusalems-old-city-1600556: 403 Client Error: Forbidden for url: https://www.newsweek.com/hamas-asks-palestinians-resist-israeli-nationalist-parade-through-jerusalems-old-city-1600556
Error extracting text from http://www.nytimes.com/2016/01/07/world/asia/north-korea-hydrogen-bomb-q-a.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/07/world/asia/north-korea-hydrogen-bomb-q-a.html?_r=0


Processing URLs:   5%|▌         | 50/1000 [09:34<1:20:58,  5.11s/it]

Error extracting text from http://www.nytimes.com/2015/10/29/business/economy/fed-interest-rates.html?emc=edit_th_20151029&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/29/business/economy/fed-interest-rates.html?emc=edit_th_20151029&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://www.nato.int/docu/comm/2002/0211-prague/more_info/membership.htm: 403 Client Error: Forbidden for url: http://www.nato.int/docu/comm/2002/0211-prague/more_info/membership.htm


Processing URLs:   6%|▌         | 57/1000 [09:51<40:11,  2.56s/it]  

URL filtered: https://twitter.com/nntaleb


Processing URLs:   6%|▌         | 61/1000 [09:56<22:48,  1.46s/it]

Error extracting text from https://www.neweurope.eu/article/ttip-negotiations-restart-brussels-clouded-brexit/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/ttip-negotiations-restart-brussels-clouded-brexit/


Processing URLs:   6%|▌         | 62/1000 [09:56<18:26,  1.18s/it]

URL filtered: https://twitter.com/Archer83Able/status/1370418913641701379


Processing URLs:   6%|▋         | 64/1000 [09:56<11:50,  1.32it/s]

Error extracting text from http://www.cdm.me/english/posa-montenegro-in-nato-by-may-2017: 403 Client Error: Forbidden for url: https://www.cdm.me/english/posa-montenegro-in-nato-by-may-2017


Processing URLs:   7%|▋         | 66/1000 [09:58<10:31,  1.48it/s]

Error extracting text from http://www.pravdareport.com/russia/politics/08-07-2014/127997-nato_ships_russia-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/russia/politics/08-07-2014/127997-nato_ships_russia-0/
Error extracting text from http://www.reuters.com/article/us-myanmar-japan-idUSKBN12W3JO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-japan-idUSKBN12W3JO?il=0


Processing URLs:   7%|▋         | 67/1000 [09:58<10:52,  1.43it/s]

Error extracting text from http://aranews.net/2016/02/dozens-of-isis-militants-escape-iraqs-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/02/dozens-of-isis-militants-escape-iraqs-mosul/


Processing URLs:   7%|▋         | 69/1000 [10:01<13:30,  1.15it/s]

Error extracting text from https://www.axios.com/republican-party-culture-war-tucker-carlson-67e47559-6782-4e8a-97b3-d660f3cb21df.html: 403 Client Error: Forbidden for url: https://www.axios.com/republican-party-culture-war-tucker-carlson-67e47559-6782-4e8a-97b3-d660f3cb21df.html


Processing URLs:   8%|▊         | 81/1000 [10:22<28:19,  1.85s/it]  

Error extracting text from https://www.google.com/amp/s/mobile.reuters.com/article/amp/idUSKBN29I2XZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN29I2XZ
URL filtered: https://www.bloomberg.com/news/articles/2016-07-06/putin-s-military-buildup-in-the-baltic-stokes-invasion-fears
Error extracting text from https://www.oecd.org/newsroom/gdp-growth-second-quarter-2016-oecd.htm: 403 Client Error: Forbidden for url: https://www.oecd.org/newsroom/gdp-growth-second-quarter-2016-oecd.htm


Processing URLs:   9%|▊         | 87/1000 [10:34<31:31,  2.07s/it]

Error extracting text from https://www.moh.gov.et/ejcc/sites/default/files/2020-04/negarit.pdf: 404 Client Error: Not Found for url: https://www.moh.gov.et/ejcc/sites/default/files/2020-04/negarit.pdf


Processing URLs:   9%|▉         | 88/1000 [10:36<30:46,  2.02s/it]

Error extracting text from https://www.reuters.com/article/us-italy-election-5star/italys-5-star-presents-election-program-with-no-euro-referendum-idUSKBN1FB1LM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-election-5star/italys-5-star-presents-election-program-with-no-euro-referendum-idUSKBN1FB1LM


Processing URLs:   9%|▉         | 91/1000 [10:38<20:54,  1.38s/it]

Error extracting text from http://minpromtorg.gov.ru/press-centre/news/#!26_noyabrya_denis_manturov_sovershit_rabochuyu_poezdku_v_obedinennye_arabskie_emiraty: HTTPSConnectionPool(host='minpromtorg.gov.ru', port=443): Max retries exceeded with url: /press-centre/news/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)')))


Processing URLs:   9%|▉         | 92/1000 [10:39<16:27,  1.09s/it]

Error extracting text from http://www.wsj.com/articles/joe-biden-edges-closer-to-joining-presidential-race-1442617215: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/joe-biden-edges-closer-to-joining-presidential-race-1442617215


Processing URLs:  10%|▉         | 96/1000 [12:44<3:14:25, 12.90s/it]

Error extracting text from http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?src=me: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?src=me


Processing URLs:  10%|▉         | 98/1000 [12:44<1:37:58,  6.52s/it]

Error extracting text from https://help.cbp.gov/app/answers/detail/a_id/447/kw/declare%20money/suggested/1: 403 Client Error: Forbidden for url: https://help.cbp.gov/app/answers/detail/a_id/447/kw/declare%20money/suggested/1
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://flaviochaves.com.br/2016/02/10/o-impeachment-e-o-caminho-para-a-recuperacao/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://flaviochaves.com.br/2016/02/10/o-impeachment-e-o-caminho-para-a-recuperacao/&amp;prev=search


Processing URLs:  10%|█         | 100/1000 [12:46<54:51,  3.66s/it] 

Error extracting text from https://greekreporter.com/2021/08/03/new-york-city-face-masks-indoors/: 403 Client Error: Forbidden for url: https://greekreporter.com/2021/08/03/new-york-city-face-masks-indoors/


Processing URLs:  10%|█         | 101/1000 [12:48<44:05,  2.94s/it]

Error extracting text from http://www.cncda.org/Auto_Outlook.asp: 403 Client Error: Forbidden for url: http://www.cncda.org/Auto_Outlook.asp


Processing URLs:  10%|█         | 102/1000 [12:48<34:19,  2.29s/it]

URL filtered: https://www.youtube.com/watch?v=OWwOJlOI1nU


Processing URLs:  11%|█         | 111/1000 [12:57<12:52,  1.15it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/14/heres-where-i-should-be-tonight-if-i-want-to-follow-the-smart-crowd/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/14/heres-where-i-should-be-tonight-if-i-want-to-follow-the-smart-crowd/
Error extracting text from https://www.rand.org/pubs/research_reports/RR2055.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR2055.html


Processing URLs:  12%|█▏        | 117/1000 [13:04<18:01,  1.22s/it]

URL filtered: https://www.youtube.com/watch?v=y63ncbpBDLI


Processing URLs:  12%|█▏        | 119/1000 [13:06<16:11,  1.10s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/dutch-investigators-give-results-of-mh17-probe-to-families/article32098645/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/dutch-investigators-give-results-of-mh17-probe-to-families/article32098645/


Processing URLs:  12%|█▏        | 121/1000 [13:07<10:38,  1.38it/s]

Error extracting text from http://ndb.int/BRICS-nations-led-New-Development-Bank-to-raise-up-to-3-billion-in-next-3-years.php: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /BRICS-nations-led-New-Development-Bank-to-raise-up-to-3-billion-in-next-3-years.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301bc98b0>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.yahoo.com/news/trapped-civilians-stall-iraqi-forces-battling-anbar-090136711.html?nhp=1: 404 Client Error: Not Found for url: https://www.yahoo.com/news/trapped-civilians-stall-iraqi-forces-battling-anbar-090136711.html?nhp=1


Processing URLs:  12%|█▏        | 122/1000 [13:08<11:16,  1.30it/s]

Error extracting text from http://www.france24.com/en/20160223-france-burundi-un-visit-ban-ki-moon: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160223-france-burundi-un-visit-ban-ki-moon


Processing URLs:  13%|█▎        | 127/1000 [13:15<20:39,  1.42s/it]

Error extracting text from http://www.english.rfi.fr/france/20151202-hollande-s-popularity-bounces-50-approval-ratings: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/france/20151202-hollande-s-popularity-bounces-50-approval-ratings
URL filtered: https://www.bloomberg.com/news/articles/2017-04-14/apple-gets-dmv-approval-to-test-self-driving-cars-in-california


Processing URLs:  13%|█▎        | 130/1000 [13:16<12:36,  1.15it/s]

Error extracting text from http://www.ohchr.org/EN/Countries/ENACARegion/Pages/UAReports.aspx: 403 Client Error: Forbidden for url: https://www.ohchr.org/EN/Countries/ENACARegion/Pages/UAReports.aspx


Processing URLs:  13%|█▎        | 132/1000 [13:21<26:46,  1.85s/it]

Error extracting text from http://theiranproject.com/blog/2016/01/18/german-chancellor-to-pay-upcoming-visit-to-iran/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=german-chancellor-to-pay-upcoming-visit-to-iran


Processing URLs:  13%|█▎        | 134/1000 [13:29<38:11,  2.65s/it]

Error extracting text from http://theiowarepublican.com/2015/state-senator-brand-zaun-endorses-trump/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/state-senator-brand-zaun-endorses-trump/


Processing URLs:  14%|█▎        | 136/1000 [13:33<30:01,  2.09s/it]

Error extracting text from http://cleantechnica.com/2016/01/15/1-large-luxury-car-in-us-tesla-model-s-2015-sales-comparison/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/01/15/1-large-luxury-car-in-us-tesla-model-s-2015-sales-comparison/


Processing URLs:  14%|█▎        | 137/1000 [13:35<31:34,  2.19s/it]

Error extracting text from http://www.cedem.me/images/jDownloads_new/Izdavastvo/Publikacije/Citizens_attitudes_on_NATO_integrations_July_2015.pdf: 404 Client Error: Not Found for url: http://www.cedem.me/images/jDownloads_new/Izdavastvo/Publikacije/Citizens_attitudes_on_NATO_integrations_July_2015.pdf


Processing URLs:  14%|█▍        | 141/1000 [13:39<16:18,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-moon-idUSKBN14B2I0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-moon-idUSKBN14B2I0


Processing URLs:  14%|█▍        | 143/1000 [13:42<17:18,  1.21s/it]

Error extracting text from http://m.en.rfi.fr/africa/20160303-opposition-has-high-hopes-new-facilitator-burundi-mediation: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/africa/20160303-opposition-has-high-hopes-new-facilitator-burundi-mediation
Error extracting text from http://www.macrotrends.net/stocks/charts/AAPL/pe-ratio/apple-inc-pe-ratiohistory: 403 Client Error: Forbidden for url: http://www.macrotrends.net/stocks/charts/AAPL/pe-ratio/apple-inc-pe-ratiohistory


Processing URLs:  14%|█▍        | 145/1000 [14:42<3:26:46, 14.51s/it]

Error extracting text from https://www.gambling.com/news/how-long-will-boris-johnson-last-as-uk-prime-minister-2024000: HTTPSConnectionPool(host='www.gambling.com', port=443): Max retries exceeded with url: /news/how-long-will-boris-johnson-last-as-uk-prime-minister-2024000 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x302d11e50>, 'Connection to www.gambling.com timed out. (connect timeout=60)'))


Processing URLs:  15%|█▍        | 149/1000 [14:46<1:08:50,  4.85s/it]

Error extracting text from http://amti.csis.org/chinas-airfield-construction-at-fiery-cross-reef-in-context-catch-up-or-coercion/: 403 Client Error: Forbidden for url: http://amti.csis.org/chinas-airfield-construction-at-fiery-cross-reef-in-context-catch-up-or-coercion/


Processing URLs:  15%|█▌        | 153/1000 [14:49<26:28,  1.88s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-trump-putin-syria-idUSKBN17Y2BX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-putin-syria-idUSKBN17Y2BX


Processing URLs:  16%|█▌        | 155/1000 [14:50<18:44,  1.33s/it]

Error extracting text from http://www.thepeninsulaqatar.com/news/middle-east/372184/iran-reformists-dominate-tehran-voting-initial-results: 404 Client Error: Not Found for url: https://thepeninsulaqatar.com/news/middle-east/372184/iran-reformists-dominate-tehran-voting-initial-results


Processing URLs:  16%|█▌        | 156/1000 [14:50<15:19,  1.09s/it]

Error extracting text from https://www.nytimes.com/2017/02/27/us/politics/trump-concedes-health-law-overhaul-is-unbelievably-complex.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/27/us/politics/trump-concedes-health-law-overhaul-is-unbelievably-complex.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  16%|█▌        | 158/1000 [14:55<19:39,  1.40s/it]

Error extracting text from http://www.wsj.com/articles/iran-vows-to-respond-to-any-new-u-s-sanctions-1451567167: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-vows-to-respond-to-any-new-u-s-sanctions-1451567167
Error extracting text from https://www.reuters.com/article/us-venezuela-debt-isda/venezuela-ruled-in-default-by-trade-group-after-bond-payment-delays-idUSKBN1DG37J?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-debt-isda/venezuela-ruled-in-default-by-trade-group-after-bond-payment-delays-idUSKBN1DG37J?il=0


Processing URLs:  16%|█▌        | 160/1000 [14:55<12:02,  1.16it/s]

Error extracting text from https://google-self-driving-car-incidents.silk.co/: HTTPSConnectionPool(host='google-self-driving-car-incidents.silk.co', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301cd1280>: Failed to resolve 'google-self-driving-car-incidents.silk.co' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  16%|█▌        | 162/1000 [14:56<10:40,  1.31it/s]

Error extracting text from https://www.forbeschina.com/business/%E5%95%86%E4%B8%9A/55153: HTTPSConnectionPool(host='www.forbeschina.com', port=443): Max retries exceeded with url: /business/%E5%95%86%E4%B8%9A/55153 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  16%|█▋        | 163/1000 [14:58<13:18,  1.05it/s]

URL filtered: https://twitter.com/NASAWebb/status/1463664164660994048
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-presses-cautious-spd-over-joining-new-german-coalition-idUSKBN1DR0YJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-presses-cautious-spd-over-joining-new-german-coalition-idUSKBN1DR0YJ


Processing URLs:  17%|█▋        | 168/1000 [15:01<10:06,  1.37it/s]

Error extracting text from http://www.nytimes.com/2016/07/08/world/americas/eduardo-cunha-an-architect-of-presidents-ouster-resigns-as-house-speaker.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/08/world/americas/eduardo-cunha-an-architect-of-presidents-ouster-resigns-as-house-speaker.html


Processing URLs:  17%|█▋        | 169/1000 [15:01<08:21,  1.66it/s]

Error extracting text from https://www.nytimes.com/2017/08/04/us/politics/robert-mueller-michael-flynn-turkey.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/04/us/politics/robert-mueller-michael-flynn-turkey.html?_r=0


Processing URLs:  17%|█▋        | 171/1000 [15:05<14:55,  1.08s/it]

Error extracting text from https://www.fda.gov/media/143891/download: 403 Client Error: Forbidden for url: https://www.fda.gov/media/143891/download


Processing URLs:  17%|█▋        | 173/1000 [15:06<14:16,  1.04s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-16/oil-freeze-iraq-ready-to-cap-output-iran-to-maintain-share


Processing URLs:  18%|█▊        | 175/1000 [15:09<15:22,  1.12s/it]

Error extracting text from http://aseanregionalforum.asean.org/files/Archive/19th/15th%20ARF%20HDUCIM,%20Bali,%2029November-2December2011/Appendix%20B%20-%20List%20of%20Participants.pdf: 404 Client Error: Not Found for url: https://aseanregionalforum.asean.org/files/Archive/19th/15th%20ARF%20HDUCIM,%20Bali,%2029November-2December2011/Appendix%20B%20-%20List%20of%20Participants.pdf


Processing URLs:  18%|█▊        | 176/1000 [15:11<19:20,  1.41s/it]

Error extracting text from http://www3.navy.mi.th/index.php/main/detail/content_id/10654&amp;usg=ALkJrhgV9vITM-tjZ2zFDlBjqOZyR2xx7w: HTTPConnectionPool(host='www3.navy.mi.th', port=80): Max retries exceeded with url: /index.php/main/detail/content_id/10654&amp;usg=ALkJrhgV9vITM-tjZ2zFDlBjqOZyR2xx7w (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301661e80>: Failed to resolve 'www3.navy.mi.th' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  18%|█▊        | 178/1000 [15:12<13:02,  1.05it/s]

Error extracting text from https://www.un.org/press/en/2017/sc12748.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2017/sc12748.doc.htm


Processing URLs:  18%|█▊        | 179/1000 [15:14<15:55,  1.16s/it]

URL filtered: https://twitter.com/CNN/status/966707132832903168


Processing URLs:  18%|█▊        | 182/1000 [15:15<11:26,  1.19it/s]

Error extracting text from http://www.businessinsider.com/r-israels-religious-parties-get-more-control-over-saturday-trading-2018-1?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-israels-religious-parties-get-more-control-over-saturday-trading-2018-1?IR=T


Processing URLs:  18%|█▊        | 183/1000 [15:16<12:14,  1.11it/s]

Error extracting text from http://uk.reuters.com/article/uk-peru-election-idUKKCN0WQ1HP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  18%|█▊        | 184/1000 [15:18<14:33,  1.07s/it]

Error extracting text from http://atimes.com/2016/07/omar-mansoors-killing-a-big-blow-to-pakistan-taliban/: 404 Client Error: Not Found for url: https://atimes.com/2016/07/omar-mansoors-killing-a-big-blow-to-pakistan-taliban/


Processing URLs:  19%|█▉        | 188/1000 [15:23<15:16,  1.13s/it]

Error extracting text from http://www.mb.com.ph/beijing-condemns-manila-for-upgrading-work-in-s-china-sea/: 403 Client Error: Forbidden for url: https://mb.com.ph/beijing-condemns-manila-for-upgrading-work-in-s-china-sea/


Processing URLs:  19%|█▉        | 192/1000 [15:28<16:52,  1.25s/it]

Error extracting text from http://www.wsj.com/articles/obamas-russia-epiphany-1475881969: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/obamas-russia-epiphany-1475881969
Error extracting text from http://www.reuters.com/article/2015/11/27/china-bonds-idUSL3N13M2LN20151127#oxOOjdjL7TSZUsjI.99&amp;quot: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/china-bonds-idUSL3N13M2LN20151127#oxOOjdjL7TSZUsjI.99&amp;quot


Processing URLs:  20%|█▉        | 198/1000 [15:34<13:59,  1.05s/it]

Error extracting text from https://blog.openai.com/openai-five-benchmark-results/: HTTPSConnectionPool(host='blog.openai.com', port=443): Max retries exceeded with url: /openai-five-benchmark-results/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301ecc260>: Failed to resolve 'blog.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  21%|██        | 206/1000 [15:50<23:58,  1.81s/it]

Error extracting text from https://www.nytimes.com/2021/08/06/nyregion/myanmar-ambassador-plot.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/06/nyregion/myanmar-ambassador-plot.html


Processing URLs:  21%|██        | 207/1000 [15:51<19:27,  1.47s/it]

URL filtered: https://twitter.com/NateSilver538/status/692888697700634625
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;nv=1&amp;rurl=translate.google.com&amp;sl=fa&amp;tl=en&amp;u=http://www.khabaronline.ir/detail/509323/root/Politics&amp;usg=ALkJrhgsAdouOUxjUdkaAS-HxLD10WhDEA: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;nv=1&amp;rurl=translate.google.com&amp;sl=fa&amp;tl=en&amp;u=http://www.khabaronline.ir/detail/509323/root/Politics&amp;usg=ALkJrhgsAdouOUxjUdkaAS-HxLD10WhDEA


Processing URLs:  22%|██▏       | 216/1000 [15:59<10:57,  1.19it/s]

Error extracting text from https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm: 403 Client Error: Forbidden for url: https://www.bls.gov/charts/employment-situation/civilian-unemployment-rate.htm
Error extracting text from https://www.reuters.com/world/americas/canada-pm-trudeau-is-planning-call-snap-election-sept-20-sources-2021-08-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/canada-pm-trudeau-is-planning-call-snap-election-sept-20-sources-2021-08-12/


Processing URLs:  22%|██▏       | 219/1000 [16:02<14:37,  1.12s/it]

Error extracting text from http://predictwise.com/politics/trump-specials: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/trump-specials


Processing URLs:  22%|██▏       | 221/1000 [16:04<12:52,  1.01it/s]

Error extracting text from http://uk.reuters.com/article/asia-oil-appec-iran/iran-says-to-keep-crude-condensate-exports-at-around-2-6-mln-bpd-in-2017-idUKL4N1M62IQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  22%|██▏       | 223/1000 [16:04<07:35,  1.71it/s]

Error extracting text from https://www.nytimes.com/2017/07/19/us/politics/paul-manafort-russia-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/19/us/politics/paul-manafort-russia-trump.html
Error extracting text from http://www.autonews.com/article/20170122/NADA100/301239911/1725: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20170122/NADA100/301239911/1725


Processing URLs:  22%|██▏       | 224/1000 [16:05<07:21,  1.76it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/overnights/360578-feds-unveil-process-hacking-toolkit-uk-blames-russia-for-hacks-state-department-cyber-office: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/overnights/360578-feds-unveil-process-hacking-toolkit-uk-blames-russia-for-hacks-state-department-cyber-office/


Processing URLs:  23%|██▎       | 231/1000 [16:25<42:51,  3.34s/it]

Error extracting text from http://www.ibtimes.co.uk/isis-libya-french-special-forces-operating-secret-ground-war-against-daesh-1545759: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/isis-libya-french-special-forces-operating-secret-ground-war-against-daesh-1545759


Processing URLs:  23%|██▎       | 232/1000 [16:26<31:18,  2.45s/it]

Error extracting text from http://www.todayonline.com/world/asia/another-us-patrol-south-china-sea-unlikely-year-officials: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/asia/another-us-patrol-south-china-sea-unlikely-year-officials


Processing URLs:  24%|██▎       | 235/1000 [16:32<31:04,  2.44s/it]

Error extracting text from http://www.projectblue.org/the-mission/: 404 Client Error: Not Found for url: https://www.boldlygo.org/project-blue


Processing URLs:  24%|██▎       | 237/1000 [16:36<27:51,  2.19s/it]

Error extracting text from http://www.who.int/blindness/causes/en/: 404 Client Error: Not Found for url: https://www.who.int/blindness/causes/en/


Processing URLs:  24%|██▍       | 242/1000 [16:47<28:09,  2.23s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2000_11_11Jan.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2000_11_11Jan.pdf


Processing URLs:  24%|██▍       | 245/1000 [16:48<14:06,  1.12s/it]

Error extracting text from http://www.vauxhalladvance.com/news/2017/05/18/questions-abound-with-upcoming-cannabis-act/: 404 Client Error: Not Found for url: http://www.vauxhalladvance.com/news/2017/05/18/questions-abound-with-upcoming-cannabis-act/
Error extracting text from http://www.opec.org/opec_web/en/press_room/4371.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/4371.htm


Processing URLs:  25%|██▌       | 252/1000 [18:01<3:42:42, 17.86s/it]

Error extracting text from http://aa.com.tr/en/info/infographic/6953: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  26%|██▌       | 256/1000 [18:06<1:11:02,  5.73s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/simulacro-ipsos-keiko-526-y-ppk-474-segunda-vuelta-noticia-1903505: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/simulacro-ipsos-keiko-526-y-ppk-474-segunda-vuelta-noticia-1903505/


Processing URLs:  26%|██▋       | 264/1000 [18:20<21:18,  1.74s/it]  

Error extracting text from http://www.debka.com/newsupdates/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /newsupdates/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  27%|██▋       | 266/1000 [18:21<12:42,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-cyber-marketplace-idUSKCN0Z117R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-marketplace-idUSKCN0Z117R
URL filtered: https://www.bloomberg.com/news/articles/2017-12-21/house-approves-temporary-funding-to-avert-government-shutdown


Processing URLs:  27%|██▋       | 268/1000 [19:21<2:56:08, 14.44s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-07-24/rex-tillerson-frustrated-with-white-house-prompts-rexit-rumors: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 270/1000 [19:23<1:41:40,  8.36s/it]

Error extracting text from http://www.wsj.com/articles/iranians-vote-in-first-election-since-nuclear-deal-1456478933: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iranians-vote-in-first-election-since-nuclear-deal-1456478933


Processing URLs:  28%|██▊       | 275/1000 [19:40<39:35,  3.28s/it]  

Error extracting text from http://www.vocativ.com/news/263774/isis-is-still-winning-hearts-and-minds-in-iraq/: 404 Client Error: Not Found for url: http://www.vocativ.com/news/263774/isis-is-still-winning-hearts-and-minds-in-iraq/


Processing URLs:  28%|██▊       | 276/1000 [20:40<4:02:08, 20.07s/it]

Error extracting text from https://www.seattletimes.com/seattle-news/times-watchdog/a-new-neo-nazi-group-in-spokane-harkens-back-to-era-of-virulent-extremism-in-the-northwest/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  28%|██▊       | 277/1000 [20:43<2:59:19, 14.88s/it]

Error extracting text from http://www.np-cpp.ru/rus/news/branch_news/document6204.phtml: 404 Client Error: Not Found for url: https://www.np-cpp.ru:443/rus/news/branch_news/document6204.phtml


Processing URLs:  28%|██▊       | 279/1000 [20:45<1:33:17,  7.76s/it]

Error extracting text from http://www.meim.gov.sa/arabic/mediacenter/press-releases/Pages/default.aspx: HTTPSConnectionPool(host='www.meim.gov.sa', port=443): Max retries exceeded with url: /arabic/mediacenter/press-releases/Pages/default.aspx (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.meim.gov.sa'. (_ssl.c:1000)")))


Processing URLs:  28%|██▊       | 281/1000 [20:45<47:16,  3.94s/it]  

Error extracting text from http://www.yenisafak.com/en/news/more-than-100-pkk-terrorists-killed-in-turkey-in-one-week-2798070: 422 Client Error:  for url: http://www.yenisafak.com/en/news/more-than-100-pkk-terrorists-killed-in-turkey-in-one-week-2798070
Error extracting text from https://medium.com/machine-learning-in-practice/nips-accepted-papers-stats-26f124843aa0: 403 Client Error: Forbidden for url: https://medium.com/machine-learning-in-practice/nips-accepted-papers-stats-26f124843aa0


Processing URLs:  28%|██▊       | 282/1000 [21:03<1:35:57,  8.02s/it]

Error extracting text from https://www.foodandwine.com/news/british-pubs-run-out-of-beer-brexit: 406 Client Error: Not Acceptable for url: https://www.foodandwine.com/news/british-pubs-run-out-of-beer-brexit


Processing URLs:  29%|██▉       | 288/1000 [21:12<28:05,  2.37s/it]  

Error extracting text from http://www.nbcnews.com/politics/first-read/poll-after-trump-tape-revelation-clinton-s-lead-double-digits-n663691?cid=sm_tw: 403 Client Error: Forbidden for url: http://www.nbcnews.com/politics/first-read/poll-after-trump-tape-revelation-clinton-s-lead-double-digits-n663691?cid=sm_tw


Processing URLs:  30%|███       | 300/1000 [21:33<17:26,  1.49s/it]

Error extracting text from http://thehill.com/homenews/administration/334388-tillerson-unaware-of-wh-aide-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/334388-tillerson-unaware-of-wh-aide-report/


Processing URLs:  31%|███       | 309/1000 [22:44<3:41:51, 19.26s/it]

Error extracting text from http://aa.com.tr/en/world/us-sizable-iraqi-force-to-attack-about-10k-mosul-militants/531840: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  31%|███       | 310/1000 [22:47<2:44:13, 14.28s/it]

URL filtered: https://www.youtube.com/watch?v=IJHcD0kHTGk


Processing URLs:  31%|███▏      | 314/1000 [22:49<51:39,  4.52s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-intelligence-idUSKBN17S0RY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-intelligence-idUSKBN17S0RY


Processing URLs:  32%|███▏      | 317/1000 [22:54<31:28,  2.76s/it]

Error extracting text from https://www.cnbc.com/2017/10/21/the-associated-press-morocco-recalls-envoy-over-algeria-officials-hashish-claims.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/10/21/the-associated-press-morocco-recalls-envoy-over-algeria-officials-hashish-claims.html


Processing URLs:  32%|███▏      | 319/1000 [22:55<19:04,  1.68s/it]

Error extracting text from http://www.nytimes.com/2015/09/26/us/john-boehner-to-resign-from-congress.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/26/us/john-boehner-to-resign-from-congress.html


Processing URLs:  32%|███▏      | 324/1000 [23:02<14:02,  1.25s/it]

URL filtered: https://www.youtube.com/watch?v=eV3TsEsfvl4
URL filtered: https://www.rferl.org/a/us-russia-facebook-manipulation-echoes-troll-factory-accounts/28722595.html


Processing URLs:  33%|███▎      | 333/1000 [23:13<20:04,  1.81s/it]

URL filtered: https://twitter.com/JeremyCliffe/status/858810953353367552


Processing URLs:  35%|███▍      | 347/1000 [23:31<11:28,  1.06s/it]

Error extracting text from http://aranews.net/2016/01/17458/: 404 Client Error: Not Found for url: http://aranews.net/2016/01/17458/


Processing URLs:  35%|███▍      | 349/1000 [23:33<11:25,  1.05s/it]

Error extracting text from http://uk.reuters.com/article/venezuela-ongc-idUKL2N1622IN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  35%|███▌      | 350/1000 [23:34<08:42,  1.24it/s]

Error extracting text from https://www.nytimes.com/2017/06/15/us/politics/trump-obstruction-of-justice-reports.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/15/us/politics/trump-obstruction-of-justice-reports.html?_r=0


Processing URLs:  35%|███▌      | 351/1000 [23:36<13:04,  1.21s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekend&amp;id=marvel14b.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekend&amp;id=marvel14b.htm


Processing URLs:  35%|███▌      | 352/1000 [23:37<11:54,  1.10s/it]

Error extracting text from http://www.businessinsider.com/r-key-republicans-open-to-handling-garland-nomination-after-us-election-2016-3: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-key-republicans-open-to-handling-garland-nomination-after-us-election-2016-3


Processing URLs:  36%|███▌      | 360/1000 [23:57<32:47,  3.07s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/trudeau-s-pot-czar-says-canada-won-t-rush-marijuana-legalization


Processing URLs:  36%|███▌      | 362/1000 [23:57<18:10,  1.71s/it]

Error extracting text from https://www.nytimes.com/2021/01/15/world/europe/russia-open-skies-treaty-biden.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/15/world/europe/russia-open-skies-treaty-biden.html


Processing URLs:  37%|███▋      | 367/1000 [27:16<2:40:59, 15.26s/it]

Error extracting text from https://www.france24.com/en/live-news/20210921-amid-official-denials-nicaraguans-battle-covid-surge: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210921-amid-official-denials-nicaraguans-battle-covid-surge


Processing URLs:  37%|███▋      | 373/1000 [27:22<35:25,  3.39s/it]  

Error extracting text from http://news.mb.com.ph/2017/03/21/have-we-lost-scarborough-shoal-for-good/: HTTPConnectionPool(host='news.mb.com.ph', port=80): Max retries exceeded with url: /2017/03/21/have-we-lost-scarborough-shoal-for-good/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff4f1f40>: Failed to resolve 'news.mb.com.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 375/1000 [27:54<1:32:31,  8.88s/it]

Error extracting text from http://www.weaselzippers.us/285630-report-british-intercepted-intel-reveals-erdogan-behind-coup-to-conduct-purges/: 522 Server Error:  for url: https://www.weaselzippers.us/285630-report-british-intercepted-intel-reveals-erdogan-behind-coup-to-conduct-purges/


Processing URLs:  38%|███▊      | 376/1000 [27:56<1:15:24,  7.25s/it]

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/October/REINZ%20Monthly%20Property%20Report%20-%20October%202021.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/October/REINZ%20Monthly%20Property%20Report%20-%20October%202021.pdf


Processing URLs:  38%|███▊      | 377/1000 [27:56<58:06,  5.60s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-11-22/masters-of-universe-scared-of-china-risks-see-yuan-devaluation


Processing URLs:  38%|███▊      | 382/1000 [28:05<30:39,  2.98s/it]

Error extracting text from http://oklo.org/2006/07/03/a-million-year-picnic/: 406 Client Error: Not Acceptable for url: http://oklo.org/2006/07/03/a-million-year-picnic/


Processing URLs:  39%|███▉      | 388/1000 [28:14<13:26,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-election/fight-over-electoral-district-boundaries-heads-to-supreme-court-idUSKCN1BS0FR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-election/fight-over-electoral-district-boundaries-heads-to-supreme-court-idUSKCN1BS0FR


Processing URLs:  39%|███▉      | 393/1000 [28:19<09:15,  1.09it/s]

Error extracting text from http://www.thebitbag.com/winds-of-winter-release-date-delayed/124494: 403 Client Error: Forbidden for url: http://www.thebitbag.com/winds-of-winter-release-date-delayed/124494
URL filtered: http://www.bloomberg.com/news/articles/2016-06-15/pakistani-stocks-jump-most-this-year-after-upgrade-from-msci


Processing URLs:  40%|███▉      | 396/1000 [28:19<04:26,  2.27it/s]

Error extracting text from http://www.nationmultimedia.com/business/Asean-to-force-the-pace-in-RCEP-negotiations-30288000.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/Asean-to-force-the-pace-in-RCEP-negotiations-30288000.html
Error extracting text from https://medium.com/dfrlab: 403 Client Error: Forbidden for url: https://medium.com/dfrlab
Error extracting text from http://www.motortrend.com/news/maserati-going-electric-ram-coming-naias-fca-ceo-says/: 403 Client Error: Forbidden for url: http://www.motortrend.com/news/maserati-going-electric-ram-coming-naias-fca-ceo-says/


Processing URLs:  40%|███▉      | 398/1000 [28:19<02:50,  3.53it/s]

Error extracting text from http://blogs.wsj.com/moneybeat/2015/08/05/tesla-teeters-after-lowering-guidance/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2015/08/05/tesla-teeters-after-lowering-guidance/


Processing URLs:  40%|████      | 400/1000 [28:20<04:01,  2.49it/s]

Error extracting text from https://www.latimes.com/opinion/story/2021-01-21/insurrection-capitol-attack-patriotism-1776: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/story/2021-01-21/insurrection-capitol-attack-patriotism-1776
URL filtered: https://www.bloomberg.com/news/articles/2021-07-01/oil-holds-gain-near-73-as-market-waits-for-opec-output-meeting


Processing URLs:  40%|████      | 405/1000 [28:25<08:47,  1.13it/s]

Error extracting text from http://www.businessinsider.com/r-iraqi-commander-sees-islamic-state-retrenching-before-mosul-battle-2016-3: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-iraqi-commander-sees-islamic-state-retrenching-before-mosul-battle-2016-3
Error extracting text from https://www.reuters.com/article/us-space-exploration-boeing/boeings-botched-starliner-test-flirted-with-catastrophic-failure-nasa-panel-idUSKBN20106A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-space-exploration-boeing/boeings-botched-starliner-test-flirted-with-catastrophic-failure-nasa-panel-idUSKBN20106A


Processing URLs:  41%|████      | 410/1000 [28:33<13:32,  1.38s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics-scenarios/scenarios-how-south-africas-zuma-could-leave-office-idUSKBN1FX1BU?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics-scenarios/scenarios-how-south-africas-zuma-could-leave-office-idUSKBN1FX1BU?il=0


Processing URLs:  41%|████      | 412/1000 [28:36<15:35,  1.59s/it]

Error extracting text from http://www.ezv.admin.ch/zollinfo_privat/04414/04415/index.html?lang=en: 404 Client Error: Not Found for url: https://www.bazg.admin.ch/bazg/de/404.html


Processing URLs:  42%|████▏     | 419/1000 [28:47<19:29,  2.01s/it]

Error extracting text from http://nigerianews.co/news/58167/what-buhari-must-do-to-stop-fulani-herdsmen-killings-group: HTTPConnectionPool(host='nigerianews.co', port=80): Max retries exceeded with url: /news/58167/what-buhari-must-do-to-stop-fulani-herdsmen-killings-group (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303988230>: Failed to resolve 'nigerianews.co' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 422/1000 [28:51<15:43,  1.63s/it]

Error extracting text from http://blogs.denverpost.com/thespot/2015/10/09/top-clinton-supporters-in-colorado-now-backing-biden-too/123285/: 500 Server Error: Internal Server Error for url: https://blogs.denverpost.com/thespot/2015/10/09/top-clinton-supporters-in-colorado-now-backing-biden-too/123285/


Processing URLs:  42%|████▏     | 423/1000 [28:53<17:11,  1.79s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-07/fujimori-leads-peru-presidential-poll-five-weeks-before-vote


Processing URLs:  43%|████▎     | 433/1000 [29:01<08:23,  1.13it/s]

Error extracting text from https://www.weforum.org/agenda/2019/08/countries-are-the-worlds-oldest-democracies: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2019/08/countries-are-the-worlds-oldest-democracies


Processing URLs:  44%|████▎     | 437/1000 [29:08<10:07,  1.08s/it]

Error extracting text from http://www.nytimes.com/2016/12/27/world/asia/south-china-sea-trump.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/27/world/asia/south-china-sea-trump.html?_r=0
Error extracting text from https://www.reuters.com/article/us-northkorea-missiles-idUSKBN18Y383: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN18Y383
URL filtered: http://www.bloomberg.com/news/articles/2016-08-11/ubs-china-s-already-started-bailing-out-its-banks


Processing URLs:  44%|████▍     | 440/1000 [29:28<34:51,  3.73s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-musk-masterplan-idUSKCN0ZQ111?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-musk-masterplan-idUSKCN0ZQ111?il=0


Processing URLs:  44%|████▍     | 443/1000 [29:33<22:47,  2.46s/it]

URL filtered: https://twitter.com/Snowden/status/765513662597623808


Processing URLs:  45%|████▍     | 448/1000 [29:44<24:30,  2.66s/it]

Error extracting text from http://www.nasa.gov/mp3/687631main_687014main_emfisis_chorus_1.mp3: 404 Client Error: Not Found for url: https://www.nasa.gov/mp3/687631main_687014main_emfisis_chorus_1.mp3


Processing URLs:  45%|████▌     | 451/1000 [29:47<15:11,  1.66s/it]

Error extracting text from http://adage.com/article/media/time-people-magazines-trim-print-circulation-guarantees/296266/: 403 Client Error: Forbidden for url: https://adage.com/article/media/time-people-magazines-trim-print-circulation-guarantees/296266/


Processing URLs:  46%|████▌     | 456/1000 [29:59<19:23,  2.14s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-kurds-idUSKCN0Z20RY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-kurds-idUSKCN0Z20RY
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-airbase-idUSKBN17A0SO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-airbase-idUSKBN17A0SO


Processing URLs:  46%|████▌     | 460/1000 [30:04<13:48,  1.53s/it]

Error extracting text from http://investmentresearchdynamics.com/the-feds-expedited-closed-meeting-is-pure-kabuki-theatre/: 404 Client Error: Not Found for url: https://investmentresearchdynamics.com/the-feds-expedited-closed-meeting-is-pure-kabuki-theatre/


Processing URLs:  46%|████▋     | 464/1000 [30:30<38:33,  4.32s/it]  

Error extracting text from https://bit.ly/3qN5ZIY: 403 Client Error: Forbidden for url: https://bfpg.co.uk/2021/03/integrated-review-10-things/


Processing URLs:  47%|████▋     | 469/1000 [31:03<30:33,  3.45s/it]  

Error extracting text from http://www.cell.com/cell-reports/fulltext/S2211-1247(16)30687-8: 403 Client Error: Forbidden for url: https://www.cell.com/cell-reports/fulltext/S2211-1247(16)30687-8
Error extracting text from https://www.yahoo.com/news/vote-whether-remove-president-nears-brazils-senate-041403303.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/vote-whether-remove-president-nears-brazils-senate-041403303.html


Processing URLs:  47%|████▋     | 473/1000 [31:05<09:28,  1.08s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/10/22/Putin-Syrian-government-Kurdish-forces-should-be-united.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/10/22/Putin-Syrian-government-Kurdish-forces-should-be-united.html
Error extracting text from https://www.yahoo.com/news/trump-eager-big-meeting-putin-073543897.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/trump-eager-big-meeting-putin-073543897.html


Processing URLs:  48%|████▊     | 478/1000 [31:14<14:04,  1.62s/it]

Error extracting text from http://www.peruthisweek.com/news-keiko-fujimori-jee-gift-giving-109111: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-keiko-fujimori-jee-gift-giving-109111 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303a74530>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 482/1000 [31:20<12:07,  1.40s/it]

Error extracting text from http://www.reuters.com/article/2015/10/17/us-china-southchinasea-idUSKCN0SB00Z20151017: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/17/us-china-southchinasea-idUSKCN0SB00Z20151017


Processing URLs:  48%|████▊     | 483/1000 [31:22<11:54,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=nor3JNEoL-M


Processing URLs:  48%|████▊     | 485/1000 [31:24<11:42,  1.36s/it]

Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN10T1A4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN10T1A4


Processing URLs:  49%|████▉     | 490/1000 [31:29<09:12,  1.08s/it]

Error extracting text from http://www.tradingeconomics.com/canada/inflation-cpi/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/canada/inflation-cpi/forecast


Processing URLs:  49%|████▉     | 492/1000 [31:31<07:48,  1.08it/s]

Error extracting text from http://www.swissinfo.ch/eng/african-union-backs-away-from-imposing-peacekeepers-on-burundi/41930946: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/african-union-backs-away-from-imposing-peacekeepers-on-burundi/41930946


Processing URLs:  50%|████▉     | 495/1000 [31:35<09:26,  1.12s/it]

Error extracting text from http://tass.ru/en/politics/887810: 404 Client Error: Not Found for url: https://tass.ru/en/politics/887810
Error extracting text from https://au.news.yahoo.com/world/a/30080410/syria-army-takes-key-rebel-town-south-of-aleppo-military-source/: 404 Client Error: Not Found for url: https://au.news.yahoo.com/syria-army-takes-key-rebel-town-south-of-aleppo-military-source-30080410.html


Processing URLs:  50%|████▉     | 499/1000 [31:40<09:23,  1.12s/it]

Error extracting text from https://www.debka.com/200-russian-advisers-killed-last-weeks-clash-us-forces-syria/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /200-russian-advisers-killed-last-weeks-clash-us-forces-syria/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  50%|█████     | 501/1000 [31:42<09:57,  1.20s/it]

Error extracting text from http://www.rferl.org/content/us-ready-invite-montenegro-join-nato/27248715.html: 403 Client Error: Forbidden for url: http://www.rferl.org/content/us-ready-invite-montenegro-join-nato/27248715.html


Processing URLs:  51%|█████     | 507/1000 [31:54<18:52,  2.30s/it]

Error extracting text from http://38north.org/2015/12/sohae120915/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  51%|█████     | 510/1000 [31:58<13:38,  1.67s/it]

Error extracting text from https://docs.google.com/presentation/d/1u7resy2bGA1_HIgj6Nc7ahzeS7DrpOtkiK5ywhQhmpk/edit#slide=id.gaeebd14cc9_0_79: 401 Client Error: Unauthorized for url: https://docs.google.com/presentation/d/1u7resy2bGA1_HIgj6Nc7ahzeS7DrpOtkiK5ywhQhmpk/edit#slide=id.gaeebd14cc9_0_79


Processing URLs:  52%|█████▏    | 517/1000 [32:09<12:25,  1.54s/it]

Error extracting text from http://www.siliconbeat.com/2016/05/31/google-chat-bot-coming-year-renowned-inventor-says/: 404 Client Error: Not Found for url: https://www.mercurynews.com/tag/siliconbeat/2016/05/31/google-chat-bot-coming-year-renowned-inventor-says/


Processing URLs:  52%|█████▏    | 518/1000 [33:10<2:34:12, 19.20s/it]

Error extracting text from http://www.spaceflightinsider.com/organizations/space-exploration-technologies/first-flight-spacex-falcon-heavy-slips-net-november-2016/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /organizations/space-exploration-technologies/first-flight-spacex-falcon-heavy-slips-net-november-2016/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303988620>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  52%|█████▏    | 519/1000 [33:10<1:48:36, 13.55s/it]

Error extracting text from http://www.raps.org/Regulatory-Focus/News/2016/02/16/24342/FDA-Sees-Spike-in-Gene-and-Cell-Therapy-Applications/: 403 Client Error: Forbidden for url: https://www.raps.org/Regulatory-Focus/News/2016/02/16/24342/FDA-Sees-Spike-in-Gene-and-Cell-Therapy-Applications/


Processing URLs:  53%|█████▎    | 527/1000 [33:20<19:37,  2.49s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-09-16/pdvsa-s-late-friday-swap-surprise-no-pick-up-for-longer-debt


Processing URLs:  53%|█████▎    | 531/1000 [33:23<09:03,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKCN0ZG0LO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKCN0ZG0LO


Processing URLs:  53%|█████▎    | 532/1000 [33:23<07:10,  1.09it/s]

Error extracting text from https://balkaninsight.com/2021/03/02/birn-fact-check-will-north-macedonias-census-result-affect-albanian-rights/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/03/02/birn-fact-check-will-north-macedonias-census-result-affect-albanian-rights/


Processing URLs:  53%|█████▎    | 533/1000 [33:25<08:59,  1.16s/it]

Error extracting text from https://www.google.com/amp/s/mobile.reuters.com/article/amp/idUSKBN2B71V7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN2B71V7


Processing URLs:  54%|█████▍    | 540/1000 [34:39<2:20:22, 18.31s/it]

Error extracting text from http://www.washingtonpost.com/blogs/the-fix/post/the: 404 Client Error: Not Found for url: https://www.washingtonpost.com/news/the-fix/post/the/


Processing URLs:  54%|█████▍    | 544/1000 [34:43<37:45,  4.97s/it]  

Error extracting text from http://www.el-nacional.com/economia/Gobierno-estudia-realizar-declarar-default_0_854314837.html: 403 Client Error: Forbidden for url: https://www.elnacional.com/economia/Gobierno-estudia-realizar-declarar-default_0_854314837.html
Error extracting text from http://www.nytimes.com/2016/07/19/world/asia/china-sea-air-patrols.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/19/world/asia/china-sea-air-patrols.html


Processing URLs:  55%|█████▍    | 545/1000 [34:44<27:50,  3.67s/it]

Error extracting text from http://m.yna.co.kr/mob2/en/contents_en.jsp?cid=AEN20160919000451315&amp;site=0400000000&amp;mobile: HTTPSConnectionPool(host='m.yna.co.kr', port=443): Max retries exceeded with url: /mob2/en/contents_en.jsp?cid=AEN20160919000451315&amp;site=0400000000&amp;mobile (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  55%|█████▌    | 552/1000 [34:51<07:41,  1.03s/it]

Error extracting text from http://www.wsj.com/articles/iranian-general-aided-a-u-s-political-aim-in-iraq-in-2006-envoy-reveals-1458334962: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iranian-general-aided-a-u-s-political-aim-in-iraq-in-2006-envoy-reveals-1458334962


Processing URLs:  55%|█████▌    | 554/1000 [34:54<09:50,  1.32s/it]

Error extracting text from http://ceip.org/1UBAuhs: 403 Client Error: Forbidden for url: https://carnegieendowment.org/1UBAuhs


Processing URLs:  56%|█████▌    | 558/1000 [34:58<08:28,  1.15s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/latest-hit-back-walk-choice-trump-rivals-37424291: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/latest-hit-back-walk-choice-trump-rivals-37424291


Processing URLs:  56%|█████▋    | 564/1000 [35:10<09:58,  1.37s/it]

Error extracting text from https://news.mongabay.com/2021/04/bolsonaro-abandons-enhanced-amazon-commitment-same-day-he-makes-it/: 403 Client Error: Forbidden for url: https://news.mongabay.com/2021/04/bolsonaro-abandons-enhanced-amazon-commitment-same-day-he-makes-it/
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15N2VF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15N2VF


Processing URLs:  57%|█████▋    | 566/1000 [36:11<2:16:29, 18.87s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-06-02/putin-russia-struck-no-secret-agreements-with-trump-team: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  57%|█████▋    | 568/1000 [36:13<1:10:49,  9.84s/it]

Error extracting text from http://missilethreat.com/defense-systems/s-400-sa-21-growler/: 404 Client Error: Not Found for url: http://missilethreat.com/defense-systems/s-400-sa-21-growler/


Processing URLs:  57%|█████▋    | 570/1000 [36:16<38:43,  5.40s/it]  

URL filtered: https://twitter.com/DeutscheBank


Processing URLs:  57%|█████▋    | 572/1000 [36:17<23:16,  3.26s/it]

URL filtered: http://www.bloombergview.com/articles/2015-06-09/iran-spends-billions-to-prop-up-assad


Processing URLs:  57%|█████▊    | 575/1000 [36:23<17:27,  2.47s/it]

Error extracting text from http://usacac.army.mil/sites/default/files/misc/doctrine/CDG/cdg_resources/manuals/adrp/adrp6_0_new.pdf: 404 Client Error: Not Found for url: https://usacac.army.mil/sites/default/files/misc/doctrine/CDG/cdg_resources/manuals/adrp/adrp6_0_new.pdf
Error extracting text from https://www.el19digital.com/articulos/ver/titulo:111302-ejercito-de-nicaragua-recibe-donacion-de-taiwan: 403 Client Error: Forbidden for url: https://www.el19digital.com/articulos/ver/titulo:111302-ejercito-de-nicaragua-recibe-donacion-de-taiwan


Processing URLs:  58%|█████▊    | 576/1000 [36:24<14:21,  2.03s/it]

Error extracting text from http://thehill.com/homenews/administration/361801-trump-frustrated-with-ivankas-condemnation-of-roy-moore-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/361801-trump-frustrated-with-ivankas-condemnation-of-roy-moore-report/


Processing URLs:  58%|█████▊    | 581/1000 [36:29<08:13,  1.18s/it]

Error extracting text from https://bit.ly/2P0FPp2: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/opinion/why-things-could-be-about-to-get-worse-for-nicola-sturgeon-brian-monteith-3165553


Processing URLs:  58%|█████▊    | 584/1000 [36:31<04:59,  1.39it/s]

Error extracting text from https://www.nytimes.com/2020/06/16/world/asia/indian-china-border-clash.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/06/16/world/asia/indian-china-border-clash.html


Processing URLs:  59%|█████▉    | 589/1000 [36:38<07:22,  1.08s/it]

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/last-few-tweaks-being-made-covid-ip-waiver-deal-wto-chief-2022-04-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/last-few-tweaks-being-made-covid-ip-waiver-deal-wto-chief-2022-04-14/


Processing URLs:  59%|█████▉    | 591/1000 [37:41<1:59:24, 17.52s/it]

Error extracting text from https://europeanwesternbalkans.com/2017/04/10/ewb-interview-genoveva-ruiz-calavera-eu-accession-process-is-strategic-investment-in-peace/: HTTPSConnectionPool(host='europeanwesternbalkans.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.youtube.com/watch?v=JzSUgOmP66Q


Processing URLs:  59%|█████▉    | 593/1000 [37:42<1:08:49, 10.15s/it]

Error extracting text from https://mobile.nytimes.com/2017/11/14/world/europe/britain-russia-cybersecurity-hacking.html?referer=https://www.google.ca/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/14/world/europe/britain-russia-cybersecurity-hacking.html?referer=https://www.google.ca/


Processing URLs:  59%|█████▉    | 594/1000 [37:43<53:29,  7.90s/it]  

URL filtered: https://twitter.com/theeconomist/status/774590704974721024


Processing URLs:  60%|█████▉    | 596/1000 [37:44<33:19,  4.95s/it]

Error extracting text from https://www.humboldtforum.org/en/on-location/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/en/on-location/


Processing URLs:  60%|█████▉    | 599/1000 [37:46<16:05,  2.41s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-islamic-state-media-idUSKBN13Q58P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-islamic-state-media-idUSKBN13Q58P
URL filtered: http://www.bloomberg.com/news/articles/2015-08-20/malaysia-rules-out-capital-controls-as-investors-exit-markets


Processing URLs:  60%|██████    | 602/1000 [37:47<08:39,  1.31s/it]

Error extracting text from http://www.bls.gov/news.release/pdf/cpi.pdf: 403 Client Error: Forbidden for url: http://www.bls.gov/news.release/pdf/cpi.pdf


Processing URLs:  60%|██████    | 604/1000 [37:49<07:45,  1.18s/it]

Error extracting text from http://www.wsj.com/articles/iraqi-kurds-seize-islamic-state-held-land-bolstering-leverage-for-future-1473903162: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-kurds-seize-islamic-state-held-land-bolstering-leverage-for-future-1473903162


Processing URLs:  61%|██████    | 610/1000 [38:00<11:02,  1.70s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-oil-shell-idUSKBN13V1TY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-shell-idUSKBN13V1TY


Processing URLs:  61%|██████▏   | 613/1000 [38:01<06:03,  1.06it/s]

URL filtered: https://twitter.com/BBCBreaking/status/941025535643344896?ref_src=twsrc%5Etfw&quot;&gt;December


Processing URLs:  62%|██████▏   | 617/1000 [38:06<05:59,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/for-syria-rebels-an-agonizing-choice-1462479800: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-syria-rebels-an-agonizing-choice-1462479800


Processing URLs:  62%|██████▏   | 619/1000 [38:09<07:52,  1.24s/it]

URL filtered: https://www.bloomberg.com/gadfly/articles/2016-12-23/blackstone-energy-transfer-deal-could-warm-hearts-balance-sheets


Processing URLs:  62%|██████▏   | 624/1000 [38:14<06:32,  1.04s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/fed-s-lockhart-says-interest-rate-liftoff-case-is-compelling-


Processing URLs:  63%|██████▎   | 631/1000 [38:37<12:49,  2.09s/it]

Error extracting text from https://dfer.org/ny/erna-ny-poll-shows-andrew-yang-leading-in-nyc-mayoral-race/: 403 Client Error: Forbidden for url: https://dfer.org/ny/erna-ny-poll-shows-andrew-yang-leading-in-nyc-mayoral-race/


Processing URLs:  63%|██████▎   | 634/1000 [39:40<1:22:26, 13.51s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-08-06/south-africas-ruling-party-faces-biggest-election-challenge: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13C1LI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13C1LI


Processing URLs:  64%|██████▎   | 636/1000 [39:42<44:30,  7.34s/it]  

Error extracting text from http://presstv.com/Detail/2016/04/14/460735/OIC-summit-Turkey-Iran-President-Rouhani-Zarif-OIC-Secretary-General-Madani/: 403 Client Error: Forbidden for url: https://presstv.com/Detail/2016/04/14/460735/OIC-summit-Turkey-Iran-President-Rouhani-Zarif-OIC-Secretary-General-Madani/


Processing URLs:  64%|██████▍   | 639/1000 [39:43<16:50,  2.80s/it]

Error extracting text from http://www.arirang.co.kr/News/News_View.asp?nseq=183631: 404 Client Error:  for url: http://www.arirang.co.kr/News/News_View.asp?nseq=183631
Error extracting text from https://trends.google.nl/trends/explore?date=2020-01-01%202021-06-22&amp;geo=US&amp;q=buy%20book: 429 Client Error: unknown for url: https://trends.google.nl/trends/explore?date=2020-01-01%202021-06-22&amp;geo=US&amp;q=buy%20book
Error extracting text from https://www.reuters.com/article/us-germany-turkey-idUSKCN1AX0RG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-turkey-idUSKCN1AX0RG


Processing URLs:  64%|██████▍   | 643/1000 [39:48<10:23,  1.75s/it]

Error extracting text from http://www.latimes.com/politics/la-na-obama-court-nominee-20160310-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-obama-court-nominee-20160310-story.html


Processing URLs:  65%|██████▍   | 646/1000 [39:54<09:26,  1.60s/it]

Error extracting text from http://taskandpurpose.com/us-marine-proved-russia-hacked-dncs-emails-now-hes-talking/: 404 Client Error: Not Found for url: https://taskandpurpose.com/us-marine-proved-russia-hacked-dncs-emails-now-hes-talking/
Error extracting text from http://www.nytimes.com/2016/07/14/us/politics/senate-opioid-addiction-bill.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/14/us/politics/senate-opioid-addiction-bill.html


Processing URLs:  65%|██████▌   | 651/1000 [39:59<05:33,  1.05it/s]

Error extracting text from https://www.congress.gov/bill/106th-congress/senate-bill/269/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/106th-congress/senate-bill/269/text


Processing URLs:  65%|██████▌   | 653/1000 [40:00<03:40,  1.57it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-egypt-idUSKBN15N27X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-egypt-idUSKBN15N27X


Processing URLs:  66%|██████▌   | 655/1000 [40:00<02:18,  2.50it/s]

Error extracting text from http://www.wsj.com/articles/for-many-life-after-surgery-is-surprisingly-hard-1466443585: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-many-life-after-surgery-is-surprisingly-hard-1466443585
Error extracting text from http://www.reuters.com/article/us-usa-trump-mexico-idUSKBN15A1VF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-mexico-idUSKBN15A1VF?il=0


Processing URLs:  66%|██████▌   | 658/1000 [40:03<03:42,  1.54it/s]

Error extracting text from http://clerk.house.gov/member_info/electionInfo/2014/114: 403 Client Error: Forbidden for url: http://clerk.house.gov/member_info/electionInfo/2014/114


Processing URLs:  66%|██████▌   | 661/1000 [41:20<1:26:26, 15.30s/it]

Error extracting text from http://www.bradenton.com/news/politics-government/article38618253.html: HTTPConnectionPool(host='www.bradenton.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0VV0RD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0VV0RD


Processing URLs:  66%|██████▌   | 662/1000 [41:21<1:01:37, 10.94s/it]

Error extracting text from http://www.dod.mil/dodgc/olc/docs/PL114-113.pdf: 403 Client Error: Forbidden for url: http://www.dod.mil/dodgc/olc/docs/PL114-113.pdf


Processing URLs:  66%|██████▋   | 664/1000 [41:24<35:13,  6.29s/it]  

Error extracting text from http://result.Wild: HTTPConnectionPool(host='result.wild', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30300a180>: Failed to resolve 'result.wild' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  67%|██████▋   | 666/1000 [41:25<20:12,  3.63s/it]

Error extracting text from http://www.ntiindex.org/wp-content/uploads/2013/12/NTI_2016-Index_FINAL.pdf: 404 Client Error: Not Found for url: https://www.ntiindex.org/wp-content/uploads/2013/12/NTI_2016-Index_FINAL.pdf


Processing URLs:  67%|██████▋   | 671/1000 [41:33<09:40,  1.76s/it]

Error extracting text from https://thehill.com/homenews/state-watch/554252-new-york-prosecutors-investigating-trump-organization-in-a-criminal: 403 Client Error: Forbidden for url: https://thehill.com/homenews/state-watch/554252-new-york-prosecutors-investigating-trump-organization-in-a-criminal/


Processing URLs:  67%|██████▋   | 673/1000 [41:35<06:37,  1.21s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/economy-budget/251088-the-export-import-bank-is-dead-and-should-stay-that-way: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/economy-budget/251088-the-export-import-bank-is-dead-and-should-stay-that-way/


Processing URLs:  68%|██████▊   | 675/1000 [41:36<05:11,  1.04it/s]

Error extracting text from https://www.theguardian.com/music/2017/oct/02/rock-star-tom-petty-dies-heart-attack&gt: 404 Client Error: Not Found for url: https://www.theguardian.com/music/2017/oct/02/rock-star-tom-petty-dies-heart-attack&gt


Processing URLs:  68%|██████▊   | 678/1000 [41:40<06:23,  1.19s/it]

Error extracting text from http://www.electoralcommission.org.uk/find-information-by-subject/elections-and-referendums/upcoming-elections-and-referendums/eu-referendum: 404 Client Error: Not Found for url: https://www.electoralcommission.org.uk/find-information-by-subject/elections-and-referendums/upcoming-elections-and-referendums/eu-referendum


Processing URLs:  68%|██████▊   | 679/1000 [41:42<07:14,  1.35s/it]

Error extracting text from http://www.iol.co.za/news/politics/anc-faces-real-prospect-of-losing-2019-poll-8752385: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/anc-faces-real-prospect-of-losing-2019-poll-8752385


Processing URLs:  68%|██████▊   | 684/1000 [41:47<05:09,  1.02it/s]

Error extracting text from http://www.investors.com/news/who-needs-iran-russia-saudi-arabia-come-to-terms-for-freeze/: 403 Client Error: Forbidden for url: https://www.investors.com/news/who-needs-iran-russia-saudi-arabia-come-to-terms-for-freeze/
URL filtered: https://www.youtube.com/watch?v=gJ-JN1Sgt38


Processing URLs:  69%|██████▊   | 686/1000 [41:47<03:10,  1.65it/s]

Error extracting text from http://allafrica.com/stories/201711130036.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201711130036.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303a757f0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  69%|██████▉   | 692/1000 [41:59<10:40,  2.08s/it]

Error extracting text from http://www.imemc.org/article/74475: 403 Client Error: Forbidden for url: https://imemc.org/article/74475


Processing URLs:  69%|██████▉   | 694/1000 [42:04<11:53,  2.33s/it]

Error extracting text from http://vestnikkavkaza.net/news/Russia-and-Turkey-making-up-for-lost-economic-opportunities.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/Russia-and-Turkey-making-up-for-lost-economic-opportunities.html


Processing URLs:  70%|██████▉   | 698/1000 [43:37<1:11:45, 14.26s/it]

Error extracting text from https://constitutioncenter.org/media/files/constitution.pdf: 403 Client Error: Forbidden for url: https://constitutioncenter.org/media/files/constitution.pdf


Processing URLs:  70%|██████▉   | 699/1000 [43:38<51:38, 10.29s/it]  

Error extracting text from http://latino.foxnews.com/latino/news/2015/12/23/official-panama-canal-expansion-could-be-delayed-6-months/: 403 Client Error: Forbidden for url: https://www.foxnews.com/category/latino


Processing URLs:  70%|███████   | 702/1000 [43:50<29:49,  6.01s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2017/03/10/26/0401000000AEN20170310010200315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  70%|███████   | 704/1000 [43:51<15:35,  3.16s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-special-forces-idUSKCN0Z10QX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-special-forces-idUSKCN0Z10QX


Processing URLs:  71%|███████   | 706/1000 [43:52<08:35,  1.76s/it]

Error extracting text from https://www.nytimes.com/2018/01/06/world/asia/north-korea-nuclear-missile-intelligence.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/06/world/asia/north-korea-nuclear-missile-intelligence.html


Processing URLs:  71%|███████   | 710/1000 [43:57<06:47,  1.40s/it]

Error extracting text from http://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action#State_Bills: 404 Client Error: Not Found for url: https://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action#State_Bills


Processing URLs:  71%|███████▏  | 713/1000 [44:02<07:23,  1.55s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN12Q0GV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN12Q0GV


Processing URLs:  72%|███████▏  | 717/1000 [44:07<05:34,  1.18s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-27/oil-spurs-ringgit-s-best-emerging-market-gain-aided-by-1mdb-sale


Processing URLs:  72%|███████▏  | 719/1000 [44:08<04:07,  1.14it/s]

Error extracting text from https://www.reuters.com/business/finance/with-focus-taper-five-questions-fed-2021-09-20/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/finance/with-focus-taper-five-questions-fed-2021-09-20/


Processing URLs:  72%|███████▏  | 722/1000 [44:08<02:26,  1.90it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-climate-power-idUSKBN1770D8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-climate-power-idUSKBN1770D8
URL filtered: https://www.bloomberg.com/opinion/articles/2020-10-16/u-s-pressure-campaign-against-venezuela-s-maduro-isn-t-working


Processing URLs:  72%|███████▏  | 724/1000 [44:24<15:12,  3.31s/it]

Error extracting text from https://www.almasdarnews.com/article/government-frontline-collapses-southern-aleppo-blitz-offensive-rebels/: 522 Server Error:  for url: https://www.almasdarnews.com/article/government-frontline-collapses-southern-aleppo-blitz-offensive-rebels/
URL filtered: https://twitter.com/Gibberman10/status/709564769448042496


Processing URLs:  73%|███████▎  | 726/1000 [44:25<10:22,  2.27s/it]

Error extracting text from https://www.maritime-executive.com/article/nord-stream-2-pipeline-set-to-start-up-before-the-end-of-2021: 403 Client Error: Forbidden for url: https://www.maritime-executive.com/article/nord-stream-2-pipeline-set-to-start-up-before-the-end-of-2021


Processing URLs:  73%|███████▎  | 730/1000 [44:29<06:35,  1.47s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-manufacturing-idUSKCN0XX2FP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-manufacturing-idUSKCN0XX2FP


Processing URLs:  73%|███████▎  | 733/1000 [44:32<04:21,  1.02it/s]

Error extracting text from https://seekingalpha.com/news/3712915-oil-reverses-early-gains-wti-dips-below-75: 403 Client Error: Forbidden for url: https://seekingalpha.com/news/3712915-oil-reverses-early-gains-wti-dips-below-75
Error extracting text from http://www.nytimes.com/2015/09/29/technology/personaltech/apple-iphone-6s-breaks-first-weekend-sales-record.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/29/technology/personaltech/apple-iphone-6s-breaks-first-weekend-sales-record.html


Processing URLs:  74%|███████▎  | 735/1000 [44:34<04:27,  1.01s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/peru-votes-tightening-race-shaped-leaders-legacy-39616950: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/peru-votes-tightening-race-shaped-leaders-legacy-39616950


Processing URLs:  74%|███████▎  | 737/1000 [44:35<03:12,  1.37it/s]

Error extracting text from http://arynews.tv/en/portugals-antonio-guterres-takes-lead-in-first-round-vote-for-un-chief/: 403 Client Error: Forbidden for url: http://arynews.tv/en/portugals-antonio-guterres-takes-lead-in-first-round-vote-for-un-chief/


Processing URLs:  74%|███████▍  | 740/1000 [44:40<06:10,  1.43s/it]

Error extracting text from https://www.cnn.com/2021/11/04/europe/russia-ukraine-military-buildup-intl-cmd/index.html.: 503 Server Error: Max restarts limit reached for url: https://edition.cnn.com/2021/11/04/europe/russia-ukraine-military-buildup-intl-cmd/index.html.


Processing URLs:  74%|███████▍  | 742/1000 [44:42<04:33,  1.06s/it]

Error extracting text from http://www.hybridcars.com/china-issues-optimistic-forecast-for-new-energy-vehicle-sales-in-2017/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/china-issues-optimistic-forecast-for-new-energy-vehicle-sales-in-2017/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.infomoney.com.br/mercados/politica/noticia/4599495/impeachment-eventos-que-definirao-sobrevivencia-dilma-poder&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.infomoney.com.br/mercados/politica/noticia/4599495/impeachment-eventos-que-definirao-sobrevivencia-dilma-poder&amp;prev=search


Processing URLs:  74%|███████▍  | 745/1000 [44:56<16:03,  3.78s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/diplomats-portugals-guterres-tops-poll-for-next-un-chief/2016/09/09/7c9e64a2-76a7-11e6-9781-49e591781754_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/diplomats-portugals-guterres-tops-poll-for-next-un-chief/2016/09/09/7c9e64a2-76a7-11e6-9781-49e591781754_story.html
URL filtered: https://www.bloomberg.com/news/articles/2017-08-14/iran-military-boost-signals-resolve-to-withstand-u-s-pressure


Processing URLs:  75%|███████▍  | 748/1000 [44:58<08:28,  2.02s/it]

Error extracting text from https://www.reuters.com/world/americas/haiti-elections-replace-slain-president-postponed-nov-7-media-2021-08-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/haiti-elections-replace-slain-president-postponed-nov-7-media-2021-08-12/


Processing URLs:  75%|███████▌  | 750/1000 [45:01<07:36,  1.83s/it]

Error extracting text from http://www.ultimasnoticias.com.ve/noticias/economia/torino-capital-iniciaron-transferencia-pagar-842-millones-bono-pdvsa-2020/: 403 Client Error: Forbidden for url: http://www.ultimasnoticias.com.ve/noticias/economia/torino-capital-iniciaron-transferencia-pagar-842-millones-bono-pdvsa-2020/


Processing URLs:  76%|███████▌  | 756/1000 [45:10<05:22,  1.32s/it]

URL filtered: http://www.catchnews.com/world-news/operation-spider-2-iranian-models-find-their-instagram-accounts-hacked-by-revolutionary-guards-1458311887.html


Processing URLs:  76%|███████▌  | 758/1000 [45:12<04:52,  1.21s/it]

Error extracting text from http://www.bea.gov/iTable/iTable.cfm?reqid=9&amp;step=1&amp;acrdn=2#reqid=9&amp;step=3&amp;isuri=1&amp;903=65: 404 Client Error: Not Found for url: https://www.bea.gov/iTable/iTable.cfm?reqid=9&amp;step=1&amp;acrdn=2#reqid=9&amp;step=3&amp;isuri=1&amp;903=65


Processing URLs:  76%|███████▌  | 759/1000 [45:12<04:08,  1.03s/it]

Error extracting text from https://www.intelligencecareers.gov/icintelligence.html?platform=hootsuite: 403 Client Error: Forbidden for url: https://www.intelligencecareers.gov/icintelligence.html?platform=hootsuite


Processing URLs:  76%|███████▌  | 762/1000 [45:14<02:30,  1.59it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14I0BV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14I0BV


Processing URLs:  76%|███████▋  | 763/1000 [45:15<03:42,  1.07it/s]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/02/28/737657/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/02/28/737657/story.html


Processing URLs:  76%|███████▋  | 764/1000 [45:17<04:19,  1.10s/it]

Error extracting text from https://reut.rs/3rA4qzb: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSL1N2L70KS


Processing URLs:  77%|███████▋  | 766/1000 [45:19<04:00,  1.03s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-rights-group-claims-have-discovered-least-14-mass-graves-across-country-1577853: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-rights-group-claims-have-discovered-least-14-mass-graves-across-country-1577853


Processing URLs:  77%|███████▋  | 771/1000 [45:23<03:26,  1.11it/s]

Error extracting text from http://election.princeton.edu/2012/08/06/a-nonpartisan-statistical-approach-to-rasmussen-data/: HTTPSConnectionPool(host='election.princeton.edu2012', port=443): Max retries exceeded with url: /08/06/a-nonpartisan-statistical-approach-to-rasmussen-data/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff3db5c0>: Failed to resolve 'election.princeton.edu2012' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/politics/articles/2017-02-27/how-brexit-means-eu-loses-money-influence-and-military-strength


Processing URLs:  77%|███████▋  | 774/1000 [45:25<02:42,  1.39it/s]

Error extracting text from http://www.tradearabia.com/news/OGN_325567.html: 400 Client Error: Bad Request for url: http://www.tradearabia.com/news/OGN_325567.html


Processing URLs:  78%|███████▊  | 775/1000 [45:41<17:37,  4.70s/it]

Error extracting text from https://www.almasdarnews.com/article/russian-warships-launch-3-sub-caliber-rockets-jihadist-defenses-aleppo/: 522 Server Error:  for url: https://www.almasdarnews.com/article/russian-warships-launch-3-sub-caliber-rockets-jihadist-defenses-aleppo/


Processing URLs:  78%|███████▊  | 776/1000 [45:42<14:02,  3.76s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8095201/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8095201/


Processing URLs:  78%|███████▊  | 778/1000 [45:46<11:01,  2.98s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/poll-suggests-vote-solve-spains-problem-42036292: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/poll-suggests-vote-solve-spains-problem-42036292


Processing URLs:  78%|███████▊  | 780/1000 [45:48<07:19,  2.00s/it]

URL filtered: https://twitter.com/PippaCrerar/status/1472933272598499331


Processing URLs:  78%|███████▊  | 783/1000 [45:49<03:30,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-usa-cyber-energy/u-s-warns-public-about-attacks-on-energy-industrial-firms-idUSKBN1CQ0IN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-energy/u-s-warns-public-about-attacks-on-energy-industrial-firms-idUSKBN1CQ0IN


Processing URLs:  78%|███████▊  | 784/1000 [45:49<03:08,  1.15it/s]

Error extracting text from http://www.tradingeconomics.com/japan/inflation-cpi: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/japan/inflation-cpi


Processing URLs:  78%|███████▊  | 785/1000 [46:50<1:01:09, 17.07s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/53255: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  79%|███████▊  | 787/1000 [46:56<35:29, 10.00s/it]  

Error extracting text from http://wpo.st/om2d1: 503 Server Error: Service Unavailable: Back-end server is at capacity for url: http://wpo.st/om2d1


Processing URLs:  79%|███████▉  | 789/1000 [46:58<19:23,  5.51s/it]

Error extracting text from http://www.foreignpolicyi.org/content/fpi-bulletin-preventing-ethnic-violence-burundi: 403 Client Error: Forbidden for url: http://www.foreignpolicyi.org/content/fpi-bulletin-preventing-ethnic-violence-burundi


Processing URLs:  79%|███████▉  | 790/1000 [46:59<15:17,  4.37s/it]

URL filtered: https://www.bbc.co.uk/news/amp/world-europe-42582704?__twitter_impression=true
Error extracting text from https://www.reuters.com/article/us-amazon-com-labor-yearend-focus/amazon-to-face-u-s-union-push-in-year-ahead-idUSKBN28X1BA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-amazon-com-labor-yearend-focus/amazon-to-face-u-s-union-push-in-year-ahead-idUSKBN28X1BA


Processing URLs:  79%|███████▉  | 793/1000 [47:00<07:03,  2.04s/it]

Error extracting text from http://www.rostelecom.ru/press/news/d438345/?backurl=/press/: HTTPConnectionPool(host='www.rostelecom.ru', port=80): Max retries exceeded with url: /press/news/d438345/?backurl=/press/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300727a40>: Failed to resolve 'www.rostelecom.ru' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▉  | 794/1000 [47:00<05:51,  1.71s/it]

Error extracting text from http://www.cdm.me/english/poland-to-continue-supporting-montenegro-in-its-eu-and-nato-integration: 403 Client Error: Forbidden for url: https://www.cdm.me/english/poland-to-continue-supporting-montenegro-in-its-eu-and-nato-integration


Processing URLs:  80%|███████▉  | 795/1000 [47:01<04:39,  1.36s/it]

Error extracting text from https://trends.google.com/trends/explore?date=all&q=%2Fm%2F07_hy: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?date=all&q=%2Fm%2F07_hy
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.cartapiaui.com.br/noticias/17019/silas-diz-que-pedido-de-impeachment-contra-dilma-perdeu-for-a&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.cartapiaui.com.br/noticias/17019/silas-diz-que-pedido-de-impeachment-contra-dilma-perdeu-for-a&amp;prev=search


Processing URLs:  80%|████████  | 802/1000 [47:11<05:11,  1.57s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-23/the-case-for-how-the-fed-has-already-made-its-policy-mistake


Processing URLs:  81%|████████  | 807/1000 [47:16<03:04,  1.05it/s]

URL filtered: https://twitter.com/charlesmurray
Error extracting text from https://news.bitcoin.com/chinese-cryptocurrency-exchanges-delist-ico-markets/: 403 Client Error: Forbidden for url: https://news.bitcoin.com/chinese-cryptocurrency-exchanges-delist-ico-markets/


Processing URLs:  81%|████████  | 809/1000 [47:18<03:39,  1.15s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/04/national/missile-launcher-move-launchpad-activity-north-korea-nhk/?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2ASituation%20Report#.VrM-zzYrKRs: Exceeded 30 redirects.
Error extracting text from http://www.reuters.com/article/us-iran-missiles-usa-idUSKBN0UJ1Z620160105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-usa-idUSKBN0UJ1Z620160105


Processing URLs:  81%|████████  | 812/1000 [47:21<03:01,  1.03it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VpFpp1V1-uY: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VpFpp1V1-uY


Processing URLs:  82%|████████▏ | 815/1000 [47:23<02:04,  1.49it/s]

Error extracting text from http://www.sessions.senate.gov/public/index.cfm/news-releases?ID=3c6f87d6-e3ab-6d16-bac1-91748b78476f: HTTPConnectionPool(host='www.sessions.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/news-releases?ID=3c6f87d6-e3ab-6d16-bac1-91748b78476f (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff714200>: Failed to resolve 'www.sessions.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.cdm.me/english/ipsos-for-joining-the-nato-is-56-percent-of-citizens: 403 Client Error: Forbidden for url: https://www.cdm.me/english/ipsos-for-joining-the-nato-is-56-percent-of-citizens


Processing URLs:  82%|████████▏ | 817/1000 [47:23<01:22,  2.21it/s]

Error extracting text from https://www.timesofisrael.com/israel-said-to-warn-cia-chief-that-new-iranian-president-is-mentally-disturbed/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/israel-said-to-warn-cia-chief-that-new-iranian-president-is-mentally-disturbed/


Processing URLs:  82%|████████▏ | 820/1000 [47:26<02:08,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-analysis-idUSKBN16A04E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-analysis-idUSKBN16A04E


Processing URLs:  82%|████████▏ | 822/1000 [47:27<01:45,  1.68it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-referendum-erdogan-europe-idUSKBN16S10D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-erdogan-europe-idUSKBN16S10D
URL filtered: https://www.youtube.com/watch?v=_aJ69YgvpDo
URL filtered: https://www.bloomberg.com/news/articles/2017-07-11/venezuela-cut-deeper-into-junk-by-s-p-as-default-risk-surges


Processing URLs:  83%|████████▎ | 827/1000 [47:28<00:58,  2.97it/s]

Error extracting text from http://www.nytimes.com/2015/10/05/opinion/the-troubles-are-back.html?emc=edit_th_20151005&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/05/opinion/the-troubles-are-back.html?emc=edit_th_20151005&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  83%|████████▎ | 829/1000 [47:29<01:14,  2.31it/s]

Error extracting text from http://www.todayonline.com/business/weak-us-jobs-report-lowers-chance-rate-hike: 403 Client Error: Forbidden for url: https://www.todayonline.com/business/weak-us-jobs-report-lowers-chance-rate-hike


Processing URLs:  84%|████████▎ | 836/1000 [47:42<02:41,  1.02it/s]

Error extracting text from https://www.middleeastmonitor.com/20201208-sudan-refuses-to-internationalise-renaissance-dam-crisis/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20201208-sudan-refuses-to-internationalise-renaissance-dam-crisis/
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-sanctions-idUSKBN1AI2N0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-sanctions-idUSKBN1AI2N0
Error extracting text from http://blogs.wsj.com/washwire/2016/03/05/hassan-rouhani-and-irans-post-election-path/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2016/03/05/hassan-rouhani-and-irans-post-election-path/


Processing URLs:  84%|████████▎ | 837/1000 [47:44<03:39,  1.34s/it]

Error extracting text from http://www.europeaninstitute.org/index.php/251-european-affairs/ea-april-2015/2023-nato-enlargement-the-case-of-montenegro: HTTPSConnectionPool(host='www.europeaninstitute.org', port=443): Max retries exceeded with url: /index.php/251-european-affairs/ea-april-2015/2023-nato-enlargement-the-case-of-montenegro (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  84%|████████▍ | 840/1000 [47:45<01:58,  1.35it/s]

Error extracting text from https://www.cnbc.com/2017/10/03/russian-pentagon-code-review-is-problematic-cyber-official-says.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/10/03/russian-pentagon-code-review-is-problematic-cyber-official-says.html
URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/dakota-access-triumphs-over-tribe-s-challenge-to-pipeline
Error extracting text from http://www.straitstimes.com/asia/se-asia/indonesia-says-chinese-coast-guard-infringed-indonesian-waters: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/se-asia/indonesia-says-chinese-coast-guard-infringed-indonesian-waters


Processing URLs:  84%|████████▍ | 843/1000 [48:14<17:28,  6.68s/it]

Error extracting text from https://mobile.almasdarnews.com/article/breaking-syrian-army-deploys-massive-reinforcements-deir-ezzor-offensive-us-backed-forces/: 522 Server Error:  for url: https://www.almasdarnews.com/article/breaking-syrian-army-deploys-massive-reinforcements-deir-ezzor-offensive-us-backed-forces/


Processing URLs:  84%|████████▍ | 845/1000 [48:16<10:08,  3.93s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKCN12J2S8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.nytimes.com/2015/10/15/us/politics/democratic-debate-hillary-clinton-bernie-sanders-biden.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/15/us/politics/democratic-debate-hillary-clinton-bernie-sanders-biden.html


Processing URLs:  85%|████████▍ | 848/1000 [48:19<04:28,  1.77s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/355466-gop-democratic-state-officials-push-to-secure-election-systems-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/355466-gop-democratic-state-officials-push-to-secure-election-systems-report/
Error extracting text from http://world.kbs.co.kr/english/news/news_In_detail.htm?No=118770: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_In_detail.htm?No=118770
URL filtered: http://www.thedailybeast.com/exclusive-russia-used-facebook-events-to-organize-anti-immigrant-rallies-on-us-soil


Processing URLs:  85%|████████▌ | 851/1000 [48:20<02:24,  1.03it/s]

Error extracting text from http://www.rferl.org/a/montenegrin-leader-djukanovic-suggests-russia-behind-alleged-coup-plot-serbia/28075458.html: 403 Client Error: Forbidden for url: http://www.rferl.org/a/montenegrin-leader-djukanovic-suggests-russia-behind-alleged-coup-plot-serbia/28075458.html
Error extracting text from http://www.reuters.com/article/us-turkey-referendum-idUSKBN17H0CU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-idUSKBN17H0CU


Processing URLs:  85%|████████▌ | 852/1000 [48:20<01:53,  1.30it/s]

Error extracting text from http://www.oecd.org/spain/governmentofspainusefullinks.htm: 403 Client Error: Forbidden for url: https://www.oecd.org/spain/governmentofspainusefullinks.htm


Processing URLs:  86%|████████▌ | 856/1000 [48:25<02:10,  1.10it/s]

Error extracting text from http://www.wsj.com/articles/a-moderate-iranian-purge-1453855466: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-moderate-iranian-purge-1453855466


Processing URLs:  86%|████████▌ | 858/1000 [48:27<02:05,  1.13it/s]

Error extracting text from https://www.wsj.com/articles/afghan-president-ashraf-ghani-says-he-wants-to-avoid-bloodshed-as-taliban-encircle-kabul-11628939781: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/afghan-president-ashraf-ghani-says-he-wants-to-avoid-bloodshed-as-taliban-encircle-kabul-11628939781


Processing URLs:  86%|████████▌ | 860/1000 [48:28<01:25,  1.64it/s]

Error extracting text from http://www.straitstimes.com/asia/east-asia/former-united-nations-chief-ban-ki-moon-says-not-running-for-president-of-south-korea: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/east-asia/former-united-nations-chief-ban-ki-moon-says-not-running-for-president-of-south-korea


Processing URLs:  86%|████████▌ | 861/1000 [48:29<01:43,  1.34it/s]

Error extracting text from https://tradingeconomics.com/commodity/corn: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/corn


Processing URLs:  86%|████████▋ | 864/1000 [48:29<00:48,  2.78it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WC2YQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WC2YQ
Error extracting text from http://www.reuters.com/article/us-usa-trump-putin-idUSKBN15O2A5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-putin-idUSKBN15O2A5


Processing URLs:  87%|████████▋ | 866/1000 [48:35<03:09,  1.42s/it]

Error extracting text from http://www.ansamed.info/ansamed/en/news/sections/politics/2016/02/17/montenegro-to-join-nato-in-2017_eee4d6f9-19c8-46a9-8c22-1d9a6d155ef2.ht: 404 Client Error: Not Found for url: https://www.ansa.it/ansamed/en/news/sections/politics/2016/02/17/montenegro-to-join-nato-in-2017_eee4d6f9-19c8-46a9-8c22-1d9a6d155ef2.ht


Processing URLs:  87%|████████▋ | 867/1000 [48:36<02:59,  1.35s/it]

Error extracting text from http://www.nytimes.com/2015/09/03/world/asia/beijing-turns-into-ghost-town-as-it-gears-up-for-military-parade.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/03/world/asia/beijing-turns-into-ghost-town-as-it-gears-up-for-military-parade.html?_r=0


Processing URLs:  87%|████████▋ | 870/1000 [48:42<03:01,  1.40s/it]

Error extracting text from http://www.oddschecker.com.au/cricket/t20-world-cup: HTTPSConnectionPool(host='www.oddschecker.com.au', port=443): Max retries exceeded with url: /cricket/t20-world-cup (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.oddschecker.com.au'. (_ssl.c:1000)")))
Error extracting text from http://www.nytimes.com/2016/02/07/world/asia/north-korea-moves-up-rocket-launching-plan.html?smid=pl-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/asia/north-korea-moves-up-rocket-launching-plan.html?smid=pl-share


Processing URLs:  87%|████████▋ | 872/1000 [48:44<02:33,  1.20s/it]

Error extracting text from https://tradingeconomics.com/commodity/lithium: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/lithium


Processing URLs:  87%|████████▋ | 874/1000 [48:46<02:13,  1.06s/it]

Error extracting text from https://publications.parliament.uk/pa/cm5802/cmselect/cmfaff/203/20302.htm: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/cm5802/cmselect/cmfaff/203/20302.htm


Processing URLs:  88%|████████▊ | 876/1000 [48:51<03:14,  1.57s/it]

Error extracting text from https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-increasingly-convinced-of-trump-nafta-pullout-sources-idUSKBN1EZ2K4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-increasingly-convinced-of-trump-nafta-pullout-sources-idUSKBN1EZ2K4


Processing URLs:  88%|████████▊ | 878/1000 [48:55<03:12,  1.58s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-anc-says-deciding-whether-to-remove-zuma-as-president-idUSKBN1FB1S4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-anc-says-deciding-whether-to-remove-zuma-as-president-idUSKBN1FB1S4


Processing URLs:  88%|████████▊ | 880/1000 [48:55<02:00,  1.00s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/268438-week-ahead-senate-to-sanction-north-korea: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/268438-week-ahead-senate-to-sanction-north-korea/


Processing URLs:  88%|████████▊ | 883/1000 [48:59<01:56,  1.01it/s]

Error extracting text from https://medicalxpress.com/news/2021-06-zealand-vaccination-isnt-hesitancyit-equal.html: 400 Client Error: Bad request for url: https://medicalxpress.com/news/2021-06-zealand-vaccination-isnt-hesitancyit-equal.html


Processing URLs:  89%|████████▉ | 888/1000 [50:05<35:04, 18.79s/it]

Error extracting text from http://www.einstein.yu.edu/: HTTPConnectionPool(host='www.einstein.yu.edu', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x117cdb500>, 'Connection to www.einstein.yu.edu timed out. (connect timeout=60)'))


Processing URLs:  89%|████████▉ | 891/1000 [50:08<12:51,  7.08s/it]

Error extracting text from http://www.wsj.com/articles/u-n-adopts-new-sanctions-against-north-korea-1456934616?mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-adopts-new-sanctions-against-north-korea-1456934616?mg=id-wsj
Error extracting text from http://www.reuters.com/article/2015/09/27/us-northkorea-missile-idUSKCN0RR00R20150927: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/27/us-northkorea-missile-idUSKCN0RR00R20150927


Processing URLs:  89%|████████▉ | 893/1000 [50:12<08:09,  4.58s/it]



Processing URLs:  90%|████████▉ | 896/1000 [50:14<03:49,  2.20s/it]

Error extracting text from https://medium.com/dfrlab/russias-fake-electronic-bomb-4ce9dbbc57f8: 403 Client Error: Forbidden for url: https://medium.com/dfrlab/russias-fake-electronic-bomb-4ce9dbbc57f8


Processing URLs:  90%|█████████ | 900/1000 [50:18<02:02,  1.23s/it]

Error extracting text from http://www.reuters.com/article/us-israel-netanyahu/time-is-netanyahus-ally-in-weathering-israel-legal-storm-idUSKBN1AP1NN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu/time-is-netanyahus-ally-in-weathering-israel-legal-storm-idUSKBN1AP1NN


Processing URLs:  90%|█████████ | 902/1000 [50:18<01:16,  1.28it/s]

Error extracting text from http://www.reuters.com/article/us-burundi-security-eu-idUSKCN0WV0BD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-security-eu-idUSKCN0WV0BD


Processing URLs:  90%|█████████ | 903/1000 [50:19<01:05,  1.48it/s]

Error extracting text from https://www.reuters.tv/v/STc/2017/05/21/n-korea-missile-test-dashes-hopes-for-peace: HTTPSConnectionPool(host='www.reuters.tv', port=443): Max retries exceeded with url: /v/STc/2017/05/21/n-korea-missile-test-dashes-hopes-for-peace (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fea0f290>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|█████████ | 905/1000 [50:22<02:01,  1.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-16/china-analysts-see-ipo-resumption-by-year-end-as-stocks-recover
Error extracting text from http://www.nbcnews.com/nightly-news/vinyl-records-see-comeback-during-musics-digital-age-n435806: 403 Client Error: Forbidden for url: http://www.nbcnews.com/nightly-news/vinyl-records-see-comeback-during-musics-digital-age-n435806
URL filtered: https://www.youtube.com/watch?v=8AAW3puYl6o


Processing URLs:  91%|█████████ | 910/1000 [50:23<00:40,  2.24it/s]

Error extracting text from https://www.nytimes.com/2021/08/28/world/asia/myanmar-monks-coup.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/28/world/asia/myanmar-monks-coup.html
Error extracting text from http://thenextweb.com/creativity/2016/04/06/a-computer-has-made-a-rembrandt-painting-and-its-perfect/: 403 Client Error: Forbidden for url: http://thenextweb.com/creativity/2016/04/06/a-computer-has-made-a-rembrandt-painting-and-its-perfect/


Processing URLs:  92%|█████████▏| 915/1000 [50:26<00:38,  2.21it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/dec/28/clinton-cautions-followers-possible-iowa-nh-losses/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/dec/28/clinton-cautions-followers-possible-iowa-nh-losses/
URL filtered: http://bloombergtv.ca/2016-07-13/news/mulroney-anti-trade-talk-is-damaging-for-americans/


Processing URLs:  92%|█████████▏| 916/1000 [50:31<02:02,  1.45s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/indias-ceasefire-violations-pose-potential-threat-pakistan-army/articleshow/57044395.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/indias-ceasefire-violations-pose-potential-threat-pakistan-army/articleshow/57044395.cms


Processing URLs:  92%|█████████▏| 918/1000 [50:32<01:21,  1.00it/s]

Error extracting text from http://www.cdm.me/english/pazin-at-nato-info-day-there-is-no-democracy-without-stability: 403 Client Error: Forbidden for url: https://www.cdm.me/english/pazin-at-nato-info-day-there-is-no-democracy-without-stability


Processing URLs:  92%|█████████▏| 921/1000 [50:37<01:37,  1.24s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-06-09/trump-s-economic-agenda-is-almost-dead


Processing URLs:  93%|█████████▎| 928/1000 [50:59<03:11,  2.66s/it]

Error extracting text from http://news.antiwar.com/2016/11/06/iraq-isis-defenses-slowing-mosul-offensive/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2016/11/06/iraq-isis-defenses-slowing-mosul-offensive/


Processing URLs:  93%|█████████▎| 931/1000 [51:02<01:32,  1.34s/it]

Error extracting text from http://www.nytimes.com/2016/06/15/health/brazil-olympic-games-zika.html?emc=edit_th_20160615&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/15/health/brazil-olympic-games-zika.html?emc=edit_th_20160615&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from https://www.opendemocracy.net/en/democraciaabierta/jair-bolsonaro-betting-on-chaos-in-brazil/: 403 Client Error: Forbidden for url: https://www.opendemocracy.net/en/democraciaabierta/jair-bolsonaro-betting-on-chaos-in-brazil/


Processing URLs:  93%|█████████▎| 933/1000 [51:04<01:21,  1.21s/it]

URL filtered: https://twitter.com/davidwhogg/status/933096319127732225


Processing URLs:  94%|█████████▎| 936/1000 [51:08<01:12,  1.14s/it]

Error extracting text from https://www.amnesty.org/en/press-releases/2016/09/electric-cars-running-on-child-labour/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/press-release/2016/09/electric-cars-running-on-child-labour/


Processing URLs:  94%|█████████▎| 937/1000 [51:10<01:26,  1.37s/it]

Error extracting text from http://en.dailypakistan.com.pk/pakistan/what-really-happened-to-the-prime-minister-daily-pakistan-reveals-pms-complete-medical-history-for-the-first-time/: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/pakistan/what-really-happened-to-the-prime-minister-daily-pakistan-reveals-pms-complete-medical-history-for-the-first-time/
URL filtered: https://www.youtube.com/watch?v=eK0rvReE-4c


Processing URLs:  94%|█████████▍| 941/1000 [51:15<01:16,  1.30s/it]

Error extracting text from https://gcn.com/articles/2017/11/29/top500-supercomputers.aspx: 404 Client Error: NOT FOUND for url: https://www.route-fifty.com/articles/2017/11/29/top500-supercomputers.aspx/
Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-idUSKBN18V0Y5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-idUSKBN18V0Y5


Processing URLs:  94%|█████████▍| 943/1000 [51:18<01:18,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-idUSKBN16O2RL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKBN16O2RL


Processing URLs:  95%|█████████▍| 948/1000 [51:25<01:12,  1.40s/it]

Error extracting text from http://www.autonews.com/article/20160501/OEM11/160509998/why-michigan-wants-to-stay-in-the-drivers-seat-on-autonomous-vehicle: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160501/OEM11/160509998/why-michigan-wants-to-stay-in-the-drivers-seat-on-autonomous-vehicle


Processing URLs:  95%|█████████▍| 949/1000 [51:25<00:57,  1.12s/it]

Error extracting text from https://covid.cdc.gov/covid-data-tracker/?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fvariant-surveillance%2Fgenomic-surveillance-dashboard.html#variant-proportions: 403 Client Error: Forbidden for url: https://covid.cdc.gov/covid-data-tracker/?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fvariant-surveillance%2Fgenomic-surveillance-dashboard.html#variant-proportions


Processing URLs:  95%|█████████▌| 951/1000 [51:28<01:00,  1.23s/it]

Error extracting text from http://www.wetalkuav.com/amazon-prime-air-drone-delivery-hits-u-s/: HTTPConnectionPool(host='www.wetalkuav.com', port=80): Max retries exceeded with url: /amazon-prime-air-drone-delivery-hits-u-s/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302d117f0>: Failed to resolve 'www.wetalkuav.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  95%|█████████▌| 953/1000 [51:31<00:59,  1.27s/it]

Error extracting text from http://trueafrica.co/article/international-war-crimes-court-opens-preliminary-investigation-burundi/: 403 Client Error: Forbidden for url: http://trueafrica.co/article/international-war-crimes-court-opens-preliminary-investigation-burundi/


Processing URLs:  96%|█████████▌| 955/1000 [51:33<00:50,  1.12s/it]

Error extracting text from http://www.plugincars.com/carmakers-commitment-electric-cars-brand-brand-review-130155.html: 403 Client Error: Forbidden for url: https://www.plugshare.com


Processing URLs:  96%|█████████▌| 958/1000 [51:36<00:38,  1.08it/s]

Error extracting text from http://news.yahoo.com/iaea-chief-visits-controversial-iran-officials-172223567.html: 404 Client Error: Not Found for url: http://news.yahoo.com/iaea-chief-visits-controversial-iran-officials-172223567.html


Processing URLs:  96%|█████████▌| 961/1000 [51:40<00:50,  1.29s/it]

Error extracting text from https://ec.europa.eu/eurostat/documents/272892/10984304/HICP-release-schedule.pdf/).: 404 Client Error:  for url: https://ec.europa.eu/eurostat/documents/272892/10984304/HICP-release-schedule.pdf/).


Processing URLs:  96%|█████████▋| 963/1000 [51:41<00:36,  1.01it/s]

Error extracting text from https://www.nba.com/article/2020/08/26/nba-playoff-games-postponed: 403 Client Error: Forbidden for url: https://www.nba.com/article/2020/08/26/nba-playoff-games-postponed


Processing URLs:  97%|█████████▋| 966/1000 [51:43<00:19,  1.70it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/italys-draghi-wins-support-rival-parties-govt-75726674: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/italys-draghi-wins-support-rival-parties-govt-75726674
Error extracting text from http://www.nato.int/cps/en/natohq/news_128829.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_128829.htm


Processing URLs:  97%|█████████▋| 967/1000 [51:43<00:15,  2.19it/s]

Error extracting text from http://www.nytimes.com/2016/07/20/business/dealbook/monsanto-rejects-bayers-64-billion-takeover-bid-as-too-low.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/20/business/dealbook/monsanto-rejects-bayers-64-billion-takeover-bid-as-too-low.html


Processing URLs:  97%|█████████▋| 969/1000 [51:47<00:36,  1.18s/it]

Error extracting text from http://www.wsj.com/articles/south-korea-worries-over-rising-threat-from-north-1450174803: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korea-worries-over-rising-threat-from-north-1450174803


Processing URLs:  97%|█████████▋| 974/1000 [51:55<00:39,  1.53s/it]

Error extracting text from https://www.ndtv.com/india-news/work-on-jammu-and-kashmir-mega-hydel-project-to-start-in-2-months-official-1757369: 403 Client Error: Forbidden for url: https://www.ndtv.com/india-news/work-on-jammu-and-kashmir-mega-hydel-project-to-start-in-2-months-official-1757369


Processing URLs:  98%|█████████▊| 975/1000 [51:56<00:33,  1.35s/it]

Error extracting text from https://www.reuters.com/article/us-mexico-election/put-trump-in-his-place-nationalism-awakens-in-mexican-presidential-race-idUSKBN1FB2Q3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mexico-election/put-trump-in-his-place-nationalism-awakens-in-mexican-presidential-race-idUSKBN1FB2Q3


Processing URLs:  98%|█████████▊| 977/1000 [52:10<01:33,  4.07s/it]

Error extracting text from http://tinyurl.com/zjnhbwl: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  98%|█████████▊| 980/1000 [52:14<00:50,  2.51s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1351JS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1351JS


Processing URLs:  98%|█████████▊| 982/1000 [52:18<00:39,  2.18s/it]

Error extracting text from http://buenosairesherald.com/article/214879/peru%E2%80%99s-fujimori-widens-lead-over-kuczynski-despite-scandal: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/214879/peru%E2%80%99s-fujimori-widens-lead-over-kuczynski-despite-scandal


Processing URLs:  98%|█████████▊| 985/1000 [53:23<04:28, 17.87s/it]

Error extracting text from https://www.cmegroup.com/trading/energy/crude-oil/brent-crude-oil.html: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.cnbc.com/2018/05/23/trump-cant-block-twitter-followers-federal-judge-says.html


Processing URLs:  99%|█████████▉| 990/1000 [53:29<00:48,  4.89s/it]

Error extracting text from http://www.al-monitor.com/pulse/afp/2016/08/iraq-conflict.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/afp/2016/08/iraq-conflict.html
Error extracting text from http://www.reuters.com/article/us-usa-autos-idUSKCN0ZH504: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-autos-idUSKCN0ZH504


Processing URLs:  99%|█████████▉| 993/1000 [53:31<00:16,  2.35s/it]

Error extracting text from http://www.timesofisrael.com/abbas-says-he-agreed-to-moscow-meeting-netanyahu-postponed/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/abbas-says-he-agreed-to-moscow-meeting-netanyahu-postponed/


Processing URLs: 100%|█████████▉| 995/1000 [53:33<00:07,  1.53s/it]

Error extracting text from https://www.si.com/college/2021/05/27/legislation-introduced-collective-bargaining-rights-college-athletes-bernie-sanders: 403 Client Error: Forbidden for url: https://www.si.com/college/2021/05/27/legislation-introduced-collective-bargaining-rights-college-athletes-bernie-sanders


Processing URLs: 100%|██████████| 1000/1000 [53:47<00:00,  3.23s/it]
Processing URLs:   0%|          | 3/1000 [00:03<16:45,  1.01s/it]

Error extracting text from http://autoweek.com/article/green-cars/all-electric-mini-cooper-coming-2019: 403 Client Error: Forbidden for url: http://autoweek.com/article/green-cars/all-electric-mini-cooper-coming-2019


Processing URLs:   1%|          | 6/1000 [00:08<25:37,  1.55s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5904276/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5904276/


Processing URLs:   1%|          | 9/1000 [00:11<21:44,  1.32s/it]

Error extracting text from http://www.cnbc.com/2017/03/30/reuters-america-update-1-brics-development-bank-to-issue-up-to-500-million-in-masala-bonds.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/03/30/reuters-america-update-1-brics-development-bank-to-issue-up-to-500-million-in-masala-bonds.html


Processing URLs:   1%|          | 10/1000 [00:12<16:13,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-oil-preview-idUSKBN1A61UT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-preview-idUSKBN1A61UT


Processing URLs:   1%|          | 11/1000 [00:13<16:17,  1.01it/s]

URL filtered: https://twitter.com/TerrorEvents/status/787398323749679105
URL filtered: http://www.bloomberg.com/news/articles/2015-09-29/the-contrarian-venezuela-bond-trade-that-s-delivering-37-return


Processing URLs:   2%|▏         | 18/1000 [00:29<24:10,  1.48s/it]

Error extracting text from https://apple.news/AqGVOdcoQP4idf73ktf8_Xw: 404 Client Error: Not Found for url: https://apple.news/AqGVOdcoQP4idf73ktf8_Xw


Processing URLs:   2%|▏         | 19/1000 [00:30<21:25,  1.31s/it]

URL filtered: https://www.youtube.com/watch?v=A90Km_PzAsA


Processing URLs:   2%|▎         | 25/1000 [00:38<20:08,  1.24s/it]

Error extracting text from https://www.fbi.gov/news/testimony/cyber-roles-and-responsibilities: 403 Client Error: Forbidden for url: https://www.fbi.gov/news/testimony/cyber-roles-and-responsibilities


Processing URLs:   3%|▎         | 27/1000 [00:43<29:03,  1.79s/it]

Error extracting text from http://www.barrons.com/articles/better-late-than-never-fitch-downgrades-venezuela-1509748480: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/better-late-than-never-fitch-downgrades-venezuela-1509748480


Processing URLs:   3%|▎         | 28/1000 [00:43<22:34,  1.39s/it]

Error extracting text from https://news.usni.org/2016/05/09/opinion-russian-tank-deal-with-nicaragua-back-to-the-future-moment-for-u-s: 403 Client Error: Forbidden for url: https://news.usni.org/2016/05/09/opinion-russian-tank-deal-with-nicaragua-back-to-the-future-moment-for-u-s


Processing URLs:   3%|▎         | 31/1000 [00:46<13:56,  1.16it/s]

Error extracting text from http://news.markets/bonds/bank-japan-surprises-markets-modest-increase-stimulus-7039/: HTTPConnectionPool(host='news.markets', port=80): Max retries exceeded with url: /bonds/bank-japan-surprises-markets-modest-increase-stimulus-7039/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30018e750>: Failed to resolve 'news.markets' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://medium.com/@roberthatem/isf-in-greater-anbar-march-12-1219c27c5e2c: 403 Client Error: Forbidden for url: https://medium.com/@roberthatem/isf-in-greater-anbar-march-12-1219c27c5e2c


Processing URLs:   3%|▎         | 32/1000 [00:46<10:44,  1.50it/s]

Error extracting text from http://www.reuters.com/article/us-steel-m-a-global-idUSKBN17Z0VR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-steel-m-a-global-idUSKBN17Z0VR


Processing URLs:   4%|▎         | 36/1000 [00:52<18:13,  1.13s/it]

URL filtered: https://www.youtube.com/watch?v=cFKSgJph4-4


Processing URLs:   4%|▍         | 39/1000 [00:53<11:05,  1.44it/s]

Error extracting text from http://www.securityweek.com/north-koreas-elite-more-connected-previously-thought: 403 Client Error: Forbidden for url: https://www.securityweek.com/north-koreas-elite-more-connected-previously-thought


Processing URLs:   4%|▍         | 42/1000 [00:56<14:36,  1.09it/s]

Error extracting text from http://www.kitco.com/news/2017-12-20/Saudi-Arabia-slows-phasing-out-energy-subsidies-under-new-long-term-budget-plan.html: 404 Client Error: Not Found for url: https://frontend.prod.kitco.com/news/2017-12-20/Saudi-Arabia-slows-phasing-out-energy-subsidies-under-new-long-term-budget-plan.html


Processing URLs:   4%|▍         | 43/1000 [00:57<14:39,  1.09it/s]

Error extracting text from http://www.lisbon-treaty.org/wcm/the-lisbon-treaty/treaty-on-European-union-and-comments/title-6-final-provisions/137-article-50.html: 404 Client Error: Not Found for url: https://www.lisbon-treaty.org/wcm/the-lisbon-treaty/treaty-on-European-union-and-comments/title-6-final-provisions/137-article-50.html


Processing URLs:   5%|▍         | 49/1000 [01:04<12:46,  1.24it/s]

Error extracting text from https://www.predictit.org/Contract/784/Will-the-UK-vote-to-leave-the-EU-by-year-end-2016#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/784/Will-the-UK-vote-to-leave-the-EU-by-year-end-2016#data
Error extracting text from https://www.sott.net/article/318265-Why-future-of-War-in-Syria-depends-on-battle-for-Aleppo: 403 Client Error: Forbidden for url: https://www.sott.net/article/318265-Why-future-of-War-in-Syria-depends-on-battle-for-Aleppo


Processing URLs:   5%|▌         | 52/1000 [01:06<09:55,  1.59it/s]

Error extracting text from https://www.nytimes.com/2017/07/04/world/asia/north-korea-missile-test-icbm.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/04/world/asia/north-korea-missile-test-icbm.html
Error extracting text from https://www.yahoo.com/news/eu-mulls-change-tack-us-080209587.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/eu-mulls-change-tack-us-080209587.html


Processing URLs:   5%|▌         | 53/1000 [01:10<26:50,  1.70s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-ODRPAW6TTDS301-32CU4DBO9KM1B6PJ3JB3UN4COU


Processing URLs:   6%|▌         | 56/1000 [01:12<14:45,  1.07it/s]

Error extracting text from https://www.thelancet.com/journals/lanwpc/article/PIIS2666-6065(21)00150-4/fulltext#seccesectitle0012: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lanwpc/article/PIIS2666-6065(21)00150-4/fulltext#seccesectitle0012


Processing URLs:   6%|▌         | 59/1000 [01:16<21:55,  1.40s/it]

Error extracting text from http://tass.ru/en/politics/830734: 404 Client Error: Not Found for url: https://tass.ru/en/politics/830734


Processing URLs:   6%|▌         | 62/1000 [01:18<11:57,  1.31it/s]

URL filtered: https://www.npr.org/2021/06/04/1003284948/trump-suspended-from-facebook-for-2-years


Processing URLs:   7%|▋         | 71/1000 [01:29<08:21,  1.85it/s]

Error extracting text from http://www.barrons.com/articles/emerging-markets-debt-5-returns-and-little-risk-1460174675: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/emerging-markets-debt-5-returns-and-little-risk-1460174675
URL filtered: https://www.linkedin.com/pulse/new-thinking-proposed-understand-insider-spying-human-steve-hammons/
URL filtered: http://www.bloomberg.com/news/videos/2016-02-10/shipping-industry-sounding-the-alarm-on-global-growth
Error extracting text from http://www.nytimes.com/2016/01/23/world/middleeast/us-may-put-forces-at-iraqi-bases-in-effort-to-retake-mosul.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/23/world/middleeast/us-may-put-forces-at-iraqi-bases-in-effort-to-retake-mosul.html?_r=0


Processing URLs:   8%|▊         | 80/1000 [01:40<14:21,  1.07it/s]

Error extracting text from https://www.reuters.com/article/us-olympics-2018-northkorea-southkorea/kim-jong-un-invites-south-korean-president-for-summit-south-korea-idUSKBN1FU05F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-olympics-2018-northkorea-southkorea/kim-jong-un-invites-south-korean-president-for-summit-south-korea-idUSKBN1FU05F


Processing URLs:   9%|▊         | 86/1000 [01:46<11:10,  1.36it/s]

Error extracting text from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1043854/S1463_Warwick_Omicron_Modelling.pdf),: 404 Client Error: Not Found for url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1043854/S1463_Warwick_Omicron_Modelling.pdf),


Processing URLs:   9%|▉         | 88/1000 [01:48<12:59,  1.17it/s]

Error extracting text from https://www.wsj.com/articles/riot-police-on-venezuelas-front-lines-seek-a-way-out-1495013403: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/riot-police-on-venezuelas-front-lines-seek-a-way-out-1495013403
URL filtered: https://www.youtube.com/watch?v=JOoNOs8Ql28


Processing URLs:   9%|▉         | 94/1000 [01:54<13:57,  1.08it/s]

Error extracting text from http://www.tradingeconomics.com/united-kingdom/gdp-per-capita-ppp: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-kingdom/gdp-per-capita-ppp
URL filtered: https://www.youtube.com/watch?v=-jPzAakHPpk


Processing URLs:  10%|█         | 104/1000 [02:09<16:47,  1.12s/it]

Error extracting text from http://www.vanguardngr.com/2016/11/senate-rejects-buharis-plan-take-30bn-loan/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/11/senate-rejects-buharis-plan-take-30bn-loan/


Processing URLs:  11%|█         | 106/1000 [02:11<15:33,  1.04s/it]

Error extracting text from http://www3.nhk.or.jp/nhkworld/english/news/20160112_38.html: 404 Client Error: Not Found for url: http://www3.nhk.or.jp/nhkworld/english/news/20160112_38.html


Processing URLs:  11%|█         | 108/1000 [02:14<18:30,  1.24s/it]

Error extracting text from http://www.counterpunch.org/2015/11/30/japans-5th-recession-in-7-years/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2015/11/30/japans-5th-recession-in-7-years/


Processing URLs:  11%|█         | 110/1000 [02:16<15:31,  1.05s/it]

Error extracting text from https://simpleflying.com/airbus-boeing-artificial-intelligence-flight/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Error extracting text from http://www.wsj.com/articles/fighting-rages-in-syria-on-eve-of-partial-cease-fire-1456485584: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fighting-rages-in-syria-on-eve-of-partial-cease-fire-1456485584


Processing URLs:  11%|█▏        | 113/1000 [02:20<20:59,  1.42s/it]

Error extracting text from https://pages.marketintelligence.spglobal.com/LCD-2021-Loan-Market-Survey-Infographic-Pr2.html: 403 Client Error: Forbidden for url: https://www.spglobal.com/marketintelligence/en/index


Processing URLs:  12%|█▏        | 117/1000 [02:32<38:11,  2.60s/it]

Error extracting text from http://www.legalreader.com/vw-u-s-ceo-michael-horn-resigns-amid-emissions-scandal/: 404 Client Error: Not Found for url: https://www.legalreader.com/vw-u-s-ceo-michael-horn-resigns-amid-emissions-scandal/


Processing URLs:  12%|█▏        | 121/1000 [02:36<20:05,  1.37s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-07-15/bolsonaro-evolving-well-will-remain-hospitalized-for-treatment


Processing URLs:  13%|█▎        | 127/1000 [10:41<28:09:21, 116.11s/it]

Error extracting text from https://www.thespainreport.com/articles/767-160620124533-spain-general-election-brief-20-06-2016: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/767-160620124533-spain-general-election-brief-20-06-2016 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301865df0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  13%|█▎        | 132/1000 [10:51<6:00:46, 24.94s/it]  

Error extracting text from http://cherna.gora.me/news/biden-to-kosovo-if-you-do-not-reach-the-agreement-with-montenegro-you-will-lose-american-support/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/biden-to-kosovo-if-you-do-not-reach-the-agreement-with-montenegro-you-will-lose-american-support/


Processing URLs:  14%|█▎        | 135/1000 [10:55<2:17:01,  9.50s/it]

Error extracting text from https://finance.yahoo.com/q/ao?s=TSLA+Analyst+Opinion: 404 Client Error: Not Found for url: https://finance.yahoo.com/q/ao?s=TSLA+Analyst+Opinion


Processing URLs:  14%|█▍        | 142/1000 [11:05<28:09,  1.97s/it]  

Error extracting text from http://www.nytimes.com/2016/04/17/movies/deadpool-isnt-the-only-solution-but-batman-v-superman-is-the-problem.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/17/movies/deadpool-isnt-the-only-solution-but-batman-v-superman-is-the-problem.html


Processing URLs:  14%|█▍        | 143/1000 [11:06<27:34,  1.93s/it]

Error extracting text from http://www.ibtimes.com/chinas-naval-base-djibouti-could-become-africas-singapore-2292581: 403 Client Error: Forbidden for url: https://www.ibtimes.com/chinas-naval-base-djibouti-could-become-africas-singapore-2292581


Processing URLs:  14%|█▍        | 144/1000 [11:07<22:01,  1.54s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-05-04/tillerson-emphatic-about-need-to-halt-south-china-sea-buildup


Processing URLs:  15%|█▍        | 146/1000 [11:09<18:05,  1.27s/it]

Error extracting text from http://www.ncr-iran.org/en/news/iran-world/20075-no-such-thing-as-reformers-in-iran-regime-ny-daily-news-op-ed: 403 Client Error: Forbidden for url: https://www.ncr-iran.org/en/news/iran-world/20075-no-such-thing-as-reformers-in-iran-regime-ny-daily-news-op-ed


Processing URLs:  15%|█▍        | 148/1000 [11:12<19:14,  1.36s/it]

Error extracting text from http://parstoday.com/en/news/middle_east-i6607-iraqi_izadi_kurds_severe_key_isil_supply_line_in_sinjar: HTTPConnectionPool(host='parstoday.com', port=80): Max retries exceeded with url: /en/news/middle_east-i6607-iraqi_izadi_kurds_severe_key_isil_supply_line_in_sinjar (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301864dd0>: Failed to resolve 'parstoday.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  16%|█▌        | 156/1000 [11:31<57:20,  4.08s/it]

Error extracting text from http://www.morningnewsusa.com/south-china-sea-war-china-perfecting-cyber-attack-amassing-allies-2386279.html: HTTPConnectionPool(host='www.morningnewsusa.com', port=80): Max retries exceeded with url: /south-china-sea-war-china-perfecting-cyber-attack-amassing-allies-2386279.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db1be0>: Failed to resolve 'www.morningnewsusa.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.cnbc.com/2017/09/12/watchdog-group-wants-facebook-to-release-russian-sponsored-ads.html


Processing URLs:  16%|█▌        | 159/1000 [11:49<1:08:15,  4.87s/it]

Error extracting text from http://thehill.com/policy/energy-environment/254033-house-committee-passes-crude-oil-export-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/254033-house-committee-passes-crude-oil-export-bill/


Processing URLs:  16%|█▌        | 161/1000 [11:51<43:20,  3.10s/it]  

Error extracting text from http://www.brookings.edu/research/articles/2003/06/summer-elections-mann: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/articles/2003/06/summer-elections-mann


Processing URLs:  16%|█▌        | 162/1000 [11:51<32:10,  2.30s/it]

Error extracting text from https://www.nytimes.com/2017/03/09/opinion/connecting-trumps-dots-to-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/09/opinion/connecting-trumps-dots-to-russia.html


Processing URLs:  16%|█▋        | 164/1000 [11:54<25:38,  1.84s/it]

Error extracting text from https://www.corelogic.co.nz/news/property-market-slowdown-continued-august-covid-resurgence-could-cause-temporary-reversal-0#.YS7KS44zaUk: 404 Client Error: Not Found for url: https://www.corelogic.co.nz/news/property-market-slowdown-continued-august-covid-resurgence-could-cause-temporary-reversal-0#.YS7KS44zaUk


Processing URLs:  16%|█▋        | 165/1000 [11:55<21:03,  1.51s/it]

Error extracting text from https://www.defense.gov/News/Article/Article/1401287/flying-high-afghan-tactical-air-controllers-strengthen-capabilities/: 403 Client Error: Forbidden for url: https://www.defense.gov/News/Article/Article/1401287/flying-high-afghan-tactical-air-controllers-strengthen-capabilities/


Processing URLs:  17%|█▋        | 169/1000 [12:05<37:55,  2.74s/it]

Error extracting text from https://vimeo.com/516977972: 404 Client Error: Not Found for url: https://vimeo.com/516977972


Processing URLs:  17%|█▋        | 172/1000 [12:27<1:02:00,  4.49s/it]

URL filtered: https://m.youtube.com/watch?v=R2e2yHjc_mc#action=share


Processing URLs:  17%|█▋        | 174/1000 [12:29<39:54,  2.90s/it]  

Error extracting text from http://www.tv360nigeria.com/us-donates-30-million-ease-humanitarian-crisis-northeast-nigeria/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  18%|█▊        | 175/1000 [12:30<32:47,  2.38s/it]

URL filtered: https://twitter.com/mhmck/status/1497390863323996163


Processing URLs:  18%|█▊        | 177/1000 [12:31<20:24,  1.49s/it]

Error extracting text from http://ottawacitizen.com/news/politics/if-canadians-went-to-the-polls-today-the-liberals-would-get-a-supermajority-says-poll: 403 Client Error: Forbidden for url: https://ottawacitizen.com:443/news/politics/if-canadians-went-to-the-polls-today-the-liberals-would-get-a-supermajority-says-poll


Processing URLs:  18%|█▊        | 178/1000 [12:46<1:04:50,  4.73s/it]

Error extracting text from http://www.investopedia.com/investing/how-interest-rates-affect-stock-market/: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/investing/how-interest-rates-affect-stock-market/


Processing URLs:  18%|█▊        | 180/1000 [12:49<42:57,  3.14s/it]  

Error extracting text from https://www.icrc.org/customary-ihl/eng/docs/v1_rul_rule156: 403 Client Error: Forbidden for url: https://www.icrc.org/customary-ihl/eng/docs/v1_rul_rule156


Processing URLs:  18%|█▊        | 181/1000 [12:49<33:45,  2.47s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-27/poland-plots-endgame-in-democracy-row-as-eu-obsesses-over-brexit


Processing URLs:  18%|█▊        | 183/1000 [12:51<22:53,  1.68s/it]

Error extracting text from http://in.reuters.com/article/india-china-coal-idINKBN0O51CD20150520: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  18%|█▊        | 185/1000 [12:58<32:44,  2.41s/it]

Error extracting text from http://www.aina.org/news/20160809122436.htm: 404 Client Error:  for url: http://www.aina.org/news/20160809122436.htm


Processing URLs:  19%|█▊        | 187/1000 [13:00<21:35,  1.59s/it]

Error extracting text from https://www.wsj.com/articles/yellen-irs-push-democrats-to-require-banks-to-report-annual-account-flows-11631727020: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/yellen-irs-push-democrats-to-require-banks-to-report-annual-account-flows-11631727020


Processing URLs:  19%|█▉        | 189/1000 [13:02<16:23,  1.21s/it]

Error extracting text from https://www.nytimes.com/2021/01/21/us/politics/biden-russia-cyber-hack-nuclear.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/21/us/politics/biden-russia-cyber-hack-nuclear.html


Processing URLs:  19%|█▉        | 193/1000 [13:08<17:23,  1.29s/it]

Error extracting text from https://www.nasdaq.com/articles/emerging-markets-turkeys-lira-jumps-as-c.bank-delivers-another-sharp-rate-hike-2020-12-24: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/emerging-markets-turkeys-lira-jumps-as-c.bank-delivers-another-sharp-rate-hike-2020-12-24


Processing URLs:  20%|█▉        | 195/1000 [13:10<13:09,  1.02it/s]

Error extracting text from http://www.nytimes.com/2008/12/22/washington/22combat.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2008/12/22/washington/22combat.html


Processing URLs:  20%|█▉        | 198/1000 [13:21<37:28,  2.80s/it]

Error extracting text from http://www.defensenews.com/story/defense/international/europe/2015/11/30/sources-nato-set-invite-montenegro-join-alliance/76567750/: 404 Client Error: Not Found for url: https://www.defensenews.com/story/defense/international/europe/2015/11/30/sources-nato-set-invite-montenegro-join-alliance/76567750/


Processing URLs:  20%|██        | 205/1000 [13:32<16:10,  1.22s/it]

Error extracting text from https://www.congress.gov/bill/114th-congress/house-bill/702/all-actions-without-amendments?q={%22search%22%3A[%22oil+export%22]}&amp;resultIndex=3: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/house-bill/702/all-actions-without-amendments?q=%7B%22search%22%3A%5B%22oil+export%22%5D%7D&amp;resultIndex=3


Processing URLs:  21%|██        | 209/1000 [13:46<54:37,  4.14s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-election-results/u-s-eu-say-they-do-not-recognize-venezuela-parliamentary-vote-idUSKBN28H0L3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-election-results/u-s-eu-say-they-do-not-recognize-venezuela-parliamentary-vote-idUSKBN28H0L3


Processing URLs:  21%|██        | 212/1000 [13:50<30:03,  2.29s/it]

Error extracting text from https://www.amazon.com/b?ie=UTF8&node=17044620011: 503 Server Error: Service Unavailable for url: https://www.amazon.com/b?ie=UTF8&node=17044620011
URL filtered: https://www.youtube.com/watch?v=efHCdKb5UWc


Processing URLs:  22%|██▏       | 215/1000 [13:54<22:44,  1.74s/it]

Error extracting text from https://www.justice.gov/archive/ag/speeches/2002/natlsecentryexittrackingsys.htm: 403 Client Error: Forbidden for url: https://www.justice.gov/archive/ag/speeches/2002/natlsecentryexittrackingsys.htm


Processing URLs:  22%|██▏       | 217/1000 [13:56<19:25,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=UPw-3e_pzqU
URL filtered: http://www.bloomberg.com/news/articles/2015-11-29/kuwait-oil-minister-leaves-post-days-before-opec-meeting


Processing URLs:  23%|██▎       | 228/1000 [14:12<17:01,  1.32s/it]

Error extracting text from https://thehill.com/opinion/white-house/495580-a-hillary-clinton-barack-obama-ticket-to-replace-joe-biden: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/495580-a-hillary-clinton-barack-obama-ticket-to-replace-joe-biden/
URL filtered: https://twitter.com/KyivIndependent/status/1500765748364521474


Processing URLs:  24%|██▍       | 241/1000 [14:28<17:41,  1.40s/it]

Error extracting text from http://www.newsweek.com/china-warns-us-south-china-sea-patrols-557431: 403 Client Error: Forbidden for url: https://www.newsweek.com/china-warns-us-south-china-sea-patrols-557431


Processing URLs:  24%|██▍       | 243/1000 [14:29<10:58,  1.15it/s]

Error extracting text from http://www.strategicstudiesinstitute.army.mil/index.cfm/articles/Russian-Engagement-in-Latin-America/2015/04/24: HTTPConnectionPool(host='www.strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /index.cfm/articles/Russian-Engagement-in-Latin-America/2015/04/24 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303a768a0>: Failed to resolve 'www.strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.fbi.gov/news/stories/national-cyber-security-awareness-month-2017: 403 Client Error: Forbidden for url: https://www.fbi.gov/news/stories/national-cyber-security-awareness-month-2017


Processing URLs:  25%|██▍       | 247/1000 [14:37<24:07,  1.92s/it]

Error extracting text from http://www.hellenicshippingnews.com/iran-wants-to-return-its-pre-sanctions-share-of-14-5-in-opec: 404 Client Error: Not Found for url: https://www.hellenicshippingnews.com/iran-wants-to-return-its-pre-sanctions-share-of-14-5-in-opec


Processing URLs:  25%|██▍       | 249/1000 [14:38<15:25,  1.23s/it]

Error extracting text from https://www.google.com/trends/explore#geo=GB: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#geo=GB
Error extracting text from http://ir.tesla.com/releasedetail.cfm?ReleaseID=991720: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?ReleaseID=991720


Processing URLs:  25%|██▌       | 251/1000 [14:39<09:14,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/iranian-hackers-infiltrated-new-york-dam-in-2013-1450662559: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iranian-hackers-infiltrated-new-york-dam-in-2013-1450662559


Processing URLs:  25%|██▌       | 252/1000 [14:39<07:40,  1.63it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-johnson-idUSKBN1770MA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-johnson-idUSKBN1770MA


Processing URLs:  25%|██▌       | 253/1000 [14:40<08:18,  1.50it/s]

Error extracting text from http://www.indiandefensenews.in/2016/02/message-of-cooperation-in-fleet-review.html: 404 Client Error: Not Found for url: https://www.indiandefensenews.in/2016/02/message-of-cooperation-in-fleet-review.html


Processing URLs:  26%|██▌       | 256/1000 [14:43<10:09,  1.22it/s]

Error extracting text from http://www.khaosodenglish.com/opinion/2017/01/07/rule-iron-fist-lie-without-shame/: 403 Client Error: Forbidden for url: https://www.khaosodenglish.com/opinion/2017/01/07/rule-iron-fist-lie-without-shame/
Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-idUSKBN1AP1OR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-idUSKBN1AP1OR


Processing URLs:  26%|██▌       | 260/1000 [14:47<13:17,  1.08s/it]

URL filtered: https://www.facebook.com/PrairieRoseRanch


Processing URLs:  26%|██▌       | 262/1000 [14:47<08:17,  1.48it/s]

Error extracting text from https://www.thestreet.com/story/14301838/1/att-time-warner-deal.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/14301838/1/att-time-warner-deal.html


Processing URLs:  26%|██▋       | 263/1000 [14:48<08:19,  1.48it/s]

URL filtered: http://noduslabs.com/cases/russian-protest-network-analysis-facebook-gephi-netvizz/


Processing URLs:  27%|██▋       | 266/1000 [14:49<05:51,  2.09it/s]

Error extracting text from https://trends.google.com/trends/explore: 429 Client Error: unknown for url: https://trends.google.com/trends/explore
Error extracting text from https://www.reuters.com/business/energy/uniper-ceo-does-not-expect-nord-stream-2-relief-winter-gas-squeeze-2021-10-01/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/uniper-ceo-does-not-expect-nord-stream-2-relief-winter-gas-squeeze-2021-10-01/


Processing URLs:  27%|██▋       | 270/1000 [14:51<06:42,  1.81it/s]

Error extracting text from http://www.straitstimes.com/world/middle-east/israeli-pms-choice-for-a-g-to-decide-on-filing-charges: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  27%|██▋       | 272/1000 [15:53<3:20:37, 16.54s/it]

Error extracting text from http://ir.gazprom-neft.com/fileadmin/user_upload/documents/shareholders_meetings/info-22-06-2007-eng-4.pdf: HTTPConnectionPool(host='ir.gazprom-neft.com', port=80): Max retries exceeded with url: /fileadmin/user_upload/documents/shareholders_meetings/info-22-06-2007-eng-4.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe913260>, 'Connection to ir.gazprom-neft.com timed out. (connect timeout=60)'))


Processing URLs:  27%|██▋       | 273/1000 [15:55<2:30:56, 12.46s/it]

Error extracting text from https://www.google.com/amp/s/mobile.reuters.com/article/amp/idUSKBN2BU39F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN2BU39F


Processing URLs:  27%|██▋       | 274/1000 [15:55<1:50:16,  9.11s/it]

Error extracting text from http://www.peruviantimes.com/30/colourless-campaign-heads-into-final-stretch/26278/: 406 Client Error: Not Acceptable for url: http://www.peruviantimes.com/30/colourless-campaign-heads-into-final-stretch/26278/


Processing URLs:  28%|██▊       | 277/1000 [16:00<49:11,  4.08s/it]  

Error extracting text from http://news.softpedia.com/news/on-chernobyl-s-30th-anniversary-malware-shuts-down-german-nuclear-power-plant-503429.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/on-chernobyl-s-30th-anniversary-malware-shuts-down-german-nuclear-power-plant-503429.shtml


Processing URLs:  28%|██▊       | 281/1000 [16:05<21:02,  1.76s/it]

Error extracting text from http://www.defense.gov/News/Special-Reports/0814_Inherent-Resolve: 403 Client Error: Forbidden for url: http://www.defense.gov/News/Special-Reports/0814_Inherent-Resolve


Processing URLs:  29%|██▊       | 286/1000 [16:14<22:58,  1.93s/it]

URL filtered: https://www.youtube.com/embed/4XDmiU41wtE&quot


Processing URLs:  29%|██▉       | 289/1000 [16:16<12:14,  1.03s/it]

URL filtered: https://twitter.com/MaryamNSharif


Processing URLs:  29%|██▉       | 291/1000 [16:17<10:35,  1.12it/s]

Error extracting text from http://www.groundreport.com/irans-syria-policy-no-shift-nuclear-deal-elections/: 403 Client Error: Forbidden for url: http://www.groundreport.com/irans-syria-policy-no-shift-nuclear-deal-elections/


Processing URLs:  30%|██▉       | 297/1000 [16:33<41:46,  3.57s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/un-agency-to-release-report-on-progress-of-iran-nuke-deal/2015/11/18/f93644e0-8def-11e5-934c-a369c80822c2_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/un-agency-to-release-report-on-progress-of-iran-nuke-deal/2015/11/18/f93644e0-8def-11e5-934c-a369c80822c2_story.html


Processing URLs:  30%|██▉       | 298/1000 [16:35<34:38,  2.96s/it]

Error extracting text from https://www.thelifeyoucansave.org/Blog/ID/235/The-Life-You-Can-Saves-2015-Year-in-Review: 403 Client Error: Forbidden for url: https://www.thelifeyoucansave.org/Blog/ID/235/The-Life-You-Can-Saves-2015-Year-in-Review


Processing URLs:  30%|██▉       | 299/1000 [16:36<28:15,  2.42s/it]

Error extracting text from http://www.fayobserver.com/military/soldiers-from-fort-bragg-deploying-to-fight-isis/article_33c6b7cf-b4cd-5258-8bc4-6146817e0bc2.html: 404 Client Error: OK for url: https://www.fayobserver.com/military/soldiers-from-fort-bragg-deploying-to-fight-isis/article_33c6b7cf-b4cd-5258-8bc4-6146817e0bc2.html/


Processing URLs:  30%|███       | 304/1000 [16:47<21:31,  1.86s/it]

Error extracting text from https://www.cnet.com/roadshow/news/chinese-automaker-baic-to-eliminate-fossil-fuel-vehicles-by-2025/: 410 Client Error: Gone for url: https://www.cnet.com/roadshow/news/chinese-automaker-baic-to-eliminate-fossil-fuel-vehicles-by-2025/


Processing URLs:  31%|███       | 306/1000 [16:54<29:30,  2.55s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/603860/obama-hagel-mark-end-of-operation-enduring-freedom: 403 Client Error: Forbidden for url: http://www.defense.gov/News-Article-View/Article/603860/obama-hagel-mark-end-of-operation-enduring-freedom


Processing URLs:  31%|███       | 310/1000 [16:56<10:33,  1.09it/s]

Error extracting text from http://m.state.gov/md257513.htm: HTTPConnectionPool(host='m.state.gov', port=80): Max retries exceeded with url: /md257513.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fea0c230>: Failed to resolve 'm.state.gov' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN15U16Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN15U16Y


Processing URLs:  31%|███       | 312/1000 [16:57<09:27,  1.21it/s]

Error extracting text from http://thehill.com/homenews/sunday-talk-shows/361842-gop-senator-moore-would-be-a-distraction-to-gop-agenda-if-elected: 403 Client Error: Forbidden for url: https://thehill.com/homenews/sunday-talk-shows/361842-gop-senator-moore-would-be-a-distraction-to-gop-agenda-if-elected/


Processing URLs:  32%|███▏      | 316/1000 [17:02<11:53,  1.04s/it]

URL filtered: http://www.bbc.com/news/world-europe-35449152?ns_mchannel=social&amp;ns_campaign=bbc_breaking&amp;ns_source=twitter&amp;ns_linkname=news_central


Processing URLs:  32%|███▏      | 319/1000 [17:04<09:16,  1.22it/s]

Error extracting text from http://www.transparency.org/cpi2014/results#myAnchor1: 404 Client Error: Not Found for url: https://www.transparency.org/en/cpi2014/results#myAnchor1


Processing URLs:  32%|███▏      | 320/1000 [17:05<10:49,  1.05it/s]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7447761/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7447761/


Processing URLs:  32%|███▏      | 323/1000 [17:08<10:46,  1.05it/s]

Error extracting text from http://en.dailypakistan.com.pk/pakistan/pm-still-needs-to-recover-from-operation-daily-pakistan-editor-mujib-shami/: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/pakistan/pm-still-needs-to-recover-from-operation-daily-pakistan-editor-mujib-shami/


Processing URLs:  32%|███▏      | 324/1000 [17:10<12:10,  1.08s/it]

Error extracting text from https://www.australianclinicaltrials.gov.au/anzctr_feed/result?searchText=ultrasound&purposeOfStudy=&recruitmentStatus=Recruiting&phase=&ethicsApproval=Yes&gender=&healthyVolunteers=&recruitmentSites=&healthConditions=&ageGroup=&recruitmentCountries=Australia&studyType=&conditionCategory=Neurological&conditionCode=Alzheimer%2527s%2520disease&paging=20&pageNumber=1: 403 Client Error: Forbidden for url: https://www.australianclinicaltrials.gov.au/anzctr_feed/result?searchText=ultrasound&purposeOfStudy=&recruitmentStatus=Recruiting&phase=&ethicsApproval=Yes&gender=&healthyVolunteers=&recruitmentSites=&healthConditions=&ageGroup=&recruitmentCountries=Australia&studyType=&conditionCategory=Neurological&conditionCode=Alzheimer%2527s%2520disease&paging=20&pageNumber=1


Processing URLs:  32%|███▎      | 325/1000 [17:10<10:04,  1.12it/s]

Error extracting text from https://www.chicagofed.org/publications/agletter/index: 403 Client Error: Forbidden for url: https://www.chicagofed.org/publications/agletter/index
Error extracting text from http://www.reuters.com/article/us-iran-oil-production-idUSKCN0XG0X5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-production-idUSKCN0XG0X5
URL filtered: http://www.bloomberg.com/politics/articles/2015-11-02/clinton-and-sanders-enter-new-phase-in-battle-for-new-hampshire


Processing URLs:  33%|███▎      | 328/1000 [17:10<05:14,  2.14it/s]

Error extracting text from http://www.nytimes.com/2001/04/02/world/us-plane-in-china-after-it-collides-with-chinese-jet.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2001/04/02/world/us-plane-in-china-after-it-collides-with-chinese-jet.html


Processing URLs:  33%|███▎      | 332/1000 [17:25<28:15,  2.54s/it]

Error extracting text from http://www.reuters.com/article/us-usa-mexico-trade-videgaray-idUSKBN16G38U?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-mexico-trade-videgaray-idUSKBN16G38U?il=0


Processing URLs:  34%|███▎      | 335/1000 [17:26<13:39,  1.23s/it]

Error extracting text from http://jakartaglobe.beritasatu.com/business/indonesia-net-oil-importer-hopes-rejoin-exporters-cartel-opec/: HTTPConnectionPool(host='jakartaglobe.beritasatu.com', port=80): Max retries exceeded with url: /business/indonesia-net-oil-importer-hopes-rejoin-exporters-cartel-opec/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x117cdb500>: Failed to resolve 'jakartaglobe.beritasatu.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-hongkong-protests-china-office/china-warns-hong-kong-protesters-not-to-play-with-fire-idUSKCN1UW0L3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-protests-china-office/china-warns-hong-kong-protesters-not-to-play-with-fire-idUSKCN1UW0L3


Processing URLs:  34%|███▍      | 338/1000 [17:41<35:34,  3.22s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-03-16/stop-trump-groups-make-one-last-bet-on-rubio-and-lose?cmpid=BBD031616_POL


Processing URLs:  34%|███▍      | 340/1000 [17:43<24:48,  2.26s/it]

Error extracting text from https://uk.reuters.com/article/uk-germany-politics/german-spd-considers-propping-up-merkel-but-only-if-members-agree-idUKKBN1DO0NQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  34%|███▍      | 343/1000 [17:53<26:55,  2.46s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-02-19/le-pen-party-taps-russian-banks-to-fund-2017-election-campaign
Error extracting text from https://www.reuters.com/article/indonesia-china/update-1-china-to-import-more-indonesian-products-to-balanced-trade-idUSL1N2JO0SZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/indonesia-china/update-1-china-to-import-more-indonesian-products-to-balanced-trade-idUSL1N2JO0SZ


Processing URLs:  35%|███▌      | 352/1000 [18:08<20:08,  1.86s/it]

Error extracting text from https://defesa.com.br/vice-premie-russo-cuba-e-nosso-principal-parceiro-e-aliado-na-america-latina/: 406 Client Error: Not Acceptable for url: https://defesa.com.br/vice-premie-russo-cuba-e-nosso-principal-parceiro-e-aliado-na-america-latina/


Processing URLs:  35%|███▌      | 353/1000 [18:09<16:52,  1.57s/it]

Error extracting text from https://www.patmccrory.com/2016/06/02/attorney-general-reverses-course-federal-litigation/: 403 Client Error: Forbidden for url: https://www.patmccrory.com/2016/06/02/attorney-general-reverses-course-federal-litigation/


Processing URLs:  36%|███▌      | 357/1000 [18:15<15:33,  1.45s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/335398-ex-cia-director-if-kushner-set-up-secure-line-with-russia-cia: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/335398-ex-cia-director-if-kushner-set-up-secure-line-with-russia-cia/


Processing URLs:  36%|███▌      | 359/1000 [18:16<11:31,  1.08s/it]

Error extracting text from http://intelligencebriefs.com/dozens-of-boko-haram-terrorists-killed-scores-injured-by-nigerian-army-in-an-operation-at-the-sambisa-forest/: 406 Client Error: Not Acceptable for url: http://intelligencebriefs.com/dozens-of-boko-haram-terrorists-killed-scores-injured-by-nigerian-army-in-an-operation-at-the-sambisa-forest/
Error extracting text from http://news.usni.org/2016/02/22/new-possible-chinese-radar-installation-on-south-china-sea-artificial-island-could-put-u-s-allied-stealth-aircraft-at-risk: 403 Client Error: Forbidden for url: http://news.usni.org/2016/02/22/new-possible-chinese-radar-installation-on-south-china-sea-artificial-island-could-put-u-s-allied-stealth-aircraft-at-risk


Processing URLs:  36%|███▋      | 365/1000 [18:22<11:30,  1.09s/it]

Error extracting text from http://uk.reuters.com/article/us-ecb-eurozone-idUKKBN1542KL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  37%|███▋      | 367/1000 [18:23<07:52,  1.34it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/03/oposicao-vai-obstruir-votacoes-ate-cunha-instalar-comissao-do-impeachment-4990540.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/03/oposicao-vai-obstruir-votacoes-ate-cunha-instalar-comissao-do-impeachment-4990540.html&amp;prev=search


Processing URLs:  37%|███▋      | 373/1000 [18:36<23:37,  2.26s/it]

Error extracting text from http://www.iowafarmertoday.com/news/regional/farmers-lenders-keep-focus-on-managing-costs/article_0376fdda-f94e-11e6-80fe-87c4b330c9c3.html: 404 Client Error: Not Found for url: https://agupdate.com/iowafarmertoday/news/regional/farmers-lenders-keep-focus-on-managing-costs/article_0376fdda-f94e-11e6-80fe-87c4b330c9c3.html


Processing URLs:  38%|███▊      | 376/1000 [18:38<11:07,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-iran-turkey-visit-idUSKCN0W70DB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-turkey-visit-idUSKCN0W70DB
Error extracting text from http://ahtribune.com/world/north-africa-south-west-asia/843-isis-freezes-people.html: HTTPConnectionPool(host='ahtribune.com', port=80): Max retries exceeded with url: /world/north-africa-south-west-asia/843-isis-freezes-people.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x301cd2e70>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  38%|███▊      | 378/1000 [26:38<19:13:45, 111.29s/it]

Error extracting text from https://www.thespainreport.com/articles/755-160605132019-unidos-podemos-can-win-seats-in-almost-every-province-says-spain-s-nate-silver-kiko-llaneras: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/755-160605132019-unidos-podemos-can-win-seats-in-almost-every-province-says-spain-s-nate-silver-kiko-llaneras (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301cd1c40>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  38%|███▊      | 379/1000 [26:39<14:27:41, 83.83s/it] 

Error extracting text from http://www.nwo.usace.army.mil/Missions/Civil-Works/Planning/Project-Reports/Article/633496/dakota-access-pipeline-environmental-assessment/: 403 Client Error: Forbidden for url: http://www.nwo.usace.army.mil/Missions/Civil-Works/Planning/Project-Reports/Article/633496/dakota-access-pipeline-environmental-assessment/


Processing URLs:  38%|███▊      | 385/1000 [26:49<2:10:24, 12.72s/it] 

Error extracting text from https://www.tsa.gov/coronavirus/passenger-throughput).: 403 Client Error: Forbidden for url: https://www.tsa.gov/coronavirus/passenger-throughput).


Processing URLs:  39%|███▊      | 387/1000 [26:51<1:12:05,  7.06s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/japan-reacts-strongly-to/2483058.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/japan-reacts-strongly-to/2483058.html


Processing URLs:  39%|███▉      | 390/1000 [26:55<31:30,  3.10s/it]  

Error extracting text from https://bit.ly/3yQwFyh: 400 Client Error: Bad Request for url: https://twitter.com/arianespaceceo/status/1471989676349267969
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O5G3IF6JTSE901-17BNH346UBIK4GQ7V0S0BG8CGF


Processing URLs:  39%|███▉      | 393/1000 [26:56<15:57,  1.58s/it]

Error extracting text from https://www.g20.org/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
Error extracting text from https://www.straitstimes.com/opinion/politics-and-covid-19-vaccine-fears: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  39%|███▉      | 394/1000 [26:57<12:12,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN0TX2MJ20151215: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN0TX2MJ20151215


Processing URLs:  40%|███▉      | 395/1000 [26:58<12:00,  1.19s/it]

Error extracting text from http://www.eli-alps.hu/: HTTPSConnectionPool(host='www.eli-alps.hu', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  40%|███▉      | 396/1000 [27:02<19:34,  1.94s/it]

Error extracting text from http://izvestia.ru/news/656514: 403 Client Error: Forbidden for url: https://iz.ru/news/656514
Error extracting text from http://wj.parliament.af/english.aspx: HTTPConnectionPool(host='wj.parliament.af', port=80): Max retries exceeded with url: /english.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e0be90>: Failed to resolve 'wj.parliament.af' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|███▉      | 399/1000 [27:05<14:50,  1.48s/it]

Error extracting text from https://www.nytimes.com/2021/04/29/nyregion/nyc-reopening-de-blasio.html?searchResultPosition=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/29/nyregion/nyc-reopening-de-blasio.html?searchResultPosition=1
Error extracting text from http://www.foxnews.com/politics/2017/03/15/house-health-bill-doa-unless-major-changes-gop-senators-say.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/politics/2017/03/15/house-health-bill-doa-unless-major-changes-gop-senators-say.html


Processing URLs:  41%|████      | 410/1000 [27:25<19:56,  2.03s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-15/russians-hit-u-k-tech-firms-with-cyber-attacks-top-spy-says


Processing URLs:  41%|████      | 412/1000 [27:26<13:13,  1.35s/it]

Error extracting text from http://www.fayobserver.com/news/local/n-c-polls-show-clinton-cooper-and-burr-leading-their/article_8aef64ae-26b2-57c3-af70-c4301b1a6751.html: 404 Client Error: OK for url: https://www.fayobserver.com/news/local/n-c-polls-show-clinton-cooper-and-burr-leading-their/article_8aef64ae-26b2-57c3-af70-c4301b1a6751.html/


Processing URLs:  41%|████▏     | 414/1000 [27:31<16:41,  1.71s/it]

Error extracting text from http://ac.els-cdn.com/S0168165615300419/1-s2.0-S0168165615300419-main.pdf?_tid=2558e32a-e8b3-11e5-b9ad-00000aab0f6b&acdnat=1457829363_306840f5b1f8d467cd2e907776bea27b: HTTPConnectionPool(host='ac.els-cdn.com', port=80): Max retries exceeded with url: /S0168165615300419/1-s2.0-S0168165615300419-main.pdf?_tid=2558e32a-e8b3-11e5-b9ad-00000aab0f6b&acdnat=1457829363_306840f5b1f8d467cd2e907776bea27b (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e0b230>: Failed to resolve 'ac.els-cdn.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 416/1000 [28:31<2:14:19, 13.80s/it]

Error extracting text from http://www.usnews.com/opinion/blogs/opinion-blog/2015/07/17/donald-trumps-poll-bump-will-fade: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  42%|████▏     | 417/1000 [28:31<1:43:23, 10.64s/it]

Error extracting text from http://www.wsj.com/articles/swift-banking-network-struggles-with-wave-of-cyberattacks-1463786328: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/swift-banking-network-struggles-with-wave-of-cyberattacks-1463786328


Processing URLs:  42%|████▏     | 421/1000 [28:42<48:43,  5.05s/it]  

Error extracting text from https://www.nytimes.com/2017/10/30/us/politics/paul-manafort-indicted.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/30/us/politics/paul-manafort-indicted.html?_r=0


Processing URLs:  42%|████▏     | 422/1000 [28:43<36:52,  3.83s/it]

Error extracting text from https://www.defense.gov/Portals/1/Documents/pubs/2018-National-Defense-Strategy-Summary.pdf: 403 Client Error: Forbidden for url: https://www.defense.gov/Portals/1/Documents/pubs/2018-National-Defense-Strategy-Summary.pdf


Processing URLs:  43%|████▎     | 427/1000 [28:49<16:36,  1.74s/it]

Error extracting text from http://www.theepochtimes.com/n3/2138603-what-chinas-sdr-bond-issue-really-means/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2138603-what-chinas-sdr-bond-issue-really-means/


Processing URLs:  44%|████▍     | 439/1000 [29:15<14:08,  1.51s/it]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-italy-iom-idUSKCN0XC1WL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-italy-iom-idUSKCN0XC1WL
Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)00562-6/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)00562-6/fulltext
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-davos-idUSKCN0US0QO20160114: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-davos-idUSKCN0US0QO20160114


Processing URLs:  44%|████▍     | 444/1000 [29:20<09:59,  1.08s/it]

Error extracting text from https://www.scientificamerican.com/article/nasas-james-webb-space-telescope-is-fueled-for-late-december-launch/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/nasas-james-webb-space-telescope-is-fueled-for-late-december-launch/


Processing URLs:  44%|████▍     | 445/1000 [29:25<20:18,  2.20s/it]

URL filtered: https://twitter.com/BloombergUK/status/1473346503422971908


Processing URLs:  45%|████▌     | 453/1000 [29:31<06:48,  1.34it/s]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=114733: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=114733


Processing URLs:  46%|████▌     | 457/1000 [29:43<23:42,  2.62s/it]

Error extracting text from https://www.reuters.com/article/us-canada-politics-nafta/canadas-pm-talks-tough-on-nafta-repeats-he-could-walk-away-idUSKBN1FM2VL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-canada-politics-nafta/canadas-pm-talks-tough-on-nafta-repeats-he-could-walk-away-idUSKBN1FM2VL


Processing URLs:  46%|████▌     | 459/1000 [29:45<16:08,  1.79s/it]

Error extracting text from http://allafrica.com/stories/201512030244.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201512030244.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x301cd0d40>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  46%|████▌     | 461/1000 [29:49<15:37,  1.74s/it]

Error extracting text from http://www.wsj.com/articles/effects-of-brexit-vote-to-span-markets-politics-1466630203: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/effects-of-brexit-vote-to-span-markets-politics-1466630203


Processing URLs:  47%|████▋     | 469/1000 [30:01<11:21,  1.28s/it]

Error extracting text from https://www.reuters.com/article/us-northkorea-missiles/north-korea-fires-submarine-launched-ballistic-missile-towards-japan-idUSKCN10Y2B0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles/north-korea-fires-submarine-launched-ballistic-missile-towards-japan-idUSKCN10Y2B0


Processing URLs:  47%|████▋     | 472/1000 [30:02<05:34,  1.58it/s]

Error extracting text from http://www.caracaschronicles.com/2016/03/31/fun-numbers-datanalisis-style/: 403 Client Error: Forbidden for url: http://www.caracaschronicles.com/2016/03/31/fun-numbers-datanalisis-style/
Error extracting text from http://www.reuters.com/article/us-usa-election-syria-idUSKBN1342D6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-syria-idUSKBN1342D6


Processing URLs:  48%|████▊     | 477/1000 [30:13<11:57,  1.37s/it]

Error extracting text from https://ia601500.us.archive.org/33/items/ftrbso/ftrbso.mp3: 403 Client Error: Forbidden for url: https://ia801200.us.archive.org/21/items/ftrbso/ftrbso.mp3
Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/us/2016_democratic_presidential_nomination-3824.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/2016_democratic_presidential_nomination-3824.html


Processing URLs:  48%|████▊     | 483/1000 [30:19<08:33,  1.01it/s]

Error extracting text from https://www.wsj.com/articles/u-s-and-taiwan-set-date-to-revive-trade-and-investment-talks-11624626153: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-and-taiwan-set-date-to-revive-trade-and-investment-talks-11624626153


Processing URLs:  48%|████▊     | 485/1000 [30:23<11:38,  1.36s/it]

Error extracting text from https://science.nasa.gov/science-news/science-at-nasa/2003/12nov_haywire: 404 Client Error: Page not found: /science-news/science-at-nasa/2003/12nov_haywire for url: https://science.nasa.gov/science-news/science-at-nasa/2003/12nov_haywire
Error extracting text from http://www.reuters.com/article/us-usa-economy-idUSKCN0YM1HC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-economy-idUSKCN0YM1HC


Processing URLs:  49%|████▊     | 487/1000 [30:26<10:48,  1.26s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=321883: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=321883


Processing URLs:  49%|████▉     | 489/1000 [30:27<07:44,  1.10it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trade-ip-idUSKCN0XO1IT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-ip-idUSKCN0XO1IT


Processing URLs:  49%|████▉     | 490/1000 [30:28<07:30,  1.13it/s]

Error extracting text from http://beta.latimes.com/world/europe/la-fg-internet-freedom-20171113-story.html#nws=mcnewsletter: 400 Client Error: Bad Request for url: http://beta.latimes.com/world/europe/la-fg-internet-freedom-20171113-story.html#nws=mcnewsletter


Processing URLs:  49%|████▉     | 492/1000 [30:30<06:39,  1.27it/s]

Error extracting text from https://theconversation.com/thirty-years-on-as-new-cold-war-looms-us-and-russia-should-remember-the-rekyjavik-summit-67084: 403 Client Error: Forbidden for url: https://theconversation.com/thirty-years-on-as-new-cold-war-looms-us-and-russia-should-remember-the-rekyjavik-summit-67084
Error extracting text from https://www.reuters.com/article/us-brazil-security/brazil-army-ordered-to-take-over-security-in-violent-rio-de-janeiro-idUSKCN1G00WP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-security/brazil-army-ordered-to-take-over-security-in-violent-rio-de-janeiro-idUSKCN1G00WP


Processing URLs:  50%|████▉     | 498/1000 [30:35<07:57,  1.05it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/federal-reserve-expected-to-hold-interest-rate-at-zero/articleshow/49547280.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/federal-reserve-expected-to-hold-interest-rate-at-zero/articleshow/49547280.cms


Processing URLs:  50%|█████     | 502/1000 [30:38<07:21,  1.13it/s]

Error extracting text from http://thehill.com/policy/finance/255715-conservatives-hold-fire-on-mccarthy-scalise: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/255715-conservatives-hold-fire-on-mccarthy-scalise/


Processing URLs:  50%|█████     | 505/1000 [30:40<04:33,  1.81it/s]

Error extracting text from http://www.msnbc.com/msnbc/fbi-formally-confirms-its-investigation-hillary-clintons-email-server: 403 Client Error: Forbidden for url: http://www.msnbc.com/msnbc/fbi-formally-confirms-its-investigation-hillary-clintons-email-server
Error extracting text from http://www.reuters.com/article/us-turkey-referendum-eu-idUSKBN16W0R2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-eu-idUSKBN16W0R2


Processing URLs:  51%|█████     | 508/1000 [30:44<07:19,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-security-hodeidah-idUSKBN16I072: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-hodeidah-idUSKBN16I072


Processing URLs:  51%|█████     | 509/1000 [30:54<30:27,  3.72s/it]

Error extracting text from https://doc.research-and-analytics.csfb.com/docView?sourceid=em&amp;document_id=x667647&amp;serialid=%2FhZ2GzmSCIsZ5cahU%2BWpB9YAdrHbJv9sL7k0xOA4ACY%3D: HTTPSConnectionPool(host='plus2.credit-suisse.com', port=443): Max retries exceeded with url: /docView?sourceid=em&amp;document_id=x667647&amp;serialid=%2FhZ2GzmSCIsZ5cahU%2BWpB9YAdrHbJv9sL7k0xOA4ACY%3D (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30451cc20>: Failed to resolve 'plus2.credit-suisse.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████     | 510/1000 [30:56<27:36,  3.38s/it]

Error extracting text from http://www.populus.co.uk/wp-content/uploads/2016/03/Polls-Apart-29-March-2016.pdf: 404 Client Error: Not Found for url: https://yonderconsulting.com/wp-content/uploads/2016/03/Polls-Apart-29-March-2016.pdf


Processing URLs:  51%|█████     | 512/1000 [30:58<15:52,  1.95s/it]

Error extracting text from https://www.blog.reinz.co.nz/s/REINZ-Monthly-Property-Report-June-2021-v2.pdf: HTTPSConnectionPool(host='www.blog.reinz.co.nz', port=443): Max retries exceeded with url: /s/REINZ-Monthly-Property-Report-June-2021-v2.pdf (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.blog.reinz.co.nz'. (_ssl.c:1000)")))


Processing URLs:  52%|█████▏    | 516/1000 [31:02<12:08,  1.51s/it]

URL filtered: http://www.bloomberg.com/quote/CO1:COM


Processing URLs:  52%|█████▏    | 520/1000 [31:06<08:00,  1.00s/it]

URL filtered: https://www.youtube.com/watch?v=6FsH7RK1S2E&amp;feature=youtu.be


Processing URLs:  52%|█████▏    | 523/1000 [31:09<08:44,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-health-zika-usa-olympics-exlusive-idUSKCN0VH0BJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-zika-usa-olympics-exlusive-idUSKCN0VH0BJ


Processing URLs:  52%|█████▏    | 524/1000 [31:10<08:31,  1.07s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-05/oil-teeters-near-45-as-shale-driven-rout-erases-opec-cut-gain


Processing URLs:  53%|█████▎    | 526/1000 [31:13<09:59,  1.27s/it]

Error extracting text from http://blogs.denverpost.com/thespot/2016/01/28/senate-panel-agrees-to-cory-gardner-bill-on-north-korea/124815/: 500 Server Error: Internal Server Error for url: https://blogs.denverpost.com/thespot/2016/01/28/senate-panel-agrees-to-cory-gardner-bill-on-north-korea/124815/


Processing URLs:  53%|█████▎    | 528/1000 [32:16<2:07:47, 16.25s/it]

Error extracting text from https://www.mitrade.com/forex/crypto/bitcoin-trader/bitcoin-leverage-margin-trading#:~:text=Bitcoin%20leverage%20trading%20allows%20you,trading%20is%20known%20as%20margin: HTTPSConnectionPool(host='www.mitrade.com', port=443): Max retries exceeded with url: /forex/crypto/bitcoin-trader/bitcoin-leverage-margin-trading (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300509160>, 'Connection to www.mitrade.com timed out. (connect timeout=60)'))


Processing URLs:  53%|█████▎    | 529/1000 [32:16<1:34:26, 12.03s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/10/11/Saudi-king-expresses-Turkey-solidarity-after-attacks.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/10/11/Saudi-king-expresses-Turkey-solidarity-after-attacks.html


Processing URLs:  53%|█████▎    | 530/1000 [32:17<1:09:45,  8.91s/it]

Error extracting text from http://www.tradingeconomics.com/united-states/car-production: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/car-production


Processing URLs:  53%|█████▎    | 533/1000 [32:21<33:32,  4.31s/it]  

Error extracting text from http://www.trtworld.com/mea/au-calls-for-intl-police-to-be-deployed-into-burundi-109870: 404 Client Error: Not Found for url: https://www.trtworld.com:443/mea/au-calls-for-intl-police-to-be-deployed-into-burundi-109870


Processing URLs:  54%|█████▍    | 539/1000 [32:30<13:25,  1.75s/it]

Error extracting text from http://www.fews.net/sites/default/files/documents/reports/YE_OL_2017_06_final.pdf_: 404 Client Error: Not Found for url: https://fews.net:443/sites/default/files/documents/reports/YE_OL_2017_06_final.pdf_
Error extracting text from http://www.reuters.com/article/us-usa-afghanistan-kerry-idUSKCN0X60A1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-afghanistan-kerry-idUSKCN0X60A1


Processing URLs:  54%|█████▍    | 540/1000 [32:32<12:26,  1.62s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848974/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848974/


Processing URLs:  54%|█████▍    | 542/1000 [32:36<14:30,  1.90s/it]

Error extracting text from https://www.dailystar.com.lb/News/Middle-East/2015/Nov-19/323774-saudis-to-host-mid-dec-conference-in-bid-to-unify-syria-opposition-al-arabiya-al-hadath-tv.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2015/Nov-19/323774-saudis-to-host-mid-dec-conference-in-bid-to-unify-syria-opposition-al-arabiya-al-hadath-tv.ashx


Processing URLs:  55%|█████▍    | 547/1000 [32:43<14:05,  1.87s/it]

Error extracting text from http://www.theweek.co.uk/eu-referendum/65461/eu-referendum-polls-remarkable-anti-brexit-consensus-among-economists: 404 Client Error: Not Found for url: https://theweek.com/eu-referendum/65461/eu-referendum-polls-remarkable-anti-brexit-consensus-among-economists


Processing URLs:  55%|█████▍    | 549/1000 [32:46<10:13,  1.36s/it]

Error extracting text from http://wthitv.com/2017/01/20/illinois-state-senator-files-sugar-sweetened-beverage-tax-act/: 404 Client Error: Not Found for url: https://www.wthitv.com/2017/01/20/illinois-state-senator-files-sugar-sweetened-beverage-tax-act/
Error extracting text from http://www.reuters.com/article/us-iran-election-energy-analysis-idUSKBN1872FJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-energy-analysis-idUSKBN1872FJ


Processing URLs:  55%|█████▌    | 550/1000 [33:00<39:51,  5.31s/it]

Error extracting text from http://sundiatapost.com/2017/08/04/army-say-20-gunmen-attack-cote-divoire-police-station-steal-weapons/: 404 Client Error: Not Found for url: http://sundiatapost.com/2017/08/04/army-say-20-gunmen-attack-cote-divoire-police-station-steal-weapons/


Processing URLs:  55%|█████▌    | 554/1000 [33:02<11:32,  1.55s/it]

Error extracting text from http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&amp;field-keywords=Andrew+Karam: 503 Server Error: Service Unavailable for url: https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&amp;field-keywords=Andrew+Karam
Error extracting text from https://www.reuters.com/article/us-afghanistan-election/afghanistan-parliament-elections-likely-delayed-until-october-idUSKBN1FO0BZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-election/afghanistan-parliament-elections-likely-delayed-until-october-idUSKBN1FO0BZ


Processing URLs:  56%|█████▌    | 555/1000 [33:03<09:28,  1.28s/it]

Error extracting text from http://thehill.com/policy/finance/261728-gop-leaders-eye-new-playbook-in-shutdown-fight: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/261728-gop-leaders-eye-new-playbook-in-shutdown-fight/


Processing URLs:  56%|█████▌    | 556/1000 [33:03<07:12,  1.03it/s]

Error extracting text from https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html


Processing URLs:  56%|█████▌    | 559/1000 [33:07<08:09,  1.11s/it]

Error extracting text from https://www.wsj.com/articles/theranos-founder-elizabeth-holmes-has-over-100-questions-for-jurors-11622217048: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/theranos-founder-elizabeth-holmes-has-over-100-questions-for-jurors-11622217048


Processing URLs:  57%|█████▋    | 569/1000 [33:22<06:57,  1.03it/s]

Error extracting text from http://www.nytimes.com/2016/05/07/world/asia/north-korea-nuclear-us-strategy.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/07/world/asia/north-korea-nuclear-us-strategy.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  57%|█████▋    | 572/1000 [33:26<08:03,  1.13s/it]

Error extracting text from http://sustainablemobility.ei.columbia.edu/files/2012/12/Transforming-Personal-Mobility-Jan-27-20132.pdf: HTTPConnectionPool(host='sustainablemobility.ei.columbia.edu', port=80): Max retries exceeded with url: /files/2012/12/Transforming-Personal-Mobility-Jan-27-20132.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff4f20c0>: Failed to resolve 'sustainablemobility.ei.columbia.edu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  57%|█████▋    | 573/1000 [33:27<07:35,  1.07s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57478#.Wbg3aTMfk_U: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57478#.Wbg3aTMfk_U
URL filtered: https://www.reuters.com/article/us-israel-netanyahu-police/israeli-police-to-recommend-indicting-netanyahu-over-alleged-bribery-in-two-cases-media-idUSKCN1FX2JD?feedType=RSS&amp;feedName=worldNews&amp;utm_source=Twitter&amp;utm_medium=Social&amp;utm_campaign=Feed%3A+Reuters%2FworldNews+%28Reuters+World+News%29


Processing URLs:  57%|█████▊    | 575/1000 [33:31<10:31,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=DpZQa1lI0rM


Processing URLs:  58%|█████▊    | 578/1000 [33:35<09:27,  1.35s/it]

Error extracting text from http://www.cubiclane.com/al-nusra-offers-3-5-million-dollars-to-kill-bashar-al-assad-36745/: 403 Client Error: Forbidden for url: https://www.cubiclane.com/al-nusra-offers-3-5-million-dollars-to-kill-bashar-al-assad-36745/


Processing URLs:  58%|█████▊    | 579/1000 [33:36<09:22,  1.34s/it]

Error extracting text from https://www.rferl.org/a/manafort-deripaska-kilimnik-russia-trump-investigation/28751541.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/manafort-deripaska-kilimnik-russia-trump-investigation/28751541.html


Processing URLs:  58%|█████▊    | 581/1000 [33:39<08:43,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-turkey-idUSKBN15B2G1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-turkey-idUSKBN15B2G1


Processing URLs:  59%|█████▉    | 590/1000 [33:51<07:09,  1.05s/it]

Error extracting text from http://www.nytimes.com/2016/05/25/world/iran-assembly-experts-ahmad-jannati.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/25/world/iran-assembly-experts-ahmad-jannati.html


Processing URLs:  59%|█████▉    | 593/1000 [33:58<11:30,  1.70s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-07/streets-of-brazil-may-hold-key-to-rousseff-s-political-future


Processing URLs:  60%|█████▉    | 596/1000 [34:01<08:26,  1.25s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-conflict-indonesia-new-zealand-australia-malaysia-singapore-britain-2425483: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-conflict-indonesia-new-zealand-australia-malaysia-singapore-britain-2425483


Processing URLs:  60%|██████    | 600/1000 [34:07<08:55,  1.34s/it]

Error extracting text from http://www.imf.org/external/pubs/ft/weo/2013/01/pdf/text.pdf: 403 Client Error: Forbidden for url: http://www.imf.org/external/pubs/ft/weo/2013/01/pdf/text.pdf


Processing URLs:  60%|██████    | 604/1000 [34:13<08:24,  1.27s/it]

Error extracting text from https://www.nytimes.com/2021/07/22/us/la-county-mask-mandate.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/22/us/la-county-mask-mandate.html


Processing URLs:  61%|██████    | 607/1000 [34:19<10:14,  1.56s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-china-idUSKBN16T32E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-china-idUSKBN16T32E


Processing URLs:  61%|██████    | 609/1000 [34:24<13:20,  2.05s/it]

Error extracting text from http://www.nesdis.noaa.gov/DSCOVR/mission.html: 404 Client Error: Not Found for url: https://www.nesdis.noaa.gov/DSCOVR/mission.html


Processing URLs:  61%|██████    | 612/1000 [34:26<06:53,  1.07s/it]

Error extracting text from http://www.aflcio.org/content/download/174946/4160141/file/Colombia+Final_May2016.pdf: 404 Client Error: Not Found for url: https://aflcio.org/content/download/174946/4160141/file/Colombia+Final_May2016.pdf
Error extracting text from http://www.reuters.com/article/brazil-corruption-odebrecht-idUSL1N1AM0XG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/brazil-corruption-odebrecht-idUSL1N1AM0XG


Processing URLs:  61%|██████▏   | 614/1000 [34:27<05:08,  1.25it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/overnights/359465-overnight-cybersecurity-cyber-figures-high-at-hearing-for: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/overnights/359465-overnight-cybersecurity-cyber-figures-high-at-hearing-for/


Processing URLs:  62%|██████▏   | 615/1000 [34:44<37:02,  5.77s/it]

Error extracting text from https://www.thoughtco.com/why-iran-supports-the-syrian-regime-2353082: 406 Client Error: Not Acceptable for url: https://www.thoughtco.com/why-iran-supports-the-syrian-regime-2353082


Processing URLs:  62%|██████▏   | 620/1000 [34:52<11:55,  1.88s/it]

Error extracting text from https://www.nbcnews.com/politics/2016-election/donald-trump-s-rigged-election-claims-raise-historical-alarms-n667831: 403 Client Error: Forbidden for url: https://www.nbcnews.com/politics/2016-election/donald-trump-s-rigged-election-claims-raise-historical-alarms-n667831


Processing URLs:  63%|██████▎   | 627/1000 [35:00<05:44,  1.08it/s]

Error extracting text from http://www.nytimes.com/2009/05/24/weekinreview/24bowley.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2009/05/24/weekinreview/24bowley.html
Error extracting text from http://legalinsurrection.com/2016/08/erdogans-turkey-is-a-major-islamist-hub-says-leaked-german-intel-report/: 403 Client Error: Forbidden for url: http://legalinsurrection.com/2016/08/erdogans-turkey-is-a-major-islamist-hub-says-leaked-german-intel-report/


Processing URLs:  63%|██████▎   | 631/1000 [35:06<07:44,  1.26s/it]

Error extracting text from http://www.nytimes.com/reuters/2016/05/08/world/americas/08reuters-peru-election-poll.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/05/08/world/americas/08reuters-peru-election-poll.html


Processing URLs:  64%|██████▍   | 638/1000 [35:14<05:35,  1.08it/s]

Error extracting text from https://www.france24.com/en/live-news/20210716-france-says-military-intervention-in-haiti-not-on-agenda: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210716-france-says-military-intervention-in-haiti-not-on-agenda
Error extracting text from https://www.reuters.com/article/us-somalia-defections-exclusive/exclusive-somalia-lures-defectors-in-new-push-against-insurgents-idUSKBN1FD0KO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-defections-exclusive/exclusive-somalia-lures-defectors-in-new-push-against-insurgents-idUSKBN1FD0KO


Processing URLs:  64%|██████▍   | 639/1000 [35:15<05:24,  1.11it/s]

Error extracting text from http://www.nbcnews.com/news/us-news/u-s-deploy-special-operations-forces-syria-official-n454506: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/us-news/u-s-deploy-special-operations-forces-syria-official-n454506


Processing URLs:  64%|██████▍   | 642/1000 [35:20<09:00,  1.51s/it]

Error extracting text from https://www.bostonglobe.com/news/politics/2017/08/30/kremlin-says-president-trump-lawyer-reached-out-about-deal/ZimGWVMkEu1b31UwskH4NL/story.html: 404 Client Error: Not Found for url: https://www.bostonglobe.com/news/politics/2017/08/30/kremlin-says-president-trump-lawyer-reached-out-about-deal/ZimGWVMkEu1b31UwskH4NL/story.html


Processing URLs:  65%|██████▍   | 646/1000 [35:24<06:20,  1.07s/it]

Error extracting text from https://www.canada.ca/content/dam/hc-sc/documents/services/campaigns/27-16-1808-Factsheet-The-Facts-eng-03.pdf: 403 Client Error: Forbidden for url: https://www.canada.ca/content/dam/hc-sc/documents/services/campaigns/27-16-1808-Factsheet-The-Facts-eng-03.pdf


Processing URLs:  66%|██████▌   | 658/1000 [35:44<04:33,  1.25it/s]

Error extracting text from http://www.nbcnews.com/politics/supreme-court/trump-s-supreme-court-pick-how-does-nominee-get-confirmed-n714286: 403 Client Error: Forbidden for url: http://www.nbcnews.com/politics/supreme-court/trump-s-supreme-court-pick-how-does-nominee-get-confirmed-n714286
Error extracting text from http://bigstory.ap.org/article/8f5fa6edd1bc4ffdbc3b30b4b5963f4d/blow-iran-hard-liners-moderates-win-clerical-assembly: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/8f5fa6edd1bc4ffdbc3b30b4b5963f4d/blow-iran-hard-liners-moderates-win-clerical-assembly (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fece3d70>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  66%|██████▋   | 664/1000 [35:50<06:07,  1.09s/it]

Error extracting text from http://amti.csis.org/the-legal-rationale-for-going-inside-12/: 403 Client Error: Forbidden for url: http://amti.csis.org/the-legal-rationale-for-going-inside-12/


Processing URLs:  67%|██████▋   | 671/1000 [36:00<05:46,  1.05s/it]

Error extracting text from http://blogs.barrons.com/techtraderdaily/2015/10/06/apple-pac-crest-still-sees-a-miss-on-iphone-units-in-fyq1/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/techtraderdaily/2015/10/06/apple-pac-crest-still-sees-a-miss-on-iphone-units-in-fyq1/


Processing URLs:  67%|██████▋   | 673/1000 [36:00<03:09,  1.72it/s]

Error extracting text from http://www.sandiegouniontribune.com/news/2016/jan/06/water-use-conservation-drought-jerry-brown/: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/2016/jan/06/water-use-conservation-drought-jerry-brown/
Error extracting text from https://www.imf.org/external/np/tr/2014/tr072414.htm: 403 Client Error: Forbidden for url: https://www.imf.org/external/np/tr/2014/tr072414.htm


Processing URLs:  68%|██████▊   | 677/1000 [36:08<07:06,  1.32s/it]

Error extracting text from http://thehill.com/homenews/news/325350-watergate-reporter-ive-been-saying-for-a-while-now-this-is-a-coverup: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/325350-watergate-reporter-ive-been-saying-for-a-while-now-this-is-a-coverup/


Processing URLs:  68%|██████▊   | 678/1000 [36:09<07:03,  1.31s/it]

Error extracting text from http://www.ibtimes.co.uk/gundremmingen-nuclear-power-plant-bavaria-shut-due-computer-malware-1556893: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/gundremmingen-nuclear-power-plant-bavaria-shut-due-computer-malware-1556893


Processing URLs:  68%|██████▊   | 680/1000 [36:13<08:10,  1.53s/it]

URL filtered: https://twitter.com/RealDonalDrumpf?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  68%|██████▊   | 683/1000 [36:17<08:36,  1.63s/it]

Error extracting text from http://38north.org/2015/08/sohae081915/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  68%|██████▊   | 685/1000 [37:18<1:31:42, 17.47s/it]

Error extracting text from http://ewp.dali.dartmouth.edu/questions/102: HTTPConnectionPool(host='ewp.dali.dartmouth.edu', port=80): Max retries exceeded with url: /questions/102 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303777710>, 'Connection to ewp.dali.dartmouth.edu timed out. (connect timeout=60)'))


Processing URLs:  69%|██████▉   | 692/1000 [37:27<12:44,  2.48s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN10P070: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN10P070


Processing URLs:  69%|██████▉   | 693/1000 [37:30<13:13,  2.59s/it]



Processing URLs:  70%|██████▉   | 695/1000 [37:30<07:20,  1.44s/it]

Error extracting text from https://www.wsj.com/articles/chopper-flight-leaves-venezuelans-mystified-1498673746: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chopper-flight-leaves-venezuelans-mystified-1498673746


Processing URLs:  70%|██████▉   | 697/1000 [37:34<08:51,  1.75s/it]

Error extracting text from https://www.reuters.com/article/us-russia-election-navalny/kremlin-critic-navalny-held-then-released-ahead-of-russian-election-idUSKCN1G618K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-election-navalny/kremlin-critic-navalny-held-then-released-ahead-of-russian-election-idUSKCN1G618K


Processing URLs:  70%|███████   | 700/1000 [37:36<06:41,  1.34s/it]

Error extracting text from http://www.themoscowtimes.com/opinion/article/poroshenko-losing-time-as-discontent-grows/529481.html: 500 Server Error: Internal Server Error for url: https://www.themoscowtimes.com/opinion/article/poroshenko-losing-time-as-discontent-grows/529481.html


Processing URLs:  70%|███████   | 705/1000 [37:44<05:50,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-review-leak-analysis-idUSKCN0X708F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-review-leak-analysis-idUSKCN0X708F
URL filtered: https://twitter.com/mkraju/status/1352637296420524032


Processing URLs:  71%|███████   | 707/1000 [37:44<03:47,  1.29it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/356269-frustrated-senators-demand-cyber-war-strategy-from-trump: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/356269-frustrated-senators-demand-cyber-war-strategy-from-trump/


Processing URLs:  71%|███████   | 709/1000 [37:47<04:41,  1.03it/s]

Error extracting text from https://www.reuters.com/business/energy/nord-stream-2-says-fortuna-vessel-working-final-stage-project-2021-08-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/nord-stream-2-says-fortuna-vessel-working-final-stage-project-2021-08-18/


Processing URLs:  71%|███████▏  | 713/1000 [37:50<02:41,  1.77it/s]

Error extracting text from https://www.yahoo.com/news/congo-leader-claims-interference-amid-vote-delay-tension-163807584.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/congo-leader-claims-interference-amid-vote-delay-tension-163807584.html
Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa-idUSKCN1B22JZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKCN1B22JZ


Processing URLs:  72%|███████▏  | 717/1000 [37:55<05:17,  1.12s/it]

URL filtered: https://twitter.com/KatiaPorzo/status/928042863841173504


Processing URLs:  72%|███████▏  | 722/1000 [38:10<14:59,  3.24s/it]

Error extracting text from http://www.moonexpress.com/about-us/: 404 Client Error: Not Found for url: https://moonexpress.com/about-us/


Processing URLs:  73%|███████▎  | 728/1000 [38:27<09:57,  2.20s/it]

Error extracting text from http://knowledgecenter.csg.org/kc/content/soda-taxes-2014: HTTPConnectionPool(host='knowledgecenter.csg.org', port=80): Max retries exceeded with url: /kc/content/soda-taxes-2014 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30082f8c0>: Failed to resolve 'knowledgecenter.csg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  73%|███████▎  | 729/1000 [39:27<1:28:00, 19.49s/it]

Error extracting text from http://www.newsobserver.com/news/local/coal-ash-issue/article79925257.html: HTTPConnectionPool(host='www.newsobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 730/1000 [39:33<1:08:42, 15.27s/it]

Error extracting text from http://greatsouthernroute.com/weather-routing/east-asian-weather-conditions/: 404 Client Error: Not Found for url: https://greatsouthernroute.com/weather-routing/east-asian-weather-conditions/


Processing URLs:  73%|███████▎  | 732/1000 [39:36<36:57,  8.27s/it]  

Error extracting text from http://www.defensenews.com/story/defense/international/asia-pacific/2016/08/02/south-china-sea-japan-islands/87949994/: 404 Client Error: Not Found for url: https://www.defensenews.com/story/defense/international/asia-pacific/2016/08/02/south-china-sea-japan-islands/87949994/


Processing URLs:  73%|███████▎  | 734/1000 [39:40<22:37,  5.10s/it]

Error extracting text from http://www.pressofatlanticcity.com/news/ap/kerry-to-meet-iranian-fm-ahead-of-imminent-sanctions-relief/article_9a603dd2-640a-53c0-9337-fce25c6aa7fd.html: 404 Client Error: Not Found for url: https://pressofatlanticcity.com/news/ap/kerry-to-meet-iranian-fm-ahead-of-imminent-sanctions-relief/article_9a603dd2-640a-53c0-9337-fce25c6aa7fd.html


Processing URLs:  74%|███████▎  | 735/1000 [39:41<17:29,  3.96s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/brazils-rousseff-tells-senate-impeachment-farce-40388206: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/brazils-rousseff-tells-senate-impeachment-farce-40388206


Processing URLs:  74%|███████▎  | 737/1000 [39:42<09:54,  2.26s/it]

Error extracting text from http://www.wsj.com/articles/talks-on-overhauling-u-k-s-relationship-with-eu-could-extend-beyond-february-1454332467: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/talks-on-overhauling-u-k-s-relationship-with-eu-could-extend-beyond-february-1454332467


Processing URLs:  74%|███████▍  | 740/1000 [39:45<05:25,  1.25s/it]

URL filtered: https://twitter.com/AP/status/1508613375105933320


Processing URLs:  74%|███████▍  | 743/1000 [39:47<04:19,  1.01s/it]

Error extracting text from http://www.reuters.com/article/2015/11/19/us-russia-china-jets-idUSKCN0T80K220151119: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/19/us-russia-china-jets-idUSKCN0T80K220151119


Processing URLs:  74%|███████▍  | 745/1000 [40:51<1:14:40, 17.57s/it]

Error extracting text from http://www.airbusgroup.com/int/en/news-media/corporate-magazine/Forum-88/My-Kind-Of-Flyover.html: HTTPConnectionPool(host='www.airbusgroup.com', port=80): Max retries exceeded with url: /int/en/news-media/corporate-magazine/Forum-88/My-Kind-Of-Flyover.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30082f890>, 'Connection to www.airbusgroup.com timed out. (connect timeout=60)'))


Processing URLs:  75%|███████▍  | 749/1000 [40:55<23:02,  5.51s/it]  

Error extracting text from http://fuelfix.com/blog/2016/11/14/iran-claims-oil-production-surged-month-before-key-opec-decision/: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2016/11/14/iran-claims-oil-production-surged-month-before-key-opec-decision/


Processing URLs:  75%|███████▌  | 751/1000 [40:58<13:31,  3.26s/it]

Error extracting text from http://www.wsj.com/articles/janet-yellen-says-fed-interest-rate-increase-still-likely-this-year-1443128438: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/janet-yellen-says-fed-interest-rate-increase-still-likely-this-year-1443128438


Processing URLs:  75%|███████▌  | 754/1000 [41:02<08:02,  1.96s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-01/11/c_134998808.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-01/11/c_134998808.htm


Processing URLs:  76%|███████▌  | 756/1000 [41:03<04:59,  1.23s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-chemical-weapons-idUSKBN14X1XY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-chemical-weapons-idUSKBN14X1XY


Processing URLs:  76%|███████▌  | 758/1000 [41:04<03:13,  1.25it/s]

Error extracting text from http://www.nytimes.com/2015/10/13/world/middleeast/jason-rezaian-washington-post-conviction-iran.html?emc=edit_th_20151013&amp;nl=todaysheadlines&amp;nlid=45205797: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/13/world/middleeast/jason-rezaian-washington-post-conviction-iran.html?emc=edit_th_20151013&amp;nl=todaysheadlines&amp;nlid=45205797


Processing URLs:  76%|███████▌  | 760/1000 [41:11<09:28,  2.37s/it]

URL filtered: https://www.youtube.com/watch?v=LtHzyDZAp7Y


Processing URLs:  76%|███████▋  | 764/1000 [41:15<05:45,  1.47s/it]

Error extracting text from http://www.newsweek.com/where-do-clinton-and-trump-stand-russia-487777: 403 Client Error: Forbidden for url: https://www.newsweek.com/where-do-clinton-and-trump-stand-russia-487777


Processing URLs:  77%|███████▋  | 767/1000 [41:20<05:51,  1.51s/it]

Error extracting text from http://www.reuters.com/article/safrica-budget-moodys-idUSL8N163505: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/safrica-budget-moodys-idUSL8N163505


Processing URLs:  77%|███████▋  | 768/1000 [41:22<06:03,  1.57s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/hong-kong-leader-cy-leung-says-will-not-run-again/3355058.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/hong-kong-leader-cy-leung-says-will-not-run-again/3355058.html


Processing URLs:  77%|███████▋  | 769/1000 [41:23<05:03,  1.31s/it]

Error extracting text from http://aranews.net/2016/02/islamic-state-militants-shoot-down-french-drone-aircraft-north-syria/: 404 Client Error: Not Found for url: http://aranews.net/2016/02/islamic-state-militants-shoot-down-french-drone-aircraft-north-syria/


Processing URLs:  77%|███████▋  | 771/1000 [41:24<03:56,  1.03s/it]

Error extracting text from http://gcaptain.com/malaysia-100-chinese-boats-in-malaysian-waters-in-south-china-sea/: 403 Client Error: Forbidden for url: http://gcaptain.com/malaysia-100-chinese-boats-in-malaysian-waters-in-south-china-sea/


Processing URLs:  78%|███████▊  | 779/1000 [41:38<05:08,  1.39s/it]

Error extracting text from http://news.usni.org/2015/08/07/report-chinese-navy-warship-likely-rammed-two-vietnamese-fishing-vessels: 403 Client Error: Forbidden for url: http://news.usni.org/2015/08/07/report-chinese-navy-warship-likely-rammed-two-vietnamese-fishing-vessels


Processing URLs:  78%|███████▊  | 780/1000 [41:39<05:31,  1.51s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-proportions.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-proportions.html
URL filtered: http://www.bloomberg.com/news/articles/2016-07-04/china-sentences-ex-presidential-aide-to-life-for-corruption


Processing URLs:  79%|███████▊  | 786/1000 [42:55<1:08:06, 19.10s/it]

Error extracting text from http://m.landlinemag.com/Story.aspx?StoryID=31957#.V-BPHpA8KrU: HTTPConnectionPool(host='m.landlinemag.com', port=80): Max retries exceeded with url: /Story.aspx?StoryID=31957 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303776c90>, 'Connection to m.landlinemag.com timed out. (connect timeout=60)'))


Processing URLs:  79%|███████▉  | 788/1000 [42:58<36:48, 10.42s/it]  

Error extracting text from https://shar.es/1YoYPd: 404 Client Error: Not Found for url: https://shar.es/1YoYPd/


Processing URLs:  79%|███████▉  | 789/1000 [42:59<26:11,  7.45s/it]

URL filtered: http://thehill.com/policy/cybersecurity/356066-popular-twitter-account-claiming-to-belong-to-tennessee-gop-was-run-by


Processing URLs:  79%|███████▉  | 791/1000 [43:02<16:59,  4.88s/it]

Error extracting text from http://38north.org/2017/05/jschilling052417-2/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  79%|███████▉  | 792/1000 [43:03<13:32,  3.91s/it]

Error extracting text from https://finance.yahoo.com/news/japanese-yen-safe-haven-currency-174531285.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/japanese-yen-safe-haven-currency-174531285.html


Processing URLs:  79%|███████▉  | 794/1000 [43:04<08:03,  2.35s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN16R1H4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN16R1H4


Processing URLs:  80%|████████  | 802/1000 [43:27<09:13,  2.80s/it]

Error extracting text from http://in.reuters.com/article/ukraine-imf-idINKCN0Z12LQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  80%|████████  | 803/1000 [43:29<08:12,  2.50s/it]

Error extracting text from https://www.reuters.com/article/mideast-stocks/mideast-stocks-saudi-shares-fall-after-gasoline-price-hike-idUSL8N1OW0C8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-stocks/mideast-stocks-saudi-shares-fall-after-gasoline-price-hike-idUSL8N1OW0C8


Processing URLs:  81%|████████  | 806/1000 [43:30<04:02,  1.25s/it]

Error extracting text from http://seekingalpha.com/article/3995425-tesla-plays-car-owners-shareholders-suckers-cap-ex-slash?lift_email_rec=true: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3995425-tesla-plays-car-owners-shareholders-suckers-cap-ex-slash?lift_email_rec=true


Processing URLs:  81%|████████▏ | 814/1000 [43:39<03:20,  1.08s/it]

Error extracting text from http://www.wsj.com/articles/reports-of-offshore-companies-raise-questions-about-beijings-graft-fight-1459869258: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/reports-of-offshore-companies-raise-questions-about-beijings-graft-fight-1459869258


Processing URLs:  82%|████████▏ | 817/1000 [43:47<05:56,  1.95s/it]

Error extracting text from http://www.nytimes.com/1999/11/16/us/first-direct-observation-of-an-extrasolar-planet.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1999/11/16/us/first-direct-observation-of-an-extrasolar-planet.html


Processing URLs:  82%|████████▏ | 820/1000 [43:51<05:05,  1.70s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2017/01/19/0301000000AEN20170119008651315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  82%|████████▏ | 821/1000 [43:52<03:50,  1.29s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2553.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2553.htm


Processing URLs:  83%|████████▎ | 827/1000 [44:01<04:58,  1.72s/it]

Error extracting text from https://tradingeconomics.com/commodity/steel?user=Analyst42718: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/steel?user=Analyst42718


Processing URLs:  83%|████████▎ | 828/1000 [44:02<03:50,  1.34s/it]

URL filtered: https://www.youtube.com/watch?v=JwhyoUyJNoY


Processing URLs:  83%|████████▎ | 832/1000 [44:05<02:44,  1.02it/s]



Processing URLs:  84%|████████▍ | 838/1000 [44:23<04:04,  1.51s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN17Q023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN17Q023
Error extracting text from http://www.basnews.com/index.php/en/news/iraq/310981: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/310981
URL filtered: https://twitter.com/INechepurenko/status/796321692863909888?ref_src=twsrc%5Etfw


Processing URLs:  84%|████████▍ | 842/1000 [44:27<03:20,  1.27s/it]

Error extracting text from http://icasualties.org/IRAQ/index.aspx: 404 Client Error: Not Found for url: http://icasualties.org/IRAQ/index.aspx


Processing URLs:  84%|████████▍ | 845/1000 [44:33<04:46,  1.85s/it]

Error extracting text from http://www.cesim.fr/observatoire/eng/86/article/182: HTTPSConnectionPool(host='www.cesim.fr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  85%|████████▍ | 846/1000 [44:34<04:20,  1.69s/it]

Error extracting text from https://www.daserste.de/information/reportage-dokumentation/dokus/sendung/das-humboldt-forum-ein-schloss-fuer-berlin-und-die-welt-100.html: 404 Client Error: Not Found for url: https://www.daserste.de/information/reportage-dokumentation/dokus/sendung/das-humboldt-forum-ein-schloss-fuer-berlin-und-die-welt-100.html


Processing URLs:  85%|████████▍ | 847/1000 [44:37<04:53,  1.92s/it]

Error extracting text from http://www.naturalgaseurope.com/iran-says-will-reclaim-full-oil-market-share-post-sanctions-29706: HTTPSConnectionPool(host='www.naturalgaseurope.com', port=443): Max retries exceeded with url: /iran-says-will-reclaim-full-oil-market-share-post-sanctions-29706 (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.naturalgaseurope.com'. (_ssl.c:1000)")))


Processing URLs:  85%|████████▌ | 850/1000 [44:40<03:17,  1.31s/it]

Error extracting text from http://www.nytimes.com/2016/08/20/world/middleeast/russia-syria-mediterranean-missiles.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/20/world/middleeast/russia-syria-mediterranean-missiles.html
URL filtered: https://en.wikipedia.org/wiki/Facebook


Processing URLs:  86%|████████▌ | 861/1000 [44:54<02:08,  1.08it/s]

Error extracting text from https://www.nytimes.com/2021/12/03/podcasts/transcript-ezra-klein-podcast-philip-tetlock.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/03/podcasts/transcript-ezra-klein-podcast-philip-tetlock.html
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKBN16Z005: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKBN16Z005


Processing URLs:  86%|████████▌ | 862/1000 [44:54<01:34,  1.46it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-mosul-exclusive-idUSKCN12314Z?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-mosul-exclusive-idUSKCN12314Z?il=0


Processing URLs:  87%|████████▋ | 866/1000 [45:09<06:46,  3.04s/it]

Error extracting text from http://www.nytimes.com/2016/03/09/world/asia/south-china-sea-militarization.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/09/world/asia/south-china-sea-militarization.html


Processing URLs:  87%|████████▋ | 870/1000 [45:27<08:06,  3.74s/it]

URL filtered: https://twitter.com/navalny/status/966291236620562434


Processing URLs:  87%|████████▋ | 872/1000 [45:28<04:37,  2.17s/it]

Error extracting text from http://www.panorama.com.ve/politicayeconomia/Presidente-Maduro-decreto-el-refinanciamiento-de-la-deuda-externa-20171102-0081.html: HTTPConnectionPool(host='www.panorama.com.ve', port=80): Max retries exceeded with url: /politicayeconomia/Presidente-Maduro-decreto-el-refinanciamiento-de-la-deuda-externa-20171102-0081.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303774e30>: Failed to resolve 'www.panorama.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  87%|████████▋ | 873/1000 [45:29<03:56,  1.86s/it]

Error extracting text from https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=en&amp;reference=2020/0381(NLE): 404 Client Error: Not Found for url: https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=en&amp;reference=2020/0381(NLE)


Processing URLs:  88%|████████▊ | 878/1000 [45:41<05:26,  2.68s/it]

Error extracting text from https://www.cms.gov/files/document/cms-omnibus-covid-19-health-care-staff-vaccination-requirements-2021.pdf: 403 Client Error: Forbidden for url: https://www.cms.gov/files/document/cms-omnibus-covid-19-health-care-staff-vaccination-requirements-2021.pdf


Processing URLs:  88%|████████▊ | 879/1000 [45:43<04:45,  2.36s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-05-16/china-s-land-grab-in-bhutan-is-the-new-face-of-war


Processing URLs:  88%|████████▊ | 882/1000 [45:47<03:53,  1.98s/it]

Error extracting text from http://www.monster.com/jobs/c-atieva.aspx: 404 Client Error: Not Found for url: https://www.monster.com/jobs/c-atieva


Processing URLs:  88%|████████▊ | 883/1000 [45:48<03:02,  1.56s/it]

Error extracting text from https://www.rand.org/blog/2020/12/taiwan-may-be-more-at-ease-with-the-biden-administration.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2020/12/taiwan-may-be-more-at-ease-with-the-biden-administration.html


Processing URLs:  89%|████████▊ | 886/1000 [45:51<02:29,  1.32s/it]

Error extracting text from http://www.ibtimes.co.uk/eu-referendum-lib-dem-leader-tim-farron-warns-pressures-nhs-after-brexit-1543112: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/eu-referendum-lib-dem-leader-tim-farron-warns-pressures-nhs-after-brexit-1543112


Processing URLs:  89%|████████▊ | 887/1000 [45:52<02:33,  1.36s/it]

Error extracting text from https://www.enca.com/south-africa/anc-nec-called-to-resign: 404 Client Error: Not Found for url: https://www.enca.com/south-africa/anc-nec-called-to-resign
URL filtered: https://twitter.com/c_drosten/status/1380058204927885314


Processing URLs:  89%|████████▉ | 890/1000 [46:03<04:38,  2.53s/it]

Error extracting text from http://hprc-online.org/mind-body/files/chandler-motion-sickness: 404 Client Error: Not Found for url: https://www.hprc-online.org/mind-body/files/chandler-motion-sickness


Processing URLs:  89%|████████▉ | 893/1000 [46:06<03:05,  1.73s/it]

Error extracting text from http://www.universetoday.com/128815/spacex-maiden-falcon-heavy-launch-may-carry-satellite-in-november/: 503 Server Error: Service Unavailable for url: https://www.universetoday.com/128815/spacex-maiden-falcon-heavy-launch-may-carry-satellite-in-november/


Processing URLs:  89%|████████▉ | 894/1000 [46:07<02:31,  1.43s/it]

Error extracting text from http://www.france24.com/en/20170123-syria-rebel-groups-refuse-face-face-meeting-astana-peace-talks: 403 Client Error: Forbidden for url: http://www.france24.com/en/20170123-syria-rebel-groups-refuse-face-face-meeting-astana-peace-talks
URL filtered: http://www.bloomberg.com/news/articles/2015-10-13/fed-s-tarullo-says-he-doesn-t-currently-back-a-2015-rate-rise


Processing URLs:  90%|████████▉ | 897/1000 [46:08<01:23,  1.24it/s]

URL filtered: https://twitter.com/schoeneggerphil


Processing URLs:  90%|████████▉ | 899/1000 [47:08<18:16, 10.85s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-08-02/south-china-sea-trespassers-beware-china-warns: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  90%|█████████ | 904/1000 [47:12<05:35,  3.49s/it]

Error extracting text from https://www.nbcnews.com/politics/white-house/joe-manchin-comes-out-against-neera-tanden-biden-s-omb-n1258387: 403 Client Error: Forbidden for url: https://www.nbcnews.com/politics/white-house/joe-manchin-comes-out-against-neera-tanden-biden-s-omb-n1258387
Error extracting text from https://www.reuters.com/article/us-olympics-2022-canada-idUSKBN2AH02U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-olympics-2022-canada-idUSKBN2AH02U


Processing URLs:  91%|█████████ | 906/1000 [47:13<03:41,  2.35s/it]

Error extracting text from https://t.co/XiV49uKhXz: 400 Client Error: Bad Request for url: https://twitter.com/JulianRoepcke/status/736098029476798465/photo/1


Processing URLs:  91%|█████████ | 907/1000 [47:15<03:24,  2.20s/it]

Error extracting text from http://www.reuters.com/article/us-belgium-blast-nuclear-idUSKCN0WS09E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-belgium-blast-nuclear-idUSKCN0WS09E


Processing URLs:  91%|█████████ | 909/1000 [47:22<03:56,  2.60s/it]

URL filtered: http://nymag.com/selectall/2017/10/did-russias-facebook-ads-actually-swing-the-election.html


Processing URLs:  91%|█████████▏| 914/1000 [47:30<02:59,  2.09s/it]

URL filtered: https://en.wikipedia.org/wiki/Censorship_of_YouTube#.C2.A0Iran


Processing URLs:  92%|█████████▏| 917/1000 [47:37<03:13,  2.33s/it]

Error extracting text from http://www.newstrib.com/free/senate-to-start-voting-on-package-to-help-end-budget/article_ddd1c6d6-e30a-11e6-b343-97bdfa6c4e5d.html: 404 Client Error: Not Found for url: https://www.shawlocal.com/news-tribune/free/senate-to-start-voting-on-package-to-help-end-budget/article_ddd1c6d6-e30a-11e6-b343-97bdfa6c4e5d.html
Error extracting text from http://bigstory.ap.org/article/cc6c0d7c92f8438698246f30bc61c3fa/kerry-says-russia-syria-should-face-war-crimes-probe: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/cc6c0d7c92f8438698246f30bc61c3fa/kerry-says-russia-syria-should-face-war-crimes-probe (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301ecc500>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▏| 921/1000 [47:42<02:03,  1.56s/it]

Error extracting text from http://www.rokdrop.net/2015/09/north-korea-has-not-notified-international-authorities-about-rocket-launch/: 406 Client Error: Not Acceptable for url: http://www.rokdrop.net/2015/09/north-korea-has-not-notified-international-authorities-about-rocket-launch/


Processing URLs:  92%|█████████▏| 922/1000 [47:42<01:35,  1.23s/it]

Error extracting text from https://www.nytimes.com/2021/03/01/world/europe/navalny-prison-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/01/world/europe/navalny-prison-russia.html


Processing URLs:  93%|█████████▎| 926/1000 [47:50<02:15,  1.83s/it]

Error extracting text from http://www.nationalreview.com/article/436223/kelly-ayotte-nh-senate-race-donald-trump-complicates-reelection-chances: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/436223/kelly-ayotte-nh-senate-race-donald-trump-complicates-reelection-chances/


Processing URLs:  93%|█████████▎| 929/1000 [47:54<01:38,  1.38s/it]

Error extracting text from http://thehill.com/homenews/administration/366148-trump-signs-tax-bill-into-law: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/366148-trump-signs-tax-bill-into-law/
Error extracting text from https://www.reuters.com/world/europe/wife-kremlin-critic-navalny-delivers-borsch-cherries-3-day-prison-visit-2021-08-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/wife-kremlin-critic-navalny-delivers-borsch-cherries-3-day-prison-visit-2021-08-05/


Processing URLs:  93%|█████████▎| 933/1000 [48:03<01:49,  1.64s/it]

Error extracting text from https://www.confidencial.com.ni/english/daniel-ortega-breaches-transparency-agreements-on-covid-19-with-imf-wb-and-idb/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/english/daniel-ortega-breaches-transparency-agreements-on-covid-19-with-imf-wb-and-idb/


Processing URLs:  94%|█████████▎| 936/1000 [48:06<01:15,  1.18s/it]

Error extracting text from http://www.reuters.com/article/2015/10/02/brazil-rousseff-idUSL1N12216720151002: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/02/brazil-rousseff-idUSL1N12216720151002


Processing URLs:  94%|█████████▍| 938/1000 [48:10<01:29,  1.44s/it]

URL filtered: http://www.reuters.com/article/us-iran-nuclear-usa-sanctions-idUSKBN13A2WZ?feedType=RSS&amp;feedName=Iran&amp;virtualBrandChannel=10209&amp;utm_source=dlvr.it&amp;utm_medium=twitter


Processing URLs:  94%|█████████▍| 940/1000 [48:10<00:57,  1.04it/s]

Error extracting text from https://www.nytimes.com/2017/07/23/world/europe/trump-putin-sanctions-hacking.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/23/world/europe/trump-putin-sanctions-hacking.html
Error extracting text from http://www.nytimes.com/aponline/2016/05/06/world/europe/ap-eu-montenegro-nato.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/05/06/world/europe/ap-eu-montenegro-nato.html


Processing URLs:  95%|█████████▌| 952/1000 [48:27<01:10,  1.46s/it]

Error extracting text from https://cvppindia.com/tendersoft/: 404 Client Error: Not Found for url: https://www.cvppindia.com/tendersoft/


Processing URLs:  96%|█████████▌| 956/1000 [48:37<01:23,  1.89s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-02-19/u-s-said-to-anticipate-mosul-offensive-with-20-000-iraqi-troops


Processing URLs:  96%|█████████▌| 958/1000 [48:37<00:45,  1.07s/it]

Error extracting text from https://www.opensecrets.org/races/summary.php?cycle=2016&amp;id=PAS1: 403 Client Error: Forbidden for url: https://www.opensecrets.org/races/summary.php?cycle=2016&amp;id=PAS1


Processing URLs:  96%|█████████▌| 960/1000 [48:43<01:17,  1.93s/it]

Error extracting text from http://www.americansunitedforchange.org/page/-/WisconsinResults616.pdf: 404 Client Error: Not Found for url: http://www.americansunitedforchange.org/page/-/WisconsinResults616.pdf


Processing URLs:  96%|█████████▌| 961/1000 [48:46<01:23,  2.14s/it]

Error extracting text from http://38north.org/category/01-wmd/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  96%|█████████▌| 962/1000 [48:47<01:09,  1.83s/it]

Error extracting text from http://www.hitc.com/en-gb/2016/07/18/nicola-sturgeon-says-she-would-consider-2017-scottish-referendum/: 410 Client Error: Gone for url: https://www.hitc.com/en-gb/business/


Processing URLs:  96%|█████████▋| 965/1000 [48:51<00:50,  1.46s/it]

Error extracting text from http://www.newsweek.com/2016/09/02/saudia-arabia-iran-cold-war-492880.html: 403 Client Error: Forbidden for url: https://www.newsweek.com/2016/09/02/saudia-arabia-iran-cold-war-492880.html


Processing URLs:  97%|█████████▋| 966/1000 [48:51<00:37,  1.11s/it]

Error extracting text from https://www.scientificamerican.com/article/new-hominin-species/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/new-hominin-species/


Processing URLs:  97%|█████████▋| 968/1000 [48:52<00:25,  1.25it/s]

Error extracting text from http://www.fao.org/news/story/en/item/1325054/icode/: 404 Client Error: Not Found for url: https://www.fao.org/news/story/en/item/1325054/icode/
Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement


Processing URLs:  97%|█████████▋| 969/1000 [48:52<00:18,  1.65it/s]

Error extracting text from https://www.congress.gov/search?pageSort=latestAction:asc&amp;q={%22source%22:%22legislation%22,%22type%22:%22bills%22,%22bill-status%22:%22law%22,%22congress%22:[%22107%22]}&amp;searchResultViewType=expanded: 403 Client Error: Forbidden for url: https://www.congress.gov/search?pageSort=latestAction:asc&amp;q=%7B%22source%22:%22legislation%22,%22type%22:%22bills%22,%22bill-status%22:%22law%22,%22congress%22:%5B%22107%22%5D%7D&amp;searchResultViewType=expanded


Processing URLs:  98%|█████████▊| 975/1000 [49:00<00:22,  1.09it/s]

Error extracting text from https://www.thelocal.de/20220401/german-consumers-to-be-hit-by-further-price-hikes-in-supermarkets/: 403 Client Error: Forbidden for url: https://www.thelocal.de/20220401/german-consumers-to-be-hit-by-further-price-hikes-in-supermarkets


Processing URLs:  98%|█████████▊| 978/1000 [49:05<00:29,  1.35s/it]

Error extracting text from https://www.reuters.com/world/uk/uk-seems-set-invoke-emergency-measures-nireland-trade-irish-minister-2021-11-07/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/uk-seems-set-invoke-emergency-measures-nireland-trade-irish-minister-2021-11-07/


Processing URLs:  99%|█████████▊| 986/1000 [49:14<00:16,  1.21s/it]

Error extracting text from https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/clyburn-statement: 403 Client Error: Forbidden for url: https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/clyburn-statement


Processing URLs:  99%|█████████▉| 988/1000 [49:20<00:25,  2.11s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-23/saudi-arabia-2-trillion-aramco-vision-runs-into-market-reality


Processing URLs:  99%|█████████▉| 994/1000 [49:26<00:07,  1.19s/it]

Error extracting text from http://thehill.com/blogs/floor-action/senate/258629-senate-approves-budget-deal-in-overnight-vote: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/258629-senate-approves-budget-deal-in-overnight-vote/
Error extracting text from http://www.reuters.com/article/us-usa-cyber-obama-idUSKCN0XA2O1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-obama-idUSKCN0XA2O1


Processing URLs: 100%|█████████▉| 996/1000 [49:27<00:03,  1.00it/s]

Error extracting text from http://thehill.com/homenews/campaign/362111-doug-jones-running-10-times-as-many-tv-ads-as-roy-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/362111-doug-jones-running-10-times-as-many-tv-ads-as-roy-moore/


Processing URLs: 100%|█████████▉| 997/1000 [49:28<00:02,  1.26it/s]

Error extracting text from http://www.wsj.com/articles/an-old-leader-faces-new-threats-in-zimbabwe-1471820903: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/an-old-leader-faces-new-threats-in-zimbabwe-1471820903


Processing URLs: 100%|██████████| 1000/1000 [49:35<00:00,  2.98s/it]
Processing URLs:   1%|          | 6/1000 [00:11<36:07,  2.18s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-02/traders-don-t-see-fed-moving-until-at-least-march-futures-show


Processing URLs:   1%|          | 10/1000 [00:18<28:11,  1.71s/it]

Error extracting text from http://www.nycaribnews.com/latest-news/us-legislators-want-elections-be-held-haiti-scheduled: 404 Client Error: Not Found for url: https://nycaribnews.com/latest-news/us-legislators-want-elections-be-held-haiti-scheduled
Error extracting text from https://www.rottentomatoes.com/m/belfast: 403 Client Error: Forbidden for url: https://www.rottentomatoes.com/m/belfast


Processing URLs:   2%|▏         | 15/1000 [00:22<14:36,  1.12it/s]

Error extracting text from https://apple.news/A5WznCDEiNOKFFXceVSsz7A: 404 Client Error: Not Found for url: https://apple.news/A5WznCDEiNOKFFXceVSsz7A


Processing URLs:   2%|▏         | 18/1000 [00:27<24:10,  1.48s/it]

Error extracting text from http://soufangroup.com/tsg-intelbrief-irans-expanding-strategic-reach/: 404 Client Error: Not Found for url: https://www.soufangroup.com/tsg-intelbrief-irans-expanding-strategic-reach/


Processing URLs:   2%|▏         | 20/1000 [00:30<23:04,  1.41s/it]

Error extracting text from https://journals.sagepub.com/doi/abs/10.1177/0959683614557574: 403 Client Error: Forbidden for url: https://journals.sagepub.com/doi/abs/10.1177/0959683614557574
URL filtered: http://www.bloomberg.com/news/articles/2016-02-08/emissions-lies-haunt-vw-as-it-faces-tougher-justice-department


Processing URLs:   2%|▏         | 24/1000 [00:37<31:00,  1.91s/it]

Error extracting text from http://atimes.com/2016/08/turkey-russia-take-more-steps-for-reconciliation-ahead-of-erdogan-putin-meeting/: 404 Client Error: Not Found for url: https://atimes.com/2016/08/turkey-russia-take-more-steps-for-reconciliation-ahead-of-erdogan-putin-meeting/


Processing URLs:   3%|▎         | 27/1000 [00:40<21:33,  1.33s/it]

Error extracting text from http://uk.reuters.com/article/uk-poland-politics-eu-idUKKCN105152: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   3%|▎         | 33/1000 [00:49<21:50,  1.36s/it]

Error extracting text from http://www.gov.me/en/News/159368/reception-to-mark-the-beginning-of-the-process-of-Montenegro-s-accession-to-NATO.html: 404 Client Error: not found for url: https://www.gov.me/en/News/159368/reception-to-mark-the-beginning-of-the-process-of-Montenegro-s-accession-to-NATO.html


Processing URLs:   4%|▎         | 35/1000 [00:55<33:57,  2.11s/it]

Error extracting text from http://en.trend.az/iran/politics/2438747.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2438747.html


Processing URLs:   4%|▎         | 36/1000 [00:58<34:13,  2.13s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-september-14-2015: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-september-14-2015


Processing URLs:   4%|▍         | 38/1000 [00:59<21:01,  1.31s/it]

Error extracting text from https://www.middleeastmonitor.com/20160903-turkey-to-normalise-ties-with-syria-and-egypt/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20160903-turkey-to-normalise-ties-with-syria-and-egypt/


Processing URLs:   4%|▍         | 39/1000 [00:59<15:49,  1.01it/s]

Error extracting text from https://www.nytimes.com/2017/03/21/climate/trump-climate-change.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/21/climate/trump-climate-change.html?_r=0
URL filtered: https://www.bloomberg.com/news/articles/2018-01-18/u-s-said-to-be-losing-patience-as-key-nafta-issues-unresolved


Processing URLs:   5%|▍         | 48/1000 [02:06<3:14:49, 12.28s/it]

Error extracting text from http://www.miamiherald.com/news/local/news-columns-blogs/andres-oppenheimer/article79861802.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.foxnews.com/politics/2017/01/02/spicer-trump-has-own-intel-obama-administration-reports-not-gospel.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/politics/2017/01/02/spicer-trump-has-own-intel-obama-administration-reports-not-gospel.html
Error extracting text from http://iraq.usembassy.gov/022816.htm: HTTPConnectionPool(host='iraq.usembassy.gov', port=80): Max retries exceeded with url: /022816.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300c90b00>: Failed to resolve 'iraq.usembassy.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 50/1000 [02:08<1:56:42,  7.37s/it]

Error extracting text from http://www.nato.int/cps/ar/natolive/topics_49212.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/ar/natolive/topics_49212.htm


Processing URLs:   5%|▌         | 52/1000 [02:09<1:13:27,  4.65s/it]

Error extracting text from http://aranews.net/2016/03/war-isis-intensifies-kurdish-peshmerga-forces-receive-first-shipment-u-s-weapons/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/war-isis-intensifies-kurdish-peshmerga-forces-receive-first-shipment-u-s-weapons/


Processing URLs:   5%|▌         | 53/1000 [02:10<1:01:59,  3.93s/it]

Error extracting text from http://vision2030.gov.sa/sites/default/files/NTP_En.pdf: 404 Client Error: Not Found for url: https://www.vision2030.gov.sa/sites/default/files/ntp_en.pdf


Processing URLs:   6%|▌         | 55/1000 [02:12<40:35,  2.58s/it]  

Error extracting text from http://www.reuters.com/article/us-europe-migrants-schengen-idUSKCN0XV174: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-schengen-idUSKCN0XV174


Processing URLs:   6%|▋         | 64/1000 [02:27<17:37,  1.13s/it]

Error extracting text from http://debate-central.ncpa.org/wp-content/uploads/2016/01/January-2016-PF-topic-analysis.pdf: HTTPConnectionPool(host='debate-central.ncpa.org', port=80): Max retries exceeded with url: /wp-content/uploads/2016/01/January-2016-PF-topic-analysis.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffa505f0>: Failed to resolve 'debate-central.ncpa.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-usa-election-saudi-iran-idUSKBN1361SS?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-saudi-iran-idUSKBN1361SS?il=0
URL filtered: https://www.youtube.com/watch?v=Ednq_vKPdQE


Processing URLs:   7%|▋         | 71/1000 [02:33<14:01,  1.10it/s]

Error extracting text from http://townhall.com/tipsheet/katiepavlich/2016/04/13/scott-walker-literally-laughs-out-loud-over-trump-vp-suggestion-n2147620: 403 Client Error: Forbidden for url: https://townhall.com/tipsheet/katiepavlich/2016/04/13/scott-walker-literally-laughs-out-loud-over-trump-vp-suggestion-n2147620


Processing URLs:   7%|▋         | 72/1000 [02:33<11:21,  1.36it/s]

Error extracting text from http://budget.house.gov/budgetprocess/budgettimetable.htm: 403 Client Error: Forbidden for url: http://budget.house.gov/budgetprocess/budgettimetable.htm


Processing URLs:   8%|▊         | 76/1000 [02:40<20:43,  1.35s/it]

Error extracting text from http://www.citizen.co.za/910743/moodys-downgrades-outlook-sa-credit-rating/: 404 Client Error: Not Found for url: https://www.citizen.co.za/moodys-downgrades-outlook-sa-credit-rating/


Processing URLs:   8%|▊         | 79/1000 [02:44<17:50,  1.16s/it]

Error extracting text from https://www.newsweek.com/eu-lawmakers-ignore-china-protests-seek-stronger-taiwan-ties-1625318: 403 Client Error: Forbidden for url: https://www.newsweek.com/eu-lawmakers-ignore-china-protests-seek-stronger-taiwan-ties-1625318
Error extracting text from http://www.reuters.com/article/us-philippines-usa-defence-idUSKBN15A18Z?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-usa-defence-idUSKBN15A18Z?il=0


Processing URLs:   8%|▊         | 83/1000 [02:51<25:37,  1.68s/it]

URL filtered: https://www.bbc.co.uk/news/live/uk-59716357?ns_mchannel=social&amp;ns_source=twitter&amp;ns_campaign=bbc_live&amp;ns_linkname=61bee35c99991e76caa3060a%26%27It%20may%20be%20too%20late%20to%20react%27%20-%20Sajid%20Javid%262021-12-19T08%3A19%3A30.182Z&amp;ns_fee=0&amp;pinned_post_locator=urn:asset:a1d7b3fa-14ee-43dc-a3a1-fa7c95668082&amp;pinned_post_asset_id=61bee35c99991e76caa3060a&amp;pinned_post_type=share.


Processing URLs:   9%|▉         | 90/1000 [03:01<22:42,  1.50s/it]

Error extracting text from https://thinkprogress.org/trump-paris-agreement-speed-up-cancel-ceb106ff9661: 403 Client Error: Forbidden for url: https://thinkprogress.org/trump-paris-agreement-speed-up-cancel-ceb106ff9661


Processing URLs:   9%|▉         | 91/1000 [03:02<19:21,  1.28s/it]

URL filtered: https://twitter.com/RichardHanania/status/1475545822808870912


Processing URLs:  10%|▉         | 96/1000 [03:12<25:28,  1.69s/it]

Error extracting text from http://ukraine.csis.org/#494: HTTPConnectionPool(host='ukraine.csis.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3024e55e0>: Failed to resolve 'ukraine.csis.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|█         | 100/1000 [03:14<13:19,  1.13it/s]

Error extracting text from http://mobile.nytimes.com/aponline/2015/08/20/world/middleeast/ap-iran-nuclear-qa.html?_r=0&amp;referrer=: 403 Client Error: Forbidden for url: https://www.nytimes.com/aponline/2015/08/20/world/middleeast/ap-iran-nuclear-qa.html?_r=0&amp;referrer=
Error extracting text from http://www.balkaninsight.com/en/article/veteran-leader-eyes-fresh-mandate-in-montenegro-12-31-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/veteran-leader-eyes-fresh-mandate-in-montenegro-12-31-2015


Processing URLs:  10%|█         | 101/1000 [04:15<4:36:17, 18.44s/it]

Error extracting text from http://www.debka.com/article/24902/Russian-marines-join-Hizballah-in-first-Syrian-battle-%E2%80%93-a-danger-signal-for-US-Israel: HTTPConnectionPool(host='www.debka.com', port=80): Max retries exceeded with url: /article/24902/Russian-marines-join-Hizballah-in-first-Syrian-battle-%E2%80%93-a-danger-signal-for-US-Israel (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3024e53a0>, 'Connection to www.debka.com timed out. (connect timeout=60)'))


Processing URLs:  11%|█         | 107/1000 [04:26<55:17,  3.72s/it]  

Error extracting text from http://www.nytimes.com/2016/01/19/us/politics/hillary-clinton-readies-for-a-long-slog-against-bernie-sanders.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/us/politics/hillary-clinton-readies-for-a-long-slog-against-bernie-sanders.html


Processing URLs:  11%|█         | 109/1000 [04:44<1:44:28,  7.04s/it]

Error extracting text from http://www.who.int/classifications/icd/revision/en/: 404 Client Error: Not Found for url: https://www.who.int/classifications/classification-of-diseases/groups-that-were-involved-in-icd-11-revision-process


Processing URLs:  11%|█▏        | 113/1000 [04:49<40:11,  2.72s/it]  

Error extracting text from https://www.reuters.com/article/kenya-tourism/kenyas-tourism-revenues-jump-20-pct-in-2017-idUSL8N1PY5PP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/kenya-tourism/kenyas-tourism-revenues-jump-20-pct-in-2017-idUSL8N1PY5PP


Processing URLs:  12%|█▏        | 117/1000 [04:55<24:02,  1.63s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/gdp-growth/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/gdp-growth/forecast


Processing URLs:  12%|█▏        | 119/1000 [04:56<15:57,  1.09s/it]

Error extracting text from http://www.wsj.com/articles/terrorists-north-korea-to-dominate-nuclear-summit-agenda-1459418403: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/terrorists-north-korea-to-dominate-nuclear-summit-agenda-1459418403


Processing URLs:  12%|█▏        | 120/1000 [04:57<15:37,  1.06s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKBN16H0O6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  12%|█▏        | 122/1000 [04:59<14:08,  1.03it/s]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S016920701400106X: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S016920701400106X


Processing URLs:  12%|█▎        | 125/1000 [05:02<12:32,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-china-philippines-court-idUSKCN0ZG05S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-philippines-court-idUSKCN0ZG05S
URL filtered: http://www.bloomberg.com/news/articles/2016-07-04/venezuela-refuses-to-default-few-people-seem-to-understand-why


Processing URLs:  13%|█▎        | 128/1000 [05:05<14:11,  1.02it/s]

Error extracting text from http://industrialcontrolsecuritynuclear.com/: HTTPConnectionPool(host='industrialcontrolsecuritynuclear.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304402d80>: Failed to resolve 'industrialcontrolsecuritynuclear.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  13%|█▎        | 131/1000 [05:06<09:32,  1.52it/s]

Error extracting text from https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.12282: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.12282
Error extracting text from http://www.nbcnews.com/news/us-news/trump-campaign-associate-carter-page-revealed-target-russian-spies-n742356: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/us-news/trump-campaign-associate-carter-page-revealed-target-russian-spies-n742356


Processing URLs:  14%|█▎        | 135/1000 [05:10<11:29,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-afghanistan-parliament-idUSKBN13806Z?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-parliament-idUSKBN13806Z?il=0


Processing URLs:  14%|█▎        | 137/1000 [05:15<20:44,  1.44s/it]

Error extracting text from http://europe.newsweek.com/will-there-be-coup-against-erdogan-turkey-439181?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/will-there-be-coup-against-erdogan-turkey-439181


Processing URLs:  14%|█▍        | 141/1000 [05:23<21:19,  1.49s/it]

Error extracting text from http://www.wsj.com/articles/former-member-of-suspended-brazilian-president-dilma-rousseffs-administration-arrested-1466692045: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/former-member-of-suspended-brazilian-president-dilma-rousseffs-administration-arrested-1466692045


Processing URLs:  14%|█▍        | 142/1000 [05:23<15:56,  1.12s/it]

Error extracting text from http://www.wsj.com/articles/pdvsa-offers-debt-exchange-to-service-providers-1464051479: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pdvsa-offers-debt-exchange-to-service-providers-1464051479


Processing URLs:  14%|█▍        | 144/1000 [05:26<22:09,  1.55s/it]

URL filtered: http://greece.greekreporter.com/2016/06/08/bloomberg-greek-finmin-admits-grexit-still-on-table/


Processing URLs:  15%|█▍        | 148/1000 [05:31<16:43,  1.18s/it]

Error extracting text from http://auto.economictimes.indiatimes.com/news/auto-technology/toyota-finally-tames-lithium-ion-battery-technology/55153819: 404 Client Error: Not Found for url: https://auto.economictimes.indiatimes.com/news/auto-technology/toyota-finally-tames-lithium-ion-battery-technology/55153819


Processing URLs:  15%|█▍        | 149/1000 [05:31<15:15,  1.08s/it]

Error extracting text from http://fn.dealogic.com/fn/MARank.htm: 404 Client Error: Not Found for url: http://fn.dealogic.com/fn/MARank.htm


Processing URLs:  15%|█▌        | 151/1000 [05:43<41:59,  2.97s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-nato-idUSKCN0W212Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-nato-idUSKCN0W212Y


Processing URLs:  15%|█▌        | 153/1000 [05:46<28:44,  2.04s/it]

Error extracting text from http://allafrica.com/stories/201602031652.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201602031652.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffa50530>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  16%|█▌        | 155/1000 [05:52<33:43,  2.39s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-08-18/rousseff-offers-rare-mea-culpa-on-eve-of-brazil-impeachment-vote


Processing URLs:  16%|█▌        | 158/1000 [05:53<17:53,  1.27s/it]

URL filtered: https://twitter.com/elonmusk/status/752182992982843392?ref_src=twsrc%5Etfw


Processing URLs:  16%|█▌        | 160/1000 [05:54<12:07,  1.16it/s]

Error extracting text from http://www.nbcnews.com/news/world/china-raise-defense-spending-trump-seeks-boost-u-s-s-n729051: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/world/china-raise-defense-spending-trump-seeks-boost-u-s-s-n729051


Processing URLs:  16%|█▌        | 162/1000 [05:55<09:17,  1.50it/s]

Error extracting text from http://www.politico.com/tipsheets/morning-trade/2015/09/whats-coming-: 404 Client Error: Not Found for url: https://www.politico.com/tipsheets/morning-trade/2015/09/whats-coming-


Processing URLs:  16%|█▋        | 164/1000 [05:56<09:58,  1.40it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/267275-senate-dem-urges-action-on-north-korea-cyber-sanctions-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/267275-senate-dem-urges-action-on-north-korea-cyber-sanctions-bill/


Processing URLs:  17%|█▋        | 166/1000 [05:59<13:09,  1.06it/s]

Error extracting text from http://www.bfna.org/sites/default/files/publications/BBrief-TTIP%20Round%2011%20%28Nov%2012%202015%29_0.pdf: 404 Client Error: Not Found for url: https://www.bfna.org/sites/default/files/publications/BBrief-TTIP%20Round%2011%20%28Nov%2012%202015%29_0.pdf


Processing URLs:  17%|█▋        | 171/1000 [06:04<12:08,  1.14it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-fourth-term-in-doubt-as-german-coalition-talks-fail-idUSKBN1DJ0I3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-fourth-term-in-doubt-as-german-coalition-talks-fail-idUSKBN1DJ0I3?il=0
URL filtered: https://www.protocol.com/facebook-crypto-diem-novi


Processing URLs:  17%|█▋        | 174/1000 [06:07<10:46,  1.28it/s]

URL filtered: https://www.nytimes.com/2021/06/22/technology/amazon-apple-google-facebook-antitrust-bills.html


Processing URLs:  18%|█▊        | 178/1000 [06:41<1:04:44,  4.73s/it]

Error extracting text from https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2017/01/global-automotive-executive-survey-2017.pdf: PyCryptodome is required for AES algorithm


Processing URLs:  18%|█▊        | 184/1000 [06:51<31:36,  2.32s/it]  

Error extracting text from http://www.infosoc.iis.ru/content/2016/201603_abstracts_eng.html: 404 Client Error: Not Found for url: http://infosoc.iis.ru/content/2016/201603_abstracts_eng.html


Processing URLs:  19%|█▉        | 188/1000 [07:57<4:19:37, 19.18s/it]

Error extracting text from http://www.itv.com/news/2016-06-30/theresa-may-launches-tory-leadership-bid/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  19%|█▉        | 190/1000 [08:00<2:15:43, 10.05s/it]

Error extracting text from http://www.latimes.com/opinion/editorials/la-ed-0221-myanmar-20160220-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/editorials/la-ed-0221-myanmar-20160220-story.html


Processing URLs:  20%|█▉        | 195/1000 [08:16<43:29,  3.24s/it]  

Error extracting text from http://www.reuters.com/article/us-autos-leeco-idUSKCN12J2MC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-autos-leeco-idUSKCN12J2MC


Processing URLs:  20%|█▉        | 199/1000 [08:17<14:02,  1.05s/it]

Error extracting text from https://www.tandfonline.com/doi/abs/10.1080/19434472.2020.1762108?journalCode=rirt20: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/abs/10.1080/19434472.2020.1762108?journalCode=rirt20
URL filtered: https://twitter.com/BernieSanders/status/687366750933889024
Error extracting text from http://www.nytimes.com/2016/02/11/world/middleeast/iran-khomeini-elections.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/11/world/middleeast/iran-khomeini-elections.html?_r=0


Processing URLs:  20%|██        | 200/1000 [08:18<12:21,  1.08it/s]

Error extracting text from http://www.newsletter.co.uk/news/stormont-must-back-workers-and-urgently-reform-welfare-1-7004372: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/stormont-must-back-workers-and-urgently-reform-welfare-1-7004372


Processing URLs:  20%|██        | 201/1000 [08:19<12:34,  1.06it/s]

Error extracting text from http://news.xinhuanet.com/english/2016-04/20/c_135297993.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/20/c_135297993.htm


Processing URLs:  20%|██        | 202/1000 [08:21<15:00,  1.13s/it]

Error extracting text from http://nationalinterest.org/blog/russias-nuclear-missile-death-train-arriving-2019-19581: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/russias-nuclear-missile-death-train-arriving-2019-19581
URL filtered: https://www.youtube.com/watch?v=nVQFHy6mM5A


Processing URLs:  21%|██        | 207/1000 [08:28<17:51,  1.35s/it]

Error extracting text from http://www.iran-daily.com/News/156110.html?catid=3&amp;title=Iran-back-in-energy-market-with-new-contracts: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  21%|██        | 208/1000 [08:28<14:12,  1.08s/it]

Error extracting text from http://theiowarepublican.com/2015/carson-in-command-of-iowa/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/carson-in-command-of-iowa/


Processing URLs:  21%|██        | 210/1000 [08:32<17:52,  1.36s/it]

Error extracting text from http://www.fda.gov/drugs/developmentapprovalprocess/howdrugsaredevelopedandapproved/approvalapplications/therapeuticbiologicapplications/: 403 Client Error: Forbidden for url: http://www.fda.gov/drugs/developmentapprovalprocess/howdrugsaredevelopedandapproved/approvalapplications/therapeuticbiologicapplications/


Processing URLs:  21%|██        | 211/1000 [08:32<13:28,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/11/15/upshot/could-a-democrat-actually-win-a-senate-seat-in-alabama-precedents-are-few-but-telling.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/15/upshot/could-a-democrat-actually-win-a-senate-seat-in-alabama-precedents-are-few-but-telling.html?_r=0


Processing URLs:  22%|██▏       | 216/1000 [08:48<32:07,  2.46s/it]

Error extracting text from https://crackdnews.blogspot.com/2016/12/ban-ki-moon-might-be-south-koreas-next.html: 404 Client Error: Not Found for url: https://crackdnews.blogspot.com/2016/12/ban-ki-moon-might-be-south-koreas-next.html


Processing URLs:  22%|██▏       | 217/1000 [08:48<23:45,  1.82s/it]

Error extracting text from http://www.levantinegroup.com/#!Iraq-Kurdish-force-and-Shiite-militia-reach-agreement-on-upcoming-offensive-near-Kirkuk/c21xo/56e9adab0cf221f9af2f566a: 404 Client Error: Not Found for url: http://www.levantinegroup.com/#!Iraq-Kurdish-force-and-Shiite-militia-reach-agreement-on-upcoming-offensive-near-Kirkuk/c21xo/56e9adab0cf221f9af2f566a


Processing URLs:  22%|██▏       | 220/1000 [08:53<20:14,  1.56s/it]

Error extracting text from http://thehill.com/homenews/campaign/361976-former-kelly-aide-to-launch-write-in-campaign-for-alabama-senate-race: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361976-former-kelly-aide-to-launch-write-in-campaign-for-alabama-senate-race/


Processing URLs:  22%|██▏       | 222/1000 [08:56<21:25,  1.65s/it]

Error extracting text from https://www.reuters.com/article/russia-military-iran-china/russia-china-and-iran-to-hold-joint-naval-drills-in-indian-ocean-soon-ria-idUSKBN2A81Q8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-military-iran-china/russia-china-and-iran-to-hold-joint-naval-drills-in-indian-ocean-soon-ria-idUSKBN2A81Q8


Processing URLs:  23%|██▎       | 226/1000 [08:58<12:07,  1.06it/s]

Error extracting text from https://www.nysenate.gov/how-bill-becomes-law-1: 404 Client Error: Not Found for url: https://www.nysenate.gov/how-bill-becomes-law-1


Processing URLs:  23%|██▎       | 233/1000 [09:09<17:51,  1.40s/it]

Error extracting text from http://www.newsweek.com/government-cyber-attacks-increase-2015-439206: 403 Client Error: Forbidden for url: https://www.newsweek.com/government-cyber-attacks-increase-2015-439206


Processing URLs:  24%|██▎       | 236/1000 [09:13<14:53,  1.17s/it]

Error extracting text from http://www.oddschecker.com/politics/us-politics/us-democrat-primaries/iowa-democrat-caucus: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/us-politics/us-democrat-primaries/iowa-democrat-caucus


Processing URLs:  24%|██▍       | 239/1000 [09:16<11:34,  1.10it/s]

Error extracting text from http://www.nytimes.com/2016/05/30/world/middleeast/iran-saudi-arabia-mecca-hajj.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/30/world/middleeast/iran-saudi-arabia-mecca-hajj.html?_r=0


Processing URLs:  24%|██▍       | 241/1000 [09:18<12:11,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-germany-election-spd-idUSKBN15824X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-spd-idUSKBN15824X


Processing URLs:  24%|██▍       | 242/1000 [09:19<10:52,  1.16it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/341505-mcconnell-ideally-debt-ceiling-vote-is-before-august-recess: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/341505-mcconnell-ideally-debt-ceiling-vote-is-before-august-recess/


Processing URLs:  25%|██▌       | 250/1000 [09:30<13:53,  1.11s/it]

Error extracting text from https://www.nytimes.com/2022/01/21/world/europe/ukraine-russia-us-blinken-lavrov.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/21/world/europe/ukraine-russia-us-blinken-lavrov.html


Processing URLs:  26%|██▌       | 255/1000 [09:53<36:15,  2.92s/it]  

Error extracting text from https://www.faasafety.gov/files/gslac/library/documents/2011/Jan/49877/ADIZ%20TFR%20Intercepts%20w%20answers.pdf: 403 Client Error: Forbidden for url: https://www.faasafety.gov/files/gslac/library/documents/2011/Jan/49877/ADIZ%20TFR%20Intercepts%20w%20answers.pdf


Processing URLs:  26%|██▌       | 256/1000 [09:54<29:07,  2.35s/it]

Error extracting text from http://time.com/4656226/ban-ki-moon-south-korea-presidency/: 404 Client Error: Not Found for url: https://time.com/4656226/ban-ki-moon-south-korea-presidency/
URL filtered: https://www.bloomberg.com/news/articles/2017-07-24/at-t-said-to-be-in-early-u-s-talks-for-time-warner-approval


Processing URLs:  26%|██▌       | 258/1000 [09:55<17:03,  1.38s/it]

Error extracting text from http://www.wsj.com/articles/turkeys-demands-complicate-battle-plan-to-retake-mosul-from-islamic-state-1476305951: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkeys-demands-complicate-battle-plan-to-retake-mosul-from-islamic-state-1476305951


Processing URLs:  26%|██▌       | 262/1000 [09:58<12:42,  1.03s/it]

Error extracting text from https://news.yahoo.com/panama-aims-end-june-much-delayed-canal-expansion-001713762.html: 404 Client Error: Not Found for url: https://news.yahoo.com/panama-aims-end-june-much-delayed-canal-expansion-001713762.html
URL filtered: http://www.bloomberg.com/news/articles/2016-06-14/why-bookies-are-still-pretty-sure-brexit-isn-t-going-to-happen


Processing URLs:  26%|██▋       | 264/1000 [09:59<09:48,  1.25it/s]

Error extracting text from https://www.reuters.com/article/us-volvocars-geely-electric/geelys-volvo-to-go-all-electric-with-new-models-from-2019-idUSKBN19Q0BJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volvocars-geely-electric/geelys-volvo-to-go-all-electric-with-new-models-from-2019-idUSKBN19Q0BJ


Processing URLs:  27%|██▋       | 267/1000 [10:02<09:18,  1.31it/s]

Error extracting text from https://www.reuters.com/article/us-asean-philippines-usa-russia-idUSKBN1AN09B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-philippines-usa-russia-idUSKBN1AN09B


Processing URLs:  27%|██▋       | 268/1000 [10:03<09:21,  1.30it/s]

Error extracting text from https://www.dni.gov/index.php/newsroom/reports-publications/reports-publications-2021/item/2236-unclassified-summary-of-assessment-on-covid-19-origins: 403 Client Error: Forbidden for url: https://www.dni.gov/index.php/newsroom/reports-publications/reports-publications-2021/item/2236-unclassified-summary-of-assessment-on-covid-19-origins
URL filtered: https://www.youtube.com/watch?v=0Jnqz62d9oM&amp;list=PLROeMLXl5EWBLSz3bfYmrKWtJ3outdB4A&amp;index=1


Processing URLs:  27%|██▋       | 270/1000 [10:14<33:09,  2.73s/it]

Error extracting text from http://ec.europa.eu/priorities/economic-monetary-union/docs/5-presidents-report_en.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/commission/priorities/economic-monetary-union/docs/5-presidents-report_en.pdf


Processing URLs:  27%|██▋       | 273/1000 [10:17<22:37,  1.87s/it]

Error extracting text from https://www.reuters.com/article/us-asml-holding-usa-china-insight-idUSKBN1Z50HN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asml-holding-usa-china-insight-idUSKBN1Z50HN
URL filtered: https://www.bloomberg.com/graphics/2017-nkorea-missiles/


Processing URLs:  28%|██▊       | 276/1000 [10:18<13:45,  1.14s/it]

Error extracting text from https://www.reuters.com/article/us-eu-trade-mercosur/eu-mercosur-strike-trade-pact-defying-protectionist-wave-idUSKCN1TT2KD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-trade-mercosur/eu-mercosur-strike-trade-pact-defying-protectionist-wave-idUSKCN1TT2KD


Processing URLs:  28%|██▊       | 282/1000 [10:28<14:13,  1.19s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/russia-building-new/2310302.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/russia-building-new/2310302.html
Error extracting text from http://www.reuters.com/article/2015/10/26/us-china-japan-southkorea-idUSKCN0SK0L620151026: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/26/us-china-japan-southkorea-idUSKCN0SK0L620151026


Processing URLs:  28%|██▊       | 283/1000 [10:28<10:57,  1.09it/s]

Error extracting text from https://www.axios.com/white-supremacy-far-right-politics-spike-pacific-northwest-87217eb0-2028-4254-acef-2a3fe946cc8b.html: 403 Client Error: Forbidden for url: https://www.axios.com/white-supremacy-far-right-politics-spike-pacific-northwest-87217eb0-2028-4254-acef-2a3fe946cc8b.html
URL filtered: https://www.bloomberg.com/news/articles/2019-11-29/what-seattle-s-wto-protests-mean-20-years-later


Processing URLs:  29%|██▉       | 288/1000 [10:33<10:26,  1.14it/s]

Error extracting text from http://www.reuters.com/article/2015/10/23/malaysia-budget-highlights-idUSL3N12G1DM20151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/malaysia-budget-highlights-idUSL3N12G1DM20151023


Processing URLs:  29%|██▉       | 289/1000 [10:33<08:48,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/a-genocide-conviction-1458945830: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-genocide-conviction-1458945830


Processing URLs:  29%|██▉       | 292/1000 [10:38<17:33,  1.49s/it]

Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-14-august-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-14-august-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff930290>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 296/1000 [10:41<09:48,  1.20it/s]

Error extracting text from http://www.espn.com/nba/bracket: 403 Client Error: Forbidden for url: http://www.espn.com/nba/bracket


Processing URLs:  30%|██▉       | 297/1000 [10:41<08:31,  1.37it/s]

Error extracting text from http://www.chinapost.com.tw/international/europe/2015/10/20/448730/Protesters-police.htm: HTTPConnectionPool(host='www.chinapost.com.tw', port=80): Max retries exceeded with url: /international/europe/2015/10/20/448730/Protesters-police.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300fd7e90>: Failed to resolve 'www.chinapost.com.tw' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 298/1000 [10:42<09:09,  1.28it/s]

Error extracting text from http://www.businessinsider.com/burundis-government-might-scrap-presidential-term-limits-2016-8: 404 Client Error: Not Found for url: https://www.businessinsider.com/burundis-government-might-scrap-presidential-term-limits-2016-8


Processing URLs:  30%|███       | 300/1000 [10:44<09:14,  1.26it/s]

Error extracting text from http://thehill.com/homenews/campaign/364236-ivankas-criticism-of-roy-moore-pushed-trump-in-opposite-direction-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/364236-ivankas-criticism-of-roy-moore-pushed-trump-in-opposite-direction-report/


Processing URLs:  30%|███       | 303/1000 [11:04<1:08:20,  5.88s/it]

Error extracting text from http://www.investopedia.com/ask/answers/041715/how-important-are-seasonal-trends-automotive-sector.asp#ixzz47yKE1GOf: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/ask/answers/041715/how-important-are-seasonal-trends-automotive-sector.asp#ixzz47yKE1GOf


Processing URLs:  30%|███       | 304/1000 [11:06<53:37,  4.62s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2017-05-03/davis-says-u-k-won-t-pay-reported-100-billion-euro-brexit-bill


Processing URLs:  31%|███       | 307/1000 [11:07<24:21,  2.11s/it]

Error extracting text from http://www.trackingterrorism.org/group/boko-haram: 403 Client Error: Forbidden for url: https://trackingterrorism.org/group/boko-haram
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-un-idUSKCN0VX2T5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-un-idUSKCN0VX2T5


Processing URLs:  31%|███▏      | 313/1000 [11:14<14:12,  1.24s/it]

Error extracting text from https://www.defenceaviationpost.com/2021/08/india-to-send-4-warships-to-south-china-sea-for-drill-with-nations-having-maritime-disputes-with-china/: HTTPSConnectionPool(host='www.defenceaviationpost.com', port=443): Max retries exceeded with url: /2021/08/india-to-send-4-warships-to-south-china-sea-for-drill-with-nations-having-maritime-disputes-with-china/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.defenceaviationpost.com'. (_ssl.c:1000)")))


Processing URLs:  31%|███▏      | 314/1000 [11:15<14:17,  1.25s/it]

Error extracting text from https://tradingeconomics.com/united-states/labor-force-participation-rate: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/united-states/labor-force-participation-rate


Processing URLs:  32%|███▏      | 316/1000 [11:16<09:18,  1.22it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/12/19/France-hopes-to-reschedule-Iran-president-s-visit-for-January.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/12/19/France-hopes-to-reschedule-Iran-president-s-visit-for-January.html


Processing URLs:  32%|███▏      | 318/1000 [11:18<09:41,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-monsanto-m-a-bayer-exclusive-idUSKCN0YF1ZG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-monsanto-m-a-bayer-exclusive-idUSKCN0YF1ZG


Processing URLs:  32%|███▏      | 320/1000 [11:18<05:44,  1.97it/s]

Error extracting text from https://www.nytimes.com/2019/04/02/world/asia/nasa-india-space-debris.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2019/04/02/world/asia/nasa-india-space-debris.html
Error extracting text from http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html


Processing URLs:  32%|███▏      | 323/1000 [11:24<14:33,  1.29s/it]

Error extracting text from http://www.latimes.com/business/la-fi-wilbur-ross-confirmed-20170227-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-wilbur-ross-confirmed-20170227-story.html
URL filtered: https://www.bloomberg.com/news/articles/2017-11-10/venezuelan-state-electric-company-s-trustee-declares-default


Processing URLs:  32%|███▎      | 325/1000 [11:26<12:06,  1.08s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/commentary-wang-qishan-enforcer-and-best-premier-china-never-had-9074082: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/commentary-wang-qishan-enforcer-and-best-premier-china-never-had-9074082


Processing URLs:  33%|███▎      | 327/1000 [11:27<08:58,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-musk-neuralink-idUSKBN17N0CU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-musk-neuralink-idUSKBN17N0CU


Processing URLs:  33%|███▎      | 330/1000 [11:29<07:22,  1.51it/s]

Error extracting text from http://www.barrons.com/articles/venezuela-pdvsa-default-risk-real-outlook-negative-s-p-says-1488324833: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-pdvsa-default-risk-real-outlook-negative-s-p-says-1488324833
Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-idUSKBN14W1LV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-idUSKBN14W1LV


Processing URLs:  33%|███▎      | 331/1000 [11:29<05:40,  1.96it/s]

Error extracting text from https://news.usni.org/2016/09/06/iranian-boats-harass-another-u-s-navy-patrol-coastal-ship-persian-gulf: 403 Client Error: Forbidden for url: https://news.usni.org/2016/09/06/iranian-boats-harass-another-u-s-navy-patrol-coastal-ship-persian-gulf


Processing URLs:  33%|███▎      | 334/1000 [11:31<05:41,  1.95it/s]

Error extracting text from http://www.nationalreview.com/article/454491/gop-tax-plan-hasty-changes-uncertain-future: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/454491/gop-tax-plan-hasty-changes-uncertain-future/
URL filtered: https://www.youtube.com/watch?v=V4YzWYf0PtM
Error extracting text from http://www.straitstimes.com/asia/east-asia/china-malaysia-start-joint-military-exercise: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  34%|███▎      | 335/1000 [12:03<1:29:20,  8.06s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_turkey-wont-allow-syrian-kurdish-pyd-to-cross-west-of-euphrates_410331.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_turkey-wont-allow-syrian-kurdish-pyd-to-cross-west-of-euphrates_410331.html


Processing URLs:  34%|███▎      | 337/1000 [12:05<55:16,  5.00s/it]  

Error extracting text from https://www.rferl.org/a/us-lawmakers-probing-whether-trump-aide-promoted-us-russian-nuclear-project-middle-east/28734682.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/us-lawmakers-probing-whether-trump-aide-promoted-us-russian-nuclear-project-middle-east/28734682.html


Processing URLs:  34%|███▍      | 338/1000 [12:08<47:06,  4.27s/it]

URL filtered: https://twitter.com/JHahnEU/status/1397977369651073038


Processing URLs:  34%|███▍      | 342/1000 [12:13<25:12,  2.30s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/268983: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/268983


Processing URLs:  34%|███▍      | 344/1000 [12:15<17:19,  1.59s/it]

Error extracting text from https://www.straitstimes.com/politics/singapore-and-australia-confident-that-rcep-negotiations-can-conclude-by-end-2019: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  34%|███▍      | 345/1000 [12:16<13:46,  1.26s/it]

Error extracting text from http://www.wsj.com/articles/trump-and-the-goodfellas-1449877685: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-and-the-goodfellas-1449877685


Processing URLs:  35%|███▍      | 346/1000 [12:16<11:21,  1.04s/it]

Error extracting text from http://wedrawthelines.ca.gov/regulation_archive.html: 403 Client Error: Forbidden for url: http://wedrawthelines.ca.gov/regulation_archive.html


Processing URLs:  35%|███▍      | 347/1000 [12:18<14:46,  1.36s/it]

Error extracting text from http://myjournalcourier.com/news/104902/commentary-as-budgets-go-this-is-a-dud: 403 Client Error: Forbidden for url: https://www.myjournalcourier.com/news/104902/commentary-as-budgets-go-this-is-a-dud


Processing URLs:  35%|███▍      | 349/1000 [12:21<14:12,  1.31s/it]

Error extracting text from http://www.comres.co.uk/bojos-go-vote-makes-moderate-tories-key-battleground-for-eu-referendum/: 403 Client Error: Forbidden for url: http://comresglobal.com/bojos-go-vote-makes-moderate-tories-key-battleground-for-eu-referendum/


Processing URLs:  35%|███▌      | 351/1000 [12:23<12:22,  1.14s/it]

URL filtered: https://www.youtube.com/watch?v=0UZgENcr3Q0


Processing URLs:  35%|███▌      | 354/1000 [12:26<10:31,  1.02it/s]

Error extracting text from http://thehill.com/policy/international/306908-bolton-obama-should-not-do-anything-to-harm-israel-at-un: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/306908-bolton-obama-should-not-do-anything-to-harm-israel-at-un/


Processing URLs:  36%|███▌      | 357/1000 [12:30<13:07,  1.22s/it]

URL filtered: https://twitter.com/michaelmina_lab?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  36%|███▋      | 365/1000 [12:40<16:49,  1.59s/it]

Error extracting text from https://nationalinterest.org/feature/after-afghanistan-crux-biden%E2%80%99s-mideast-challenge-lies-tehran-191932: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/after-afghanistan-crux-biden%E2%80%99s-mideast-challenge-lies-tehran-191932


Processing URLs:  37%|███▋      | 367/1000 [12:42<11:27,  1.09s/it]

Error extracting text from http://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html?_r=0


Processing URLs:  37%|███▋      | 369/1000 [13:14<1:45:05,  9.99s/it]

Error extracting text from http://left-lane.com/us-car-sales-data/toyota/toyota-mirai/: 522 Server Error:  for url: http://left-lane.com/us-car-sales-data/toyota/toyota-mirai/


Processing URLs:  37%|███▋      | 371/1000 [13:15<53:29,  5.10s/it]  

Error extracting text from http://www.reuters.com/article/us-japan-china-eastchinasea-idUSKCN0WT0QZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-china-eastchinasea-idUSKCN0WT0QZ


Processing URLs:  37%|███▋      | 373/1000 [13:28<58:14,  5.57s/it]  

Error extracting text from http://www.ibtimes.com/north-korea-calls-us-icbm-interceptor-test-military-provocation-2547409: 403 Client Error: Forbidden for url: https://www.ibtimes.com/north-korea-calls-us-icbm-interceptor-test-military-provocation-2547409


Processing URLs:  38%|███▊      | 375/1000 [13:29<31:06,  2.99s/it]

Error extracting text from http://thespun.com/college-football/fansided-fires-editor-mia-khalifa: 403 Client Error: Forbidden for url: http://thespun.com/college-football/fansided-fires-editor-mia-khalifa
URL filtered: https://www.youtube.com/watch?v=IVs0Yr3GbRk


Processing URLs:  38%|███▊      | 378/1000 [13:30<13:14,  1.28s/it]

Error extracting text from https://www.nytimes.com/2017/07/14/world/asia/back-in-afghan-hot-spot-us-marines-chase-diminished-goals.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/14/world/asia/back-in-afghan-hot-spot-us-marines-chase-diminished-goals.html
Error extracting text from https://www.nato.int/cps/en/natohq/topics_37356.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/topics_37356.htm


Processing URLs:  38%|███▊      | 381/1000 [13:33<11:23,  1.10s/it]

Error extracting text from http://www.genengnews.com/gen-articles/crispr-gene-editing-it-isnt-quite-as-easy-as-it-looks/5200/: 403 Client Error: Forbidden for url: http://www.genengnews.com/gen-articles/crispr-gene-editing-it-isnt-quite-as-easy-as-it-looks/5200/


Processing URLs:  38%|███▊      | 382/1000 [13:34<09:47,  1.05it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/256043-club-for-growth-poll-shows-trump-drop-in-iowa: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/256043-club-for-growth-poll-shows-trump-drop-in-iowa/


Processing URLs:  39%|███▊      | 386/1000 [13:40<14:00,  1.37s/it]

Error extracting text from https://ctovision.com/subscribe/: 403 Client Error: Forbidden for url: https://ctovision.com/subscribe/


Processing URLs:  39%|███▊      | 387/1000 [13:40<10:35,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-resist-idUSKBN15A0DI?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-resist-idUSKBN15A0DI?il=0
URL filtered: https://www.bloomberg.com/news/articles/2017-02-27/mexico-warns-u-s-it-ll-cut-off-nafta-talks-if-tariffs-proposed


Processing URLs:  39%|███▉      | 391/1000 [13:44<11:09,  1.10s/it]

Error extracting text from http://www.nytimes.com/2016/04/29/world/middleeast/aleppo-syria-strikes.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/29/world/middleeast/aleppo-syria-strikes.html?_r=0


Processing URLs:  39%|███▉      | 394/1000 [13:47<10:04,  1.00it/s]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Kim-says-ready-to-improve-ties-with-S.-Korea-in-New-Year-s-address: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Kim-says-ready-to-improve-ties-with-S.-Korea-in-New-Year-s-address


Processing URLs:  40%|███▉      | 398/1000 [13:50<07:43,  1.30it/s]

Error extracting text from http://www.scotsman.com/news/politics/eu-exit-would-lead-to-scottish-independence-says-poll-1-4003304: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/eu-exit-would-lead-to-scottish-independence-says-poll-1-4003304


Processing URLs:  41%|████      | 407/1000 [14:05<15:48,  1.60s/it]

Error extracting text from http://www.ibtimes.co.uk/military-scale-islamic-state-weapon-manufacturing-east-mosul-revealed-1596477: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/military-scale-islamic-state-weapon-manufacturing-east-mosul-revealed-1596477


Processing URLs:  41%|████      | 410/1000 [15:05<2:07:11, 12.94s/it]

Error extracting text from http://www.nytimes.com/2016/02/11/world/middleeast/russian-intervention-in-syrian-war-has-sharply-reduced-us-options.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/11/world/middleeast/russian-intervention-in-syrian-war-has-sharply-reduced-us-options.html?_r=0


Processing URLs:  41%|████▏     | 413/1000 [15:09<51:58,  5.31s/it]  

Error extracting text from https://www.newsweek.com/barcelona-antoine-griezmann-cuts-ties-china-huawei-over-treatment-uyghurs-1553877: 403 Client Error: Forbidden for url: https://www.newsweek.com/barcelona-antoine-griezmann-cuts-ties-china-huawei-over-treatment-uyghurs-1553877


Processing URLs:  42%|████▏     | 417/1000 [15:25<37:37,  3.87s/it]  

Error extracting text from https://www.nbcnews.com/news/olympics/ahead-tokyo-olympics-japan-vaccination-rate-far-mark-n1272523: 403 Client Error: Forbidden for url: https://www.nbcnews.com/news/olympics/ahead-tokyo-olympics-japan-vaccination-rate-far-mark-n1272523


Processing URLs:  42%|████▏     | 418/1000 [15:26<27:29,  2.83s/it]

Error extracting text from http://www.rand.org/blog/2017/03/china-tolerating-vietnams-south-china-sea-activities.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2017/03/china-tolerating-vietnams-south-china-sea-activities.html


Processing URLs:  42%|████▏     | 421/1000 [15:28<14:29,  1.50s/it]

Error extracting text from https://www.senate.gov/legislative/resources/pdf/2017_calendar.pdf: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/resources/pdf/2017_calendar.pdf


Processing URLs:  42%|████▏     | 423/1000 [15:31<16:27,  1.71s/it]

Error extracting text from http://www.genome.gov/pages/der/sequencing_costs_oct2015.xlsx: 404 Client Error: Not Found for url: https://www.genome.gov/pages/der/sequencing_costs_oct2015.xlsx


Processing URLs:  43%|████▎     | 427/1000 [15:37<12:24,  1.30s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-25/iran-says-ballyhoo-over-parchin-won-t-hinder-iaea-inspection


Processing URLs:  43%|████▎     | 429/1000 [15:37<07:52,  1.21it/s]

Error extracting text from http://www.tradingeconomics.com/china/corruption-rank: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/china/corruption-rank


Processing URLs:  43%|████▎     | 431/1000 [15:40<10:47,  1.14s/it]

Error extracting text from http://in.reuters.com/article/britain-eu-opinium-idINKCN0Z40P0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  43%|████▎     | 433/1000 [15:44<13:43,  1.45s/it]

Error extracting text from http://www.politicususa.com/2016/04/19/republicans-admit-obama-scotus-obstruction-vote-tactic.html: 403 Client Error: Forbidden for url: http://www.politicususa.com/2016/04/19/republicans-admit-obama-scotus-obstruction-vote-tactic.html


Processing URLs:  43%|████▎     | 434/1000 [15:44<10:22,  1.10s/it]

Error extracting text from https://thefreethoughtproject.com/longer-conspiracy-theory-state-legalizes-weaponized-drones-cops/: 403 Client Error: Forbidden for url: https://thefreethoughtproject.com/longer-conspiracy-theory-state-legalizes-weaponized-drones-cops/


Processing URLs:  44%|████▎     | 436/1000 [15:47<11:16,  1.20s/it]

Error extracting text from http://tass.ru/en/defense/867350: 404 Client Error: Not Found for url: https://tass.ru/en/defense/867350


Processing URLs:  44%|████▍     | 438/1000 [15:49<09:51,  1.05s/it]

Error extracting text from https://seekingalpha.com/article/4001655-tesla-just-said: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4001655-tesla-just-said


Processing URLs:  44%|████▍     | 442/1000 [15:57<11:47,  1.27s/it]

Error extracting text from http://english.aawsat.com/2016/06/article55351894/saudi-crown-prince-marine-vessels-project-will-enhance-security-control-borders: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/06/article55351894/saudi-crown-prince-marine-vessels-project-will-enhance-security-control-borders


Processing URLs:  44%|████▍     | 444/1000 [15:58<08:03,  1.15it/s]

Error extracting text from https://www.nytimes.com/2018/01/21/world/asia/afghanistan-hotel-attack.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/21/world/asia/afghanistan-hotel-attack.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp


Processing URLs:  45%|████▍     | 446/1000 [16:02<11:51,  1.28s/it]

Error extracting text from http://www.wsj.com/articles/house-passes-short-term-spending-bill-to-avoid-government-shutdown-1449856732: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-passes-short-term-spending-bill-to-avoid-government-shutdown-1449856732


Processing URLs:  45%|████▍     | 448/1000 [16:06<14:00,  1.52s/it]

Error extracting text from http://www.reuters.com/article/global-markets-idUSL5N18M22Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/global-markets-idUSL5N18M22Q


Processing URLs:  45%|████▌     | 451/1000 [16:11<14:11,  1.55s/it]

Error extracting text from https://www.reuters.com/world/china/china-q3-gdp-growth-hits-1-year-low-raising-heat-policymakers-2021-10-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/china/china-q3-gdp-growth-hits-1-year-low-raising-heat-policymakers-2021-10-17/


Processing URLs:  46%|████▌     | 455/1000 [16:14<07:55,  1.15it/s]

Error extracting text from http://www.nytimes.com/2016/08/10/world/europe/putin-erdogan-russia-turkey.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/10/world/europe/putin-erdogan-russia-turkey.html?_r=0
Error extracting text from http://www.reuters.com/article/us-usa-election-southkorea-idUSKBN13508O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-southkorea-idUSKBN13508O


Processing URLs:  46%|████▌     | 459/1000 [16:21<12:54,  1.43s/it]

Error extracting text from http://www.reuters.com/article/us-colombia-peace-idUSKCN11W0BR?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-colombia-peace-idUSKCN11W0BR?il=0


Processing URLs:  46%|████▋     | 464/1000 [16:28<16:02,  1.80s/it]

Error extracting text from http://news.mb.com.ph/2016/12/17/duterte-to-us-prepare-for-departure-vfa-repeal/: HTTPConnectionPool(host='news.mb.com.ph', port=80): Max retries exceeded with url: /2016/12/17/duterte-to-us-prepare-for-departure-vfa-repeal/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303fdd520>: Failed to resolve 'news.mb.com.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  47%|████▋     | 466/1000 [16:28<09:14,  1.04s/it]

Error extracting text from https://wallethub.com/edu/states-most-least-dependent-on-the-federal-government/2700/: 403 Client Error: Forbidden for url: https://wallethub.com/edu/states-most-least-dependent-on-the-federal-government/2700/


Processing URLs:  47%|████▋     | 467/1000 [16:29<08:30,  1.04it/s]

Error extracting text from https://www.marketscreener.com/news/latest/German-regulator-s-Nord-Stream-2-move-may-delay-commissioning-to-March-sources--37053137/: 403 Client Error: Forbidden. for url: https://www.marketscreener.com/news/latest/German-regulator-s-Nord-Stream-2-move-may-delay-commissioning-to-March-sources--37053137/


Processing URLs:  47%|████▋     | 470/1000 [16:35<13:09,  1.49s/it]

Error extracting text from http://www.balkaninsight.com/en/article/us-nato-praise-montenegro-reforms-11-26-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/us-nato-praise-montenegro-reforms-11-26-2015


Processing URLs:  47%|████▋     | 474/1000 [16:42<13:44,  1.57s/it]

Error extracting text from https://www.fiettalaw.com/wp-content/uploads/2021/02/Brexit-BulletinEU-backtracks-on-invoking-Article-16-of-the-Northern-Ireland-Protocol-as-part-of-coronavirus-COVID-19-vaccine-export-authorisation-scheme.pdf: 403 Client Error: Forbidden for url: https://www.fiettalaw.com/wp-content/uploads/2021/02/Brexit-BulletinEU-backtracks-on-invoking-Article-16-of-the-Northern-Ireland-Protocol-as-part-of-coronavirus-COVID-19-vaccine-export-authorisation-scheme.pdf


Processing URLs:  48%|████▊     | 475/1000 [16:43<13:43,  1.57s/it]

Error extracting text from http://www.nbcnews.com/storyline/dakota-pipeline-protests/appeals-court-refuses-stop-oil-dakota-access-pipeline-n735336: 403 Client Error: Forbidden for url: http://www.nbcnews.com/storyline/dakota-pipeline-protests/appeals-court-refuses-stop-oil-dakota-access-pipeline-n735336


Processing URLs:  48%|████▊     | 477/1000 [16:44<07:58,  1.09it/s]

Error extracting text from https://www.yahoo.com/news/ex-un-chief-ban-ki-moon-says-not-070439393.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/ex-un-chief-ban-ki-moon-says-not-070439393.html


Processing URLs:  48%|████▊     | 479/1000 [16:46<08:57,  1.03s/it]

URL filtered: https://www.instagram.com/itsdougthepug/?hl=en


Processing URLs:  49%|████▉     | 488/1000 [16:54<07:30,  1.14it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/361560-roy-moore-strategist-fox-news-putting-out-fake-polls-showing-us: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/361560-roy-moore-strategist-fox-news-putting-out-fake-polls-showing-us/


Processing URLs:  49%|████▉     | 493/1000 [17:01<10:10,  1.21s/it]

Error extracting text from http://www.swissinfo.ch/eng/brazil-court-sends-lula-charges-to-petrobras-judge/42021574: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/brazil-court-sends-lula-charges-to-petrobras-judge/42021574


Processing URLs:  49%|████▉     | 494/1000 [17:03<12:43,  1.51s/it]

Error extracting text from http://theweek.com/articles/593126/russia-bringing-big-guns-syria-literally: 404 Client Error: Not Found for url: https://theweek.com/articles/593126/russia-bringing-big-guns-syria-literally


Processing URLs:  50%|████▉     | 496/1000 [17:06<13:42,  1.63s/it]

Error extracting text from https://www.tvnz.co.nz/one-news/world/helen-clark-fails-make-ground-in-latest-un-top-job-straw-poll: 404 Client Error: Not Found for url: https://www.1news.co.nz/one-news/world/helen-clark-fails-make-ground-in-latest-un-top-job-straw-poll/


Processing URLs:  50%|████▉     | 498/1000 [17:07<08:33,  1.02s/it]

Error extracting text from https://www.conservativereview.com/articles/how-fixing-the-filibuster-could-save-countless-unborn-lives: 404 Client Error: Not Found for url: https://www.conservativereview.com/articles/how-fixing-the-filibuster-could-save-countless-unborn-lives
Error extracting text from https://www.reuters.com/article/us-southchinasea-china/as-u-s-goes-quiet-on-close-naval-patrols-china-speaks-out-idUSKBN1FC0JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china/as-u-s-goes-quiet-on-close-naval-patrols-china-speaks-out-idUSKBN1FC0JK


Processing URLs:  50%|█████     | 500/1000 [17:07<05:05,  1.64it/s]

Error extracting text from http://www.wsj.com/articles/electric-car-maker-faraday-breaks-ground-in-nevada-1460589107: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/electric-car-maker-faraday-breaks-ground-in-nevada-1460589107


Processing URLs:  50%|█████     | 503/1000 [17:10<06:59,  1.18it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/08/updated-super-picker/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/08/updated-super-picker/


Processing URLs:  51%|█████     | 506/1000 [17:12<05:11,  1.58it/s]

Error extracting text from https://www.nytimes.com/2021/05/25/us/politics/biden-putin-meeting.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/25/us/politics/biden-putin-meeting.html


Processing URLs:  51%|█████     | 510/1000 [17:29<21:24,  2.62s/it]

Error extracting text from http://politico.pro/1NJ4VAZ: HTTPConnectionPool(host='politico.pro', port=80): Max retries exceeded with url: /1NJ4VAZ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db14c0>: Failed to resolve 'politico.pro' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/10/28/venezuela-bonds-idUSL1N12S04920151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/venezuela-bonds-idUSL1N12S04920151028


Processing URLs:  51%|█████     | 512/1000 [17:31<13:55,  1.71s/it]

Error extracting text from http://warontherocks.com/2016/06/open-letter-to-president-obama-and-the-u-s-congress-urging-quick-action-on-montenegros-entry-into-nato/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/06/open-letter-to-president-obama-and-the-u-s-congress-urging-quick-action-on-montenegros-entry-into-nato/


Processing URLs:  52%|█████▏    | 517/1000 [17:36<07:46,  1.03it/s]

Error extracting text from https://www.travelpulse.com/news/airlines/is-the-future-of-air-travel-supersonic.html: 405 Client Error: Not Allowed for url: https://www.travelpulse.com/news/airlines/is-the-future-of-air-travel-supersonic.html
Error extracting text from http://www.reuters.com/video/2016/11/08/us-elections-and-the-trouble-with-north?videoId=370394866&amp;utm_medium=referral&amp;utm_source=morefromreuters: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/11/08/us-elections-and-the-trouble-with-north?videoId=370394866&amp;utm_medium=referral&amp;utm_source=morefromreuters


Processing URLs:  52%|█████▏    | 521/1000 [17:46<15:23,  1.93s/it]

Error extracting text from http://cleantechnica.com/2016/06/01/volkswagen-considering-11-billion-battery-factory-germany/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/06/01/volkswagen-considering-11-billion-battery-factory-germany/


Processing URLs:  52%|█████▏    | 522/1000 [17:46<11:22,  1.43s/it]

Error extracting text from http://english.alarabiya.net/en/business/energy/2015/10/11/Saudi-Arabia-oil-output-up-by-7-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/energy/2015/10/11/Saudi-Arabia-oil-output-up-by-7-.html


Processing URLs:  52%|█████▏    | 523/1000 [17:47<09:09,  1.15s/it]

Error extracting text from http://www.realcleardefense.com/articles/2016/02/05/down_the_rabbit_hole__108983.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/02/05/down_the_rabbit_hole__108983.html


Processing URLs:  53%|█████▎    | 526/1000 [17:51<08:40,  1.10s/it]

Error extracting text from https://www.amazon.co.uk/Best-Sellers-Books/zgbs/books: 503 Server Error: Service Unavailable for url: https://www.amazon.co.uk/Best-Sellers-Books/zgbs/books


Processing URLs:  53%|█████▎    | 527/1000 [17:51<06:34,  1.20it/s]

Error extracting text from http://www.wsj.com/articles/behind-the-fall-of-marcelo-odebrecht-brazils-construction-prince-1450406284: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/behind-the-fall-of-marcelo-odebrecht-brazils-construction-prince-1450406284


Processing URLs:  53%|█████▎    | 528/1000 [17:52<05:28,  1.44it/s]

Error extracting text from https://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html


Processing URLs:  53%|█████▎    | 529/1000 [17:53<06:50,  1.15it/s]

Error extracting text from http://www.payvand.com/news/16/mar/1140.html: 404 Client Error: Not Found for url: http://www.payvand.com/news/16/mar/1140.html
Error extracting text from https://www.reuters.com/article/us-saudi-energy-prices/saudi-arabia-to-raise-energy-prices-pay-cash-to-poorer-citizens-idUSKBN1E62GY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-energy-prices/saudi-arabia-to-raise-energy-prices-pay-cash-to-poorer-citizens-idUSKBN1E62GY


Processing URLs:  53%|█████▎    | 531/1000 [17:53<04:04,  1.92it/s]

Error extracting text from http://www.business-standard.com/article/news-ani/military-boys-set-timer-device-on-nawaz-sharif-govt-116101500461_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ani/military-boys-set-timer-device-on-nawaz-sharif-govt-116101500461_1.html


Processing URLs:  53%|█████▎    | 533/1000 [17:54<02:48,  2.78it/s]

Error extracting text from https://www.nytimes.com/2021/12/22/nyregion/omicron-nyc-spread.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/22/nyregion/omicron-nyc-spread.html
Error extracting text from https://www.spglobal.com/platts/en/market-insights/latest-news/oil/061521-emirates-expects-patchy-recovery-after-passenger-numbers-decline-88-in-2020-21: 403 Client Error: Forbidden for url: https://www.spglobal.com/platts/en/market-insights/latest-news/oil/061521-emirates-expects-patchy-recovery-after-passenger-numbers-decline-88-in-2020-21


Processing URLs:  54%|█████▎    | 536/1000 [17:56<03:56,  1.96it/s]

Error extracting text from https://www.nytimes.com/2017/01/25/world/asia/ban-ki-moon-south-korea-president.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/25/world/asia/ban-ki-moon-south-korea-president.html


Processing URLs:  54%|█████▎    | 537/1000 [17:56<04:38,  1.66it/s]

Error extracting text from https://finance.yahoo.com/quote/FB/history?period1=1489363200&amp;period2=1647129600&amp;interval=1d&amp;filter=history&amp;frequency=1d&amp;includeAdjustedClose=true: 404 Client Error: Not Found for url: https://finance.yahoo.com/quote/FB/history?period1=1489363200&amp;period2=1647129600&amp;interval=1d&amp;filter=history&amp;frequency=1d&amp;includeAdjustedClose=true


Processing URLs:  54%|█████▍    | 538/1000 [17:57<04:13,  1.82it/s]

Error extracting text from http://dronelife.com/2017/01/03/faa-authorizations-waivers-3-big-questions-answers/: 403 Client Error: Forbidden for url: http://dronelife.com/2017/01/03/faa-authorizations-waivers-3-big-questions-answers/
URL filtered: https://twitter.com/HQNigerianArmy/status/722347240476512256


Processing URLs:  54%|█████▍    | 542/1000 [18:16<36:20,  4.76s/it]

Error extracting text from https://www.recode.net/2017/3/24/15054884/amazon-prime-air-public-us-drone-delivery: Exceeded 30 redirects.


Processing URLs:  55%|█████▌    | 550/1000 [19:32<1:26:59, 11.60s/it]

Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:bd88bf382b7946ffb9787580e0eb6327: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:bd88bf382b7946ffb9787580e0eb6327 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcdb50>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nbcnews.com/storyline/isis-uncovered/trump-gave-russians-secrets-news-orgs-are-being-asked-withhold-n760811: 403 Client Error: Forbidden for url: http://www.nbcnews.com/storyline/isis-uncovered/trump-gave-russians-secrets-news-orgs-are-being-asked-withhold-n760811


Processing URLs:  55%|█████▌    | 551/1000 [19:34<1:10:29,  9.42s/it]

Error extracting text from http://www.theweek.co.uk/scottish-independence/55716/indyref2-sturgeon-announces-second-independence-referendum-bill: 404 Client Error: Not Found for url: https://theweek.com/scottish-independence/55716/indyref2-sturgeon-announces-second-independence-referendum-bill
URL filtered: http://www.bloomberg.com/news/articles/2015-12-08/brazil-real-rises-on-bet-waning-rousseff-support-to-speed-ouster


Processing URLs:  55%|█████▌    | 554/1000 [19:37<35:41,  4.80s/it]  

Error extracting text from http://thehill.com/homenews/campaign/363230-as-election-nears-gop-braces-for-moore-victory: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/363230-as-election-nears-gop-braces-for-moore-victory/


Processing URLs:  56%|█████▌    | 555/1000 [19:41<32:31,  4.39s/it]

Error extracting text from http://ec.europa.eu/dgs/home-affairs/what-we-do/policies/borders-and-visas/schengen/index_en.htm: 404 Client Error: Not Found for url: https://home-affairs.ec.europa.eu/what-we-do/policies/borders-and-visas/schengen/index_en.htm
Error extracting text from http://www.balkaninsight.com/en/article/eu-conservatives-slam-brussels-double-standard-in-montenegro-03-02-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/eu-conservatives-slam-brussels-double-standard-in-montenegro-03-02-2016


Processing URLs:  56%|█████▌    | 557/1000 [19:41<19:58,  2.71s/it]

Error extracting text from http://warisboring.com/articles/air-power-remains-americas-best-weapon-for-halting-islamic-state/?mc_cid=0072a868fb&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/air-power-remains-americas-best-weapon-for-halting-islamic-state/?mc_cid=0072a868fb&amp;mc_eid=0467f21653


Processing URLs:  56%|█████▌    | 561/1000 [19:44<08:55,  1.22s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-21/protests-rocking-south-african-capital-highlight-anc-divide
Error extracting text from http://www.reuters.com/article/us-china-military-commentary-idUSKBN1684OW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-military-commentary-idUSKBN1684OW


Processing URLs:  56%|█████▌    | 562/1000 [20:44<1:46:48, 14.63s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2018-02-15/two-thirds-of-spd-supporters-back-german-grand-coalition-poll: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  56%|█████▋    | 565/1000 [20:47<46:49,  6.46s/it]  

Error extracting text from http://www.reuters.com/article/2015/12/01/us-mideast-crisis-germany-idUSKBN0TK3ON20151201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/us-mideast-crisis-germany-idUSKBN0TK3ON20151201
Error extracting text from http://www.nytimes.com/aponline/2016/11/28/world/asia/ap-as-thailand-king.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/11/28/world/asia/ap-as-thailand-king.html?_r=0


Processing URLs:  57%|█████▋    | 566/1000 [20:49<37:36,  5.20s/it]

Error extracting text from https://uk.reuters.com/article/opec-oil/table-opec-oil-output-falls-by-300000-bpd-in-november-reuters-survey-idUKL8N1O433T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  57%|█████▋    | 572/1000 [20:55<10:24,  1.46s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil.gov.br/governo/2016/02/jose-eduardo-cardozo-deixa-o-ministerio-da-justica-e-comandara-a-cgu&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil.gov.br/governo/2016/02/jose-eduardo-cardozo-deixa-o-ministerio-da-justica-e-comandara-a-cgu&amp;prev=search


Processing URLs:  57%|█████▋    | 573/1000 [20:57<10:32,  1.48s/it]

Error extracting text from http://www.foxnews.com/politics/interactive/2016/01/22/fox-news-poll-iowa-presidential-primary/: 403 Client Error: Forbidden for url: http://www.foxnews.com/politics/interactive/2016/01/22/fox-news-poll-iowa-presidential-primary/


Processing URLs:  58%|█████▊    | 577/1000 [21:00<07:34,  1.07s/it]

Error extracting text from https://www.wsj.com/articles/german-government-hinges-on-party-tally-1519345257: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/german-government-hinges-on-party-tally-1519345257


Processing URLs:  58%|█████▊    | 580/1000 [21:04<09:11,  1.31s/it]

Error extracting text from https://gcaptain.com/uncertainty-surrounds-panama-canal-expansion-delivery-timeline/#.VmMhKd-rRR0: 403 Client Error: Forbidden for url: https://gcaptain.com/uncertainty-surrounds-panama-canal-expansion-delivery-timeline/#.VmMhKd-rRR0


Processing URLs:  58%|█████▊    | 582/1000 [21:09<11:41,  1.68s/it]

Error extracting text from https://uslumbercoalition.org/press-release/u-s-lumber-coalition-u-s-department-of-commerces-continued-trade-enforcement-leads-to-robust-domestic-lumber-industry-capacity-expansion/: 403 Client Error: Forbidden for url: https://uslumbercoalition.org/press-release/u-s-lumber-coalition-u-s-department-of-commerces-continued-trade-enforcement-leads-to-robust-domestic-lumber-industry-capacity-expansion/


Processing URLs:  58%|█████▊    | 584/1000 [21:10<08:35,  1.24s/it]

URL filtered: https://www.newsweek.com/aclu-counsel-warns-unchecked-power-twitter-facebook-after-trump-suspension-1560248


Processing URLs:  59%|█████▉    | 588/1000 [22:16<1:55:48, 16.86s/it]

Error extracting text from https://www.betfair.com/exchange/plus/#/politics/market/1.120637079: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303caeb10>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))
URL filtered: https://twitter.com/hashtag/UKinEU?src=hash
URL filtered: http://www.bloomberg.com/news/articles/2015-09-15/obama-opposes-bill-to-lift-oil-export-ban-set-for-vote-in-house


Processing URLs:  60%|█████▉    | 598/1000 [22:22<10:30,  1.57s/it]  

Error extracting text from http://www.nytimes.com/2006/04/28/nyregion/that-wild-taxi-ride-is-safer-than-you-think-a-study-says.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2006/04/28/nyregion/that-wild-taxi-ride-is-safer-than-you-think-a-study-says.html
Error extracting text from http://www.reuters.com/article/us-china-defence-idUSKBN13S0DA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-defence-idUSKBN13S0DA


Processing URLs:  60%|██████    | 600/1000 [22:24<08:17,  1.24s/it]

Error extracting text from http://www.reuters.com/article/us-yahoo-m-a-timeinc-exclusive-idUSKCN0WY5T8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yahoo-m-a-timeinc-exclusive-idUSKCN0WY5T8


Processing URLs:  60%|██████    | 602/1000 [22:25<04:39,  1.43it/s]

Error extracting text from https://www.nytimes.com/2017/08/27/world/middleeast/if-report-says-iran-is-abiding-by-nuclear-deal-will-trump-heed-it.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/27/world/middleeast/if-report-says-iran-is-abiding-by-nuclear-deal-will-trump-heed-it.html
Error extracting text from http://www.nytimes.com/2016/06/03/health/zika-oral-sex-kissing-transmission.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/03/health/zika-oral-sex-kissing-transmission.html?_r=0


Processing URLs:  61%|██████    | 607/1000 [22:27<03:10,  2.06it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56205: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56205
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YA0QX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YA0QX
Error extracting text from http://www.nytimes.com/2016/03/05/upshot/if-super-tuesday-voting-pattern-continues-donald-trump-will-reach-delegate-target.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/05/upshot/if-super-tuesday-voting-pattern-continues-donald-trump-will-reach-delegate-target.html?_r=0


Processing URLs:  61%|██████    | 611/1000 [22:33<07:23,  1.14s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-06-16/treasury-yields-jump-as-fed-officials-signal-speedier-hikes?sref=x7nYEkiY


Processing URLs:  61%|██████▏   | 613/1000 [22:34<05:17,  1.22it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-27/markets-may-have-nothing-left-to-fear-but-fearlessness-itself


Processing URLs:  62%|██████▏   | 616/1000 [22:36<04:54,  1.31it/s]

Error extracting text from http://egc2015.cz/sites/default/files/Results_European_Championship.pdf: 404 Client Error: Not Found for url: http://egc2015.cz/sites/default/files/Results_European_Championship.pdf


Processing URLs:  62%|██████▏   | 617/1000 [22:36<04:21,  1.46it/s]

Error extracting text from http://www.amazon.com/the-winds-of-winter/cp/ezyv3bakaop9eah: 404 Client Error: Not Found for url: https://www.amazon.com/the-winds-of-winter/cp/ezyv3bakaop9eah
Error extracting text from https://www.reuters.com/article/us-safrica-politics-zuma/anc-leader-says-south-africa-in-new-era-as-talk-of-zuma-exit-grows-idUSKBN1FE2ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics-zuma/anc-leader-says-south-africa-in-new-era-as-talk-of-zuma-exit-grows-idUSKBN1FE2ID


Processing URLs:  62%|██████▎   | 625/1000 [22:45<06:38,  1.06s/it]

Error extracting text from https://www.humboldtforum.org/en/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/en/


Processing URLs:  63%|██████▎   | 626/1000 [22:46<06:43,  1.08s/it]

Error extracting text from https://www.nbcnews.com/politics/elections/why-alabama-young-republicans-deserted-roy-moore-n822731: 403 Client Error: Forbidden for url: https://www.nbcnews.com/politics/elections/why-alabama-young-republicans-deserted-roy-moore-n822731


Processing URLs:  63%|██████▎   | 628/1000 [22:49<06:23,  1.03s/it]

Error extracting text from http://www.politics.co.uk/blogs/2016/07/14/everything-you-need-to-know-about-theresa-may-s-brexit: 403 Client Error: Forbidden for url: http://www.politics.co.uk/blogs/2016/07/14/everything-you-need-to-know-about-theresa-may-s-brexit


Processing URLs:  63%|██████▎   | 632/1000 [22:54<07:15,  1.18s/it]

URL filtered: https://twitter.com/abasinfo


Processing URLs:  64%|██████▎   | 636/1000 [23:02<09:22,  1.55s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-satellite-fueling-idUSKCN0VE2C4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-satellite-fueling-idUSKCN0VE2C4


Processing URLs:  64%|██████▎   | 637/1000 [23:03<07:43,  1.28s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/06/nigeria-killing-of-unarmed-pro-biafra-supporters-by-military-must-be-urgently-investigated/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/06/nigeria-killing-of-unarmed-pro-biafra-supporters-by-military-must-be-urgently-investigated/


Processing URLs:  64%|██████▍   | 640/1000 [23:05<05:41,  1.05it/s]

Error extracting text from https://www.nytimes.com/2020/12/17/arts/design/humboldt-forum-berlin.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/17/arts/design/humboldt-forum-berlin.html


Processing URLs:  64%|██████▍   | 644/1000 [23:11<08:53,  1.50s/it]

Error extracting text from https://ec.europa.eu/food/sites/food/files/animals/docs/aw_platform_20190617_pres-12.pdf: 404 Client Error: Not Found for url: https://food.ec.europa.eu/sites/default/files/animals/docs/aw_platform_20190617_pres-12.pdf


Processing URLs:  65%|██████▍   | 647/1000 [23:14<06:25,  1.09s/it]

Error extracting text from http://www.reuters.com/article/2015/10/15/us-montenegro-nato-idUSKCN0S911720151015: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/15/us-montenegro-nato-idUSKCN0S911720151015


Processing URLs:  65%|██████▌   | 652/1000 [24:19<1:40:16, 17.29s/it]

Error extracting text from http://en.xinfinance.com/html/Industries/Finance/2015/168077.shtml: HTTPConnectionPool(host='en.xinfinance.com', port=80): Max retries exceeded with url: /html/Industries/Finance/2015/168077.shtml (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30364c8f0>, 'Connection to en.xinfinance.com timed out. (connect timeout=60)'))


Processing URLs:  66%|██████▌   | 655/1000 [24:31<47:14,  8.22s/it]  

Error extracting text from https://www.yahoo.com/news/north-korea-says-us-sanctions-leader-declaration-war-170305297.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/north-korea-says-us-sanctions-leader-declaration-war-170305297.html


Processing URLs:  67%|██████▋   | 669/1000 [25:04<07:16,  1.32s/it]

Error extracting text from http://www.caam.org.cn/shichang/20160930/0905199204.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/shichang/20160930/0905199204.html
URL filtered: https://www.youtube.com/watch?v=EPmRBkwwBBo&amp;list=PLPnZfvKID1Sje5jWxt-4CSZD7bUI4gSPS&amp;index=49


Processing URLs:  67%|██████▋   | 672/1000 [25:04<03:40,  1.49it/s]

Error extracting text from http://www.baltimoresun.com/entertainment/tv/z-on-tv-blog/bal-book-adnan-syed-rabia-chaudry-20151124-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/entertainment/tv/z-on-tv-blog/bal-book-adnan-syed-rabia-chaudry-20151124-story.html
Error extracting text from http://www.reuters.com/article/us-india-iran-idUSKBN16K0OA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-iran-idUSKBN16K0OA


Processing URLs:  68%|██████▊   | 676/1000 [25:10<05:34,  1.03s/it]

Error extracting text from https://www.wsj.com/articles/china-wants-a-chip-machine-from-the-dutch-the-u-s-said-no-11626514513: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-wants-a-chip-machine-from-the-dutch-the-u-s-said-no-11626514513
URL filtered: http://www.bloomberg.com/news/articles/2015-12-18/greece-s-world-beating-21-bond-return-no-cure-for-pariah-status


Processing URLs:  68%|██████▊   | 679/1000 [25:14<07:30,  1.40s/it]

URL filtered: https://www.youtube.com/watch?v=c45IfGRkJ_w


Processing URLs:  68%|██████▊   | 683/1000 [25:20<07:14,  1.37s/it]

Error extracting text from https://www.goodjudgmentproject.com/questions/30-before-1-january-2017-will-it-be-officially-announced-that-greece-is-leaving-the-eurozone: 404 Client Error: Not Found for url: https://www.goodjudgmentproject.com/questions/30-before-1-january-2017-will-it-be-officially-announced-that-greece-is-leaving-the-eurozone


Processing URLs:  68%|██████▊   | 684/1000 [25:23<08:59,  1.71s/it]

Error extracting text from http://nationalinterest.org/feature/why-iran-fears-independent-kurdistan-10950: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/why-iran-fears-independent-kurdistan-10950


Processing URLs:  69%|██████▉   | 688/1000 [25:29<08:07,  1.56s/it]

Error extracting text from http://www.foxnews.com/politics/2016/01/27/trump-campaign-says-candidate-won-t-participate-in-fox-newsgoogle-debate.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/politics/2016/01/27/trump-campaign-says-candidate-won-t-participate-in-fox-newsgoogle-debate.html


Processing URLs:  69%|██████▉   | 690/1000 [25:30<05:20,  1.03s/it]

URL filtered: https://www.youtube.com/watch?v=1SmgLtg1Izw


Processing URLs:  69%|██████▉   | 693/1000 [25:31<03:20,  1.53it/s]

Error extracting text from https://www.nytimes.com/2017/08/18/world/asia/in-afghanistan-a-destructive-game-of-thrones.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/18/world/asia/in-afghanistan-a-destructive-game-of-thrones.html?_r=0


Processing URLs:  69%|██████▉   | 694/1000 [25:33<04:35,  1.11it/s]

Error extracting text from http://www.aas.org/: 403 Client Error: Forbidden for url: http://www.aas.org/


Processing URLs:  70%|██████▉   | 695/1000 [25:36<07:22,  1.45s/it]

URL filtered: https://www.vox.com/recode/22221135/capitol-riot-section-230-twitter-hawley-democrats


Processing URLs:  70%|██████▉   | 699/1000 [25:39<05:14,  1.04s/it]

Error extracting text from http://allafrica.com/view/group/main/main/id/00058322.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /view/group/main/main/id/00058322.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffc6fb60>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  70%|███████   | 702/1000 [25:43<05:13,  1.05s/it]

Error extracting text from http://news.riskadvisory.net/2015/16/iran-summary-of-nuclear-deal-timeline-and-sanctions-relief/: 409 Client Error: Conflict for url: http://news.riskadvisory.net/2015/16/iran-summary-of-nuclear-deal-timeline-and-sanctions-relief/


Processing URLs:  70%|███████   | 703/1000 [25:43<04:03,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-migrants-idUSKBN0UM22520160108: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-migrants-idUSKBN0UM22520160108


Processing URLs:  71%|███████   | 706/1000 [25:49<07:10,  1.47s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16V03M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16V03M


Processing URLs:  71%|███████   | 708/1000 [25:51<05:42,  1.17s/it]

Error extracting text from https://www.reuters.com/business/cop/paris-rulebook-could-still-be-completed-cop26-eus-timmermans-2021-11-11/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/cop/paris-rulebook-could-still-be-completed-cop26-eus-timmermans-2021-11-11/


Processing URLs:  71%|███████   | 709/1000 [25:52<05:55,  1.22s/it]

URL filtered: https://www.bnnbloomberg.ca/us-china-fight-may-spoil-global-deal-for-a-covid-vaccine-patent-waiver-1.1766590


Processing URLs:  71%|███████   | 711/1000 [25:54<05:02,  1.05s/it]

Error extracting text from http://tass.ru/en/politics/837305: 404 Client Error: Not Found for url: https://tass.ru/en/politics/837305


Processing URLs:  71%|███████▏  | 713/1000 [25:57<05:43,  1.20s/it]

Error extracting text from https://africa.cgtn.com/2020/11/05/egypt-ethiopia-sudan-disagree-on-methodology-for-nile-dam-talks/: 404 Client Error: Not Found for url: https://africa.cgtn.com/2020/11/05/egypt-ethiopia-sudan-disagree-on-methodology-for-nile-dam-talks/


Processing URLs:  72%|███████▏  | 717/1000 [26:04<08:07,  1.72s/it]

Error extracting text from http://en.apa.az/world-news/america-news/us-set-to-restrict-russian-flights-over-us-violating-open-skies-treaty.html: 404 Client Error: Not Found for url: https://en.apa.az/world-news/america-news/us-set-to-restrict-russian-flights-over-us-violating-open-skies-treaty.html


Processing URLs:  72%|███████▏  | 718/1000 [26:04<05:59,  1.28s/it]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2016/10/13-tusk-speech-epc/ : 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/10/13-tusk-speech-epc/%20


Processing URLs:  72%|███████▏  | 721/1000 [26:10<07:26,  1.60s/it]

Error extracting text from https://reut.rs/3oPnUOl: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics/italys-renzi-would-like-ex-ecbs-draghi-to-head-italy-government-source-idUSKBN2A00B0


Processing URLs:  72%|███████▏  | 723/1000 [26:11<04:52,  1.06s/it]

Error extracting text from https://au.news.yahoo.com/world/a/32040926/key-lobbies-for-clark-as-un-chief/#page1: 404 Client Error: Not Found for url: https://au.news.yahoo.com/key-lobbies-for-clark-as-un-chief-32040926.html#page1


Processing URLs:  72%|███████▏  | 724/1000 [26:12<05:24,  1.17s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/the-latest-senate-gop-leader-backs-house-health-care-bill/articleshow/57521534.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/the-latest-senate-gop-leader-backs-house-health-care-bill/articleshow/57521534.cms
Error extracting text from https://www.reuters.com/world/europe/germany-worried-about-covid-19-vaccination-no-shows-2021-07-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/germany-worried-about-covid-19-vaccination-no-shows-2021-07-05/
Error extracting text from http://bigstory.ap.org/article/c968b2cebcef43f7ad7892cccb2f8d55/skorea-urges-extraordinary-new-sanctions-nkorea: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/c968b2cebcef43f7ad7892cccb2f8d55/skorea-urges-extraordinary-new-sanctions-nkorea (Caused by NameResolutionError("<urllib3.connection.HTT

Processing URLs:  73%|███████▎  | 728/1000 [27:15<1:01:30, 13.57s/it]

Error extracting text from http://tass.ru/ekonomika/2489088: HTTPSConnectionPool(host='tass.ru', port=443): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 729/1000 [27:16<47:17, 10.47s/it]  

Error extracting text from http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?_r=0


Processing URLs:  74%|███████▎  | 736/1000 [27:32<12:36,  2.87s/it]

Error extracting text from http://news.yahoo.com/nato-chief-russia-interference-boosts-montenegro-chances-130233555.html: 404 Client Error: Not Found for url: http://news.yahoo.com/nato-chief-russia-interference-boosts-montenegro-chances-130233555.html


Processing URLs:  74%|███████▎  | 737/1000 [27:32<09:39,  2.20s/it]

Error extracting text from http://tech.163.com/15/0209/17/AI1FOM6Q00094ODU.html: 403 Client Error: Forbidden for url: https://www.163.com/tech/article/AI1FOM6Q00094ODU.html


Processing URLs:  74%|███████▍  | 739/1000 [27:36<08:17,  1.91s/it]

Error extracting text from https://www.nato.int/cps/en/natohq/news_149233.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/news_149233.htm


Processing URLs:  74%|███████▍  | 742/1000 [27:40<05:44,  1.34s/it]

Error extracting text from http://www.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?_r=0


Processing URLs:  74%|███████▍  | 743/1000 [27:40<04:58,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=DON-aM2tze4
URL filtered: https://twitter.com/diemassociation


Processing URLs:  75%|███████▍  | 748/1000 [27:43<02:49,  1.48it/s]

Error extracting text from https://theconversation.com/turkeys-constitutional-referendum-experts-express-fear-for-a-divided-country-76289: 403 Client Error: Forbidden for url: https://theconversation.com/turkeys-constitutional-referendum-experts-express-fear-for-a-divided-country-76289


Processing URLs:  75%|███████▍  | 749/1000 [27:43<02:23,  1.75it/s]

Error extracting text from https://www.nytimes.com/2017/08/18/business/media/bannon-said-to-be-planning-his-return-to-breitbart-news.html?emc=edit_th_20170819&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/18/business/media/bannon-said-to-be-planning-his-return-to-breitbart-news.html?emc=edit_th_20170819&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  75%|███████▌  | 750/1000 [27:44<02:20,  1.78it/s]

Error extracting text from http://laht.com/article.asp?ArticleId=2399114&amp;CategoryId=10717: 404 Client Error: Not Found for url: http://laht.com/article.asp?ArticleId=2399114&amp;CategoryId=10717


Processing URLs:  75%|███████▌  | 751/1000 [27:45<02:52,  1.44it/s]

Error extracting text from http://www.france24.com/en/20160920-risk-genocide-crimes-against-humanity-burundi-un-investigators-warn: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160920-risk-genocide-crimes-against-humanity-burundi-un-investigators-warn


Processing URLs:  76%|███████▌  | 760/1000 [27:56<05:10,  1.29s/it]

Error extracting text from https://www.jcs.mil/Media/News/: 403 Client Error: Forbidden for url: https://www.jcs.mil/Media/News/


Processing URLs:  76%|███████▌  | 762/1000 [27:58<04:19,  1.09s/it]

Error extracting text from http://thecipherbrief.com/article/isis%E2%80%99-illicit-networks: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/isis%E2%80%99-illicit-networks
Error extracting text from http://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea#ixzz47RFlEcps: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea#ixzz47RFlEcps


Processing URLs:  76%|███████▋  | 765/1000 [28:02<04:07,  1.05s/it]

Error extracting text from https://www.hindustantimes.com/india-news/india-china-agree-to-work-towards-speedy-resolution-of-lac-row-101631901927179.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/india-news/india-china-agree-to-work-towards-speedy-resolution-of-lac-row-101631901927179.html


Processing URLs:  77%|███████▋  | 767/1000 [28:03<02:49,  1.37it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/may/19/nato-formally-invites-montenegro-as-29th-member/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/may/19/nato-formally-invites-montenegro-as-29th-member/


Processing URLs:  77%|███████▋  | 768/1000 [28:03<02:39,  1.45it/s]

Error extracting text from https://www.weforum.org/agenda/2016/03/is-there-a-link-between-free-trade-and-war?utm_content=buffer30e31&amp;utm_medium=social&amp;utm_source=plus.google.com&amp;utm_campaign=buffer: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2016/03/is-there-a-link-between-free-trade-and-war?utm_content=buffer30e31&amp;utm_medium=social&amp;utm_source=plus.google.com&amp;utm_campaign=buffer


Processing URLs:  77%|███████▋  | 770/1000 [28:05<03:20,  1.15it/s]

Error extracting text from http://www.wsj.com/articles/donald-trump-jr-held-talks-on-syria-with-russia-supporters-1479920753: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/donald-trump-jr-held-talks-on-syria-with-russia-supporters-1479920753


Processing URLs:  77%|███████▋  | 772/1000 [28:07<02:55,  1.30it/s]

Error extracting text from http://www.defense.gov/Video?videoid=447759: 403 Client Error: Forbidden for url: http://www.defense.gov/Video?videoid=447759


Processing URLs:  77%|███████▋  | 773/1000 [28:08<03:05,  1.22it/s]

Error extracting text from https://www.cnbc.com/2017/10/20/pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/10/20/pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills.html


Processing URLs:  78%|███████▊  | 778/1000 [28:15<05:23,  1.46s/it]

URL filtered: https://www.youtube.com/watch?v=ZhX4__wwTWk


Processing URLs:  78%|███████▊  | 780/1000 [29:16<54:13, 14.79s/it]

Error extracting text from http://aa.com.tr/en/todays-headlines/rating-agys-should-note-turkeys-potential-in-rankings/613390: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  78%|███████▊  | 781/1000 [29:18<42:38, 11.68s/it]

URL filtered: https://www.youtube.com/watch?v=RU636IQMD1E
URL filtered: https://twitter.com/Zurich/statuses/725728128656613377


Processing URLs:  78%|███████▊  | 785/1000 [29:19<16:41,  4.66s/it]

Error extracting text from https://www.wsj.com/articles/covid-origins-china-wild-animal-farms-pandemic-source-11625060088?mod=djemHL_t: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/covid-origins-china-wild-animal-farms-pandemic-source-11625060088?mod=djemHL_t


Processing URLs:  79%|███████▉  | 790/1000 [29:31<09:40,  2.76s/it]

Error extracting text from http://www.usa-anti-communist.com/ard/US_PatentedMindControl.php: 403 Client Error: Forbidden for url: https://usa-anti-communist.com/ard/US_PatentedMindControl.php


Processing URLs:  79%|███████▉  | 794/1000 [29:39<06:36,  1.93s/it]

Error extracting text from https://www.moodys.com/sites/products/ProductAttachments/Moody&#39;s%20Rating%20System.pdf: 400 Client Error: Bad Request for url: https://www.moodys.com/sites/products/ProductAttachments/Moody&#39;s%20Rating%20System.pdf


Processing URLs:  80%|███████▉  | 796/1000 [29:42<06:07,  1.80s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-22/as-opec-tries-to-squeeze-rivals-one-of-its-own-feels-the-pinch


Processing URLs:  80%|████████  | 800/1000 [29:43<02:42,  1.23it/s]

Error extracting text from http://in.reuters.com/article/myanmar-politics-president-idINKCN0QI1KE20150813: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from https://www.reuters.com/article/us-usa-cyber-northkorea/u-s-government-shares-technical-details-on-north-korean-hacking-campaign-idUSKBN1DE2V4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-northkorea/u-s-government-shares-technical-details-on-north-korean-hacking-campaign-idUSKBN1DE2V4
Error extracting text from http://www.reuters.com/article/2015/09/19/us-northkorea-southkorea-idUSKCN0RJ01C20150919: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/19/us-northkorea-southkorea-idUSKCN0RJ01C20150919


Processing URLs:  80%|████████  | 805/1000 [29:48<02:39,  1.23it/s]

Error extracting text from https://www.confidencial.com.ni/politica/las-razones-de-los-paises-caribenos-de-la-oea-para-dar-la-espalda-a-ortega/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/politica/las-razones-de-los-paises-caribenos-de-la-oea-para-dar-la-espalda-a-ortega/
Error extracting text from http://www.visualcapitalist.com/14-companies-control-entire-auto-industry/: 403 Client Error: Forbidden for url: http://www.visualcapitalist.com/14-companies-control-entire-auto-industry/


Processing URLs:  81%|████████  | 808/1000 [29:49<01:30,  2.12it/s]

Error extracting text from http://www.barrons.com/articles/iphone-supplier-slump-set-to-end-1444675807?tesla=y&amp;mod=djemb_dr_h: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/iphone-supplier-slump-set-to-end-1444675807?tesla=y&amp;mod=djemb_dr_h


Processing URLs:  81%|████████  | 810/1000 [29:50<01:27,  2.16it/s]

Error extracting text from https://superforecasting.squarespace.com/: 404 Client Error: Not Found for url: https://superforecasting.squarespace.com/
Error extracting text from http://www.reuters.com/article/us-brazil-corruption-odebrecht-idUSKCN0Z71SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-odebrecht-idUSKCN0Z71SB


Processing URLs:  81%|████████▏ | 814/1000 [29:53<01:28,  2.10it/s]

URL filtered: http://www.bloomberg.com/news/articles/2014-05-19/u-s-said-to-charge-chinese-military-officers-with-online-spying
Error extracting text from http://m.ndtv.com/world-news/nawaz-sharif-to-decide-on-successor-to-pak-army-chief-raheel-sharif-report-1444188: 403 Client Error: Forbidden for url: https://www.ndtv.com/world-news/nawaz-sharif-to-decide-on-successor-to-pak-army-chief-raheel-sharif-report-1444188


Processing URLs:  82%|████████▏ | 816/1000 [29:55<02:11,  1.40it/s]

Error extracting text from http://www.iran-daily.com/News/130402.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  82%|████████▏ | 820/1000 [30:17<12:16,  4.09s/it]

Error extracting text from http://www.insightonconflict.org/2016/04/tackling-election-violence-and-hate-speech-in-hate-speech-in-burundi/: HTTPConnectionPool(host='www.insightonconflict.org', port=80): Max retries exceeded with url: /2016/04/tackling-election-violence-and-hate-speech-in-hate-speech-in-burundi/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302ae4a10>: Failed to resolve 'www.insightonconflict.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 823/1000 [30:20<05:52,  1.99s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-syria-vote-idUSKCN0XA0ZD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-syria-vote-idUSKCN0XA0ZD


Processing URLs:  82%|████████▎ | 825/1000 [30:20<03:38,  1.25s/it]

Error extracting text from https://www.amazon.com/Not-Afraid-Have-Sons-America/dp/0312285582#productDescription_secondary_view_div_1484793711526: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Not-Afraid-Have-Sons-America/dp/0312285582#productDescription_secondary_view_div_1484793711526


Processing URLs:  83%|████████▎ | 827/1000 [30:23<03:30,  1.22s/it]

Error extracting text from http://paktribune.com/news/Pakistan-and-Afghanistan-will-synchronise-their-polio-campaigns-together-275448.html: HTTPSConnectionPool(host='paktribune.com', port=443): Max retries exceeded with url: /news/Pakistan-and-Afghanistan-will-synchronise-their-polio-campaigns-together-275448.html (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'paktribune.com'. (_ssl.c:1000)")))


Processing URLs:  83%|████████▎ | 829/1000 [30:24<02:51,  1.01s/it]

Error extracting text from http://thehill.com/homenews/campaign/364172-moore-i-did-not-date-under-aged-women: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/364172-moore-i-did-not-date-under-aged-women/


Processing URLs:  83%|████████▎ | 833/1000 [30:30<03:42,  1.33s/it]

Error extracting text from http://www.38north.org/2017/07/jschilling070517/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  84%|████████▎ | 835/1000 [30:33<03:38,  1.33s/it]

Error extracting text from http://www.agweb.com/crops/corn/: 403 Client Error: Forbidden for url: http://www.agweb.com/crops/corn/


Processing URLs:  84%|████████▎ | 837/1000 [30:35<02:43,  1.00s/it]

Error extracting text from http://www.presidency.ucsb.edu/ws/?pid=65388: 404 Client Error: Not Found for url: https://www.presidency.ucsb.edu/ws?pid=65388
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/02/presidente-dilma-aceita-pedido-de-demissao-de-jose-eduardo-cardozo-4986338.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/02/presidente-dilma-aceita-pedido-de-demissao-de-jose-eduardo-cardozo-4986338.html&amp;prev=search
URL filtered: https://www.bloomberg.com/news/articles/2017-08-30/cancer-breakthrough-heralds-new-era-of-cures-costs-and-choices


Processing URLs:  84%|████████▍ | 839/1000 [30:37<02:35,  1.04it/s]

Error extracting text from http://www.cse.wustl.edu/~jain/cse571-14/ftp/cyber_espionage/#recent_attacks: HTTPSConnectionPool(host='www.cse.wustl.edu', port=443): Max retries exceeded with url: /~jain/cse571-14/ftp/cyber_espionage/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  84%|████████▍ | 843/1000 [30:45<04:15,  1.63s/it]

Error extracting text from http://postimg.org/image/8jql9pexb/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/8jql9pexb/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023dfa10>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▍ | 844/1000 [30:46<03:44,  1.44s/it]

Error extracting text from https://news.sky.com/story/covid-19-big-numbers-of-coronavirus-patients-in-hospital-after-christmas-due-to-phenomenal-omicron-spread-warns-whitty-12496558).: 404 Client Error: Not Found for url: https://news.sky.com/story/covid-19-big-numbers-of-coronavirus-patients-in-hospital-after-christmas-due-to-phenomenal-omicron-spread-warns-whitty-12496558).


Processing URLs:  85%|████████▍ | 847/1000 [31:49<47:44, 18.72s/it]

Error extracting text from http://www.newsobserver.com/news/local/coal-ash-issue/article79925257.html#storylink=cpy: HTTPConnectionPool(host='www.newsobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  85%|████████▍ | 848/1000 [31:51<35:03, 13.84s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-08/tillerson-ethics-plan-foreshadows-knotty-trump-confirmations


Processing URLs:  85%|████████▌ | 851/1000 [32:52<51:20, 20.67s/it]

Error extracting text from http://aa.com.tr/en/politics/iraqi-sunnis-reject-shia-role-in-planned-mosul-offensive/524995: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  85%|████████▌ | 853/1000 [32:54<28:13, 11.52s/it]

Error extracting text from http://www.reuters.com/article/2015/09/03/iran-usa-journalist-idUSL1N1192IW20150903: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/03/iran-usa-journalist-idUSL1N1192IW20150903


Processing URLs:  85%|████████▌ | 854/1000 [32:55<20:35,  8.46s/it]

Error extracting text from http://www.yourmiddleeast.com/news/many-months-before-start-of-battle-for-mosul-coalition_38488: 500 Server Error: Internal Server Error for url: http://www.yourmiddleeast.com/news/many-months-before-start-of-battle-for-mosul-coalition_38488


Processing URLs:  86%|████████▌ | 860/1000 [33:01<04:01,  1.73s/it]

Error extracting text from https://www.yahoo.com/news/french-fighter-jets-mission-against-mosul-063239733.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/french-fighter-jets-mission-against-mosul-063239733.html
Error extracting text from https://www.congress.gov/bill/117th-congress/senate-bill/610/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/117th-congress/senate-bill/610/text


Processing URLs:  86%|████████▌ | 862/1000 [33:07<05:47,  2.52s/it]

URL filtered: https://www.cbsnews.com/news/facebook-oversight-board-first-decisions/


Processing URLs:  87%|████████▋ | 868/1000 [33:14<02:52,  1.31s/it]

Error extracting text from http://www.nytimes.com/2017/07/05/opinion/kim-jong-un-north-korea-sanctions.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/07/05/opinion/kim-jong-un-north-korea-sanctions.html


Processing URLs:  87%|████████▋ | 869/1000 [33:15<02:19,  1.07s/it]

Error extracting text from https://www.predictit.org/: 403 Client Error: Forbidden for url: https://www.predictit.org/


Processing URLs:  87%|████████▋ | 870/1000 [33:15<01:48,  1.20it/s]

Error extracting text from https://fr.news.yahoo.com/russie-d%C3%A9ploy%C3%A9-28-avions-combat-syrie-172632243.html: 404 Client Error: Not Found for url: https://fr.news.yahoo.com/russie-d%C3%A9ploy%C3%A9-28-avions-combat-syrie-172632243.html


Processing URLs:  87%|████████▋ | 872/1000 [33:16<01:10,  1.81it/s]

Error extracting text from http://business.financialpost.com/news/mining/why-canadas-junior-mining-sector-is-going-to-pot-literally: 403 Client Error: Forbidden for url: https://financialpost.com/news/mining/why-canadas-junior-mining-sector-is-going-to-pot-literally
Error extracting text from http://www.nytimes.com/2016/01/05/nyregion/manhattan-apartment-prices-reached-1-15-million-mark-in-2015-reports-say.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/05/nyregion/manhattan-apartment-prices-reached-1-15-million-mark-in-2015-reports-say.html


Processing URLs:  87%|████████▋ | 873/1000 [33:17<01:22,  1.54it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53711#.VxWJK5MrIdU: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53711#.VxWJK5MrIdU


Processing URLs:  88%|████████▊ | 878/1000 [33:21<01:27,  1.40it/s]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/3944184296x0x929284/22C29259-6C19-41AC-9CAB-899D148F323D/TSLA_Update_Letter_2016_4Q.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/3944184296x0x929284/22C29259-6C19-41AC-9CAB-899D148F323D/TSLA_Update_Letter_2016_4Q.pdf


Processing URLs:  88%|████████▊ | 880/1000 [33:25<02:41,  1.34s/it]

Error extracting text from http://www.stripes.com/news/middle-east/un-syria-envoy-only-plan-b-to-talks-is-return-to-war-1.399141: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/middle_east/un-syria-envoy-only-plan-b-to-talks-is-return-to-war-1.399141


Processing URLs:  88%|████████▊ | 882/1000 [33:28<02:39,  1.35s/it]

Error extracting text from http://uk.reuters.com/article/2015/09/02/us-health-polio-ukraine-idUKKCN0R21FJ20150902: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  88%|████████▊ | 885/1000 [33:33<03:00,  1.57s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/toyota+mirai+owners+jump+future.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/toyota+mirai+owners+jump+future/


Processing URLs:  89%|████████▉ | 890/1000 [33:59<07:59,  4.36s/it]

URL filtered: https://www.youtube.com/watch?v=DHkkR_wM--c&amp;feature=youtu.be


Processing URLs:  89%|████████▉ | 893/1000 [33:59<03:16,  1.84s/it]

Error extracting text from https://www.wsj.com/articles/oil-price-forecasts-again-cut-by-banks-1501826400: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-price-forecasts-again-cut-by-banks-1501826400
Error extracting text from http://www.reuters.com/article/2015/10/30/us-northkorea-nuclear-idUSKCN0SO09N20151030: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/30/us-northkorea-nuclear-idUSKCN0SO09N20151030


Processing URLs:  89%|████████▉ | 894/1000 [33:59<02:27,  1.39s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/12/13/world/ap-cn-canada-marijuana-legalization.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/12/13/world/ap-cn-canada-marijuana-legalization.html?_r=0


Processing URLs:  90%|████████▉ | 896/1000 [34:01<01:51,  1.07s/it]

Error extracting text from http://europe.newsweek.com/nicola-sturgeon-scottish-independence-brexit-indyref-2-theresa-may-567103?utm_source=email&amp;utm_medium=newsletter&amp;utm_campaign=newsletter&amp;utm_content=headline&amp;spMailingID=1462592&amp;spUserID=MTI0NzM2MjYzMDMS1&amp;spJobID=750399981&amp;spReportId=NzUwMzk5OTgxS0: 403 Client Error: Forbidden for url: https://www.newsweek.com/nicola-sturgeon-scottish-independence-brexit-indyref-2-theresa-may-567103


Processing URLs:  90%|████████▉ | 899/1000 [34:03<01:14,  1.36it/s]

Error extracting text from http://m.thenational.ae/world/middle-east/tentative-push-for-mosul-reveals-iraq-armys-failings: 400 Client Error: Bad Request for url: http://m.thenational.ae/world/middle-east/tentative-push-for-mosul-reveals-iraq-armys-failings
Error extracting text from http://www.latimes.com/world/la-fg-north-korea-un-speech-20160923-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-north-korea-un-speech-20160923-snap-story.html


Processing URLs:  90%|█████████ | 900/1000 [34:04<01:10,  1.42it/s]

Error extracting text from https://www.debka.com/us-strike-russians-laid-euphrates-bridge-among-targets/ : HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /us-strike-russians-laid-euphrates-bridge-among-targets/%20 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  90%|█████████ | 902/1000 [34:07<01:47,  1.09s/it]

Error extracting text from http://thehill.com/policy/technology/366733-uber-closes-multi-billion-dollar-deal-with-softbankl: 403 Client Error: Forbidden for url: https://thehill.com/policy/technology/366733-uber-closes-multi-billion-dollar-deal-with-softbankl/


Processing URLs:  90%|█████████ | 903/1000 [34:09<02:14,  1.38s/it]

Error extracting text from http://www.superforecasting.com/phil-tetlock-and-dan-gardner-on-better-learning-through-better-betting/: HTTPSConnectionPool(host='www.superforecasting.com', port=443): Max retries exceeded with url: /phil-tetlock-and-dan-gardner-on-better-learning-through-better-betting/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  90%|█████████ | 905/1000 [34:11<01:27,  1.08it/s]

Error extracting text from http://www.wsj.com/articles/aid-deliveries-in-syria-to-begin-relief-agencies-say-1455715626: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/aid-deliveries-in-syria-to-begin-relief-agencies-say-1455715626


Processing URLs:  91%|█████████ | 906/1000 [34:11<01:07,  1.39it/s]

Error extracting text from https://www.nytimes.com/2017/02/20/world/middleeast/russia-syria-war.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/20/world/middleeast/russia-syria-war.html?_r=0


Processing URLs:  91%|█████████ | 907/1000 [34:12<01:29,  1.04it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-04/north-korea-claims-successful-intercontinental-missile-launch


Processing URLs:  91%|█████████ | 911/1000 [34:19<02:19,  1.57s/it]

Error extracting text from http://www.stripes.com/news/pacific/storm-clouds-gather-over-south-china-sea-ahead-of-key-un-ruling-1.406604: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/asia_pacific/storm-clouds-gather-over-south-china-sea-ahead-of-key-un-ruling-1.406604


Processing URLs:  91%|█████████ | 912/1000 [34:19<01:53,  1.29s/it]

Error extracting text from https://www.un.org/en/ga/about/ropga/credent.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/about/ropga/credent.shtml


Processing URLs:  92%|█████████▏| 915/1000 [34:25<02:06,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN113064?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN113064?il=0


Processing URLs:  92%|█████████▏| 917/1000 [34:30<02:59,  2.17s/it]

Error extracting text from http://www.opec.org/opec_web/en/1952.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/1952.htm


Processing URLs:  92%|█████████▏| 924/1000 [34:41<02:59,  2.37s/it]

URL filtered: https://twitter.com/Angry_Staffer/status/1503051965944766471


Processing URLs:  93%|█████████▎| 927/1000 [34:41<01:21,  1.11s/it]

Error extracting text from http://www.wsj.com/articles/pope-francis-welcomes-irans-president-to-the-vatican-1453808307: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pope-francis-welcomes-irans-president-to-the-vatican-1453808307


Processing URLs:  93%|█████████▎| 929/1000 [34:44<01:33,  1.31s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3736576/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3736576/


Processing URLs:  93%|█████████▎| 931/1000 [34:46<01:08,  1.00it/s]

Error extracting text from http://www.goalsys.com/books/documents/DESTRUCTION_AND_CREATION.pdf: 404 Client Error: Not Found for url: http://www.goalsys.com/books/documents/DESTRUCTION_AND_CREATION.pdf


Processing URLs:  93%|█████████▎| 932/1000 [34:46<01:00,  1.12it/s]

Error extracting text from https://thehill.com/policy/healthcare/528247-fauci-says-us-could-have-heard-immunity-by-end-of-summer-2021: 403 Client Error: Forbidden for url: https://thehill.com/policy/healthcare/528247-fauci-says-us-could-have-heard-immunity-by-end-of-summer-2021/


Processing URLs:  94%|█████████▎| 936/1000 [34:53<01:19,  1.24s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN15E2SW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN15E2SW?il=0


Processing URLs:  94%|█████████▎| 937/1000 [34:56<01:45,  1.67s/it]

Error extracting text from http://timesofindia.indiatimes.com/world/us/UN-council-endorses-Syria-peace-plan-no-agreement-on-Assads-fate/articleshow/50240665.cms: 410 Client Error: Gone for url: https://timesofindia.indiatimes.com/world/us/UN-council-endorses-Syria-peace-plan-no-agreement-on-Assads-fate/articleshow/50240665.cms


Processing URLs:  94%|█████████▍| 938/1000 [34:56<01:16,  1.24s/it]

Error extracting text from http://www.efe.com/efe/english/world/brazilian-gov-t-says-no-chance-rousseff-will-resign/50000262-2791107: 404 Client Error: Not Found for url: https://efe.com/efe/english/world/brazilian-gov-t-says-no-chance-rousseff-will-resign/50000262-2791107


Processing URLs:  94%|█████████▍| 943/1000 [35:06<01:45,  1.85s/it]

Error extracting text from http://www.al-monitor.com/pulse/security/2015/11/syria-army-aleppo-offensive-damascus-road.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/security/2015/11/syria-army-aleppo-offensive-damascus-road.html


Processing URLs:  94%|█████████▍| 944/1000 [35:06<01:17,  1.38s/it]

Error extracting text from http://www.wsj.com/articles/iraqi-forces-pause-mosul-advance-1476783097: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-forces-pause-mosul-advance-1476783097


Processing URLs:  94%|█████████▍| 945/1000 [35:09<01:36,  1.76s/it]

Error extracting text from http://elcomercio.pe/politica/gobierno/ipsos-keiko-primera-321-y-ppk-segundo-160-noticia-1889659: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/gobierno/ipsos-keiko-primera-321-y-ppk-segundo-160-noticia-1889659/


Processing URLs:  95%|█████████▍| 948/1000 [35:13<01:14,  1.43s/it]

Error extracting text from https://cleantechnica.com/2020/11/30/renewables-70-of-new-us-power-capacity-in-2020-solar-43/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2020/11/30/renewables-70-of-new-us-power-capacity-in-2020-solar-43/


Processing URLs:  95%|█████████▌| 951/1000 [35:16<00:49,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/04/07/us/politics/neil-gorsuch-supreme-court.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/07/us/politics/neil-gorsuch-supreme-court.html?_r=0


Processing URLs:  95%|█████████▌| 952/1000 [35:17<00:55,  1.15s/it]

Error extracting text from http://inhomelandsecurity.com/navy-warships-get-new-heavy-missile-2500-lb-lrasm/: 403 Client Error: Forbidden for url: https://amuedge.com/navy-warships-get-new-heavy-missile-2500-lb-lrasm/


Processing URLs:  95%|█████████▌| 954/1000 [35:19<00:42,  1.08it/s]

Error extracting text from https://majorriskfactors.substack.com/p/trouble-in-transnistria-04may22?s=w: 404 Client Error: Not Found for url: https://majorriskfactors.substack.com/p/trouble-in-transnistria-04may22?s=w


Processing URLs:  96%|█████████▌| 955/1000 [35:20<00:45,  1.02s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-genocide-being-prepared-un-will-be-too-late-1552054: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-genocide-being-prepared-un-will-be-too-late-1552054


Processing URLs:  96%|█████████▌| 957/1000 [35:24<01:05,  1.53s/it]

URL filtered: https://twitter.com/earlywarnproj
Error extracting text from http://www.portman.senate.gov/public/index.cfm/press-releases?ID=1382DD19-175A-42A8-B7B4-4AA58B036628: HTTPConnectionPool(host='www.portman.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/press-releases?ID=1382DD19-175A-42A8-B7B4-4AA58B036628 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3d8a10>: Failed to resolve 'www.portman.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  96%|█████████▌| 961/1000 [35:26<00:31,  1.22it/s]

Error extracting text from https://www.senate.gov/legislative/nominations/SupremeCourtNominations1789present.htm).: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/nominations/SupremeCourtNominations1789present.htm).


Processing URLs:  96%|█████████▋| 964/1000 [35:29<00:42,  1.18s/it]

Error extracting text from https://af.reuters.com/article/worldNews/idAFKBN1FD20S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  97%|█████████▋| 966/1000 [35:31<00:33,  1.01it/s]

Error extracting text from https://sinocism.com/wang-qishan-reappears-xi-purging-generals-north-korea-crisis-to-worsen-xi-to-be-supreme-leader-最高领袖-fake-china-news-sinocism-09-05-17/: 404 Client Error: Not Found for url: https://sinocism.com/wang-qishan-reappears-xi-purging-generals-north-korea-crisis-to-worsen-xi-to-be-supreme-leader-%E6%9C%80%E9%AB%98%E9%A2%86%E8%A2%96-fake-china-news-sinocism-09-05-17
Error extracting text from https://www.timesofisrael.com/netanyahu-any-gaza-rocket-fire-will-be-met-with-whole-new-level-of-force/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/netanyahu-any-gaza-rocket-fire-will-be-met-with-whole-new-level-of-force/


Processing URLs:  97%|█████████▋| 967/1000 [35:33<00:36,  1.12s/it]

Error extracting text from http://europe.newsweek.com/brexit-second-referendum-nigel-farage-460781: 403 Client Error: Forbidden for url: https://www.newsweek.com/brexit-second-referendum-nigel-farage-460781


Processing URLs:  97%|█████████▋| 974/1000 [35:48<00:55,  2.15s/it]

Error extracting text from http://www.hybridcars.com/toyota-mirai-goes-on-sale-with-2000-preorders/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/toyota-mirai-goes-on-sale-with-2000-preorders/


Processing URLs:  98%|█████████▊| 976/1000 [35:49<00:29,  1.25s/it]

Error extracting text from http://www.nextev.com/en/node/39: HTTPConnectionPool(host='www.nextev.com', port=80): Max retries exceeded with url: /en/node/39 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30451c9e0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/us-usa-stocks-infrastructure-idUSKBN16T38F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-infrastructure-idUSKBN16T38F


Processing URLs:  98%|█████████▊| 978/1000 [35:51<00:23,  1.09s/it]

Error extracting text from http://thehill.com/homenews/senate/363724-gop-senator-on-franken-and-moore-theres-a-difference-between-a-14-year-old: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/363724-gop-senator-on-franken-and-moore-theres-a-difference-between-a-14-year-old/


Processing URLs:  98%|█████████▊| 982/1000 [36:00<00:26,  1.48s/it]

Error extracting text from https://www.reuters.com/article/us-usa-hyperloop-musk/elon-musk-to-compete-to-fund-high-speed-loop-in-chicago-idUSKBN1DU0CF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-hyperloop-musk/elon-musk-to-compete-to-fund-high-speed-loop-in-chicago-idUSKBN1DU0CF


Processing URLs:  99%|█████████▉| 989/1000 [36:32<00:48,  4.43s/it]

Error extracting text from https://beincrypto.com/libra-diem-is-getting-back-on-track-for-2021/: 403 Client Error: Forbidden for url: https://beincrypto.com/libra-diem-is-getting-back-on-track-for-2021/
Error extracting text from http://news.yahoo.com/panama-aims-end-june-much-delayed-canal-expansion-001713762.html: 404 Client Error: Not Found for url: http://news.yahoo.com/panama-aims-end-june-much-delayed-canal-expansion-001713762.html


Processing URLs:  99%|█████████▉| 991/1000 [36:34<00:22,  2.45s/it]

Error extracting text from https://www.nytimes.com/2021/08/12/us/politics/taliban-afghanistan-us-embassy.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/12/us/politics/taliban-afghanistan-us-embassy.html


Processing URLs:  99%|█████████▉| 994/1000 [36:37<00:09,  1.59s/it]

Error extracting text from http://www.tradingeconomics.com/saudi-arabia/foreign-exchange-reserves: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/saudi-arabia/foreign-exchange-reserves


Processing URLs: 100%|█████████▉| 996/1000 [36:42<00:07,  2.00s/it]

Error extracting text from http://www.cnbc.com/2015/10/23/trump-blames-young-intern-for-retweet-that-riled-up-iowans.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/10/23/trump-blames-young-intern-for-retweet-that-riled-up-iowans.html


Processing URLs: 100%|██████████| 1000/1000 [36:47<00:00,  2.21s/it]
Processing URLs:   0%|          | 1/1000 [00:03<1:01:01,  3.67s/it]

URL filtered: https://www.youtube.com/watch?v=DhOvYLOkw4c


Processing URLs:   0%|          | 5/1000 [00:10<31:50,  1.92s/it]  

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20161103/0905200296.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20161103/0905200296.html


Processing URLs:   1%|          | 8/1000 [00:15<29:53,  1.81s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/25358-fifth-quadrilateral-meeting-held-in-islamabad: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/25358-fifth-quadrilateral-meeting-held-in-islamabad


Processing URLs:   1%|          | 10/1000 [00:16<19:26,  1.18s/it]

Error extracting text from http://theresurgent.com/trump-and-cruz-are-neck-and-neck-no-one-else-has-a-chance/: 530 Server Error:  for url: https://theresurgent.com/trump-and-cruz-are-neck-and-neck-no-one-else-has-a-chance/


Processing URLs:   2%|▏         | 17/1000 [00:25<23:44,  1.45s/it]

Error extracting text from http://europe.newsweek.com/election-turnout-great-uncertainty-brexit-461987?spMailingID=426318&amp;spUserID=MTI0NzI1NTA5ODES1&amp;spJobID=550132621&amp;spReportId=NTUwMTMyNjIxS0: 403 Client Error: Forbidden for url: https://www.newsweek.com/election-turnout-great-uncertainty-brexit-461987


Processing URLs:   2%|▏         | 20/1000 [00:27<13:48,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-gulf-qatar-idUSKBN18O0HA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-gulf-qatar-idUSKBN18O0HA
Error extracting text from https://www.nbcnews.com/science/science-news/lab-leak-theory-science-scientists-rcna1191: 403 Client Error: Forbidden for url: https://www.nbcnews.com/science/science-news/lab-leak-theory-science-scientists-rcna1191


Processing URLs:   2%|▏         | 23/1000 [00:32<20:01,  1.23s/it]

Error extracting text from http://www.adweek.com/news/press/time-inc-reorganizes-4-distinct-brand-groups-172639: 403 Client Error: Forbidden for url: https://www.adweek.com/news/press/time-inc-reorganizes-4-distinct-brand-groups-172639


Processing URLs:   2%|▏         | 24/1000 [00:32<16:38,  1.02s/it]

Error extracting text from http://www.arirang.co.kr/News/News_View.asp?nseq=186514: 404 Client Error:  for url: http://www.arirang.co.kr/News/News_View.asp?nseq=186514
URL filtered: https://twitter.com/elonmusk/status/1345208391958888448?s=20


Processing URLs:   3%|▎         | 28/1000 [00:36<15:25,  1.05it/s]

Error extracting text from http://www.nasdaq.com/article/brexit-poll-conspiracy-cm626239: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/brexit-poll-conspiracy-cm626239


Processing URLs:   3%|▎         | 31/1000 [00:39<16:49,  1.04s/it]

Error extracting text from http://www.ibtimes.com/amid-chinese-military-aggression-russia-delivers-submarine-vietnam-defense-over-south-1989783: 403 Client Error: Forbidden for url: https://www.ibtimes.com/amid-chinese-military-aggression-russia-delivers-submarine-vietnam-defense-over-south-1989783


Processing URLs:   3%|▎         | 32/1000 [00:40<15:03,  1.07it/s]

Error extracting text from http://www.iran-daily.com/News/133536.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:   3%|▎         | 33/1000 [00:41<15:40,  1.03it/s]

Error extracting text from http://www.nhregister.com/article/NH/20160204/NEWS/160209810: 403 Client Error: Forbidden for url: https://www.nhregister.com/article/NH/20160204/NEWS/160209810


Processing URLs:   3%|▎         | 34/1000 [00:42<18:19,  1.14s/it]

Error extracting text from http://www.aol.com/article/2015/12/10/donald-trump-just-took-his-most-commanding-lead-yet-in-a-new-pol/21281559/?icid=maing-fluid%7Camp-bon%7Cdl1%7Csec1_lnk2%26pLid%3D-1620624431: 404 Client Error: Not Found for url: https://www.aol.com/article/2015/12/10/donald-trump-just-took-his-most-commanding-lead-yet-in-a-new-pol/21281559/?icid=maing-fluid%7Camp-bon%7Cdl1%7Csec1_lnk2%26pLid%3D-1620624431


Processing URLs:   4%|▎         | 36/1000 [00:43<10:43,  1.50it/s]

Error extracting text from https://www.nytimes.com/2021/07/04/technology/tech-cold-war-chips.html?referringSource=articleShare: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/04/technology/tech-cold-war-chips.html?referringSource=articleShare
Error extracting text from https://medium.com/waymo/michigan-is-waymos-winter-wonderland-9b3cffbb9bab: 403 Client Error: Forbidden for url: https://medium.com/waymo/michigan-is-waymos-winter-wonderland-9b3cffbb9bab


Processing URLs:   4%|▍         | 43/1000 [02:51<6:52:05, 25.84s/it]

Error extracting text from https://www.sensecy.com/: HTTPSConnectionPool(host='www.sensecy.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x302f18290>, 'Connection to www.sensecy.com timed out. (connect timeout=60)'))
Error extracting text from https://trends.google.com/trends/explore?date=all&q=Effective%20Altruism: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?date=all&q=Effective%20Altruism


Processing URLs:   5%|▍         | 46/1000 [02:55<2:34:50,  9.74s/it]

Error extracting text from http://emarketalerts.forecast1.com/mic/eabstract.cfm: HTTPConnectionPool(host='emarketalerts.forecast1.com', port=80): Max retries exceeded with url: /mic/eabstract.cfm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302f18f80>: Failed to resolve 'emarketalerts.forecast1.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▍         | 47/1000 [03:56<6:40:23, 25.21s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220825-iran-draws-red-line-on-missile-program: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220825-iran-draws-red-line-on-missile-program (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x302f18860>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:   5%|▍         | 48/1000 [03:58<4:46:58, 18.09s/it]

Error extracting text from http://www.newsweek.com/iranian-president-rouhani-visit-france-november-report-372684: 403 Client Error: Forbidden for url: https://www.newsweek.com/iranian-president-rouhani-visit-france-november-report-372684


Processing URLs:   5%|▌         | 51/1000 [04:01<1:48:19,  6.85s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-goldmansachs-congress-idUSKBN18Q2ND: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-goldmansachs-congress-idUSKBN18Q2ND
Error extracting text from https://www.reuters.com/article/us-usa-trump-dhs/trump-to-nominate-nielsen-as-homeland-security-secretary-official-idUSKBN1CG2EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-dhs/trump-to-nominate-nielsen-as-homeland-security-secretary-official-idUSKBN1CG2EH


Processing URLs:   6%|▌         | 55/1000 [04:04<39:26,  2.50s/it]  

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0XQ1QW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0XQ1QW


Processing URLs:   6%|▌         | 56/1000 [04:04<29:50,  1.90s/it]

Error extracting text from https://www.nytimes.com/2015/03/18/world/asia/afghan-militia-leaders-empowered-by-us-to-fight-taliban-inspire-fear-in-villages.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/03/18/world/asia/afghan-militia-leaders-empowered-by-us-to-fight-taliban-inspire-fear-in-villages.html


Processing URLs:   6%|▌         | 59/1000 [04:10<28:33,  1.82s/it]

URL filtered: https://twitter.com/esa_webb/status/1472197587985911809
URL filtered: https://twitter.com/ESA_Webb/status/1469970396015476738


Processing URLs:   7%|▋         | 67/1000 [04:17<15:39,  1.01s/it]

Error extracting text from http://www.rferl.org/content/russia-rt-tv-cluster-bombs-syria/27809812.html: 403 Client Error: Forbidden for url: http://www.rferl.org/content/russia-rt-tv-cluster-bombs-syria/27809812.html


Processing URLs:   7%|▋         | 68/1000 [04:20<20:45,  1.34s/it]

Error extracting text from http://uk.reuters.com/article/venezuela-pdvsa-debt-idUKL1N1CS00F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   7%|▋         | 74/1000 [04:35<31:10,  2.02s/it]

Error extracting text from https://thehill.com/homenews/campaign/529476-fewer-than-one-quarter-of-republicans-trust-election-results-poll: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/529476-fewer-than-one-quarter-of-republicans-trust-election-results-poll/


Processing URLs:   8%|▊         | 78/1000 [04:40<21:52,  1.42s/it]

Error extracting text from https://www.nytimes.com/2018/01/26/opinion/sunday/united-states-afghanistan-win.html?rref=collection%2Fsectioncollection%2Fopinion-contributors&amp;action=click&amp;contentCo: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/26/opinion/sunday/united-states-afghanistan-win.html?rref=collection%2Fsectioncollection%2Fopinion-contributors&amp;action=click&amp;contentCo


Processing URLs:   8%|▊         | 79/1000 [04:42<24:04,  1.57s/it]

Error extracting text from http://afkinsider.com/116544/opinion-rising-instability-calls-question-ethiopia-rising-status/: 403 Client Error: Forbidden for url: http://afkinsider.com/116544/opinion-rising-instability-calls-question-ethiopia-rising-status/


Processing URLs:   8%|▊         | 84/1000 [04:51<25:52,  1.69s/it]



Processing URLs:   9%|▊         | 86/1000 [04:54<25:06,  1.65s/it]

Error extracting text from https://news.antiwar.com/2022/04/01/un-says-two-month-yemen-ceasefire-agree-upon-will-ease-blockade/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2022/04/01/un-says-two-month-yemen-ceasefire-agree-upon-will-ease-blockade/


Processing URLs:   9%|▉         | 89/1000 [05:00<28:55,  1.91s/it]

Error extracting text from http://www.cfr.org/africa-sub-saharan/sub-saharan-security-tracker/p37884: 403 Client Error: Forbidden for url: https://www.cfr.org/africa-sub-saharan/sub-saharan-security-tracker/p37884


Processing URLs:   9%|▉         | 92/1000 [05:05<23:38,  1.56s/it]



Processing URLs:  10%|▉         | 95/1000 [05:10<22:41,  1.50s/it]

Error extracting text from https://taas.technology/abstracts: 500 Server Error: Internal Server Error for url: https://taas.technology/abstracts
Error extracting text from http://www.reuters.com/article/us-tesla-assemblyline-idUSKBN17Q0DE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-assemblyline-idUSKBN17Q0DE
Error extracting text from http://www.rferl.org/content/russia-successful-icbm-test/27422697.html: 403 Client Error: Forbidden for url: http://www.rferl.org/content/russia-successful-icbm-test/27422697.html


Processing URLs:  10%|█         | 104/1000 [05:30<22:16,  1.49s/it]

Error extracting text from https://www.wsj.com/articles/scientists-discover-new-state-of-matter-dubbed-time-crystals-1488998924?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/scientists-discover-new-state-of-matter-dubbed-time-crystals-1488998924?mod=e2fb


Processing URLs:  11%|█         | 106/1000 [05:49<1:07:37,  4.54s/it]

Error extracting text from http://www.consilium.europa.eu/en/policies/sanctions/different-types/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/policies/sanctions/different-types/


Processing URLs:  11%|█         | 109/1000 [05:51<29:41,  2.00s/it]  

Error extracting text from http://thehill.com/homenews/administration/363333-trump-plugs-roy-moore-in-front-of-jeff-flake: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/363333-trump-plugs-roy-moore-in-front-of-jeff-flake/


Processing URLs:  11%|█         | 112/1000 [05:54<19:21,  1.31s/it]

Error extracting text from https://www.google.com/amp/s/www.wionews.com/world/erdogans-turkey-spends-beyond-its-means-as-lira-loses-value-351892/amp: 403 Client Error: Forbidden for url: https://www.wionews.com/world/erdogans-turkey-spends-beyond-its-means-as-lira-loses-value-351892/amp


Processing URLs:  11%|█▏        | 113/1000 [05:58<31:57,  2.16s/it]

Error extracting text from http://www.rferl.org/content/afghanistan-new-interior-minister-attorney-general/27664022.html: 403 Client Error: Forbidden for url: http://www.rferl.org/content/afghanistan-new-interior-minister-attorney-general/27664022.html


Processing URLs:  12%|█▏        | 115/1000 [05:58<18:39,  1.27s/it]

Error extracting text from https://www.newsletter.co.uk/news/politics/sam-mcbride-deep-instability-now-looms-in-northern-ireland-and-the-dup-civil-war-is-central-to-it-3269687: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/politics/sam-mcbride-deep-instability-now-looms-in-northern-ireland-and-the-dup-civil-war-is-central-to-it-3269687


Processing URLs:  12%|█▏        | 116/1000 [05:59<17:16,  1.17s/it]

URL filtered: https://twitter.com/realdonaldtrump/status/948194400114487296


Processing URLs:  12%|█▏        | 119/1000 [06:01<12:05,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN1AE0RW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN1AE0RW


Processing URLs:  12%|█▏        | 122/1000 [06:15<34:53,  2.38s/it]

Error extracting text from http://www.timesofisrael.com/iran-reformists-hold-rally-as-election-campaigns-kick-off/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iran-reformists-hold-rally-as-election-campaigns-kick-off/


Processing URLs:  13%|█▎        | 127/1000 [06:21<21:11,  1.46s/it]

Error extracting text from https://regfollower.com/2016/12/27/the-9th-round-of-china-gcc-fta-negotiation-concluded/: 406 Client Error: Not Acceptable for url: https://regfollower.com/2016/12/27/the-9th-round-of-china-gcc-fta-negotiation-concluded/
URL filtered: https://www.bloomberg.com/politics/articles/2017-04-18/theresa-may-seeks-snap-u-k-elections-on-june-8


Processing URLs:  13%|█▎        | 131/1000 [06:24<13:38,  1.06it/s]

Error extracting text from http://gazettereview.com/2016/01/volkswagen-ag-adr-otcmktsvlkay-sees-sales-decline-in-u-s-for-december/: 403 Client Error: Forbidden for url: http://gazettereview.com/2016/01/volkswagen-ag-adr-otcmktsvlkay-sees-sales-decline-in-u-s-for-december/
URL filtered: https://www.youtube.com/watch?v=4ECvRtSEUEs


Processing URLs:  13%|█▎        | 134/1000 [06:27<11:23,  1.27it/s]

Error extracting text from http://www.nytimes.com/2015/11/18/world/europe/russia-plane-crash-bomb.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/18/world/europe/russia-plane-crash-bomb.html


Processing URLs:  14%|█▎        | 137/1000 [06:29<12:11,  1.18it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53091#.Vr7CYJMrIdU: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53091#.Vr7CYJMrIdU


Processing URLs:  14%|█▍        | 140/1000 [06:34<17:16,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN17F2TO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN17F2TO


Processing URLs:  14%|█▍        | 142/1000 [06:38<24:48,  1.73s/it]

Error extracting text from http://nation.com.pk/international/24-Jun-2016/kim-says-new-missile-can-strike-us-pacific-bases: 503 Server Error: Backend fetch failed for url: https://www.nation.com.pk/international/24-Jun-2016/kim-says-new-missile-can-strike-us-pacific-bases


Processing URLs:  15%|█▍        | 149/1000 [06:53<41:08,  2.90s/it]

Error extracting text from http://www.ntnews.com.au/news/breaking-news/pm-at-white-house-for-talks-with-obama/news-story/14849e236d312f9fd5965e6b757eef03: 404 Client Error: Not Found for url: https://www.ntnews.com.au/404.php


Processing URLs:  15%|█▌        | 150/1000 [06:55<36:44,  2.59s/it]

URL filtered: https://www.youtube.com/watch?v=ETN9eNOA6vw


Processing URLs:  16%|█▌        | 155/1000 [06:58<14:34,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-indonesia-idUSKCN0Z60LU?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-indonesia-idUSKCN0Z60LU?il=0


Processing URLs:  16%|█▌        | 156/1000 [07:01<19:47,  1.41s/it]

Error extracting text from http://tass.ru/en/politics/845931: 404 Client Error: Not Found for url: https://tass.ru/en/politics/845931


Processing URLs:  16%|█▌        | 158/1000 [07:02<14:34,  1.04s/it]

Error extracting text from http://www.newsweek.com/us-losing-so-badly-afghanistan-trump-administration-hid-figures-796466: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-losing-so-badly-afghanistan-trump-administration-hid-figures-796466
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN0TS0H420151209: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN0TS0H420151209


Processing URLs:  16%|█▌        | 159/1000 [07:04<16:23,  1.17s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-nireland-idUKKBN16K28E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  16%|█▌        | 161/1000 [07:09<23:05,  1.65s/it]

Error extracting text from http://theiranproject.com/blog/2016/04/28/anger-rises-ruling-iran-assets/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=anger-rises-ruling-iran-assets
Error extracting text from http://www.reuters.com/article/2015/08/12/us-iea-oil-idUSKCN0QH0VB20150812: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/08/12/us-iea-oil-idUSKCN0QH0VB20150812


Processing URLs:  16%|█▋        | 165/1000 [07:23<40:22,  2.90s/it]

Error extracting text from http://www.transparency.org/news/pressrelease/panama_papers_expose_uk_role_in_global_corruption: 404 Client Error: Not Found for url: https://www.transparency.org/en/en/press/panama-papers-expose-uk-role-in-global-corruption


Processing URLs:  17%|█▋        | 166/1000 [07:24<29:31,  2.12s/it]

Error extracting text from http://www.wsj.com/articles/will-trump-drop-out-1459964740: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/will-trump-drop-out-1459964740


Processing URLs:  17%|█▋        | 170/1000 [07:27<17:00,  1.23s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-22/russia-retailers-run-out-of-iphones-as-buyers-eye-ruble-plunge?cmpid=wsdemand


Processing URLs:  18%|█▊        | 176/1000 [07:37<23:10,  1.69s/it]

Error extracting text from http://www.holyblasphemy.net/church-files-lawsuit-against-jk-rowling-harry-potter-plagiarizes-bible/: 403 Client Error: Forbidden for url: http://www.holyblasphemy.net/church-files-lawsuit-against-jk-rowling-harry-potter-plagiarizes-bible/
URL filtered: https://www.youtube.com/watch?v=iDYpRhoZqBY


Processing URLs:  18%|█▊        | 178/1000 [07:37<13:48,  1.01s/it]

Error extracting text from http://www.opec.org/opec_web/en/publications/338.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/publications/338.htm


Processing URLs:  18%|█▊        | 179/1000 [07:39<14:19,  1.05s/it]

Error extracting text from https://www.gisreportsonline.com/the-risks-of-german-unilateralism-on-nord-stream-2,energy,2213.html: 404 Client Error: Not Found for url: https://www.gisreportsonline.com/the-risks-of-german-unilateralism-on-nord-stream-2,energy,2213.html


Processing URLs:  18%|█▊        | 182/1000 [07:50<35:09,  2.58s/it]

Error extracting text from http://www.reuters.com/article/us-usa-economy-nyfed-idUSKCN0ZO1UJ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-economy-nyfed-idUSKCN0ZO1UJ?il=0


Processing URLs:  18%|█▊        | 183/1000 [07:52<34:51,  2.56s/it]

URL filtered: https://twitter.com/realDonaldTrump


Processing URLs:  18%|█▊        | 185/1000 [08:02<47:46,  3.52s/it]

URL filtered: https://www.facebook.com/zuck/posts/10112681480907401


Processing URLs:  19%|█▉        | 188/1000 [08:03<24:52,  1.84s/it]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/24/us-volunteers-set-up-field-hospitals-to-treat-kurd/?page=all: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/24/us-volunteers-set-up-field-hospitals-to-treat-kurd/?page=all


Processing URLs:  19%|█▉        | 190/1000 [08:06<24:08,  1.79s/it]

Error extracting text from https://www.reuters.com/article/saudi-finances/update-1-saudi-arabia-to-push-back-balanced-budget-goal-to-2023-sources-say-idUSL8N1N84M7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/saudi-finances/update-1-saudi-arabia-to-push-back-balanced-budget-goal-to-2023-sources-say-idUSL8N1N84M7


Processing URLs:  19%|█▉        | 192/1000 [08:07<17:25,  1.29s/it]

Error extracting text from http://asia.nikkei.com/Japan-Update/Abe-agrees-with-Putin-to-arrange-unofficial-visit-to-Russia: 404 Client Error: Not Found for url: https://asia.nikkei.com/Japan-Update/Abe-agrees-with-Putin-to-arrange-unofficial-visit-to-Russia


Processing URLs:  19%|█▉        | 194/1000 [08:10<17:08,  1.28s/it]

Error extracting text from http://thehill.com/policy/energy-environment/249804-senators-vote-to-lift-oil-export-ban: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/249804-senators-vote-to-lift-oil-export-ban/


Processing URLs:  20%|█▉        | 199/1000 [08:30<37:34,  2.82s/it]

Error extracting text from https://www.theregister.co.uk/2017/09/25/showtime_hit_with_coinmining_script/: 403 Client Error: Forbidden for url: https://www.theregister.com/2017/09/25/showtime_hit_with_coinmining_script/


Processing URLs:  20%|██        | 200/1000 [08:34<40:51,  3.06s/it]

URL filtered: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/playlists


Processing URLs:  21%|██        | 206/1000 [08:42<22:07,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-un-idUSKBN17Y2AG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-un-idUSKBN17Y2AG


Processing URLs:  21%|██        | 211/1000 [09:06<42:57,  3.27s/it]  

Error extracting text from https://www.yardeni.com/pub/sp500corrbeartables.pdf: 403 Client Error: Forbidden for url: https://yardeni.com/our-charts/


Processing URLs:  21%|██▏       | 213/1000 [09:08<29:57,  2.28s/it]

Error extracting text from http://www.timesofisrael.com/paris-peace-parley-proceeding-as-planned-france-insists/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/paris-peace-parley-proceeding-as-planned-france-insists/


Processing URLs:  22%|██▏       | 217/1000 [09:32<1:23:30,  6.40s/it]

Error extracting text from https://flipboard.com/article/putin-s-probably-not-going-in-january-but-that-won-t-stop-the-rumour-mill/a-KqKa64UETfSCqWkqVJfybA%3Aa%3A3199678-b9862c9866%2Fco.uk: 404 Client Error: Not Found for url: https://flipboard.com/article/putin-s-probably-not-going-in-january-but-that-won-t-stop-the-rumour-mill/a-KqKa64UETfSCqWkqVJfybA%3Aa%3A3199678-b9862c9866%2Fco.uk


Processing URLs:  22%|██▏       | 219/1000 [09:35<51:40,  3.97s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/draghi-says-ecb-to-extend-qe-by-six-months-with-broader-buying


Processing URLs:  22%|██▏       | 222/1000 [09:36<23:59,  1.85s/it]

Error extracting text from http://mobile.reuters.com/article/newsOne/idUSKCN11T0XI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/newsOne/idUSKCN11T0XI
Error extracting text from http://www.oneindia.com/india/38-attacks-on-indian-army-is-india-doing-enough-to-stop-pak-offensive-2419266.html: 403 Client Error: Forbidden for url: http://www.oneindia.com/india/38-attacks-on-indian-army-is-india-doing-enough-to-stop-pak-offensive-2419266.html


Processing URLs:  23%|██▎       | 229/1000 [09:48<22:20,  1.74s/it]

Error extracting text from https://www.rferl.org/a/us-military-chief-dunford-says-recommends-providing-ukraine-lethal-defensive-aid/28759423.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/us-military-chief-dunford-says-recommends-providing-ukraine-lethal-defensive-aid/28759423.html


Processing URLs:  23%|██▎       | 232/1000 [09:50<13:35,  1.06s/it]

Error extracting text from https://www.reuters.com/business/energy/us-shale-restraint-pushes-oil-prices-multi-year-high-kemp-2021-06-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/us-shale-restraint-pushes-oil-prices-multi-year-high-kemp-2021-06-04/
Error extracting text from http://www.nbcnews.com/politics/first-read/gop-health-care-effort-unraveling-n733761: 403 Client Error: Forbidden for url: http://www.nbcnews.com/politics/first-read/gop-health-care-effort-unraveling-n733761


Processing URLs:  23%|██▎       | 234/1000 [09:54<16:50,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-model-idUSKCN0WY3JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-model-idUSKCN0WY3JK


Processing URLs:  24%|██▎       | 235/1000 [09:55<14:42,  1.15s/it]

Error extracting text from http://www.baltimoresun.com/news/maryland/crime/bs-md-ci-syed-hearing-day-three-20160205-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/news/maryland/crime/bs-md-ci-syed-hearing-day-three-20160205-story.html
URL filtered: https://twitter.com/CJTFOIR/status/804354586995806208


Processing URLs:  24%|██▍       | 238/1000 [09:57<12:31,  1.01it/s]

Error extracting text from http://af.reuters.com/article/drcNews/idAFL2N15J24G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  24%|██▍       | 239/1000 [09:58<10:21,  1.23it/s]

Error extracting text from http://www.wsj.com/articles/u-s-sanctions-iranian-defense-firms-revolutionary-guard-units-for-missile-tests-1458837204: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-sanctions-iranian-defense-firms-revolutionary-guard-units-for-missile-tests-1458837204
Error extracting text from http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=3y: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=3y


Processing URLs:  24%|██▍       | 242/1000 [10:00<10:02,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-juncker-idUSKBN15Q0J4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-juncker-idUSKBN15Q0J4


Processing URLs:  24%|██▍       | 244/1000 [10:01<08:49,  1.43it/s]

Error extracting text from https://www.google.ch/amp/s/www.wsj.com/amp/articles/gm-and-zoox-robot-cars-battle-it-out-in-san-francisco-1511960403: 403 Client Error: Forbidden for url: https://www.wsj.com/amp/articles/gm-and-zoox-robot-cars-battle-it-out-in-san-francisco-1511960403


Processing URLs:  25%|██▍       | 247/1000 [10:02<04:58,  2.52it/s]

Error extracting text from https://www.barrons.com/articles/amazon-is-a-free-cash-rocket-ship-time-to-jump-on-board-51578099601: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/amazon-is-a-free-cash-rocket-ship-time-to-jump-on-board-51578099601
Error extracting text from http://www.vanguardngr.com/2016/03/imn-drags-nigeria-army-icc-genocide/He: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/03/imn-drags-nigeria-army-icc-genocide/He


Processing URLs:  25%|██▌       | 251/1000 [10:08<12:45,  1.02s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-bill-idUSKBN19M3UZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-bill-idUSKBN19M3UZ
URL filtered: https://www.bloomberg.com/quote/CO1:COM?sref=i2Bc5OtW


Processing URLs:  25%|██▌       | 254/1000 [10:09<07:19,  1.70it/s]

Error extracting text from http://www.nytimes.com/2015/10/06/business/energy-environment/oil-industry-gaining-in-push-for-repeal-of-us-ban-on-petroleum-exports.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/06/business/energy-environment/oil-industry-gaining-in-push-for-repeal-of-us-ban-on-petroleum-exports.html


Processing URLs:  26%|██▌       | 256/1000 [10:12<12:00,  1.03it/s]

Error extracting text from http://www.reuters.com/article/2015/11/27/panama-canal-idUSL1N13M01V20151127#zoSUVLy7p0iGmYTK.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/panama-canal-idUSL1N13M01V20151127#zoSUVLy7p0iGmYTK.99


Processing URLs:  26%|██▌       | 257/1000 [10:15<18:20,  1.48s/it]

Error extracting text from http://www.foxnews.com/world/2016/04/07/mosul-siege-stalled-as-iraqi-army-once-again-flees-when-bullets-fly-say-sources.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/world/2016/04/07/mosul-siege-stalled-as-iraqi-army-once-again-flees-when-bullets-fly-say-sources.html
URL filtered: https://www.youtube.com/watch?v=Dc4Qbp65hlc#t=1m13s


Processing URLs:  26%|██▌       | 260/1000 [10:16<09:40,  1.27it/s]

URL filtered: http://nymag.com/daily/intelligencer/2017/09/mueller-investigation-into-facebook-ads-may-be-a-big-deal.html
URL filtered: http://www.bloomberg.com/politics/articles/2016-10-17/trump-asserts-wide-voter-fraud-that-experts-say-is-unfounded


Processing URLs:  27%|██▋       | 266/1000 [10:24<15:03,  1.23s/it]

Error extracting text from https://www.nytimes.com/2021/08/20/world/asia/korea-china-election-young-voters.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/20/world/asia/korea-china-election-young-voters.html


Processing URLs:  27%|██▋       | 268/1000 [10:26<13:05,  1.07s/it]

Error extracting text from https://www.reuters.com/business/environment/worst-brazil-drought-20-years-up-pressure-power-grid-official-2021-05-10/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/environment/worst-brazil-drought-20-years-up-pressure-power-grid-official-2021-05-10/


Processing URLs:  27%|██▋       | 269/1000 [11:26<3:24:55, 16.82s/it]

Error extracting text from http://www.miamiherald.com/news/local/news-columns-blogs/andres-oppenheimer/article33568107.html&gt: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 271/1000 [11:28<1:51:10,  9.15s/it]

Error extracting text from http://www.who.int/emergencies/crises/cod/drc-donor-alert-ebola-10may2018.pdf?ua=1%3E: 404 Client Error: Not Found for url: https://www.who.int/emergencies/crises/cod/drc-donor-alert-ebola-10may2018.pdf?ua=1%3E


Processing URLs:  28%|██▊       | 276/1000 [12:35<2:59:18, 14.86s/it]

Error extracting text from https://dc.isda.org/credit-default-swaps-management/: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /credit-default-swaps-management/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fefc6630>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))
Error extracting text from http://www.nytimes.com/2016/05/18/world/europe/syria-truce-kerry-assad-airdrop.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/18/world/europe/syria-truce-kerry-assad-airdrop.html


Processing URLs:  28%|██▊       | 277/1000 [12:37<2:12:29, 11.00s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Sep-09/371373-al-qaeda-chief-threatens-thousands-of-911-attacks.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Sep-09/371373-al-qaeda-chief-threatens-thousands-of-911-attacks.ashx


Processing URLs:  28%|██▊       | 280/1000 [12:41<53:18,  4.44s/it]  

Error extracting text from http://inhabitat.com/hyundai-hydrogen-powered-car-sets-new-eco-friendly-vehicle-record/: 403 Client Error: Forbidden for url: https://inhabitat.com/hyundai-hydrogen-powered-car-sets-new-eco-friendly-vehicle-record/


Processing URLs:  29%|██▊       | 287/1000 [13:00<22:35,  1.90s/it]  

Error extracting text from https://bit.ly/3xT3Ujm: 403 Client Error: Forbidden for url: https://www.conservativewoman.co.uk/the-stooge-in-number-ten/


Processing URLs:  29%|██▉       | 289/1000 [13:02<16:01,  1.35s/it]

Error extracting text from https://news.yahoo.com/d-day-italian-government-pm-conte-resignation-mooted-023109857.html: 404 Client Error: Not Found for url: https://news.yahoo.com/d-day-italian-government-pm-conte-resignation-mooted-023109857.html


Processing URLs:  29%|██▉       | 290/1000 [13:03<17:27,  1.48s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html


Processing URLs:  29%|██▉       | 291/1000 [13:06<21:22,  1.81s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-13/vix-could-double-on-geopolitics-and-central-banks-natixis-says


Processing URLs:  29%|██▉       | 294/1000 [13:08<12:31,  1.06s/it]

Error extracting text from https://goo.gl/hfmGQV: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#q=Uk%20Should%20leave%20EU%2C%20UK%20Stay%20in%20EU%2C%20UK%20Stay%20EU&date=today%2012-m&cmpt=q&tz=Etc%2FGMT%2B5


Processing URLs:  30%|██▉       | 295/1000 [13:09<12:21,  1.05s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/company-dakota-access-pipeline-track-start-week-46265658: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/company-dakota-access-pipeline-track-start-week-46265658


Processing URLs:  30%|██▉       | 296/1000 [13:09<09:48,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-usa-pentagon-budget-islamic-state-idUSKCN0VA3JShttps://www.gjopen.com/questions/101-will-anti-islamic-state-forces-retake-mosul-before-1-january-2017#: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-pentagon-budget-islamic-state-idUSKCN0VA3JShttps://www.gjopen.com/questions/101-will-anti-islamic-state-forces-retake-mosul-before-1-january-2017


Processing URLs:  30%|██▉       | 299/1000 [13:14<16:20,  1.40s/it]

Error extracting text from http://www.gov.me/en/News/156306/DPM-Luksic-receives-invitation-letter.html: 404 Client Error: not found for url: https://www.gov.me/en/News/156306/DPM-Luksic-receives-invitation-letter.html


Processing URLs:  30%|███       | 300/1000 [13:45<1:57:25, 10.07s/it]

Error extracting text from http://www.futurecar.com/article-366-1.html: 522 Server Error:  for url: https://www.futurecar.com/article-366-1.html


Processing URLs:  30%|███       | 302/1000 [14:46<4:28:45, 23.10s/it]

Error extracting text from https://archive.is/uYxjq: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  30%|███       | 303/1000 [14:47<3:11:39, 16.50s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0WZ0J6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0WZ0J6


Processing URLs:  31%|███       | 306/1000 [14:51<1:14:23,  6.43s/it]

Error extracting text from https://www.espn.com/olympics/story/_/id/28941868/line-major-announcements-leading-postponement-2020-summer-olympics: 403 Client Error: Forbidden for url: https://www.espn.com/olympics/story/_/id/28941868/line-major-announcements-leading-postponement-2020-summer-olympics


Processing URLs:  31%|███       | 308/1000 [14:53<43:48,  3.80s/it]  

Error extracting text from http://www.comres.co.uk/polls/the-sun-eu-referendum-poll/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/the-sun-eu-referendum-poll/


Processing URLs:  31%|███       | 311/1000 [14:56<22:13,  1.94s/it]

Error extracting text from https://news.lift.co/liberals-seeking-limit-debate-cannabis-act/: HTTPSConnectionPool(host='news.lift.co', port=443): Max retries exceeded with url: /liberals-seeking-limit-debate-cannabis-act/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  31%|███▏      | 313/1000 [14:57<13:44,  1.20s/it]



Processing URLs:  31%|███▏      | 314/1000 [14:59<15:54,  1.39s/it]

Error extracting text from http://www.ibtimes.com/brexit-polls-2016-leave-eu-support-drops-ahead-referendum-vote-citizens-debate-pros-2321077: 403 Client Error: Forbidden for url: https://www.ibtimes.com/brexit-polls-2016-leave-eu-support-drops-ahead-referendum-vote-citizens-debate-pros-2321077


Processing URLs:  32%|███▏      | 316/1000 [15:02<14:27,  1.27s/it]

Error extracting text from http://www.breakingnews.com/topic/iraq-political-turmoil-2016/: 403 Client Error: Forbidden for url: https://www.nbcnews.com


Processing URLs:  32%|███▏      | 318/1000 [15:04<14:19,  1.26s/it]

Error extracting text from http://www.alternet.org/rss/breaking_news/242905/china_taking_&#39;more_aggressive&#39;_stance_at_sea%3A_us_admiral: 404 Client Error: Not Found for url: https://www.alternet.org/rss/breaking_news/242905/china_taking_&#39;more_aggressive&%2339;_stance_at_sea%3A_us_admiral


Processing URLs:  32%|███▏      | 319/1000 [15:04<10:45,  1.05it/s]

Error extracting text from https://www.nytimes.com/2017/02/22/us/a-deadline-looms-for-dakota-protesters-to-leave-campsite.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/22/us/a-deadline-looms-for-dakota-protesters-to-leave-campsite.html


Processing URLs:  32%|███▏      | 320/1000 [15:04<08:23,  1.35it/s]

Error extracting text from https://www.nytimes.com/2017/11/21/technology/fcc-repeal-net-neutrality.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/21/technology/fcc-repeal-net-neutrality.html


Processing URLs:  32%|███▎      | 325/1000 [15:08<05:52,  1.91it/s]

Error extracting text from http://www.reuters.com/article/us-nato-russia-germany-idUSKCN0Z40LE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-russia-germany-idUSKCN0Z40LE


Processing URLs:  33%|███▎      | 326/1000 [15:10<11:50,  1.05s/it]

Error extracting text from http://www.dallasnews.com/news/local-news/20151126-russia-turkey-rift-over-downed-russian-warplane-escalates.ece: 404 Client Error: Not Found for url: https://www.dallasnews.com/news/local-news/20151126-russia-turkey-rift-over-downed-russian-warplane-escalates.ece
URL filtered: https://www.bloomberg.com/news/articles/2016-12-19/russian-ambassador-shot-in-turkish-capital-amid-syria-tensions


Processing URLs:  33%|███▎      | 329/1000 [15:11<07:48,  1.43it/s]

Error extracting text from http://www.wsj.com/articles/islamic-state-braces-as-iraq-prepares-mosul-offensive-1476642193: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/islamic-state-braces-as-iraq-prepares-mosul-offensive-1476642193


Processing URLs:  33%|███▎      | 331/1000 [15:23<28:39,  2.57s/it]

Error extracting text from http://www.opel.com/company/locations.html: 403 Client Error: Forbidden for url: http://www.opel.com/company/locations.html
Error extracting text from http://www.reuters.com/article/us-tillerson-asia-china-visit-idUSKBN16P07O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tillerson-asia-china-visit-idUSKBN16P07O


Processing URLs:  34%|███▎      | 336/1000 [15:30<18:28,  1.67s/it]

URL filtered: https://twitter.com/i/status/1417397544196399127


Processing URLs:  34%|███▍      | 341/1000 [15:35<14:05,  1.28s/it]

Error extracting text from https://www.asco.org/sites/new-www.asco.org/files/content-files/advocacy-and-policy/documents/FAQs-Right-to-Try-Expanded-Access-to-Investigational-Therapies.pdf: 403 Client Error: Forbidden for url: https://www.asco.org/sites/new-www.asco.org/files/content-files/advocacy-and-policy/documents/FAQs-Right-to-Try-Expanded-Access-to-Investigational-Therapies.pdf


Processing URLs:  34%|███▍      | 342/1000 [15:38<18:59,  1.73s/it]

Error extracting text from http://38north.org/2017/05/jschilling052417/?utm_source=38+North+Bulletin+052417&amp;utm_campaign=38+North&amp;utm_medium=email: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  34%|███▍      | 345/1000 [15:42<14:12,  1.30s/it]

Error extracting text from https://www.usni.org/magazines/proceedings/2017-02/escalate-de-escalate: 403 Client Error: Forbidden for url: https://www.usni.org/magazines/proceedings/2017-02/escalate-de-escalate
URL filtered: https://www.youtube.com/watch?v=cawk2cMTnGo


Processing URLs:  35%|███▍      | 348/1000 [15:43<08:09,  1.33it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN16O132?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN16O132?il=0
Error extracting text from https://www.reuters.com/world/us-assesses-russia-completes-withdrawal-around-kyiv-us-defense-official-2022-04-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us-assesses-russia-completes-withdrawal-around-kyiv-us-defense-official-2022-04-06/


Processing URLs:  35%|███▌      | 350/1000 [15:45<08:12,  1.32it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-10/oil-trades-near-44-as-u-s-stockpiles-seen-increasing-7th-week


Processing URLs:  36%|███▌      | 355/1000 [15:52<13:51,  1.29s/it]

Error extracting text from http://finviz.com/futures_charts.ashx?p=d1&amp;t=QA: 403 Client Error: Forbidden for url: https://finviz.com/futures_charts.ashx?p=d1&amp;t=QA


Processing URLs:  36%|███▌      | 356/1000 [15:54<15:49,  1.47s/it]

Error extracting text from http://www.huffingtonpost.in/2016/10/03/how-indias-diplomatic-isolation-of-pakistan-goes-beyond-saarc/: 404 Client Error: Not Found for url: https://www.huffpost.com/archive/in/entry/2016/10/03/how-indias-diplomatic-isolation-of-pakistan-goes-beyond-saarc/


Processing URLs:  36%|███▌      | 358/1000 [15:59<19:28,  1.82s/it]

Error extracting text from https://tradingeconomics.com/china/gdp-growth-annual: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/china/gdp-growth-annual


Processing URLs:  37%|███▋      | 366/1000 [21:14<2:57:57, 16.84s/it] 

URL filtered: https://twitter.com/JuliaDavisNews/status/1497235340276297729
URL filtered: http://www.bloomberg.com/news/articles/2015-11-17/opec-said-to-delay-long-term-strategy-amid-rift-over-production


Processing URLs:  37%|███▋      | 371/1000 [21:17<51:07,  4.88s/it]  

Error extracting text from http://blogs.reuters.com/breakingviews/2016/04/04/dixon-greeces-problems-could-become-britains/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2016/04/04/dixon-greeces-problems-could-become-britains/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fefc52e0>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  37%|███▋      | 374/1000 [21:17<23:34,  2.26s/it]

Error extracting text from http://www.rs.nato.int/news-center/feature-stories/2018-feature-stories/what-to-expect-in-2018.aspx: HTTPConnectionPool(host='www.rs.nato.int', port=80): Max retries exceeded with url: /news-center/feature-stories/2018-feature-stories/what-to-expect-in-2018.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3d8c80>: Failed to resolve 'www.rs.nato.int' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: http://www.bloomberg.com/politics/articles/2016-01-16/iran-has-met-terms-of-international-nuclear-agreement-iaea-says
Error extracting text from http://www.infotep.gov.do/art.php?id=1175: 403 Client Error: Forbidden for url: http://www.infotep.gov.do/art.php?id=1175


Processing URLs:  38%|███▊      | 376/1000 [21:19<16:04,  1.55s/it]

Error extracting text from https://www.reuters.com/article/us-usa-databreaches/former-yahoo-ceo-apologizes-for-data-breaches-blames-russians-idUSKBN1D825V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-databreaches/former-yahoo-ceo-apologizes-for-data-breaches-blames-russians-idUSKBN1D825V


Processing URLs:  38%|███▊      | 381/1000 [21:24<11:50,  1.15s/it]

Error extracting text from https://reut.rs/3oXKtQK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-coastguard-law-idUSKBN29R1ER


Processing URLs:  38%|███▊      | 383/1000 [21:25<09:10,  1.12it/s]

Error extracting text from http://abcnews.go.com/US/wireStory/georgieva-inclusive-east-europe-woman-head-42512392: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/georgieva-inclusive-east-europe-woman-head-42512392
Error extracting text from http://www.nytimes.com/2016/04/04/us/politics/supreme-court-nominee-pushes-ahead-despite-fracas.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/04/us/politics/supreme-court-nominee-pushes-ahead-despite-fracas.html


Processing URLs:  38%|███▊      | 384/1000 [21:27<09:53,  1.04it/s]

Error extracting text from http://www.ngrguardiannews.com/2016/01/abe-putin-agree-to-meet-as-island-row-rumbles-on/: 403 Client Error: Forbidden for url: http://guardian.ng/2016/01/abe-putin-agree-to-meet-as-island-row-rumbles-on/
Error extracting text from http://www.nasdaq.com/article/oil-etfs-in-focus-on-oil-output-freeze-talks-cm580839#ixzz40YaZofcW: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/oil-etfs-in-focus-on-oil-output-freeze-talks-cm580839#ixzz40YaZofcW


Processing URLs:  39%|███▊      | 387/1000 [21:31<12:39,  1.24s/it]

Error extracting text from http://colombiareports.com/one-farcs-feared-commanders-joins-peace-talks-havana/: 404 Client Error: Not Found for url: http://colombiareports.com/one-farcs-feared-commanders-joins-peace-talks-havana/


Processing URLs:  39%|███▉      | 389/1000 [21:32<08:39,  1.18it/s]

Error extracting text from https://www.nytimes.com/2021/06/14/world/asia/china-covid-wuhan-lab-leak.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/14/world/asia/china-covid-wuhan-lab-leak.html


Processing URLs:  40%|███▉      | 397/1000 [21:47<14:05,  1.40s/it]

Error extracting text from https://boingboing.net/2016/08/16/hackers-claim-to-have-stolen-n.html: 403 Client Error: Forbidden for url: https://boingboing.net/2016/08/16/hackers-claim-to-have-stolen-n.html


Processing URLs:  40%|████      | 405/1000 [21:56<07:42,  1.29it/s]

Error extracting text from http://nyti.ms/1lvDLAK: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/world/europe/kerry-nato-syria-russia.html?smid=pl-share
Error extracting text from http://www.nytimes.com/2016/01/19/world/middleeast/dispute-over-oppositions-seat-at-table-threatens-to-push-back-syria-peace-talks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/world/middleeast/dispute-over-oppositions-seat-at-table-threatens-to-push-back-syria-peace-talks.html?_r=0


Processing URLs:  41%|████▏     | 413/1000 [22:21<39:01,  3.99s/it]

Error extracting text from http://www.bna.com/coalition-pushes-ttip-n57982073862/: 403 Client Error: Forbidden for url: https://www.bloombergindustry.com/


Processing URLs:  42%|████▏     | 416/1000 [22:23<16:47,  1.73s/it]

Error extracting text from http://aminewswire.com/stories/510702715-coalition-airstrikes-hit-major-isis-site-in-mosul: 404 Client Error: Not Found for url: https://aminewswire.com/stories/510702715-coalition-airstrikes-hit-major-isis-site-in-mosul


Processing URLs:  42%|████▏     | 423/1000 [22:33<09:34,  1.01it/s]

Error extracting text from https://pythagorassite.files.wordpress.com/2016/04/screenshot_4_1_16__6_37_pm.png?w=920: 404 Client Error: Not Found for url: https://pythagorassite.files.wordpress.com/2016/04/screenshot_4_1_16__6_37_pm.png?w=920
Error extracting text from http://www.reuters.com/article/safrica-budget-idUSL8N16331Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/safrica-budget-idUSL8N16331Z


Processing URLs:  43%|████▎     | 429/1000 [22:40<09:26,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-ramadi-idUSKCN0VI0OU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-ramadi-idUSKCN0VI0OU


Processing URLs:  43%|████▎     | 430/1000 [22:42<12:36,  1.33s/it]

Error extracting text from http://www.stripes.com/news/despite-sanctions-north-korea-continues-provocations-1.400276: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/despite-sanctions-north-korea-continues-provocations-1.400276


Processing URLs:  43%|████▎     | 432/1000 [22:45<12:06,  1.28s/it]

Error extracting text from https://www.newsweek.com/fact-check-would-repealing-section-230-promote-free-speech-trump-says-1557886: 403 Client Error: Forbidden for url: https://www.newsweek.com/fact-check-would-repealing-section-230-promote-free-speech-trump-says-1557886


Processing URLs:  43%|████▎     | 434/1000 [22:47<10:14,  1.09s/it]

Error extracting text from http://www.sec.gov/Archives/edgar/data/1418091/000156459016017462/twtr-10q_20160331.htm: 403 Client Error: Forbidden for url: http://www.sec.gov/Archives/edgar/data/1418091/000156459016017462/twtr-10q_20160331.htm


Processing URLs:  44%|████▎     | 437/1000 [22:50<08:15,  1.14it/s]

Error extracting text from http://www.newsworks.org/index.php/local/down-the-shore/102931-cape-may-county-to-test-how-drones-can-aid-post-disaster-communications-: HTTPConnectionPool(host='www.newsworks.org', port=80): Max retries exceeded with url: /index.php/local/down-the-shore/102931-cape-may-county-to-test-how-drones-can-aid-post-disaster-communications- (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f05c0>: Failed to resolve 'www.newsworks.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▍     | 440/1000 [22:53<09:27,  1.01s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/rights-group-accuses-iraqi-kurds-destroying-arab-homes-43500493: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/rights-group-accuses-iraqi-kurds-destroying-arab-homes-43500493


Processing URLs:  44%|████▍     | 445/1000 [22:57<08:56,  1.03it/s]

URL filtered: https://twitter.com/NASAWebb/status/1473762199617388551


Processing URLs:  45%|████▍     | 449/1000 [23:05<17:00,  1.85s/it]

URL filtered: https://www.bloomberg.com/news/videos/2020-12-28/eu-lawmaker-gozi-on-brexit-deal-european-parliament-ratification-video


Processing URLs:  45%|████▌     | 453/1000 [23:07<09:05,  1.00it/s]

Error extracting text from http://www.thesun.co.uk/sol/homepage/news/politics/6923338/EU-mess-Boris-Johnsons-top-lawyer-wife-launches-devastating-attack-on-Camerons-referendum-deal.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/news/politics/6923338/EU-mess-Boris-Johnsons-top-lawyer-wife-launches-devastating-attack-on-Camerons-referendum-deal.html
Error extracting text from http://www.nbcnews.com/politics/white-house/trump-charges-iran-violating-spirit-nuclear-deal-n749131: 403 Client Error: Forbidden for url: http://www.nbcnews.com/politics/white-house/trump-charges-iran-violating-spirit-nuclear-deal-n749131


Processing URLs:  46%|████▌     | 456/1000 [24:12<2:18:27, 15.27s/it]

Error extracting text from https://www.swp-berlin.org/fileadmin/contents/products/comments/2019C26_pau.pdf: HTTPSConnectionPool(host='www.swp-berlin.org', port=443): Read timed out.


Processing URLs:  46%|████▌     | 458/1000 [24:31<2:01:02, 13.40s/it]

Error extracting text from http://www.tehrantimes.com/news/402756/Ayatollah-Amini-to-run-for-Assembly-of-Experts-chair: 504 Server Error: Gateway Time-out for url: https://www.tehrantimes.com/news/402756/Ayatollah-Amini-to-run-for-Assembly-of-Experts-chair


Processing URLs:  46%|████▌     | 460/1000 [24:33<1:05:27,  7.27s/it]

Error extracting text from https://www.reuters.com/business/us-job-openings-jump-11-million-october-2021-12-08/): 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/us-job-openings-jump-11-million-october-2021-12-08/)


Processing URLs:  46%|████▋     | 463/1000 [24:35<27:49,  3.11s/it]  

Error extracting text from http://english.yonhapnews.co.kr/national/2016/02/09/0301000000AEN20160209002051320.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  46%|████▋     | 464/1000 [24:36<22:46,  2.55s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/full-public-fbi-reveal-rare-trump-russia-type-47159987: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/full-public-fbi-reveal-rare-trump-russia-type-47159987


Processing URLs:  47%|████▋     | 468/1000 [24:50<19:51,  2.24s/it]

Error extracting text from http://www.reuters.com/article/us-burundi-politics-idUSKCN0YP1XW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-politics-idUSKCN0YP1XW


Processing URLs:  47%|████▋     | 469/1000 [24:51<18:33,  2.10s/it]

Error extracting text from http://www.ibtimes.com/cerns-cast-experiment-limits-dark-matter-possibilities-not-finding-solar-axions-2533242: 403 Client Error: Forbidden for url: https://www.ibtimes.com/cerns-cast-experiment-limits-dark-matter-possibilities-not-finding-solar-axions-2533242


Processing URLs:  48%|████▊     | 477/1000 [25:06<16:26,  1.89s/it]

Error extracting text from https://wwwnc.cdc.gov/travel/notices: 403 Client Error: Forbidden for url: https://wwwnc.cdc.gov/travel/notices
URL filtered: https://www.recode.net/2017/11/5/16609678/facebook-twitter-russia-investment-paradise-papers-apple


Processing URLs:  48%|████▊     | 482/1000 [25:10<07:35,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-usa-idUSKBN0TX23O20151214: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-usa-idUSKBN0TX23O20151214


Processing URLs:  48%|████▊     | 483/1000 [25:13<12:21,  1.43s/it]

URL filtered: https://www.youtube.com/watch?v=8SbJIlEd6jA


Processing URLs:  49%|████▉     | 493/1000 [25:25<11:55,  1.41s/it]

Error extracting text from http://thehill.com/policy/finance/258303-house-approves-ex-im-renewal: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/258303-house-approves-ex-im-renewal/


Processing URLs:  50%|████▉     | 495/1000 [25:27<10:35,  1.26s/it]

Error extracting text from https://www.reuters.com/article/us-ethiopia-usa-conflict-idUSKCN2AT1LB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ethiopia-usa-conflict-idUSKCN2AT1LB


Processing URLs:  50%|████▉     | 499/1000 [26:31<2:19:41, 16.73s/it]

Error extracting text from http://www.atmb.net.cn/Adout_Us.aspx: HTTPConnectionPool(host='www.atmb.net.cn', port=80): Read timed out. (read timeout=60)


Processing URLs:  50%|█████     | 505/1000 [26:42<27:56,  3.39s/it]  

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-idUSKCN0XJ0KI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-idUSKCN0XJ0KI


Processing URLs:  51%|█████     | 511/1000 [27:01<32:44,  4.02s/it]

Error extracting text from http://dx.doi.org/10.1051/shsconf/20162801147: 403 Client Error: Forbidden for url: https://www.shs-conferences.org/10.1051/shsconf/20162801147


Processing URLs:  51%|█████     | 512/1000 [27:04<30:06,  3.70s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-11-09/what-to-watch-in-oil-when-donald-trump-moves-into-white-house-ivb8djgu


Processing URLs:  52%|█████▏    | 517/1000 [27:19<22:38,  2.81s/it]

Error extracting text from http://dailykanban.com/2016/06/tesla-suspension-breakage-not-crime-coverup/: 403 Client Error: Forbidden for url: https://dailykanban.com/2016/06/tesla-suspension-breakage-not-crime-coverup/


Processing URLs:  52%|█████▏    | 518/1000 [27:20<20:07,  2.51s/it]

Error extracting text from http://www.phnompenhpost.com/national/pm-hun-sen-urges-north-korea-resume-talks: 403 Client Error: Forbidden for url: http://www.phnompenhpost.com/national/pm-hun-sen-urges-north-korea-resume-talks


Processing URLs:  53%|█████▎    | 526/1000 [27:30<09:34,  1.21s/it]

Error extracting text from http://m.news24.com.ng/Nigeria/National/News/suspected-fulani-herdsmen-hack-pastor-to-death-20160706: HTTPConnectionPool(host='m.news24.com.ng', port=80): Max retries exceeded with url: /Nigeria/National/News/suspected-fulani-herdsmen-hack-pastor-to-death-20160706 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e0a3f0>: Failed to resolve 'm.news24.com.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  53%|█████▎    | 531/1000 [27:38<09:32,  1.22s/it]

Error extracting text from http://www.fao.org/emergencies/crisis/fightingfamine/en/: 404 Client Error: Not Found for url: https://www.fao.org/emergencies/crisis/fightingfamine/en/
Error extracting text from https://www.axios.com/zarif-report-iran-deal-sanctions-relief-5b6b15c7-c9ca-414f-967e-365b43025e8c.html: 403 Client Error: Forbidden for url: https://www.axios.com/zarif-report-iran-deal-sanctions-relief-5b6b15c7-c9ca-414f-967e-365b43025e8c.html


Processing URLs:  53%|█████▎    | 532/1000 [27:39<06:59,  1.12it/s]

Error extracting text from http://www.nytimes.com/2015/07/22/world/middleeast/iran-nuclear-deal-vote.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/07/22/world/middleeast/iran-nuclear-deal-vote.html


Processing URLs:  53%|█████▎    | 534/1000 [27:40<07:12,  1.08it/s]

Error extracting text from https://opil.ouplaw.com/view/10.1093/law:epil/9780199231690/law-9780199231690-e1486?rskey=FWD5v7&amp;result=1&amp;prd=OPIL: 403 Client Error: Forbidden for url: https://opil.ouplaw.com/view/10.1093/law:epil/9780199231690/law-9780199231690-e1486?rskey=FWD5v7&amp;result=1&amp;prd=OPIL


Processing URLs:  54%|█████▎    | 535/1000 [27:42<08:51,  1.14s/it]

Error extracting text from http://www.timesofisrael.com/as-terrorism-dips-idf-arrests-on-the-wane-in-the-west-bank/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/as-terrorism-dips-idf-arrests-on-the-wane-in-the-west-bank/


Processing URLs:  54%|█████▍    | 539/1000 [27:47<07:41,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-pdvsa-nustar-ener-storage-exclusive/exclusive-pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills-idUSKBN1CP23L?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-pdvsa-nustar-ener-storage-exclusive/exclusive-pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills-idUSKBN1CP23L?il=0


Processing URLs:  54%|█████▍    | 541/1000 [27:48<05:47,  1.32it/s]

Error extracting text from http://www.reuters.com/article/saudi-aramco-ipo-idUSL8N1BC2W3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/saudi-aramco-ipo-idUSL8N1BC2W3


Processing URLs:  54%|█████▍    | 543/1000 [27:50<06:50,  1.11it/s]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-protests/tens-of-thousands-of-israelis-protest-against-netanyahu-corruption-idUSKBN1DW0Q8?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-protests/tens-of-thousands-of-israelis-protest-against-netanyahu-corruption-idUSKBN1DW0Q8?il=0


Processing URLs:  55%|█████▍    | 545/1000 [27:56<12:40,  1.67s/it]

Error extracting text from http://community.uavcoach.com/topic/1529-faa-airspace-authorization-waiver-experiences-notes/: HTTPConnectionPool(host='community.uavcoach.com', port=80): Max retries exceeded with url: /topic/1529-faa-airspace-authorization-waiver-experiences-notes/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db1550>: Failed to resolve 'community.uavcoach.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▍    | 546/1000 [27:57<11:05,  1.47s/it]

Error extracting text from https://www.afghanistan-analysts.org/pushing-the-parliament-to-accept-a-decree-another-election-without-reform/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/pushing-the-parliament-to-accept-a-decree-another-election-without-reform/


Processing URLs:  55%|█████▍    | 549/1000 [28:03<16:51,  2.24s/it]

Error extracting text from http://www.unian.info/economics/1413933-finance-ministry-says-when-imfs-board-of-directors-to-convene-on-ukraine.html&quot: 404 Client Error: Not Found for url: https://www.unian.info/economics/1413933-finance-ministry-says-when-imfs-board-of-directors-to-convene-on-ukraine.html&quot


Processing URLs:  55%|█████▌    | 553/1000 [28:11<15:35,  2.09s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=308322: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=308322


Processing URLs:  56%|█████▌    | 558/1000 [28:20<11:44,  1.59s/it]

Error extracting text from http://www.nytimes.com/2015/10/13/world/middleeast/jason-rezaian-washington-post-conviction-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/13/world/middleeast/jason-rezaian-washington-post-conviction-iran.html


Processing URLs:  56%|█████▌    | 562/1000 [28:39<20:27,  2.80s/it]

Error extracting text from http://www.reuters.com/article/us-usa-china-drone-idUSKBN14526J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-drone-idUSKBN14526J


Processing URLs:  57%|█████▋    | 567/1000 [28:48<12:21,  1.71s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/zuma-should-resign-says-senior-official-in-south-africas-ruling-party-idUSKBN1FN0CL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/zuma-should-resign-says-senior-official-in-south-africas-ruling-party-idUSKBN1FN0CL


Processing URLs:  57%|█████▋    | 571/1000 [28:52<08:09,  1.14s/it]

Error extracting text from http://digitalcommons.ilr.cornell.edu/cgi/viewcontent.cgi?article=1671&amp;context=key_workplace: 404 Client Error: Not Found for url: http://digitalcommons.ilr.cornell.edu/cgi/viewcontent.cgi?article=1671&amp;context=key_workplace


Processing URLs:  57%|█████▋    | 574/1000 [28:54<05:31,  1.29it/s]

Error extracting text from http://www.hindustantimes.com/opinion/what-indians-should-know-before-demanding-military-crackdown-in-kashmir-chhattisgarh/story-l65D1BM3Z4IVMJ2dtQ802O.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/opinion/what-indians-should-know-before-demanding-military-crackdown-in-kashmir-chhattisgarh/story-l65D1BM3Z4IVMJ2dtQ802O.html


Processing URLs:  58%|█████▊    | 578/1000 [28:55<02:26,  2.88it/s]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-11/senate-democrats-tee-up-post-recess-clash-on-voting-rights-bill
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-park-idUSKCN0V009D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-park-idUSKCN0V009D


Processing URLs:  58%|█████▊    | 583/1000 [29:00<06:09,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-impeachment-idUSKCN10D297: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-impeachment-idUSKCN10D297


Processing URLs:  59%|█████▉    | 588/1000 [29:10<11:35,  1.69s/it]

Error extracting text from http://www.nytimes.com/2016/08/04/world/europe/spains-political-parties-fail-to-agree-on-coalition-government.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/04/world/europe/spains-political-parties-fail-to-agree-on-coalition-government.html
Error extracting text from http://elections.af/?p=2175: HTTPConnectionPool(host='elections.af', port=80): Max retries exceeded with url: /?p=2175 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffd5f2c0>: Failed to resolve 'elections.af' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  59%|█████▉    | 594/1000 [29:15<07:29,  1.11s/it]

Error extracting text from https://www.sciencemag.org/news/2020/09/can-china-worlds-bigger-coal-consumer-become-carbon-neutral-2060: 403 Client Error: Forbidden for url: https://www.science.org/news/2020/09/can-china-worlds-bigger-coal-consumer-become-carbon-neutral-2060


Processing URLs:  60%|█████▉    | 597/1000 [29:22<11:30,  1.71s/it]

URL filtered: https://techcrunch.com/2017/09/08/bitcoin-price-drops-following-report-that-china-is-going-to-shut-down-local-exchanges/?utm_source=tcfbpage&sr_share=facebook


Processing URLs:  60%|█████▉    | 599/1000 [29:23<08:18,  1.24s/it]

Error extracting text from http://www.debka.com/article/25560/Erdogan-locks-US-airmen-nuclear-arms-in-Incirlik: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25560/Erdogan-locks-US-airmen-nuclear-arms-in-Incirlik (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  60%|██████    | 602/1000 [29:29<09:46,  1.47s/it]

Error extracting text from http://fuelfix.com/blog/2015/09/17/oil-producers-eyeing-options-for-getting-crude-export-bill-to-presidents-desk/: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2015/09/17/oil-producers-eyeing-options-for-getting-crude-export-bill-to-presidents-desk/
Error extracting text from http://www.reuters.com/.../us-trade-eu-usa-idUSKBN0NJ1VB20150428: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/.../us-trade-eu-usa-idUSKBN0NJ1VB20150428


Processing URLs:  60%|██████    | 605/1000 [29:30<04:56,  1.33it/s]

Error extracting text from https://news.google.com/articles/CAIiEFlaxPDhZdkTz9b5ImohwnoqMwgEKioIACIQpzoRSNLEm6QR--MasMLSAioUCAoiEKc6EUjSxJukEfvjGrDC0gIwpfTQBg?hl=en-US&amp;gl=US&amp;ceid=US%3Aen: 500 Server Error: Internal Server Error for url: https://news.google.com/articles/CAIiEFlaxPDhZdkTz9b5ImohwnoqMwgEKioIACIQpzoRSNLEm6QR--MasMLSAioUCAoiEKc6EUjSxJukEfvjGrDC0gIwpfTQBg?hl=en-US&amp;gl=US&amp;ceid=US:en&gl=US&ceid=US:en


Processing URLs:  61%|██████    | 606/1000 [29:31<05:30,  1.19it/s]

URL filtered: https://m.youtube.com/watch?v=1rsIek9AFug


Processing URLs:  61%|██████    | 610/1000 [29:34<06:09,  1.06it/s]

Error extracting text from https://science.sciencemag.org/content/355/6324/481.abstract: 403 Client Error: Forbidden for url: https://www.science.org/doi/abs/10.1126/science.aal3147


Processing URLs:  61%|██████▏   | 614/1000 [29:39<06:21,  1.01it/s]

Error extracting text from http://eng.mod.gov.cn/MilitaryExercises/index_2.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/MilitaryExercises/index_2.htm


Processing URLs:  62%|██████▏   | 617/1000 [29:41<05:22,  1.19it/s]

Error extracting text from https://english.alarabiya.net/en/views/news/middle-east/2018/02/13/The-regional-dimensions-of-Syrian-conflict.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/views/news/middle-east/2018/02/13/The-regional-dimensions-of-Syrian-conflict.html


Processing URLs:  62%|██████▏   | 620/1000 [29:45<05:26,  1.17it/s]

Error extracting text from https://www.nasdaq.com/articles/could-stripe-be-the-biggest-ipo-in-2021-2021-01-10: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/could-stripe-be-the-biggest-ipo-in-2021-2021-01-10


Processing URLs:  62%|██████▏   | 621/1000 [29:48<10:28,  1.66s/it]

Error extracting text from http://www.ansamed.info/ansamed/en/news/sections/politics/2017/05/09/turkey-desires-to-continue-process-of-eu-accession-erdog_c6a290d6-617b-477f-b019-7939715b091c.html: 404 Client Error: Not Found for url: https://www.ansa.it/ansamed/en/news/sections/politics/2017/05/09/turkey-desires-to-continue-process-of-eu-accession-erdog_c6a290d6-617b-477f-b019-7939715b091c.html


Processing URLs:  62%|██████▏   | 622/1000 [29:48<07:50,  1.25s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-05-25/new-york-london-on-notice-as-china-targets-commodities-pricing


Processing URLs:  62%|██████▎   | 625/1000 [29:50<05:17,  1.18it/s]

Error extracting text from http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?emc=edit_th_20150915&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?emc=edit_th_20150915&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  63%|██████▎   | 632/1000 [30:09<14:22,  2.34s/it]

Error extracting text from http://www.c-span.org/video/?407164-1/hillary-clinton-remarks-counterterrorism: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?407164-1/hillary-clinton-remarks-counterterrorism
Error extracting text from http://www.ajot.com/blogs/full/panama-canal-administrator-says-inauguration-on-schedule-for-june: 403 Client Error: Forbidden for url: http://www.ajot.com/blogs/full/panama-canal-administrator-says-inauguration-on-schedule-for-june


Processing URLs:  64%|██████▎   | 635/1000 [30:16<13:38,  2.24s/it]

Error extracting text from http://www.wsj.com/articles/little-known-beijing-stock-market-sees-surge-of-ipos-1445283001: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/little-known-beijing-stock-market-sees-surge-of-ipos-1445283001


Processing URLs:  64%|██████▍   | 638/1000 [30:29<20:40,  3.43s/it]

Error extracting text from http://www.acleddata.com/data/realtime-data/: 404 Client Error: Not Found for url: https://acleddata.com/data/realtime-data/
Error extracting text from http://www.reuters.com/article/us-health-zika-idUSKCN0W62HI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-zika-idUSKCN0W62HI


Processing URLs:  64%|██████▍   | 639/1000 [30:31<17:09,  2.85s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/21505-russia-deploys-28-combat-planes-in-syria-us-officials: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/21505-russia-deploys-28-combat-planes-in-syria-us-officials


Processing URLs:  64%|██████▍   | 640/1000 [30:33<15:55,  2.65s/it]

Error extracting text from http://www.tv360nigeria.com/agatu-massacre-hurting-nigerias-economic-interests-ogbeh/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  64%|██████▍   | 642/1000 [30:34<09:36,  1.61s/it]

Error extracting text from http://thelongestswim.com/logbook/: 404 Client Error: Not Found for url: http://thelongestswim.com/logbook/


Processing URLs:  65%|██████▍   | 646/1000 [30:43<14:38,  2.48s/it]

Error extracting text from http://www.iranpolitik.com/2016/01/19/analysis/rouhani-disheartened-mass-disqualification-aspiring-2016-iran-election-candidates/: 404 Client Error: Not Found for url: http://www.iranpolitik.com/2016/01/19/analysis/rouhani-disheartened-mass-disqualification-aspiring-2016-iran-election-candidates/


Processing URLs:  65%|██████▍   | 647/1000 [30:44<12:45,  2.17s/it]

Error extracting text from http://ca.reuters.com/article/businessNews/idCAKCN0X81YI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  65%|██████▍   | 649/1000 [30:47<09:48,  1.68s/it]

Error extracting text from http://www.latimes.com/nation/la-na-brennan-russia-20170523-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-brennan-russia-20170523-story.html


Processing URLs:  65%|██████▌   | 650/1000 [30:48<08:04,  1.38s/it]

Error extracting text from http://oklo.org/2016/01/10/autoregression/: 406 Client Error: Not Acceptable for url: http://oklo.org/2016/01/10/autoregression/


Processing URLs:  65%|██████▌   | 651/1000 [30:51<10:22,  1.78s/it]

Error extracting text from http://www.propublica.org/article/how-dark-money-helped-republicans-hold-the: 404 Client Error: Not Found for url: http://www.propublica.org/article/how-dark-money-helped-republicans-hold-the


Processing URLs:  65%|██████▌   | 652/1000 [30:51<08:21,  1.44s/it]

Error extracting text from http://whatreallyhappened.com/RANCHO/POLITICS/BODIES.php: 403 Client Error: Forbidden for url: http://whatreallyhappened.com/RANCHO/POLITICS/BODIES.php


Processing URLs:  65%|██████▌   | 653/1000 [30:53<08:59,  1.55s/it]

Error extracting text from http://www.ibtimes.com/cure-cancer-found-new-leukemia-treatment-provides-hope-2566137: 403 Client Error: Forbidden for url: https://www.ibtimes.com/cure-cancer-found-new-leukemia-treatment-provides-hope-2566137


Processing URLs:  66%|██████▌   | 656/1000 [31:05<14:02,  2.45s/it]

Error extracting text from https://www.nytimes.com/2017/11/20/world/europe/germany-merkel-coalition.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/20/world/europe/germany-merkel-coalition.html


Processing URLs:  66%|██████▌   | 658/1000 [31:07<09:58,  1.75s/it]

Error extracting text from http://www.ndb.int/medias/brics-development-bank-issue-500-million-masala-bonds-reuters/: 403 Client Error: Forbidden for url: https://www.ndb.int/medias/brics-development-bank-issue-500-million-masala-bonds-reuters/


Processing URLs:  66%|██████▌   | 659/1000 [31:09<09:49,  1.73s/it]

Error extracting text from https://www.latimes.com/world-nation/story/2021-02-09/idaho-ammon-bundy: 403 Client Error: Forbidden for url: https://www.latimes.com/world-nation/story/2021-02-09/idaho-ammon-bundy


Processing URLs:  66%|██████▋   | 665/1000 [31:15<05:16,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-climatechange-idUSKBN16M2CB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-climatechange-idUSKBN16M2CB


Processing URLs:  67%|██████▋   | 668/1000 [31:18<04:32,  1.22it/s]

Error extracting text from http://www.nytimes.com/2016/01/03/world/middleeast/saudi-arabia-executes-47-sheikh-nimr-shiite-cleric.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/03/world/middleeast/saudi-arabia-executes-47-sheikh-nimr-shiite-cleric.html


Processing URLs:  67%|██████▋   | 672/1000 [31:23<06:08,  1.12s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/iran-isnt-sweating-saudi-intervention-syria-15262: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/iran-isnt-sweating-saudi-intervention-syria-15262


Processing URLs:  68%|██████▊   | 675/1000 [31:26<05:07,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/france-conducts-airstrikes-against-islamic-state-in-syria-1443337569: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/france-conducts-airstrikes-against-islamic-state-in-syria-1443337569


Processing URLs:  68%|██████▊   | 677/1000 [31:29<05:55,  1.10s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/25/gitrep-24mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/25/gitrep-24mar16pm/


Processing URLs:  68%|██████▊   | 678/1000 [31:30<05:47,  1.08s/it]

URL filtered: https://www.youtube.com/watch?v=7WEd34oW9BI


Processing URLs:  68%|██████▊   | 680/1000 [31:39<14:47,  2.77s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/recent-developments-surrounding-the-south-china-sea/2017/07/03/356622ac-5fc0-11e7-80a2-8c226031ac3f_story.html?utm_term=.b0fea950b04b: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/recent-developments-surrounding-the-south-china-sea/2017/07/03/356622ac-5fc0-11e7-80a2-8c226031ac3f_story.html?utm_term=.b0fea950b04b


Processing URLs:  68%|██████▊   | 681/1000 [31:41<12:39,  2.38s/it]

Error extracting text from https://www.stripes.com/news/a-string-of-deadly-attacks-in-afghanistan-exposes-government-weakness-limits-of-us-training-effort-1.509078: 404 Client Error: Not Found for url: https://www.stripes.com/news/a-string-of-deadly-attacks-in-afghanistan-exposes-government-weakness-limits-of-us-training-effort-1.509078


Processing URLs:  68%|██████▊   | 684/1000 [31:44<07:34,  1.44s/it]

Error extracting text from https://www.lesswrong.com/: 403 Client Error: Forbidden for url: https://www.lesswrong.com/
Error extracting text from http://www.nytimes.com/2016/01/19/world/asia/afghan-panel-sets-election-date-drawing-government-criticism.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/world/asia/afghan-panel-sets-election-date-drawing-government-criticism.html


Processing URLs:  68%|██████▊   | 685/1000 [31:44<06:16,  1.20s/it]

URL filtered: https://twitter.com/SpeakerRyan
Error extracting text from https://www.reuters.com/article/us-germany-politics/split-social-democrats-could-sink-merkels-coalition-plans-idUSKBN1F51VN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/split-social-democrats-could-sink-merkels-coalition-plans-idUSKBN1F51VN


Processing URLs:  69%|██████▉   | 690/1000 [31:53<07:49,  1.51s/it]

Error extracting text from https://www.emirates.com/us/english/discover-dubai/expo-2020/: 404 Client Error: Not Found for url: https://www.emirates.com/us/english/discover-dubai/expo-2020/


Processing URLs:  69%|██████▉   | 692/1000 [32:11<22:35,  4.40s/it]

Error extracting text from http://www.investopedia.com/terms/c/coreinflation.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/c/coreinflation.asp


Processing URLs:  70%|██████▉   | 696/1000 [32:19<13:44,  2.71s/it]

Error extracting text from https://uk.news.yahoo.com/construction-workers-strike-stalls-panama-183601407.html#78Opyku: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/construction-workers-strike-stalls-panama-183601407.html#78Opyku


Processing URLs:  70%|███████   | 701/1000 [32:28<08:14,  1.65s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-safezones-idUSKBN15B0E5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-safezones-idUSKBN15B0E5


Processing URLs:  70%|███████   | 703/1000 [32:30<05:53,  1.19s/it]

Error extracting text from http://www.aljazeera.com/news/2016/10/battle-mosul-peshmerga-seizes-bashiqa-isil-161023103005292.html: 404 Client Error: Not Found for url: https://www.aljazeera.com/news/2016/10/battle-mosul-peshmerga-seizes-bashiqa-isil-161023103005292.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-mood-idUSKBN13B18V?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-mood-idUSKBN13B18V?il=0


Processing URLs:  71%|███████   | 707/1000 [32:35<05:36,  1.15s/it]

Error extracting text from http://moslereconomics.com/wp-content/powerpoints/7DIF.pdf: 403 Client Error: Forbidden for url: http://moslereconomics.com/wp-content/powerpoints/7DIF.pdf
URL filtered: https://www.bloomberg.com/news/articles/2017-08-20/buhari-faces-key-challenges-in-nigeria-as-he-ends-medical-leave


Processing URLs:  71%|███████   | 710/1000 [33:40<56:38, 11.72s/it]  

Error extracting text from https://www.rferl.org/a/tillerson-european-security-russian-threat/28884642.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/tillerson-european-security-russian-threat/28884642.html


Processing URLs:  71%|███████   | 712/1000 [33:49<41:41,  8.69s/it]

Error extracting text from http://tmsnrt.rs/2bGrvdS: 404 Client Error:  for url: https://news.trust.org:443/item/20160830160716-605e8/


Processing URLs:  71%|███████▏  | 713/1000 [33:50<30:49,  6.44s/it]

Error extracting text from https://theconversation.com/lawmakers-keen-to-break-up-big-tech-like-amazon-and-google-need-to-realize-the-world-has-changed-a-lot-since-microsoft-and-standard-oil-143517: 403 Client Error: Forbidden for url: https://theconversation.com/lawmakers-keen-to-break-up-big-tech-like-amazon-and-google-need-to-realize-the-world-has-changed-a-lot-since-microsoft-and-standard-oil-143517


Processing URLs:  72%|███████▏  | 715/1000 [33:51<16:56,  3.57s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-usa-idUSKCN0XG2B5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-usa-idUSKCN0XG2B5


Processing URLs:  72%|███████▏  | 718/1000 [33:56<10:43,  2.28s/it]

Error extracting text from https://www.thelocal.se/20210628/swedish-prime-minister-stefan-lofven-resigns/: 403 Client Error: Forbidden for url: https://www.thelocal.se/20210628/swedish-prime-minister-stefan-lofven-resigns


Processing URLs:  72%|███████▏  | 721/1000 [34:15<17:05,  3.68s/it]

Error extracting text from https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html?pagewanted=all: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html?pagewanted=all


Processing URLs:  72%|███████▏  | 723/1000 [34:16<09:03,  1.96s/it]

Error extracting text from https://www.nytimes.com/2017/10/15/world/asia/north-korea-hacking-cyber-sony.html?emc=edit_nn_20171016&amp;nl=morning-briefing&amp;nlid=52725637&amp;te=1&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/15/world/asia/north-korea-hacking-cyber-sony.html?emc=edit_nn_20171016&amp;nl=morning-briefing&amp;nlid=52725637&amp;te=1&amp;_r=0


Processing URLs:  73%|███████▎  | 726/1000 [34:28<13:44,  3.01s/it]

Error extracting text from http://wakeywakeynews.com/32896/us-deploys-spy-plane-in-singapore-amid-south-china-sea: 404 Client Error: Not Found for url: https://www.wakeywakeynews.com/32896/us-deploys-spy-plane-in-singapore-amid-south-china-sea


Processing URLs:  73%|███████▎  | 728/1000 [34:31<09:59,  2.20s/it]

Error extracting text from http://www.ibtimes.com/who-xu-lin-great-firewall-china-just-got-new-top-censor-2388101: 403 Client Error: Forbidden for url: https://www.ibtimes.com/who-xu-lin-great-firewall-china-just-got-new-top-censor-2388101


Processing URLs:  73%|███████▎  | 732/1000 [35:37<1:28:06, 19.73s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2021-01-25/biden-administration-suspends-some-sanctions-on-yemen-rebels: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  74%|███████▎  | 736/1000 [35:42<25:07,  5.71s/it]  

Error extracting text from https://www.desmogblog.com/sites/beta.desmogblog.com/files/Final%20report%20of%20Gulf%20Coast%20pipeline.pdf: 404 Client Error: Not Found for url: https://www.desmog.com/wp-content/uploads/files/Final%20report%20of%20Gulf%20Coast%20pipeline.pdf/


Processing URLs:  74%|███████▍  | 739/1000 [35:46<12:13,  2.81s/it]

Error extracting text from http://www.wsj.com/articles/brazil-court-suspends-impeachment-process-against-president-rousseff-1449660334: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-court-suspends-impeachment-process-against-president-rousseff-1449660334


Processing URLs:  74%|███████▍  | 745/1000 [36:57<1:23:19, 19.60s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/congress/article180707721.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  75%|███████▌  | 751/1000 [37:02<12:07,  2.92s/it]  

Error extracting text from http://thehill.com/blogs/pundits-blog/technology/316549-chao-at-transportation-can-bring-much-needed-change-on-drone: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/technology/316549-chao-at-transportation-can-bring-much-needed-change-on-drone/
URL filtered: https://www.youtube.com/watch?v=UZ9ac7vb8-E


Processing URLs:  76%|███████▌  | 756/1000 [37:08<06:42,  1.65s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/17373-attorney-general-retakes-office-soon-after-farewell: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/17373-attorney-general-retakes-office-soon-after-farewell


Processing URLs:  76%|███████▌  | 757/1000 [37:12<09:51,  2.43s/it]

Error extracting text from http://theiranproject.com/blog/2016/04/25/fm-macedonia-open-embassy-iran-soon/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=fm-macedonia-open-embassy-iran-soon


Processing URLs:  76%|███████▋  | 763/1000 [37:20<04:15,  1.08s/it]

Error extracting text from http://www.latimes.com/business/hiltzik/la-fi-hiltzik-debt-ceiling-20170808-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/hiltzik/la-fi-hiltzik-debt-ceiling-20170808-story.html
Error extracting text from http://www.nytimes.com/2016/04/19/world/middleeast/obama-isis-iraq.html?ref=todayspaper&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/19/world/middleeast/obama-isis-iraq.html?ref=todayspaper&amp;_r=0


Processing URLs:  76%|███████▋  | 764/1000 [37:21<03:22,  1.17it/s]

Error extracting text from http://www.ndb.int/NPA-not-an-issue-rate-cuts-to-fetch-Rs-2-5-trillion-gains-K-V-Kamath.php: 403 Client Error: Forbidden for url: https://www.ndb.int/NPA-not-an-issue-rate-cuts-to-fetch-Rs-2-5-trillion-gains-K-V-Kamath.php


Processing URLs:  77%|███████▋  | 766/1000 [37:22<02:44,  1.42it/s]

Error extracting text from http://www.wsj.com/articles/amid-encryption-fight-attorney-general-urges-cooperation-1456830001: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/amid-encryption-fight-attorney-general-urges-cooperation-1456830001


Processing URLs:  77%|███████▋  | 768/1000 [37:34<11:06,  2.87s/it]

Error extracting text from http://www.reuters.com/article/us-cyber-spying-idUSKCN10J18I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-spying-idUSKCN10J18I


Processing URLs:  77%|███████▋  | 769/1000 [37:35<09:18,  2.42s/it]

URL filtered: https://www.youtube.com/watch?v=sPMJ6b-xhtk&feature=youtu.be


Processing URLs:  77%|███████▋  | 771/1000 [37:37<06:01,  1.58s/it]

Error extracting text from http://www.undispatch.com/burundi-in-a-tailspin/: 403 Client Error: Forbidden for url: http://undispatch.com/burundi-in-a-tailspin/


Processing URLs:  78%|███████▊  | 777/1000 [38:46<1:09:34, 18.72s/it]

Error extracting text from http://www.charlotteobserver.com/news/nation-world/world/article197334714.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  78%|███████▊  | 778/1000 [38:48<50:49, 13.74s/it]  

Error extracting text from http://atimes.com/2016/01/japan-warns-china-over-naval-incursions-near-disputed-isles/: 404 Client Error: Not Found for url: https://atimes.com/2016/01/japan-warns-china-over-naval-incursions-near-disputed-isles/


Processing URLs:  78%|███████▊  | 780/1000 [38:54<30:21,  8.28s/it]

Error extracting text from http://www.38north.org: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  78%|███████▊  | 784/1000 [38:59<10:21,  2.88s/it]

Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)30890-X/fulltext?elsca1=etoc: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)30890-X/fulltext?elsca1=etoc


Processing URLs:  79%|███████▉  | 788/1000 [39:01<04:09,  1.17s/it]

Error extracting text from https://www.yahoo.com/news/us-challenges-chinas-100-bn-rice-cereal-subsidies-160011658.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/us-challenges-chinas-100-bn-rice-cereal-subsidies-160011658.html


Processing URLs:  79%|███████▉  | 789/1000 [40:01<1:06:18, 18.85s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-09-15/us-president-biden-to-host-uk-pm-boris-johnson-at-the-white-house-next-week-axios: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  79%|███████▉  | 790/1000 [40:03<48:09, 13.76s/it]  

Error extracting text from https://www.reuters.com/world/europe/navalny-mocks-putin-over-accusation-he-consciously-left-coma-2021-06-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/navalny-mocks-putin-over-accusation-he-consciously-left-coma-2021-06-17/


Processing URLs:  79%|███████▉  | 792/1000 [40:06<27:40,  7.98s/it]

Error extracting text from http://en.dailypakistan.com.pk/pakistan/hrcp-report-2015-violence-related-deaths-decreases-upto-40-in-pakistan/: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/pakistan/hrcp-report-2015-violence-related-deaths-decreases-upto-40-in-pakistan/


Processing URLs:  79%|███████▉  | 794/1000 [41:09<1:11:23, 20.80s/it]

Error extracting text from http://aa.com.tr/en/world/us-concerned-by-russia-moving-forces-north-of-aleppo/559296: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  80%|███████▉  | 796/1000 [41:11<38:20, 11.28s/it]  

Error extracting text from https://www.wsj.com/articles/gop-may-tie-debt-limit-increase-to-veterans-bill-1500050835: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gop-may-tie-debt-limit-increase-to-veterans-bill-1500050835


Processing URLs:  80%|███████▉  | 799/1000 [41:17<16:43,  4.99s/it]

Error extracting text from https://reut.rs/3jN7erN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/adviser-iraqi-pm-says-lack-opec-coordination-will-lead-price-war-ina-2021-07-05/
Error extracting text from http://www.reuters.com/article/us-iraq-oil-attack-idUSKCN10B05F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iraq-oil-attack-idUSKCN10B05F


Processing URLs:  80%|████████  | 801/1000 [41:28<15:57,  4.81s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-swiss/swiss-ready-to-mediate-in-north-korea-crisis-idUSKCN1BF157: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-swiss/swiss-ready-to-mediate-in-north-korea-crisis-idUSKCN1BF157


Processing URLs:  81%|████████  | 807/1000 [41:45<08:04,  2.51s/it]

Error extracting text from https://www.reuters.com/world/china/exclusive-us-set-add-more-chinese-companies-blacklist-over-xinjiang-2021-07-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/china/exclusive-us-set-add-more-chinese-companies-blacklist-over-xinjiang-2021-07-09/


Processing URLs:  81%|████████  | 809/1000 [41:46<05:21,  1.69s/it]

Error extracting text from http://uk.reuters.com/article/uk-southchinasea-china-japan-idUKKCN11Z10O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  81%|████████  | 812/1000 [41:52<06:24,  2.05s/it]

Error extracting text from http://bulgariaanalytica.org/cbbss/?p=8797: 403 Client Error: Forbidden for url: http://cbbss.org/?p=8797


Processing URLs:  82%|████████▏ | 815/1000 [41:54<03:31,  1.14s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Asian-trade-pact-talks-to-miss-2016-deadline: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Asian-trade-pact-talks-to-miss-2016-deadline
Error extracting text from http://www.nytimes.com/2015/10/11/world/asia/at-north-korean-military-parade-a-rare-public-speech-by-kim.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/11/world/asia/at-north-korean-military-parade-a-rare-public-speech-by-kim.html


Processing URLs:  82%|████████▏ | 817/1000 [41:55<02:14,  1.36it/s]

Error extracting text from http://www.hybridcars.com/chinese-analysts-concerned-electric-car-sales-need-more-than-subsidies-to-succeed/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/chinese-analysts-concerned-electric-car-sales-need-more-than-subsidies-to-succeed/
Error extracting text from https://www.reuters.com/article/us-usa-trade-china-idUSKBN29Y2DV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-china-idUSKBN29Y2DV


Processing URLs:  82%|████████▏ | 818/1000 [41:56<02:20,  1.29it/s]

Error extracting text from http://nationalinterest.org/blog/the-buzz/nukes-subs-missiles-how-russia-plans-challenge-americas-20622: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/nukes-subs-missiles-how-russia-plans-challenge-americas-20622


Processing URLs:  82%|████████▏ | 822/1000 [42:06<05:31,  1.86s/it]

Error extracting text from http://uk.reuters.com/article/uk-usa-trump-tax-idUKKBN16Q0AD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  82%|████████▏ | 823/1000 [42:07<04:30,  1.53s/it]

Error extracting text from http://www.amazon.com/Jurassic-World-Limited-Edition-Packaging/dp/B00NYC5TVG: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Jurassic-World-Limited-Edition-Packaging/dp/B00NYC5TVG


Processing URLs:  83%|████████▎ | 829/1000 [42:14<02:59,  1.05s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-08/-1-3-trillion-housing-boom-set-to-be-india-s-next-growth-driver
Error extracting text from https://www.imf.org/en/Countries/CHN: 403 Client Error: Forbidden for url: https://www.imf.org/en/Countries/CHN


Processing URLs:  83%|████████▎ | 832/1000 [42:15<01:37,  1.72it/s]

Error extracting text from http://cdn.aiindex.org/2018/AI%20Index%202018%20Annual%20Report.pdf: HTTPConnectionPool(host='cdn.aiindex.org', port=80): Max retries exceeded with url: /2018/AI%20Index%202018%20Annual%20Report.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30451c110>: Failed to resolve 'cdn.aiindex.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.metaculus.com/questions/1613/will-facebooks-share-price-be-lower-in-3-months-than-it-is-today/
Error extracting text from https://www.iol.co.za/capetimes/opinion/how-exactly-will-our-president-depart-from-office-apart-from-reluctantly-12743497: 403 Client Error: Forbidden for url: https://www.iol.co.za/capetimes/opinion/how-exactly-will-our-president-depart-from-office-apart-from-reluctantly-12743497


Processing URLs:  84%|████████▎ | 836/1000 [42:19<02:45,  1.01s/it]

URL filtered: https://www.youtube.com/watch?v=i8O0PwGSoO0


Processing URLs:  84%|████████▍ | 842/1000 [42:25<02:52,  1.09s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-07/maduro-chooses-food-over-bondholders-as-venezuela-goes-hungry


Processing URLs:  84%|████████▍ | 845/1000 [42:27<02:13,  1.17it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-25/saudi-aramco-said-to-appoint-jpmorgan-hsbc-for-debut-bond-sale


Processing URLs:  85%|████████▍ | 848/1000 [42:29<01:35,  1.59it/s]

Error extracting text from http://www.scientificamerican.com/article/many-prisoners-on-death-row-are-wrongfully-convicted/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/many-prisoners-on-death-row-are-wrongfully-convicted/


Processing URLs:  85%|████████▍ | 849/1000 [42:30<01:41,  1.49it/s]

URL filtered: https://www.youtube.com/watch?v=TP3mXVRd89Y


Processing URLs:  86%|████████▌ | 855/1000 [42:34<01:55,  1.25it/s]

Error extracting text from https://2019impactreport.teamusa.org/USOPC-2019-Consolidated-Financial-Statement.pdf: PyCryptodome is required for AES algorithm


Processing URLs:  86%|████████▌ | 857/1000 [42:37<02:40,  1.12s/it]

Error extracting text from https://www.reuters.com/article/us-northkorea-missiles-southkorea-idUSKBN18Z075: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-southkorea-idUSKBN18Z075


Processing URLs:  86%|████████▌ | 860/1000 [42:43<03:19,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-ukraine-crisis-nato-idUSKCN0RK14M20150921#jDJAL1TxXHicPiVT.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-nato-idUSKCN0RK14M20150921#jDJAL1TxXHicPiVT.99


Processing URLs:  86%|████████▌ | 862/1000 [42:45<02:50,  1.23s/it]

Error extracting text from https://www.un.org/sg/en/content/sg/statement/2021-01-31/statement-attributable-the-spokesperson-for-the-secretary-general-myanmar: 403 Client Error: Forbidden for url: https://www.un.org/sg/en/content/sg/statement/2021-01-31/statement-attributable-the-spokesperson-for-the-secretary-general-myanmar


Processing URLs:  86%|████████▋ | 865/1000 [42:47<01:37,  1.38it/s]

Error extracting text from https://www.ipsos.com/ipsos-mori/en-uk/snp-retains-strong-lead-independence-dominates-voters-concerns: 403 Client Error: Forbidden for url: https://www.ipsos.com/ipsos-mori/en-uk/snp-retains-strong-lead-independence-dominates-voters-concerns
Error extracting text from http://www.nytimes.com/2016/03/06/opinion/sunday/tricked-into-cheating-and-sentenced-to-death.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/06/opinion/sunday/tricked-into-cheating-and-sentenced-to-death.html?_r=0


Processing URLs:  87%|████████▋ | 866/1000 [42:48<01:58,  1.13it/s]

Error extracting text from http://www.ibtimes.co.uk/china-will-be-only-winner-donald-trumps-war-clean-energy-epa-1603038: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/china-will-be-only-winner-donald-trumps-war-clean-energy-epa-1603038


Processing URLs:  87%|████████▋ | 869/1000 [43:51<40:57, 18.76s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2015/11/16/confusing-candidates-on-ballot-vex-venezuela-opposition: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  87%|████████▋ | 872/1000 [43:54<14:45,  6.92s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-tillerson-idUSKBN1560X8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-tillerson-idUSKBN1560X8


Processing URLs:  87%|████████▋ | 874/1000 [43:56<08:05,  3.85s/it]

Error extracting text from http://thehill.com/homenews/administration/343489-trump-doj-nominee-used-to-represent-russian-bank-with-ties-to-putin: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/343489-trump-doj-nominee-used-to-represent-russian-bank-with-ties-to-putin/


Processing URLs:  88%|████████▊ | 876/1000 [43:57<04:43,  2.28s/it]



Processing URLs:  88%|████████▊ | 879/1000 [43:59<02:21,  1.17s/it]

Error extracting text from http://www.reuters.com/article/2015/09/16/us-usa-fed-meeting-idUSKCN0RG2CT20150916: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/16/us-usa-fed-meeting-idUSKCN0RG2CT20150916


Processing URLs:  88%|████████▊ | 880/1000 [44:00<01:47,  1.12it/s]

Error extracting text from https://www.nytimes.com/2017/10/06/world/asia/mattis-afghanistan-rules-of-engagement.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/06/world/asia/mattis-afghanistan-rules-of-engagement.html?_r=0


Processing URLs:  88%|████████▊ | 883/1000 [44:05<02:39,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-egypt-cenbank-rates-idUSKBN18H113: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-egypt-cenbank-rates-idUSKBN18H113


Processing URLs:  89%|████████▉ | 888/1000 [44:20<06:37,  3.55s/it]

Error extracting text from http://www.didier-bertin.org/pages/genocides-humankind/rwanda-genocides-english.html: HTTPConnectionPool(host='www.didier-bertin.org', port=80): Max retries exceeded with url: /pages/genocides-humankind/rwanda-genocides-english.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3db770>: Failed to resolve 'www.didier-bertin.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  89%|████████▉ | 892/1000 [44:26<03:54,  2.17s/it]

Error extracting text from https://www.four-paws.org/our-stories/press-releases/eu-agriculture-ministers-discuss-covid-19-and-mink-farms: 404 Client Error: Not Found for url: https://www.four-paws.org/eu-agriculture-ministers-discuss-covid-19-and-mink-farms


Processing URLs:  90%|████████▉ | 896/1000 [44:32<03:01,  1.75s/it]

URL filtered: https://www.youtube.com/watch?v=GiPe1OiKQuk


Processing URLs:  90%|█████████ | 900/1000 [44:38<02:15,  1.35s/it]

Error extracting text from http://www.dinargururv.com/2016/06/washington-hails-abadi-to-its-agreement-with-the-imf-for-a-loan-of-five-billion-iraq/: 404 Client Error: Not Found for url: https://www.dinargururv.com/2016/06/washington-hails-abadi-to-its-agreement-with-the-imf-for-a-loan-of-five-billion-iraq/
Error extracting text from http://www.reuters.com/article/us-venezuela-maduro-idUSKCN0QI25Y20150813: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-maduro-idUSKCN0QI25Y20150813


Processing URLs:  90%|█████████ | 901/1000 [44:38<01:41,  1.03s/it]

Error extracting text from http://translate.google.com/translate?hl=en&amp;sl=fa&amp;tl=en&amp;u=http%3A%2F%2Fisna.ir%2Ffa%2Fnews%2F94121006540%2Fنتایج-انتخابات-مجلس-جدول-گرایش-ها&amp;sandbox=1: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fa&amp;tl=en&amp;u=http://isna.ir/fa/news/94121006540/%D9%86%D8%AA%D8%A7%DB%8C%D8%AC-%D8%A7%D9%86%D8%AA%D8%AE%D8%A7%D8%A8%D8%A7%D8%AA-%D9%85%D8%AC%D9%84%D8%B3-%D8%AC%D8%AF%D9%88%D9%84-%DA%AF%D8%B1%D8%A7%DB%8C%D8%B4-%D9%87%D8%A7&amp;sandbox=1


Processing URLs:  90%|█████████ | 904/1000 [44:41<01:50,  1.15s/it]

Error extracting text from https://en-press.ens.dk/pressreleases/nord-stream-2-pipeline-b-can-be-put-into-operation-3133391: HTTPSConnectionPool(host='en-press.ens.dk', port=443): Max retries exceeded with url: /pressreleases/nord-stream-2-pipeline-b-can-be-put-into-operation-3133391 (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  91%|█████████ | 906/1000 [44:49<04:03,  2.59s/it]

URL filtered: https://www.youtube.com/watch?v=hNBgEirKxq8


Processing URLs:  91%|█████████ | 909/1000 [44:52<02:31,  1.66s/it]

URL filtered: https://twitter.com/DaniloOnorino/status/854959028455313408
URL filtered: http://www.bloomberg.com/news/articles/2016-07-20/u-s-maps-1mdb-fraud-trail-from-kuala-lumpur-to-hollywood


Processing URLs:  92%|█████████▏| 920/1000 [45:06<01:54,  1.43s/it]

Error extracting text from http://www.newsweek.com/ramadi-mosul-iraq-isis-shia-millitas-sunni-baghdad-iran-iraqi-security-forces-407085: 403 Client Error: Forbidden for url: https://www.newsweek.com/ramadi-mosul-iraq-isis-shia-millitas-sunni-baghdad-iran-iraqi-security-forces-407085


Processing URLs:  93%|█████████▎| 931/1000 [45:20<01:38,  1.43s/it]

Error extracting text from https://in.reuters.com/article/usa-trump-russia-idINKBN1AA0C9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  93%|█████████▎| 933/1000 [45:21<01:12,  1.08s/it]

Error extracting text from https://itunes.apple.com/gb/app/venezuela-econ/id1112743973?mt=8: 404 Client Error: Not Found for url: https://apps.apple.com/gb/app/venezuela-econ/id1112743973


Processing URLs:  94%|█████████▍| 941/1000 [45:39<01:55,  1.95s/it]

Error extracting text from http://www.reuters.com/article/spain-politics-idUSL8N1AR3LL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/spain-politics-idUSL8N1AR3LL
Error extracting text from https://blog.openai.com/openai-five/: HTTPSConnectionPool(host='blog.openai.com', port=443): Max retries exceeded with url: /openai-five/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff1ccce0>: Failed to resolve 'blog.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  94%|█████████▍| 943/1000 [45:40<01:04,  1.13s/it]

Error extracting text from https://www.nytimes.com/2018/01/03/world/asia/north-korea-hotline-south.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/03/world/asia/north-korea-hotline-south.html


Processing URLs:  95%|█████████▍| 949/1000 [45:53<01:31,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-iran-oil-storage-idUSKBN1780H6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-storage-idUSKBN1780H6


Processing URLs:  95%|█████████▌| 952/1000 [45:56<00:55,  1.16s/it]

Error extracting text from http://autoweek.com/article/green-cars/infiniti-will-go-mostly-electric-2021: 403 Client Error: Forbidden for url: http://autoweek.com/article/green-cars/infiniti-will-go-mostly-electric-2021


Processing URLs:  95%|█████████▌| 953/1000 [45:57<00:56,  1.21s/it]

Error extracting text from https://www.reuters.com/article/us-cyber-hotels-idUSKBN1AR1IZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-hotels-idUSKBN1AR1IZ


Processing URLs:  96%|█████████▌| 955/1000 [45:58<00:36,  1.25it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-27/deadly-clashes-erupt-between-yemen-s-shiite-rebels-saleh-forces


Processing URLs:  96%|█████████▌| 959/1000 [46:02<00:40,  1.01it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-22/main-features-of-saudi-arabia-s-2017-budget-2016-performance


Processing URLs:  96%|█████████▋| 963/1000 [46:05<00:32,  1.13it/s]

URL filtered: https://www.youtube.com/watch?v=El11u9uN4wo


Processing URLs:  97%|█████████▋| 967/1000 [46:06<00:16,  2.01it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/2/us-ready-to-boost-colombian-aid-in-hopes-of-farc-p/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/2/us-ready-to-boost-colombian-aid-in-hopes-of-farc-p/


Processing URLs:  97%|█████████▋| 970/1000 [46:10<00:26,  1.15it/s]

URL filtered: https://www.google.com/amp/s/www.businessinsider.com/facebook-twitter-sue-florida-censorship-deplatforming-law-ron-desantis-2021-6%3famp


Processing URLs:  97%|█████████▋| 972/1000 [46:13<00:31,  1.13s/it]

Error extracting text from http://38north.org/2016/02/sohae020316/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml
Error extracting text from https://forums.teslamotors.com/forum/forums/gm-successful-blocking-tesla-sales-connecticut: HTTPSConnectionPool(host='forums.teslamotors.com', port=443): Max retries exceeded with url: /forum/forums/gm-successful-blocking-tesla-sales-connecticut (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300c93710>: Failed to resolve 'forums.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  98%|█████████▊| 975/1000 [46:19<00:39,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-zones-russia-idUSKBN15A0WK?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-zones-russia-idUSKBN15A0WK?il=0


Processing URLs:  98%|█████████▊| 978/1000 [46:19<00:20,  1.09it/s]

Error extracting text from https://www.justice.gov/usao-sdny/press-release/file/1306611/download: 403 Client Error: Forbidden for url: https://www.justice.gov/usao-sdny/press-release/file/1306611/download


Processing URLs:  98%|█████████▊| 981/1000 [47:23<05:20, 16.86s/it]

Error extracting text from https://archive.is/qNeEE#selection-859.150-859.288: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  99%|█████████▊| 986/1000 [47:30<00:59,  4.24s/it]

Error extracting text from http://ancestry.com/: 403 Client Error: Forbidden for url: http://ancestry.com/


Processing URLs:  99%|█████████▉| 988/1000 [48:30<02:55, 14.62s/it]

Error extracting text from https://dc.isda.org/cds/petroleos-de-venezuela-s-a/sor: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /cds/petroleos-de-venezuela-s-a/sor (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3051d8b30>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))
Error extracting text from https://transportation.house.gov/imo/media/doc/Scholl%20Testimony1.pdf: 403 Client Error: Forbidden for url: https://transportation.house.gov/imo/media/doc/Scholl%20Testimony1.pdf


Processing URLs:  99%|█████████▉| 991/1000 [48:33<00:49,  5.49s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN16D240: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN16D240


Processing URLs: 100%|█████████▉| 995/1000 [48:38<00:10,  2.10s/it]

Error extracting text from http://english.alarabiya.net/en/media/print/2017/01/28/News-of-Bashar-al-Assad-suffering-a-stroke-has-gone-viral.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/media/print/2017/01/28/News-of-Bashar-al-Assad-suffering-a-stroke-has-gone-viral.html


Processing URLs: 100%|█████████▉| 998/1000 [48:41<00:02,  1.41s/it]

Error extracting text from https://www.nytimes.com/2018/03/23/opinion/john-bolton-trump-national-security-adviser.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/03/23/opinion/john-bolton-trump-national-security-adviser.html


Processing URLs: 100%|██████████| 1000/1000 [48:43<00:00,  2.92s/it]
Processing URLs:   0%|          | 1/1000 [00:08<2:26:02,  8.77s/it]

Error extracting text from http://news.trust.org/item/20160212192011-yq3dd: 404 Client Error:  for url: https://news.trust.org:443/item/20160212192011-yq3dd


Processing URLs:   0%|          | 2/1000 [00:10<1:15:01,  4.51s/it]

Error extracting text from http://europe.newsweek.com/uk-snap-election-date-theresa-may-call-early-election-fix-term-parliaments-act-570808?utm_source=email&amp;utm_medium=newsletter&amp;utm_campaign=newsletter&amp;utm_content=read_more&amp;spMailingID=1507589&amp;spUserID=MTI0NzM2MjM0NzYS1&amp;spJobID=750591156&amp;spReportId=NzUwNTkxMTU2S0: 403 Client Error: Forbidden for url: https://www.newsweek.com/uk-snap-election-date-theresa-may-call-early-election-fix-term-parliaments-act-570808


Processing URLs:   0%|          | 4/1000 [00:13<43:21,  2.61s/it]  

Error extracting text from http://www.tribune.net.ph/headlines/8-of-10-pinoys-want-rp-to-assert-rights-on-s-china-sea-pulse-poll: 404 Client Error: Not Found for url: https://tribune.net.ph/headlines/8-of-10-pinoys-want-rp-to-assert-rights-on-s-china-sea-pulse-poll


Processing URLs:   1%|          | 8/1000 [00:18<24:18,  1.47s/it]

Error extracting text from http://www.reuters.com/article/china-economy-qatar-idUSL4N0ST3LY20141103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/china-economy-qatar-idUSL4N0ST3LY20141103


Processing URLs:   1%|          | 9/1000 [00:19<19:48,  1.20s/it]

Error extracting text from http://thehill.com/policy/defense/279774-ex-obama-official-says-us-should-drop-demand-assad-must-go: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/279774-ex-obama-official-says-us-should-drop-demand-assad-must-go/


Processing URLs:   1%|          | 10/1000 [00:19<14:44,  1.12it/s]

Error extracting text from https://www.nytimes.com/2017/08/08/world/americas/nicolas-maduro-venezuela-military.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/08/world/americas/nicolas-maduro-venezuela-military.html


Processing URLs:   1%|          | 11/1000 [00:20<13:17,  1.24it/s]

Error extracting text from http://www.turkeyvisa.info/turkey-diplomats-says-may-be-it-will-be-visa-free-regime-with-russian-in-future/: 404 Client Error: Not Found for url: https://turkeyvisa.info/turkey-diplomats-says-may-be-it-will-be-visa-free-regime-with-russian-in-future/


Processing URLs:   2%|▏         | 15/1000 [00:27<26:34,  1.62s/it]

Error extracting text from https://www.fda.gov/medical-devices/neurological-devices/regulatory-overview-neurological-devices: 403 Client Error: Forbidden for url: https://www.fda.gov/medical-devices/neurological-devices/regulatory-overview-neurological-devices


Processing URLs:   2%|▏         | 18/1000 [01:31<5:16:25, 19.33s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-08-29/donald-trump-jr-will-speak-to-investigators-about-meeting-with-russian-lawyer: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   2%|▏         | 20/1000 [01:33<2:42:46,  9.97s/it]

Error extracting text from http://thehill.com/opinion/cybersecurity/361121-the-feds-not-companies-are-most-liable-to-mishandle-our-personal-data: 403 Client Error: Forbidden for url: https://thehill.com/opinion/cybersecurity/361121-the-feds-not-companies-are-most-liable-to-mishandle-our-personal-data/


Processing URLs:   2%|▏         | 24/1000 [01:45<1:09:37,  4.28s/it]

Error extracting text from http://thehill.com/policy/energy-environment/256967-gop-shifts-strategy-on-oil-export-ban: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/256967-gop-shifts-strategy-on-oil-export-ban/


Processing URLs:   3%|▎         | 26/1000 [01:48<44:50,  2.76s/it]  

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-iraq-mosul-insight-idUKKCN11E2DC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-usa-trump-coal-idUSKBN1762YY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-coal-idUSKBN1762YY


Processing URLs:   3%|▎         | 29/1000 [01:52<32:48,  2.03s/it]

Error extracting text from http://www.inquisitr.com/2833569/zika-advice-for-olympics-2016-cdc-forewarns-pregnant-women-to-stay-away-from-olympics-for-zika-virus-fears/: 404 Client Error: Not Found for url: https://www.inquisitr.com:443/2833569/zika-advice-for-olympics-2016-cdc-forewarns-pregnant-women-to-stay-away-from-olympics-for-zika-virus-fears


Processing URLs:   3%|▎         | 32/1000 [01:57<27:13,  1.69s/it]

Error extracting text from https://www.niaid.nih.gov/research/emerging-infectious-diseases-pathogens: 403 Client Error: Forbidden for url: https://www.niaid.nih.gov/research/emerging-infectious-diseases-pathogens


Processing URLs:   4%|▎         | 36/1000 [02:03<22:33,  1.40s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-mosul-village-idUSKCN12R0QN?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-mosul-village-idUSKCN12R0QN?mod=related&amp;channelName=worldNews


Processing URLs:   4%|▍         | 41/1000 [02:11<23:18,  1.46s/it]

Error extracting text from https://in.reuters.com/article/venezuela-politics-idINKBN1AE04Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:   4%|▍         | 43/1000 [02:11<14:03,  1.14it/s]

Error extracting text from https://www.latimes.com/entertainment-arts/business/story/2021-04-26/oscars-telecast-draws-record-low-viewers-abc: 403 Client Error: Forbidden for url: https://www.latimes.com/entertainment-arts/business/story/2021-04-26/oscars-telecast-draws-record-low-viewers-abc


Processing URLs:   4%|▍         | 44/1000 [02:12<11:05,  1.44it/s]

Error extracting text from https://www.nytimes.com/2021/07/01/world/asia/covid-myanmar-coup.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/01/world/asia/covid-myanmar-coup.html


Processing URLs:   5%|▍         | 49/1000 [02:19<19:11,  1.21s/it]

Error extracting text from http://www.militarytimes.com/story/military/pentagon/2016/02/29/us-expand-improve-support-iraqs-mosul-operation/81116020/: 404 Client Error: Not Found for url: https://www.militarytimes.com/story/military/pentagon/2016/02/29/us-expand-improve-support-iraqs-mosul-operation/81116020/


Processing URLs:   5%|▌         | 54/1000 [02:27<19:42,  1.25s/it]

Error extracting text from https://www.rferl.org/a/pentagon-lethal-aid-ukraine-russia-crimea/28660304.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/pentagon-lethal-aid-ukraine-russia-crimea/28660304.html
Error extracting text from http://www.nytimes.com/2015/10/12/business/dealbook/bankers-grapple-with-how-to-help-emerging-markets.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/12/business/dealbook/bankers-grapple-with-how-to-help-emerging-markets.html


Processing URLs:   6%|▌         | 55/1000 [02:27<14:17,  1.10it/s]

Error extracting text from https://www.reuters.com/article/us-britain-eu/brexit-trade-deal-may-be-imminent-senior-eu-source-says-idUSKBN28X0TW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/brexit-trade-deal-may-be-imminent-senior-eu-source-says-idUSKBN28X0TW


Processing URLs:   6%|▌         | 58/1000 [02:33<24:12,  1.54s/it]

Error extracting text from https://covid19tracker.health.ny.gov/views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Map?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n: HTTPSConnectionPool(host='covid19tracker.health.ny.gov', port=443): Max retries exceeded with url: /views/NYS-COVID19-Tracker/NYSDOHCOVID-19Tracker-Map?%3Aembed=yes&%3Atoolbar=no&%3Atabs=n (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'covid19tracker.health.ny.gov'. (_ssl.c:1000)")))


Processing URLs:   6%|▌         | 61/1000 [02:36<16:29,  1.05s/it]

Error extracting text from http://www.ohchr.org/en/NewsEvents/Pages/DisplayNews.aspx?NewsID=20534&amp;LangID=E: 403 Client Error: Forbidden for url: https://www.ohchr.org/en/NewsEvents/Pages/DisplayNews.aspx?NewsID=20534&amp;LangID=E


Processing URLs:   6%|▋         | 63/1000 [02:38<14:45,  1.06it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/257162-key-goper-pushes-obama-on-cuban-involvement-in-syria: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/257162-key-goper-pushes-obama-on-cuban-involvement-in-syria/


Processing URLs:   6%|▋         | 64/1000 [02:38<13:52,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-raqqa-idUSKBN15J0CZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-raqqa-idUSKBN15J0CZ


Processing URLs:   7%|▋         | 66/1000 [02:40<12:46,  1.22it/s]

Error extracting text from http://www.newsweek.com/iran-military-us-get-out-persian-gulf-577231: 403 Client Error: Forbidden for url: https://www.newsweek.com/iran-military-us-get-out-persian-gulf-577231


Processing URLs:   7%|▋         | 68/1000 [02:43<17:15,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-usa-palestinian-idUSKBN17Z06S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-palestinian-idUSKBN17Z06S


Processing URLs:   8%|▊         | 75/1000 [02:55<29:00,  1.88s/it]

Error extracting text from http://tass.ru/en/politics/835802: 404 Client Error: Not Found for url: https://tass.ru/en/politics/835802


Processing URLs:   8%|▊         | 78/1000 [02:58<20:29,  1.33s/it]

Error extracting text from http://www.caam.org.cn/zhengceyanjiu/20161121/1005201237.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/zhengceyanjiu/20161121/1005201237.html


Processing URLs:   8%|▊         | 80/1000 [03:00<15:36,  1.02s/it]

Error extracting text from https://english.ahram.org.eg/NewsContent/6/56/411496/Sports/Omni-Sports/IOC-chiefs-delayed-preOlympics-Japan-trip-to-happe.aspx: 403 Client Error: Forbidden for url: https://english.ahram.org.eg/NewsContent/6/56/411496/Sports/Omni-Sports/IOC-chiefs-delayed-preOlympics-Japan-trip-to-happe.aspx


Processing URLs:   8%|▊         | 81/1000 [03:01<17:22,  1.13s/it]

Error extracting text from http://www.debka.com/article/25900/Trump-Putin-safe-zones-deal-ousts-Iran-from-Syria: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25900/Trump-Putin-safe-zones-deal-ousts-Iran-from-Syria (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:   8%|▊         | 84/1000 [03:06<20:50,  1.37s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://www.shs-conferences.org/articles/shsconf/pdf/2016/06/shsconf_rptss2016_01048.pdf&amp;usg=ALkJrhhW7MRIds7ztgsPvT51qEVX5fA7Lg: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://www.shs-conferences.org/articles/shsconf/pdf/2016/06/shsconf_rptss2016_01048.pdf&amp;usg=ALkJrhhW7MRIds7ztgsPvT51qEVX5fA7Lg


Processing URLs:   9%|▊         | 86/1000 [03:08<17:33,  1.15s/it]

Error extracting text from http://carnegieeurope.eu/strategiceurope/64235: 403 Client Error: Forbidden for url: http://carnegieeurope.eu/strategiceurope/64235


Processing URLs:   9%|▉         | 91/1000 [03:18<27:13,  1.80s/it]

Error extracting text from http://www.wsj.com/articles/china-flew-fighter-jets-to-disputed-south-china-sea-island-u-s-officials-say-1456292008: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-flew-fighter-jets-to-disputed-south-china-sea-island-u-s-officials-say-1456292008


Processing URLs:  10%|▉         | 98/1000 [04:25<4:46:26, 19.05s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-06-24/israel-to-ease-more-gaza-restrictions-as-truce-holds: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  10%|█         | 101/1000 [05:13<3:43:03, 14.89s/it]

Error extracting text from http://nyti.ms/1Lb0CIi: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/10/world/middleeast/hussein-hamedani-iran-general-killed-in-syria.html?smid=pl-share
URL filtered: https://twitter.com/i/status/1337502815464263681


Processing URLs:  11%|█         | 107/1000 [05:32<1:07:58,  4.57s/it]

Error extracting text from https://www.nsa.gov/: 403 Client Error: Forbidden for url: https://www.nsa.gov/
Error extracting text from https://www.reuters.com/article/us-usa-congress-moore/white-house-wants-republican-in-alabama-senate-seat-for-tax-bill-vote-adviser-idUSKBN1DK1T0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-moore/white-house-wants-republican-in-alabama-senate-seat-for-tax-bill-vote-adviser-idUSKBN1DK1T0


Processing URLs:  11%|█         | 108/1000 [05:32<52:04,  3.50s/it]  

URL filtered: https://twitter.com/mod_russia/status/1360252983678795787?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1360252983678795787%7Ctwgr%5E%7Ctwcon%5Es1_c10&amp;ref_url=https%3A%2F%2Fwww.islamabadscene.com%2Fpakistan-navys-biggest-exercise-aman-21-begins-45-countries-participating%2F


Processing URLs:  11%|█         | 111/1000 [10:39<13:02:54, 52.84s/it]

Error extracting text from http://www.nytimes.com/2016/07/19/world/asia/china-sea-air-patrols.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/19/world/asia/china-sea-air-patrols.html?_r=0


Processing URLs:  11%|█         | 112/1000 [10:40<9:48:12, 39.74s/it] 

Error extracting text from https://reut.rs/3p8U8nR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-idUSKBN2A92TM
Error extracting text from https://www.reuters.com/world/asia-pacific/half-all-afghan-district-centers-under-taliban-control-us-general-2021-07-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/half-all-afghan-district-centers-under-taliban-control-us-general-2021-07-21/


Processing URLs:  11%|█▏        | 114/1000 [10:40<5:40:28, 23.06s/it]

Error extracting text from http://www.thelocal.es/20160609/podemos-coalition-set-to-beat-spains-socialist: 403 Client Error: Forbidden for url: https://www.thelocal.es/20160609/podemos-coalition-set-to-beat-spains-socialist


Processing URLs:  12%|█▏        | 115/1000 [10:41<4:23:11, 17.84s/it]

Error extracting text from http://ukpollingreport.co.uk/voting-intention-2005-2010: 404 Client Error: Not Found for url: http://ukpollingreport.co.uk/voting-intention-2005-2010


Processing URLs:  12%|█▏        | 118/1000 [10:45<1:56:50,  7.95s/it]

Error extracting text from http://www.techworld.com/picture-gallery/big-data/9-tech-giants-investing-in-artificial-intelligence-3629737/: 404 Client Error: Not Found for url: https://www.computerworld.com/article/3547108/how-tech-giants-are-investing-in-artificial-intelligence.html


Processing URLs:  12%|█▏        | 120/1000 [10:53<1:25:07,  5.80s/it]

Error extracting text from http://www.amazon.com/Best-Sellers-Books-Business-Money/zgbs/books/3/ref=zg_bs_nav_b_1_b: 503 Server Error: Service Unavailable for url: https://www.amazon.com/Best-Sellers-Books-Business-Money/zgbs/books/3/ref=zg_bs_nav_b_1_b


Processing URLs:  12%|█▏        | 121/1000 [10:54<1:03:00,  4.30s/it]

Error extracting text from https://www.maritime-executive.com/article/indonesia-deploys-anti-air-system-in-s-china-sea: 403 Client Error: Forbidden for url: https://www.maritime-executive.com/article/indonesia-deploys-anti-air-system-in-s-china-sea


Processing URLs:  12%|█▏        | 123/1000 [10:55<35:01,  2.40s/it]  

Error extracting text from http://energypolicy.columbia.edu/blog/predicting-when-new-iranian-oil-will-begin-flow: 403 Client Error: Forbidden for url: http://www.energypolicy.columbia.edu/blog/predicting-when-new-iranian-oil-will-begin-flow
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-ceasefire-idUSKBN0UE0B620151231: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-ceasefire-idUSKBN0UE0B620151231
Error extracting text from http://training.goodjudgment.com/keepingscore/index.html: HTTPConnectionPool(host='training.goodjudgment.com', port=80): Max retries exceeded with url: /keepingscore/index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300c93f50>: Failed to resolve 'training.goodjudgment.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  13%|█▎        | 126/1000 [10:56<18:10,  1.25s/it]

Error extracting text from https://www.nytimes.com/2021/06/01/world/middleeast/palestinians-netanyahu-government.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/01/world/middleeast/palestinians-netanyahu-government.html


Processing URLs:  13%|█▎        | 128/1000 [10:59<20:01,  1.38s/it]

URL filtered: https://www.poynter.org/2016/facebook-rolls-out-plan-to-fight-fake-news/442987/


Processing URLs:  13%|█▎        | 130/1000 [12:00<3:22:16, 13.95s/it]

Error extracting text from http://www.spaceflightinsider.com/organizations/space-exploration-technologies/spacex-still-expects-resume-launches-end-year/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /organizations/space-exploration-technologies/spacex-still-expects-resume-launches-end-year/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300c91370>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  13%|█▎        | 131/1000 [12:01<2:37:21, 10.87s/it]

Error extracting text from http://time.com/4893218/paul-manafort-fbi-search-warrant/: 404 Client Error: Not Found for url: https://time.com/4893218/paul-manafort-fbi-search-warrant/


Processing URLs:  14%|█▍        | 141/1000 [12:17<27:08,  1.90s/it]  

Error extracting text from http://iranmatters.belfercenter.org/blog/explainer-iranian-review-nuclear-deal: 500 Server Error: Domain Not Found for url: http://iranmatters.belfercenter.org/blog/explainer-iranian-review-nuclear-deal


Processing URLs:  14%|█▍        | 143/1000 [12:19<21:16,  1.49s/it]

Error extracting text from http://post.understandingwar.org/report/al-qaeda-and-isis-existential-threats-us-and-europe: HTTPConnectionPool(host='post.understandingwar.org', port=80): Max retries exceeded with url: /report/al-qaeda-and-isis-existential-threats-us-and-europe (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3009379b0>: Failed to resolve 'post.understandingwar.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▍        | 147/1000 [12:23<13:56,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN16H0SP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN16H0SP


Processing URLs:  15%|█▌        | 150/1000 [12:29<23:27,  1.66s/it]

Error extracting text from http://www.kashmirmonitor.in/news-breakaway-taliban-faction-%E2%80%98expresses-support-for-talks-with-afghan-govt%E2%80%99-104609.aspx: 404 Client Error: Not Found for url: http://www.kashmirmonitor.in/news-breakaway-taliban-faction-%E2%80%98expresses-support-for-talks-with-afghan-govt%E2%80%99-104609.aspx


Processing URLs:  15%|█▌        | 152/1000 [12:30<17:00,  1.20s/it]

Error extracting text from http://atimes.com/2015/11/china-will-allow-suspended-ipos-to-launch/: 404 Client Error: Not Found for url: https://atimes.com/2015/11/china-will-allow-suspended-ipos-to-launch/
Error extracting text from http://news.yahoo.com/colombia-peace-talks-near-finale-155921061.html: 404 Client Error: Not Found for url: http://news.yahoo.com/colombia-peace-talks-near-finale-155921061.html


Processing URLs:  16%|█▌        | 156/1000 [12:34<11:44,  1.20it/s]

Error extracting text from http://www.nytimes.com/2016/01/14/opinion/iraq-and-the-kurds-are-going-broke.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/opinion/iraq-and-the-kurds-are-going-broke.html?_r=0


Processing URLs:  16%|█▌        | 157/1000 [12:36<14:55,  1.06s/it]

Error extracting text from http://nsnbc.me/2016/02/20/visit-of-farc-peace-delegates-in-colombia-caused-row-with-government/: 403 Client Error: Forbidden for url: https://nsnbc.me/2016/02/20/visit-of-farc-peace-delegates-in-colombia-caused-row-with-government/


Processing URLs:  16%|█▋        | 165/1000 [13:06<25:34,  1.84s/it]  

Error extracting text from http://www.citizen.co.za/1291471/tsunami-about-to-hit-jacob-zuma/: 404 Client Error: Not Found for url: https://www.citizen.co.za/tsunami-about-to-hit-jacob-zuma/


Processing URLs:  17%|█▋        | 168/1000 [13:07<11:32,  1.20it/s]

URL filtered: https://www.instagram.com/instagram/?hl=en
Error extracting text from http://www.reuters.com/article/us-trade-ttip-merkel-idUSKCN0XV2GC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-ttip-merkel-idUSKCN0XV2GC
URL filtered: https://www.youtube.com/watch?v=caqRJUFOMlA


Processing URLs:  17%|█▋        | 174/1000 [13:11<08:34,  1.60it/s]

Error extracting text from https://www.nytimes.com/2015/02/21/opinion/when-the-government-tells-you-what-to-eat.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/02/21/opinion/when-the-government-tells-you-what-to-eat.html


Processing URLs:  18%|█▊        | 175/1000 [13:13<12:25,  1.11it/s]

URL filtered: https://www.youtube.com/watch?v=sW_3nLCPK0E#t=2m07s


Processing URLs:  18%|█▊        | 177/1000 [13:13<08:08,  1.69it/s]

Error extracting text from https://www.nytimes.com/2017/06/03/world/asia/afghanistan-war-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/03/world/asia/afghanistan-war-trump.html


Processing URLs:  18%|█▊        | 179/1000 [13:14<07:19,  1.87it/s]

Error extracting text from https://www.congress.gov/bill/116th-congress/house-bill/2144: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/116th-congress/house-bill/2144
URL filtered: http://www.bloomberg.com/news/articles/2016-09-01/asia-is-a-growth-market-for-military-aircraft


Processing URLs:  18%|█▊        | 182/1000 [13:15<04:39,  2.92it/s]

Error extracting text from http://www.mfa.gov.pl/en/news/first_meeting_of_inter_ministerial_group_for_preparing_nato_summit_in_warsaw: HTTPConnectionPool(host='www.mfa.gov.pl', port=80): Max retries exceeded with url: /en/news/first_meeting_of_inter_ministerial_group_for_preparing_nato_summit_in_warsaw (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff6014f0>: Failed to resolve 'www.mfa.gov.pl' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2002/09/01/opinion/iraq-without-saddam.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2002/09/01/opinion/iraq-without-saddam.html


Processing URLs:  18%|█▊        | 183/1000 [13:16<05:40,  2.40it/s]

Error extracting text from http://www.imf.org/external/NP/SEC/bc/eng/index.aspx: 403 Client Error: Forbidden for url: http://www.imf.org/external/NP/SEC/bc/eng/index.aspx
URL filtered: https://www.bloomberg.com/news/articles/2021-04-21/-diplomatic-boycott-of-beijing-olympics-added-to-china-bill


Processing URLs:  19%|█▊        | 186/1000 [14:16<2:21:35, 10.44s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-04-20/chinas-xi-kicks-off-congress-with-formal-election-in-allys-province: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  19%|█▉        | 189/1000 [14:18<1:13:01,  5.40s/it]

Error extracting text from https://www.scottaaronson.com/blog/?p=5088: 406 Client Error: Not Acceptable for url: https://www.scottaaronson.com/blog/?p=5088


Processing URLs:  19%|█▉        | 191/1000 [14:19<41:45,  3.10s/it]  

Error extracting text from http://www.komodoexercise.org/: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300937b30>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-amazon-com-labor/amazon-union-election-to-start-in-february-u-s-labor-board-says-idUSKBN29K2BV?utm_source=morning_brew: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-amazon-com-labor/amazon-union-election-to-start-in-february-u-s-labor-board-says-idUSKBN29K2BV?utm_source=morning_brew


Processing URLs:  19%|█▉        | 193/1000 [14:20<26:01,  1.94s/it]

Error extracting text from http://www.praguepost.com/czech-news/praguepostnews/czech-news/eu-and-us-agree-on-data-protection-framework-for-ttip: 403 Client Error: Forbidden for url: http://www.praguepost.com/czech-news/praguepostnews/czech-news/eu-and-us-agree-on-data-protection-framework-for-ttip


Processing URLs:  20%|█▉        | 197/1000 [14:37<41:23,  3.09s/it]  

Error extracting text from http://www.imf.org/external/np/sec/pr/2016/pr16227.htm: 403 Client Error: Forbidden for url: http://www.imf.org/external/np/sec/pr/2016/pr16227.htm


Processing URLs:  20%|█▉        | 199/1000 [14:39<30:00,  2.25s/it]

Error extracting text from https://www.chicago.gov/city/en/sites/covid-19/home/emergency-travel-order.html: 404 Client Error: Not Found for url: https://www.chicago.gov/city/en/sites/covid-19/home/emergency-travel-order.html


Processing URLs:  20%|██        | 202/1000 [14:44<22:41,  1.71s/it]

URL filtered: https://www.youtube.com/watch?v=OewBDIwy-O4


Processing URLs:  20%|██        | 205/1000 [14:45<13:01,  1.02it/s]

Error extracting text from http://www.huewire.com/headlines/indian/isis-weakening-inside-iraqi-city-of-mosul-pentagon/197078/: 404 Client Error: Not Found for url: http://www.huewire.com/headlines/indian/isis-weakening-inside-iraqi-city-of-mosul-pentagon/197078/
Error extracting text from https://au.news.yahoo.com/world/a/30771112/santos-says-referendum-will-be-held-on-peace-deal-regardless-of-farc/: 404 Client Error: Not Found for url: https://au.news.yahoo.com/santos-says-referendum-will-be-held-on-peace-deal-regardless-of-farc-30771112.html


Processing URLs:  21%|██        | 206/1000 [14:46<14:17,  1.08s/it]

Error extracting text from http://www.defenddemocracy.org/media-hit/bill-roggio-taliban-overruns-2-districts-in-southern-afghanistan/: 403 Client Error: Forbidden for url: http://www.fdd.org/media-hit/bill-roggio-taliban-overruns-2-districts-in-southern-afghanistan/


Processing URLs:  21%|██        | 207/1000 [14:48<15:03,  1.14s/it]

Error extracting text from http://news.xinhuanet.com/english/2007-07/26/content_6434933.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2007-07/26/content_6434933.htm


Processing URLs:  21%|██▏       | 213/1000 [14:56<21:00,  1.60s/it]

Error extracting text from https://www.thecipherbrief.com/column/expert-view/comey-dismissal-us-entering-treacherous-waters-1093: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/expert-view/comey-dismissal-us-entering-treacherous-waters-1093


Processing URLs:  21%|██▏       | 214/1000 [14:57<18:22,  1.40s/it]

Error extracting text from https://www.bia.gov/cs/groups/public/documents/text/idc-002000.pdf: 404 Client Error: Not Found for url: https://www.bia.gov/cs/groups/public/documents/text/idc-002000.pdf


Processing URLs:  22%|██▏       | 216/1000 [15:00<18:06,  1.39s/it]

Error extracting text from http://www.reuters.com/article/2015/11/10/us-iran-nuclear-deal-idUSKCN0SZ1Z720151110#7x7fZxtdZ3mUEkzh.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/us-iran-nuclear-deal-idUSKCN0SZ1Z720151110#7x7fZxtdZ3mUEkzh.97
Error extracting text from http://www.reuters.com/article/us-usa-russia-missiles-idUSKBN15U10Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-missiles-idUSKBN15U10Q


Processing URLs:  22%|██▏       | 218/1000 [15:02<14:45,  1.13s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-02/maduro-says-venezuela-to-restructure-all-debt-after-tomorrow-j9j1jj2m


Processing URLs:  22%|██▏       | 220/1000 [15:19<52:21,  4.03s/it]

Error extracting text from http://www.investopedia.com/terms/c/capital_conrol.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/c/capital_conrol.asp


Processing URLs:  22%|██▏       | 223/1000 [15:23<33:03,  2.55s/it]

URL filtered: http://www.reuters.com/article/us-facebook-internet-idUSBREA2Q27420140327


Processing URLs:  23%|██▎       | 229/1000 [15:28<11:58,  1.07it/s]

Error extracting text from https://www.imf.org/external/np/sta/ir/IRProcessWeb/data/sau/eng/hstSAU.pdf: 403 Client Error: Forbidden for url: https://www.imf.org/external/np/sta/ir/IRProcessWeb/data/sau/eng/hstSAU.pdf
Error extracting text from http://www.business-standard.com/article/news-ians/aleppo-milestone-syria-to-limp-along-until-us-elections-comment-special-to-ians-116050601178_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/aleppo-milestone-syria-to-limp-along-until-us-elections-comment-special-to-ians-116050601178_1.html


Processing URLs:  23%|██▎       | 230/1000 [15:29<13:52,  1.08s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0YN458: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0YN458


Processing URLs:  23%|██▎       | 231/1000 [15:32<19:06,  1.49s/it]

Error extracting text from http://tpu.ru/en/structure/institutes/pe/: 404 Client Error: Not Found for url: https://tpu.ru:443/en/structure/institutes/pe/


Processing URLs:  24%|██▎       | 237/1000 [15:43<19:15,  1.51s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-odds-idUKKCN0Y80ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.bloomberg.com/news/articles/2016-09-23/erdogan-doesn-t-care-at-all-if-turkey-gets-downgraded-to-junk


Processing URLs:  24%|██▍       | 241/1000 [15:45<12:01,  1.05it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/haiti-pm-elections-referendum-held-year-80289104: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/haiti-pm-elections-referendum-held-year-80289104


Processing URLs:  24%|██▍       | 243/1000 [15:47<12:21,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-kurds-idUSKBN18525V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-kurds-idUSKBN18525V


Processing URLs:  25%|██▍       | 246/1000 [15:49<09:53,  1.27it/s]

Error extracting text from http://m.thenational.ae/world/middle-east/iraqs-shiite-militias-say-they-will-be-part-of-mosul-battle-despite-war-crimes-claims: 400 Client Error: Bad Request for url: http://m.thenational.ae/world/middle-east/iraqs-shiite-militias-say-they-will-be-part-of-mosul-battle-despite-war-crimes-claims


Processing URLs:  25%|██▍       | 249/1000 [15:53<11:19,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN0XQ134: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN0XQ134


Processing URLs:  25%|██▌       | 250/1000 [15:54<14:07,  1.13s/it]

Error extracting text from http://thecipherbrief.com/article/asia/effect-south-koreas-neighbors-1093: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/asia/effect-south-koreas-neighbors-1093
URL filtered: http://www.bloomberg.com/news/articles/2015-11-02/opec-seen-staying-pat-on-output-by-oil-guru-who-called-2014-rout?cmpid=wsdemand


Processing URLs:  25%|██▌       | 253/1000 [15:58<14:39,  1.18s/it]

Error extracting text from https://www.nytimes.com/2021/06/20/science/covid-lab-leak-wuhan.html?searchResultPosition=2: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/20/science/covid-lab-leak-wuhan.html?searchResultPosition=2


Processing URLs:  26%|██▌       | 255/1000 [16:01<15:18,  1.23s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950222001323: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950222001323 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30364c470>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  26%|██▌       | 258/1000 [16:05<16:23,  1.33s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-01/china-electric-car-boom-driven-by-state-buying-bernstein-says


Processing URLs:  26%|██▋       | 264/1000 [16:14<16:28,  1.34s/it]

Error extracting text from https://www.todayonline.com/singapore/singapore-improves-marginally-eius-democracy-index: 403 Client Error: Forbidden for url: https://www.todayonline.com/singapore/singapore-improves-marginally-eius-democracy-index


Processing URLs:  26%|██▋       | 265/1000 [16:15<18:12,  1.49s/it]

Error extracting text from http://tass.ru/en/politics/875809: 404 Client Error: Not Found for url: https://tass.ru/en/politics/875809
URL filtered: http://www.bloomberg.com/politics/articles/2016-01-05/ted-cruz-travels-iowa-s-backroads


Processing URLs:  27%|██▋       | 271/1000 [17:24<3:39:52, 18.10s/it]

Error extracting text from https://www.meritalk.com/articles/finland-debuts-driverless-bus-as-u-s-proceeds-with-caution/: HTTPSConnectionPool(host='www.21centurystate.com', port=443): Max retries exceeded with url: /articles/finland-debuts-driverless-bus-as-u-s-proceeds-with-caution/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30364e660>, 'Connection to www.21centurystate.com timed out. (connect timeout=60)'))


Processing URLs:  27%|██▋       | 274/1000 [17:28<1:27:29,  7.23s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-russia-idUSKBN13O133: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-russia-idUSKBN13O133


Processing URLs:  28%|██▊       | 275/1000 [17:29<1:07:08,  5.56s/it]

Error extracting text from https://diplomacy.state.gov/discoverdiplomacy/explorer/#places: 404 Client Error: Not Found for url: https://diplomacy.state.gov/discoverdiplomacy/explorer/#places


Processing URLs:  28%|██▊       | 276/1000 [17:31<54:01,  4.48s/it]  

Error extracting text from https://www.cnbc.com/2022/03/24/supreme-court-pick-ketanji-brown-jackson-confirmation-what-happens-next.html),: 404 Client Error: Not Found for url: https://www.cnbc.com/2022/03/24/supreme-court-pick-ketanji-brown-jackson-confirmation-what-happens-next.html),


Processing URLs:  28%|██▊       | 278/1000 [25:33<29:12:52, 145.67s/it]

Error extracting text from https://www.thespainreport.com/articles/860-160822112212-polls-probe-voters-on-psoe-abstention-socialists-split: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/860-160822112212-polls-probe-voters-on-psoe-abstention-socialists-split (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x302ae5cd0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  28%|██▊       | 279/1000 [25:38<20:45:19, 103.63s/it]

Error extracting text from http://www.rollcall.com/news/reid-democrats-considering-forcing-votes-supreme-court-nominee: 404 Client Error: Not Found for url: https://rollcall.com/news/reid-democrats-considering-forcing-votes-supreme-court-nominee


Processing URLs:  28%|██▊       | 280/1000 [25:38<14:32:47, 72.73s/it] 

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=138144: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=138144


Processing URLs:  28%|██▊       | 284/1000 [25:44<3:37:16, 18.21s/it] 

Error extracting text from http://www.reuters.com/article/usa-cyber-russia-senate-idUSL2N1CD26N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/usa-cyber-russia-senate-idUSL2N1CD26N


Processing URLs:  29%|██▊       | 286/1000 [25:46<1:51:17,  9.35s/it]

Error extracting text from http://www.reuters.com/article/2015/09/21/us-mideast-crisis-syria-drones-idUSKCN0RL1CI20150921: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/21/us-mideast-crisis-syria-drones-idUSKCN0RL1CI20150921
Error extracting text from http://www.reuters.com/article/idUSKCN0X50O0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X50O0


Processing URLs:  29%|██▉       | 288/1000 [25:48<1:05:20,  5.51s/it]

Error extracting text from http://www.nileinternational.net/en/?p=84823: HTTPSConnectionPool(host='www.interfacesymposia.org', port=443): Max retries exceeded with url: /en/?p=84823 (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.interfacesymposia.org'. (_ssl.c:1000)")))
URL filtered: https://twitter.com/USLaunchReport/status/810596374718939136


Processing URLs:  29%|██▉       | 290/1000 [25:49<42:36,  3.60s/it]  

Error extracting text from http://leadercall.com/2016/02/hundreds-of-thousands-at-risk-if-troops-surround-aleppo/: 404 Client Error: Not Found for url: http://leadercall.com/2016/02/hundreds-of-thousands-at-risk-if-troops-surround-aleppo/


Processing URLs:  29%|██▉       | 291/1000 [25:50<35:15,  2.98s/it]

Error extracting text from http://www.businessinsider.com/ap-despite-tail-winds-eurozone-economy-loses-momentum-2015-11: 404 Client Error: Not Found for url: https://www.businessinsider.com/ap-despite-tail-winds-eurozone-economy-loses-momentum-2015-11


Processing URLs:  29%|██▉       | 294/1000 [25:53<22:54,  1.95s/it]

Error extracting text from http://www.newsweek.com/after-ramadi-anti-isis-coalition-shifts-focus-liberating-mosul-409676: 403 Client Error: Forbidden for url: https://www.newsweek.com/after-ramadi-anti-isis-coalition-shifts-focus-liberating-mosul-409676


Processing URLs:  30%|██▉       | 296/1000 [25:56<20:12,  1.72s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/s-korea-probes-death-of-chinese-fishermen-in-boat-seizure/3170230.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/s-korea-probes-death-of-chinese-fishermen-in-boat-seizure/3170230.html


Processing URLs:  30%|██▉       | 298/1000 [26:56<2:34:16, 13.19s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/article169310942.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://live.belfercenter.org/files/scientificamerican0185-32.pdf: HTTPSConnectionPool(host='live.belfercenter.org', port=443): Max retries exceeded with url: /files/scientificamerican0185-32.pdf (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'live.belfercenter.org'. (_ssl.c:1000)")))


Processing URLs:  30%|██▉       | 299/1000 [26:57<1:49:43,  9.39s/it]

Error extracting text from https://www.nytimes.com/2017/05/04/opinion/too-many-donald-trumps.html?emc=edit_th_20170504&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/04/opinion/too-many-donald-trumps.html?emc=edit_th_20170504&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  30%|███       | 303/1000 [27:02<38:20,  3.30s/it]  

Error extracting text from https://detroit.craigslist.org/mcb/rvs/d/sterling-heights-1989-cruisemaster-rv/7250158739.html: 404 Client Error: Not Found for url: https://detroit.craigslist.org/mcb/rvs/d/sterling-heights-1989-cruisemaster-rv/7250158739.html
Error extracting text from http://www.reuters.com/article/us-china-usa-diplomacy-idUSKCN0WQ08A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-usa-diplomacy-idUSKCN0WQ08A


Processing URLs:  31%|███       | 306/1000 [27:05<18:49,  1.63s/it]

Error extracting text from https://www.nytimes.com/2018/01/12/business/gm-driverless-car.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/12/business/gm-driverless-car.html


Processing URLs:  31%|███       | 308/1000 [27:07<15:06,  1.31s/it]

Error extracting text from https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-report/2020/november/mpr-press-conference-transcript-november-2020.pdf: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-report/2020/november/mpr-press-conference-transcript-november-2020.pdf


Processing URLs:  31%|███       | 310/1000 [27:09<12:23,  1.08s/it]

Error extracting text from https://www.predictit.org/Contract/983/Will-the-DOJ-open-a-criminal-investigation-into-Clinton&#39;s-email-this-year#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/983/Will-the-DOJ-open-a-criminal-investigation-into-Clinton&#39;s-email-this-year%23data


Processing URLs:  31%|███▏      | 314/1000 [27:14<14:17,  1.25s/it]

Error extracting text from http://carnegieeurope.eu/2016/01/13/is-this-end-of-moscow-ankara-nuclear-cooperation/it80: 403 Client Error: Forbidden for url: http://carnegieeurope.eu/2016/01/13/is-this-end-of-moscow-ankara-nuclear-cooperation/it80


Processing URLs:  32%|███▏      | 315/1000 [27:18<24:04,  2.11s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-21/malaysia-cuts-bank-reserve-ratio-keeps-benchmark-rate-steady-ijo3dgir


Processing URLs:  32%|███▏      | 319/1000 [27:20<10:04,  1.13it/s]

Error extracting text from https://www.rbnz.govt.nz/news/2021/10/monetary-stimulus-further-reduced-official-cash-rate-raised-to-050-percent#:~:text=Monetary%20Stimulus%20Further%20Reduced%20%2D%20Official,Reserve%20Bank%20of%20New%20Zealand: 403 Client Error: Forbidden for url: https://www.rbnz.govt.nz/news/2021/10/monetary-stimulus-further-reduced-official-cash-rate-raised-to-050-percent#:~:text=Monetary%20Stimulus%20Further%20Reduced%20-%20Official,Reserve%20Bank%20of%20New%20Zealand
Error extracting text from https://www.unep.org/emissions-gap-report-2020: 403 Client Error: Forbidden for url: https://www.unep.org/emissions-gap-report-2020


Processing URLs:  32%|███▏      | 323/1000 [27:24<09:54,  1.14it/s]

Error extracting text from https://greekreporter.com/2021/06/18/joe-biden-cancels-3-billion-worth-of-student-debt/: 403 Client Error: Forbidden for url: https://greekreporter.com/2021/06/18/joe-biden-cancels-3-billion-worth-of-student-debt/


Processing URLs:  33%|███▎      | 327/1000 [27:29<13:06,  1.17s/it]

Error extracting text from http://www.newsweek.com/new-evidence-shows-duke-windsor-plotted-hitler-341850: 403 Client Error: Forbidden for url: https://www.newsweek.com/new-evidence-shows-duke-windsor-plotted-hitler-341850


Processing URLs:  33%|███▎      | 334/1000 [27:53<34:38,  3.12s/it]

URL filtered: https://www.youtube.com/watch?v=vUiL0CF6lq4


Processing URLs:  34%|███▎      | 336/1000 [27:56<27:30,  2.49s/it]

Error extracting text from http://www.thenational.ae/world/middle-east/candidates-pull-out-ahead-of-irans-parliamentary-election: 404 Client Error: Not Found for url: https://www.thenationalnews.com/mena/candidates-pull-out-ahead-of-irans-parliamentary-election/


Processing URLs:  34%|███▍      | 338/1000 [27:59<22:19,  2.02s/it]

Error extracting text from https://sports.yahoo.com/usopc-opposes-ineffective-2022-games-011310879.html: 404 Client Error: Not Found for url: https://sports.yahoo.com/usopc-opposes-ineffective-2022-games-011310879.html


Processing URLs:  34%|███▍      | 340/1000 [28:02<18:09,  1.65s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/52895671.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/52895671.cms


Processing URLs:  34%|███▍      | 341/1000 [28:03<17:51,  1.63s/it]

Error extracting text from http://www.mondaq.com/article.asp?article_id=640904&amp;signup=true: 404 Client Error: Not Found for url: https://www.mondaq.com:443/article.asp?article_id=640904&amp;signup=true


Processing URLs:  34%|███▍      | 342/1000 [28:05<18:25,  1.68s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/beijing-building-radar-in/2543032.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/beijing-building-radar-in/2543032.html


Processing URLs:  34%|███▍      | 345/1000 [28:11<19:05,  1.75s/it]

Error extracting text from http://www.nytimes.com/2016/03/01/world/middleeast/after-gains-against-isis-american-focus-is-turning-to-mosul.html?nytmobile=0&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/01/world/middleeast/after-gains-against-isis-american-focus-is-turning-to-mosul.html?nytmobile=0&amp;_r=0


Processing URLs:  35%|███▍      | 347/1000 [28:13<13:07,  1.21s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/wi/wisconsin_trump_vs_clinton-5659.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/wi/wisconsin_trump_vs_clinton-5659.html#polls


Processing URLs:  35%|███▍      | 348/1000 [29:13<3:24:17, 18.80s/it]

Error extracting text from http://www.seattletimes.com/nation-world/kabul-fortification-signals-long-u-s-stay-in-afghanistan/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  35%|███▍      | 349/1000 [29:13<2:23:43, 13.25s/it]

Error extracting text from http://www.businessinsider.com.au/poor-countries-outperform-america-pisa-exam-2016-12?r=US&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/poor-countries-outperform-america-pisa-exam-2016-12?r=US&amp;IR=T


Processing URLs:  35%|███▌      | 350/1000 [29:13<1:41:12,  9.34s/it]

Error extracting text from http://www.autonews.com/article/20161002/RETAIL01/161009998/tesla-sets-quarterly-sales-record-as-shipments-move-closer-to-2016: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20161002/RETAIL01/161009998/tesla-sets-quarterly-sales-record-as-shipments-move-closer-to-2016


Processing URLs:  35%|███▌      | 352/1000 [30:15<4:08:42, 23.03s/it]

Error extracting text from http://www.spaceflightinsider.com/missions/defense/north-korea-launches-long-range-rocket-kwangmyongsong-4-satellite/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /missions/defense/north-korea-launches-long-range-rocket-kwangmyongsong-4-satellite/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffc6ffb0>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  36%|███▌      | 359/1000 [30:22<29:51,  2.79s/it]  

Error extracting text from https://www.nytimes.com/2017/11/19/us/jones-alabama-democrats.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/19/us/jones-alabama-democrats.html


Processing URLs:  36%|███▌      | 361/1000 [30:25<20:33,  1.93s/it]

Error extracting text from http://www.straitstimes.com/opinion/transnational-links-political-vacuum-fuel-bangladeshis-turn-to-terrorism: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  36%|███▌      | 362/1000 [30:27<20:57,  1.97s/it]

URL filtered: https://www.bloomberg.com/news/videos/2016-09-20/cfr-s-haass-on-the-dawn-of-the-u-s-global-illiterate


Processing URLs:  36%|███▋      | 365/1000 [30:30<14:16,  1.35s/it]

URL filtered: https://www.youtube.com/watch?v=0BVRhsqHE5M


Processing URLs:  37%|███▋      | 369/1000 [30:35<13:27,  1.28s/it]

Error extracting text from http://developer.nrel.gov/api/alt-fuel-stations/v1.csv?access=all&amp;api_key=FGa7rvCzh6qxJvcov0DrYpIOoGzBP9yGMIxtnQ5i&amp;download=true&amp;fuel_type=HY&amp;status=all: 403 Client Error: Forbidden for url: https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?access=all&amp;api_key=FGa7rvCzh6qxJvcov0DrYpIOoGzBP9yGMIxtnQ5i&amp;download=true&amp;fuel_type=HY&amp;status=all
Error extracting text from https://www.wsj.com/amp/articles/u-s-moves-to-tighten-iran-sanctions-enforcement-as-nuclear-talks-stall-11639039567: 403 Client Error: Forbidden for url: https://www.wsj.com/amp/articles/u-s-moves-to-tighten-iran-sanctions-enforcement-as-nuclear-talks-stall-11639039567


Processing URLs:  37%|███▋      | 371/1000 [30:40<19:56,  1.90s/it]

Error extracting text from http://www.tol.org/client/article/25127-montenegro-podgorica-djukanovic-osce-mijatovic-zivkovic-raicevic.html: 404 Client Error: Not Found for url: https://tol.org/client/article/25127-montenegro-podgorica-djukanovic-osce-mijatovic-zivkovic-raicevic.html


Processing URLs:  37%|███▋      | 374/1000 [30:44<16:27,  1.58s/it]

Error extracting text from https://www.deepcapture.com/2009/02/bernard-madoff-the-mafia-and-the-friends-of-michael-milken/: 403 Client Error: Forbidden for url: https://www.deepcapture.com/2009/02/bernard-madoff-the-mafia-and-the-friends-of-michael-milken/


Processing URLs:  38%|███▊      | 375/1000 [30:46<16:18,  1.57s/it]

Error extracting text from http://middle-east-online.com/english/?id=75750: 404 Client Error: Not Found for url: https://middle-east-online.com/english/?id=75750


Processing URLs:  38%|███▊      | 380/1000 [30:52<14:31,  1.40s/it]

Error extracting text from https://www.ipsos-mori.com/researchpublications/researcharchive/3714/Half-think-David-Cameron-should-resign-as-Prime-Minister-if-Britain-votes-to-leave-the-EU.aspx: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-uk/researchpublications/researcharchive/3714/Half-think-David-Cameron-should-resign-as-Prime-Minister-if-Britain-votes-to-leave-the-EU.aspx
URL filtered: https://www.youtube.com/watch?v=3QS4rfPE2k0&amp;list=RDMM&amp;index=18
URL filtered: http://www.bloomberg.com/news/articles/2015-09-13/putin-said-to-explore-sidelining-assad-even-as-russia-arms-him


Processing URLs:  39%|███▊      | 386/1000 [30:57<10:03,  1.02it/s]

Error extracting text from http://www.nytimes.com/2016/11/17/world/middleeast/assad-donald-trump-syria-natural-ally.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/17/world/middleeast/assad-donald-trump-syria-natural-ally.html?_r=0


Processing URLs:  39%|███▉      | 388/1000 [30:59<09:45,  1.05it/s]

Error extracting text from http://www.sciencemag.org/content/344/6190/1330: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.344.6190.1330


Processing URLs:  39%|███▉      | 389/1000 [30:59<07:51,  1.29it/s]

Error extracting text from https://www.dhs.gov/fusion-centers: 403 Client Error: Forbidden for url: https://www.dhs.gov/fusion-centers


Processing URLs:  39%|███▉      | 390/1000 [30:59<06:14,  1.63it/s]

Error extracting text from https://www.wsj.com/articles/oil-prices-edge-lower-on-chinas-weaker-growth-prospects-1488801463: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-prices-edge-lower-on-chinas-weaker-growth-prospects-1488801463


Processing URLs:  39%|███▉      | 393/1000 [31:03<09:47,  1.03it/s]

Error extracting text from https://www.amazon.com/Code-Trust-American-Counterintelligence-Experts/dp/1250093465: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Code-Trust-American-Counterintelligence-Experts/dp/1250093465


Processing URLs:  40%|███▉      | 395/1000 [31:09<17:20,  1.72s/it]

Error extracting text from http://www.tradingeconomics.com/venezuela/foreign-exchange-reserves: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/venezuela/foreign-exchange-reserves


Processing URLs:  40%|███▉      | 396/1000 [31:09<13:43,  1.36s/it]

Error extracting text from http://www.consilium.europa.eu/en/council-eu/voting-system/unanimity/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/council-eu/voting-system/unanimity/


Processing URLs:  40%|████      | 400/1000 [31:15<10:49,  1.08s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-bonds-oilservices/u-s-oil-service-firms-face-hit-from-venezuela-debt-restructuring-idUSKBN1D72RC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-oilservices/u-s-oil-service-firms-face-hit-from-venezuela-debt-restructuring-idUSKBN1D72RC
Error extracting text from http://www.komodoexercise.org/#!medcap--encap/cl5r: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3025ecce0>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 402/1000 [31:15<06:29,  1.54it/s]

Error extracting text from http://nriworld.net/2016/05/russian-federation-says-its-ready-to-ramp-up-oil-output/: 404 Client Error: Not Found for url: http://nriworld.net/2016/05/russian-federation-says-its-ready-to-ramp-up-oil-output/


Processing URLs:  40%|████      | 404/1000 [36:21<13:19:02, 80.44s/it]

Error extracting text from http://www.japantoday.com/category/politics/view/abe-putin-agree-to-advance-japan-russia-territorial-talks: 404 Client Error: Not Found for url: https://japantoday.com/category/politics/abe-putin-agree-to-advance-japan-russia-territorial-talks


Processing URLs:  41%|████      | 409/1000 [36:27<2:37:37, 16.00s/it] 

Error extracting text from http://www.torontosun.com/2015/09/27/satellite-missile-test-or-space-junk-north-korea-readies-launch: 403 Client Error: Forbidden for url: https://torontosun.com/2015/09/27/satellite-missile-test-or-space-junk-north-korea-readies-launch


Processing URLs:  41%|████      | 410/1000 [36:30<1:58:27, 12.05s/it]

Error extracting text from http://store.russfeingold.com/Russ-Feingold-for-Sentate-Yard-Sign-YS62196.html: 404 Client Error: Not Found for url: https://www.facebook.com/russfeingold//Russ-Feingold-for-Sentate-Yard-Sign-YS62196.html


Processing URLs:  41%|████      | 411/1000 [36:32<1:28:53,  9.05s/it]

Error extracting text from https://www.argusmedia.com/News/Article?id=1116375: 404 Client Error: Not Found for url: https://www.argusmedia.com/not-found


Processing URLs:  41%|████      | 412/1000 [36:32<1:03:20,  6.46s/it]

Error extracting text from https://www.nytimes.com/2021/01/25/world/asia/india-china-border.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/25/world/asia/india-china-border.html


Processing URLs:  42%|████▏     | 419/1000 [36:53<28:48,  2.97s/it]  

Error extracting text from http://aranews.net/2017/02/syrian-tribal-leader-nawaf-al-bashir-rejoins-assad-regime-years-supporting-rebels/: 404 Client Error: Not Found for url: http://aranews.net/2017/02/syrian-tribal-leader-nawaf-al-bashir-rejoins-assad-regime-years-supporting-rebels/


Processing URLs:  42%|████▏     | 423/1000 [37:00<16:57,  1.76s/it]

Error extracting text from http://www.vanguardngr.com/2016/04/farmersherdsmen-clashes-fg-govs-agree-establish-ranches/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/04/farmersherdsmen-clashes-fg-govs-agree-establish-ranches/
Error extracting text from http://www.nbcnews.com/news/us-news/could-russian-hackers-spoil-election-day-n619321: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/us-news/could-russian-hackers-spoil-election-day-n619321


Processing URLs:  42%|████▎     | 425/1000 [37:00<09:48,  1.02s/it]

Error extracting text from https://www.yahoo.com/news/one-month-coup-bid-turkey-transformed-102620883.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/one-month-coup-bid-turkey-transformed-102620883.html


Processing URLs:  43%|████▎     | 427/1000 [37:00<06:29,  1.47it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-goldman-sachs-idUSKBN18Q1D6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-goldman-sachs-idUSKBN18Q1D6


Processing URLs:  43%|████▎     | 431/1000 [37:02<04:40,  2.03it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-usa-oil-idUSKBN1862Q0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-usa-oil-idUSKBN1862Q0


Processing URLs:  43%|████▎     | 434/1000 [37:06<08:30,  1.11it/s]

URL filtered: https://twitter.com/_stah/status/1441340999771439109


Processing URLs:  44%|████▎     | 437/1000 [37:11<11:13,  1.20s/it]

Error extracting text from https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00448-7/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00448-7/fulltext


Processing URLs:  44%|████▍     | 440/1000 [37:15<10:36,  1.14s/it]

Error extracting text from https://www.timesofisrael.com/israel-confirms-vaccine-less-effective-against-delta-variant-eyes-third-dose/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/israel-confirms-vaccine-less-effective-against-delta-variant-eyes-third-dose/


Processing URLs:  44%|████▍     | 444/1000 [37:18<06:31,  1.42it/s]

Error extracting text from http://www.wsj.com/articles/how-the-fed-waiting-game-pushes-japan-to-ease-1442551219: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/how-the-fed-waiting-game-pushes-japan-to-ease-1442551219
Error extracting text from http://www.reuters.com/article/us-turkey-eu-idUSKBN17Y0U0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-idUSKBN17Y0U0


Processing URLs:  45%|████▍     | 449/1000 [37:22<05:32,  1.66it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/jul/31/haider-al-abadi-iraq-prime-minister-faces-power-st/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jul/31/haider-al-abadi-iraq-prime-minister-faces-power-st/
Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-subpoena-idUSKBN1AJ2V0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-subpoena-idUSKBN1AJ2V0


Processing URLs:  45%|████▌     | 452/1000 [37:30<15:47,  1.73s/it]

Error extracting text from https://iea.org.uk/publications/pass-the-remote/: 403 Client Error: Forbidden for url: https://iea.org.uk/publications/pass-the-remote/


Processing URLs:  45%|████▌     | 453/1000 [37:32<16:22,  1.80s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/08/09/770954/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/08/09/770954/story.html


Processing URLs:  46%|████▌     | 458/1000 [37:42<16:18,  1.80s/it]

URL filtered: https://www.bloomberg.com/view/articles/2016-07-08/russia-has-the-most-boring-election-of-2016
URL filtered: https://www.bloomberg.com/news/articles/2017-11-19/merkel-s-push-collapses-to-reach-four-party-coalition-agreement-ja7dc50v


Processing URLs:  46%|████▌     | 461/1000 [37:43<09:12,  1.03s/it]

Error extracting text from https://reut.rs/3j2hKsP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-usa-explainer/explainer-south-china-sea-tension-flares-again-as-biden-takes-charge-idUSKBN29U0LO?il=0


Processing URLs:  46%|████▋     | 464/1000 [37:47<08:32,  1.05it/s]

Error extracting text from http://southfront.org/u-s-carrier-strike-groups-locations-map-oct-16-2015/: HTTPConnectionPool(host='southfront.org', port=80): Max retries exceeded with url: /u-s-carrier-strike-groups-locations-map-oct-16-2015/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d9d90>: Failed to resolve 'southfront.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nato.int/cps/en/natohq/news_131132.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_131132.htm


Processing URLs:  47%|████▋     | 466/1000 [37:49<09:28,  1.06s/it]

Error extracting text from http://www.nytimes.com/2016/07/07/us/politics/hillary-clinton-loretta-lynch.html?smid=tw-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/07/us/politics/hillary-clinton-loretta-lynch.html?smid=tw-share


Processing URLs:  47%|████▋     | 468/1000 [37:54<14:36,  1.65s/it]

Error extracting text from http://www.takepart.com/article/2016/07/25/zero-days-stuxnet-iran: 404 Client Error: Not Found for url: https://participant.com/article/2016/07/25/zero-days-stuxnet-iran


Processing URLs:  47%|████▋     | 474/1000 [38:07<19:34,  2.23s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-saarland-idUSKBN16X00S?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-saarland-idUSKBN16X00S?il=0


Processing URLs:  48%|████▊     | 477/1000 [38:09<10:01,  1.15s/it]

Error extracting text from http://www.nejm.org/doi/full/10.1056/nejmoa1600651: 403 Client Error: Forbidden for url: http://www.nejm.org/doi/full/10.1056/nejmoa1600651
URL filtered: https://www.instagram.com/p/Bb4JiNZDvax/?taken-by=theeconomist


Processing URLs:  48%|████▊     | 479/1000 [38:10<06:41,  1.30it/s]

Error extracting text from http://english.yonhapnews.co.kr/national/2015/09/08/0301000000AEN20150908005300315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  48%|████▊     | 480/1000 [38:13<11:42,  1.35s/it]

Error extracting text from http://www.nioc-intl.ir/WebSiteEn/Pages/News1.aspx: 403 Client Error: Forbidden for url: https://www.nioc-intl.ir:443/WebSiteEn/Pages/News1.aspx


Processing URLs:  48%|████▊     | 481/1000 [38:13<09:10,  1.06s/it]

Error extracting text from https://www.nytimes.com/2021/01/25/world/europe/italy-government-conte.html?campaign_id=51&emc=edit_MBE_p_20210126&instance_id=26406&nl=morning-briefing&regi_id=124411317&section=topNews&segment_id=50252&te=1&user_id=f9b4299b888fb043c19d31525a9823ba: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/25/world/europe/italy-government-conte.html?campaign_id=51&emc=edit_MBE_p_20210126&instance_id=26406&nl=morning-briefing&regi_id=124411317&section=topNews&segment_id=50252&te=1&user_id=f9b4299b888fb043c19d31525a9823ba


Processing URLs:  48%|████▊     | 482/1000 [38:14<08:09,  1.06it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/267976-push-for-new-iran-missile-sanctions-divides-democrats: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/267976-push-for-new-iran-missile-sanctions-divides-democrats/


Processing URLs:  48%|████▊     | 483/1000 [38:15<07:08,  1.21it/s]

Error extracting text from http://thehill.com/homenews/campaign/312766-gop-aims-to-rein-in-liberal-cities: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/312766-gop-aims-to-rein-in-liberal-cities/


Processing URLs:  48%|████▊     | 485/1000 [38:28<36:51,  4.29s/it]

Error extracting text from https://www.reuters.com/article/health-coronavirus-nicaragua-russia/nicaragua-in-talks-to-acquire-russias-sputnik-v-covid-vaccine-says-paho-idUSL1N2JO36Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/health-coronavirus-nicaragua-russia/nicaragua-in-talks-to-acquire-russias-sputnik-v-covid-vaccine-says-paho-idUSL1N2JO36Q


Processing URLs:  49%|████▉     | 488/1000 [38:31<19:21,  2.27s/it]

URL filtered: https://twitter.com/UKenyatta?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  49%|████▉     | 491/1000 [38:36<15:54,  1.88s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-iran-deal-idUSKBN1491BN?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-iran-deal-idUSKBN1491BN?feedType=RSS&amp;feedName=worldNews


Processing URLs:  49%|████▉     | 492/1000 [38:38<16:01,  1.89s/it]

Error extracting text from http://the-japan-news.com/news/article/0003274804: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0003274804


Processing URLs:  50%|████▉     | 497/1000 [38:45<13:03,  1.56s/it]

Error extracting text from http://www.zacks.com/stock/news/189284/comcasts-jurassic-world-a-raging-hit-sets-global-record: 500 Server Error: Internal Server Error for url: https://www.zacks.com/stock/news/189284/comcasts-jurassic-world-a-raging-hit-sets-global-record


Processing URLs:  50%|████▉     | 498/1000 [38:46<11:28,  1.37s/it]

Error extracting text from https://azcapitoltimes.com/news/2020/12/14/ward-takes-election-challenge-to-u-s-supreme-court/: 403 Client Error: Forbidden for url: https://azcapitoltimes.com/news/2020/12/14/ward-takes-election-challenge-to-u-s-supreme-court/


Processing URLs:  50%|█████     | 501/1000 [38:51<12:08,  1.46s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-24/merkel-caught-off-guard-as-erdogan-threatens-to-ax-refugee-deal


Processing URLs:  50%|█████     | 505/1000 [38:55<08:52,  1.07s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/2016_elections_electoral_college_map.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/2016_elections_electoral_college_map.html


Processing URLs:  51%|█████     | 510/1000 [39:03<09:12,  1.13s/it]

Error extracting text from http://warontherocks.com/2016/08/the-decay-of-the-syrian-regime-is-much-worse-than-you-think/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/08/the-decay-of-the-syrian-regime-is-much-worse-than-you-think/


Processing URLs:  51%|█████     | 511/1000 [39:03<08:01,  1.02it/s]

Error extracting text from http://aranews.net/2015/03/isis-threatens-any-civilian-leaves-mosul-to-be-beheaded/: 404 Client Error: Not Found for url: http://aranews.net/2015/03/isis-threatens-any-civilian-leaves-mosul-to-be-beheaded/


Processing URLs:  51%|█████     | 512/1000 [39:05<08:17,  1.02s/it]

URL filtered: http://www.wsj.com/articles/facebook-moves-to-curtail-fake-news-on-trending-feature-1485367200


Processing URLs:  51%|█████▏    | 514/1000 [39:05<04:54,  1.65it/s]

Error extracting text from http://www.wsj.com/articles/reports-of-zika-cases-growing-quickly-in-u-s-1456521970: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/reports-of-zika-cases-growing-quickly-in-u-s-1456521970


Processing URLs:  52%|█████▏    | 518/1000 [39:07<04:14,  1.89it/s]

Error extracting text from http://www.cdm.me/english/slovakia-ratified-nato-accession-protocol-too: 403 Client Error: Forbidden for url: https://www.cdm.me/english/slovakia-ratified-nato-accession-protocol-too


Processing URLs:  52%|█████▏    | 521/1000 [39:08<03:50,  2.08it/s]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2406345&amp;CategoryId=12395: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2406345&amp;CategoryId=12395
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=fa&amp;u=http://www.tabnak.ir/fa/parliament&amp;usg=ALkJrhhBIKgb2aErMnLI8SFwh0k3HAyvxA: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=fa&amp;u=http://www.tabnak.ir/fa/parliament&amp;usg=ALkJrhhBIKgb2aErMnLI8SFwh0k3HAyvxA


Processing URLs:  53%|█████▎    | 528/1000 [39:18<06:54,  1.14it/s]

Error extracting text from http://www.ibtimes.com/new-isis-video-shows-french-speaking-militant-executing-alleged-apostates-spies-iraq-2287395: 403 Client Error: Forbidden for url: https://www.ibtimes.com/new-isis-video-shows-french-speaking-militant-executing-alleged-apostates-spies-iraq-2287395
URL filtered: https://www.bloomberg.com/news/articles/2017-08-25/venezuela-oil-minister-pdvsa-head-to-switch-roles-maduro-says
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0WD1WR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0WD1WR


Processing URLs:  53%|█████▎    | 532/1000 [39:35<18:09,  2.33s/it]

Error extracting text from https://www.barrons.com/news/trump-says-peace-deals-close-between-israel-and-five-or-six-other-countries-01600188005: 403 Client Error: Forbidden for url: https://www.barrons.com/news/trump-says-peace-deals-close-between-israel-and-five-or-six-other-countries-01600188005


Processing URLs:  53%|█████▎    | 533/1000 [40:35<2:26:36, 18.84s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-10-05/russian-hackers-get-us-cyber-defense-details-from-nsa-wsj: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  54%|█████▎    | 535/1000 [40:36<1:14:27,  9.61s/it]

Error extracting text from http://www.barrons.com/articles/BL-SWB-44851: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/BL-SWB-44851


Processing URLs:  54%|█████▍    | 538/1000 [40:38<28:07,  3.65s/it]  

Error extracting text from https://www.iter.org/mach: HTTPSConnectionPool(host='www.iter.org', port=443): Max retries exceeded with url: /mach (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from https://www.reuters.com/article/us-brazil-politics-poll/poll-shows-jump-in-approval-for-brazils-bolsonaro-amid-pandemic-idUSKCN26F369: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-poll/poll-shows-jump-in-approval-for-brazils-bolsonaro-amid-pandemic-idUSKCN26F369


Processing URLs:  54%|█████▍    | 542/1000 [40:41<11:09,  1.46s/it]

Error extracting text from https://inventariandochina.com/2016/06/29/china-new-rules-to-hold-party-officials-accountable-for-breach-of-duty/: HTTPSConnectionPool(host='inventariandochina.com', port=443): Max retries exceeded with url: /2016/06/29/china-new-rules-to-hold-party-officials-accountable-for-breach-of-duty/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff3dac00>: Failed to resolve 'inventariandochina.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/08/08/world/middleeast/how-an-iranians-spy-saga-ends-6-years-later-hes-executed.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/08/world/middleeast/how-an-iranians-spy-saga-ends-6-years-later-hes-executed.html?hp&amp;action=click&amp;pgtype=Homepage&amp;

Processing URLs:  54%|█████▍    | 543/1000 [41:41<2:01:38, 15.97s/it]

Error extracting text from http://www.usnews.com/news/articles/2012/03/20/us-nukes-face-up-to-10-million-cyber-attacks-daily: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://www.bloomberg.com/news/articles/2016-11-30/opec-said-to-agree-oil-production-cuts-as-saudis-soften-on-iran


Processing URLs:  55%|█████▍    | 545/1000 [41:44<1:14:40,  9.85s/it]

Error extracting text from http://cookpolitical.com/senate/maps: 404 Client Error: Not Found for url: https://www.cookpolitical.com/senate/maps


Processing URLs:  55%|█████▍    | 546/1000 [41:46<1:01:14,  8.09s/it]

Error extracting text from http://tribbleagency.com/2015/10/30/gallup-mitch-mcconnell-favorability-among-republican-voters/: 404 Client Error: Not Found for url: https://tribbleagency.com/2015/10/30/gallup-mitch-mcconnell-favorability-among-republican-voters/


Processing URLs:  55%|█████▍    | 547/1000 [41:48<50:24,  6.68s/it]  

Error extracting text from http://marketrealist.com/2015/11/opec-meeting-will-crude-oil-prices-react/: 404 Client Error: Not Found for url: https://marketrealist.com:443/2015/11/opec-meeting-will-crude-oil-prices-react/


Processing URLs:  55%|█████▍    | 548/1000 [41:49<39:37,  5.26s/it]

Error extracting text from https://reut.rs/3tK2SnQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-china/india-says-troops-had-minor-face-off-with-china-in-sikkim-border-area-idUSKBN29U0MI?il=0


Processing URLs:  55%|█████▌    | 550/1000 [41:51<22:56,  3.06s/it]

Error extracting text from https://www.wsj.com/articles/trump-white-house-puts-iran-on-notice-after-missile-launch-1485979767?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-white-house-puts-iran-on-notice-after-missile-launch-1485979767?mod=e2fb


Processing URLs:  55%|█████▌    | 552/1000 [41:58<23:56,  3.21s/it]

Error extracting text from http://www.ibtimes.com/cern-lhc-update-large-hadron-collider-finishes-2016-proton-run-surpasses-luminosity-2440399: 403 Client Error: Forbidden for url: https://www.ibtimes.com/cern-lhc-update-large-hadron-collider-finishes-2016-proton-run-surpasses-luminosity-2440399


Processing URLs:  55%|█████▌    | 553/1000 [42:00<21:42,  2.91s/it]

Error extracting text from http://tass.ru/en/economy/845065: 404 Client Error: Not Found for url: https://tass.ru/en/economy/845065


Processing URLs:  56%|█████▌    | 555/1000 [42:03<14:37,  1.97s/it]

Error extracting text from http://www.wsj.com/articles/iran-threatened-to-shoot-down-u-s-surveillance-planes-1473794715: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-threatened-to-shoot-down-u-s-surveillance-planes-1473794715


Processing URLs:  56%|█████▌    | 559/1000 [42:06<07:51,  1.07s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-trump-west-virginia-rally-20170803-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-trump-west-virginia-rally-20170803-story.html


Processing URLs:  56%|█████▋    | 564/1000 [42:11<05:12,  1.39it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-resolution/506/text?q=%7B%22search%22%3A%5B%22montenegro%22%5D%7D&amp;resultIndex=1&amp;overview=open#content: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-resolution/506/text?q=%7B%22search%22%3A%5B%22montenegro%22%5D%7D&amp;resultIndex=1&amp;overview=open#content
Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#SG3SZ4QbcMclC0Lm.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#SG3SZ4QbcMclC0Lm.99


Processing URLs:  57%|█████▋    | 566/1000 [42:23<21:39,  2.99s/it]

Error extracting text from http://www.ibtimes.co.uk/two-districts-captured-mosul-among-intense-fighting-suicide-bombings-1591244: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/two-districts-captured-mosul-among-intense-fighting-suicide-bombings-1591244


Processing URLs:  57%|█████▋    | 568/1000 [42:24<11:47,  1.64s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKCN0VV01K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKCN0VV01K


Processing URLs:  57%|█████▋    | 573/1000 [42:29<07:10,  1.01s/it]

URL filtered: https://www.youtube.com/watch?v=6piUQsaOOSQ
URL filtered: https://www.youtube.com/watch?v=gLlCrw9TQZA
Error extracting text from https://www.reuters.com/article/us-asml-holding-smic/asml-extends-sales-deal-with-chinese-chipmaker-smic-to-end-of-2021-idUSKBN2AV1S6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asml-holding-smic/asml-extends-sales-deal-with-chinese-chipmaker-smic-to-end-of-2021-idUSKBN2AV1S6
Error extracting text from http://www.reuters.com/article/us-russia-iran-arms-idUSKCN0X80MM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-iran-arms-idUSKCN0X80MM


Processing URLs:  58%|█████▊    | 577/1000 [42:37<11:48,  1.67s/it]

Error extracting text from http://www.kplu.org/post/washington-aerospace-summit-will-focus-issues-such-export-import-bank-renewal: HTTPConnectionPool(host='www.kplu.org', port=80): Max retries exceeded with url: /post/washington-aerospace-summit-will-focus-issues-such-export-import-bank-renewal (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30451c920>: Failed to resolve 'www.kplu.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 580/1000 [43:39<1:29:05, 12.73s/it]

Error extracting text from http://infographics.demo.economist.com/2016/minifiedbrexit/?n=21011894/2016/01/daily-chart-18&amp;w=595: HTTPConnectionPool(host='infographics.demo.economist.com', port=80): Max retries exceeded with url: /2016/minifiedbrexit/?n=21011894/2016/01/daily-chart-18&amp;w=595 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30451d4f0>, 'Connection to infographics.demo.economist.com timed out. (connect timeout=60)'))
Error extracting text from http://www.latimes.com/opinion/op-ed/la-oe-0105-tilleman-china-electric-vehicles-20160105-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/op-ed/la-oe-0105-tilleman-china-electric-vehicles-20160105-story.html


Processing URLs:  58%|█████▊    | 582/1000 [43:46<57:30,  8.26s/it]  

Error extracting text from http://vestnikkavkaza.net/articles/Nine-states-stand-for-stable-presence-of-NATO-in-Eastern-Europe-and-Baltic-Region.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/articles/Nine-states-stand-for-stable-presence-of-NATO-in-Eastern-Europe-and-Baltic-Region.html


Processing URLs:  59%|█████▉    | 590/1000 [43:58<14:05,  2.06s/it]

Error extracting text from http://www.amazon.com/Emperors-New-Mind-Concerning-Computers-ebook/dp/B00ARGXG7Q/ref=sr_1_1?s=books&ie=UTF8&qid=1449776046&sr=1-1&keywords=the+emperor%27s+new+mind: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Emperors-New-Mind-Concerning-Computers-ebook/dp/B00ARGXG7Q/ref=sr_1_1?s=books&ie=UTF8&qid=1449776046&sr=1-1&keywords=the+emperor%27s+new+mind
URL filtered: http://www.bloombergview.com/articles/2015-11-22/venezuela-s-opposition-smells-a-victory


Processing URLs:  59%|█████▉    | 593/1000 [44:01<10:07,  1.49s/it]

Error extracting text from http://tass.ru/en/politics/845464&gt: 404 Client Error: Not Found for url: https://tass.ru/en/politics/845464&gt
Error extracting text from https://www.latimes.com/politics/story/2020-03-20/stock-trades-by-lawmakers-who-got-coronavirus-briefings: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/story/2020-03-20/stock-trades-by-lawmakers-who-got-coronavirus-briefings


Processing URLs:  60%|█████▉    | 598/1000 [44:11<12:43,  1.90s/it]

Error extracting text from http://reut.rs/1MUjD1Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-brazil-rousseff-impeachment-idUSKCN0SM2NC20151028


Processing URLs:  60%|██████    | 602/1000 [44:14<05:59,  1.11it/s]

Error extracting text from https://www.france24.com/en/live-news/20210302-boeing-starliner-test-flight-postponed: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210302-boeing-starliner-test-flight-postponed
Error extracting text from http://www.nytimes.com/2015/10/14/us/politics/hillary-clinton-turns-up-heat-on-bernie-sanders-in-a-sharp-debate.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/14/us/politics/hillary-clinton-turns-up-heat-on-bernie-sanders-in-a-sharp-debate.html


Processing URLs:  61%|██████    | 606/1000 [44:26<22:10,  3.38s/it]

Error extracting text from https://www.washingtonpost.com/politics/manaforts-plan-to-greatly-benefit-the-putin-government/2017/03/22/52f6bfb8-0ee7-11e7-aa57-2ca1b05c41b8_story.html?utm_term=.1aaa0260a043: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/manaforts-plan-to-greatly-benefit-the-putin-government/2017/03/22/52f6bfb8-0ee7-11e7-aa57-2ca1b05c41b8_story.html?utm_term=.1aaa0260a043


Processing URLs:  61%|██████    | 607/1000 [44:26<17:13,  2.63s/it]

Error extracting text from http://gawker.com/france-expands-bombing-campaign-against-isis-into-syria-1733246001: 404 Client Error: Not Found for url: https://gawker.com/france-expands-bombing-campaign-against-isis-into-syria-1733246001


Processing URLs:  61%|██████    | 608/1000 [44:37<32:06,  4.91s/it]

URL filtered: https://twitter.com/matzschmale/status/1394880930687373313


Processing URLs:  61%|██████    | 611/1000 [44:43<22:03,  3.40s/it]

URL filtered: http://www.secureworldexpo.com/over-100-bugs-detected-dod-networks-hack-pentagon-program?utm_content=34033469&amp;utm_medium=social&amp;utm_source=linkedin


Processing URLs:  61%|██████▏   | 614/1000 [44:44<11:39,  1.81s/it]

Error extracting text from https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp
Error extracting text from https://www.reuters.com/article/us-southchinasea-philippines-china-idUSKCN1AV0VJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-china-idUSKCN1AV0VJ
URL filtered: https://www.youtube.com/watch?v=7S94ohyErSw


Processing URLs:  62%|██████▏   | 617/1000 [44:45<06:04,  1.05it/s]

Error extracting text from http://www.globalconstructionreview.com/news/china-designs-baby-reactors-pow7er-isla7nds-sou7th/: 403 Client Error: Forbidden for url: http://www.globalconstructionreview.com/news/china-designs-baby-reactors-pow7er-isla7nds-sou7th/


Processing URLs:  62%|██████▏   | 618/1000 [44:46<06:24,  1.01s/it]

Error extracting text from https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=9a792027-4fb4-4ddf-aef0-5e856aad7b65&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KS1-FFP1-JDJN-629N-00000-00&amp;pddocid=urn%3AcontentItem%3A5KS1-FFP1-: 403 Client Error: Forbidden for url: https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=9a792027-4fb4-4ddf-aef0-5e856aad7b65&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KS1-FFP1-JDJN-629N-00000-00&amp;pddocid=urn%3AcontentItem%3A5KS1-FFP1-


Processing URLs:  62%|██████▏   | 619/1000 [45:46<1:26:40, 13.65s/it]

Error extracting text from http://www.miamiherald.com/news/politics-government/article173492596.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 620/1000 [45:47<1:08:15, 10.78s/it]

Error extracting text from http://www.gov.me/en/News/154766/Brussels-Montenegro-s-progress-is-encouraging-sign-ahead-of-NATO-foreign-ministers-in-December.html: 404 Client Error: not found for url: https://www.gov.me/en/News/154766/Brussels-Montenegro-s-progress-is-encouraging-sign-ahead-of-NATO-foreign-ministers-in-December.html


Processing URLs:  62%|██████▏   | 624/1000 [45:54<26:21,  4.21s/it]  

Error extracting text from https://www.medicare.gov/hospitalcompare/search.html: 403 Client Error: Forbidden for url: https://www.medicare.gov/hospitalcompare/search.html


Processing URLs:  62%|██████▎   | 625/1000 [45:57<23:50,  3.82s/it]

Error extracting text from http://www.trtworld.com/europe/russia-to-build-permanent-naval-base-in-syrias-tartus-204121: 404 Client Error: Not Found for url: https://www.trtworld.com:443/europe/russia-to-build-permanent-naval-base-in-syrias-tartus-204121
Error extracting text from http://news.mb.com.ph/2017/01/26/lorenzana-us-to-build-fix-facilities-in-ph-camps-under-edca/: HTTPConnectionPool(host='news.mb.com.ph', port=80): Max retries exceeded with url: /2017/01/26/lorenzana-us-to-build-fix-facilities-in-ph-camps-under-edca/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3d8ad0>: Failed to resolve 'news.mb.com.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 627/1000 [45:58<14:05,  2.27s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-22/cut-oil-supply-or-drop-riyal-peg-saudis-face-critical-choice


Processing URLs:  63%|██████▎   | 632/1000 [46:01<06:31,  1.06s/it]

Error extracting text from http://www.nytimes.com/2015/11/25/world/europe/turkey-syria-russia-military-plane.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/25/world/europe/turkey-syria-russia-military-plane.html?ref=world


Processing URLs:  63%|██████▎   | 633/1000 [46:02<06:04,  1.01it/s]

Error extracting text from https://finance.yahoo.com/news/polkadot-based-pontem-raises-4-201000510.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/polkadot-based-pontem-raises-4-201000510.html


Processing URLs:  64%|██████▎   | 637/1000 [46:07<07:16,  1.20s/it]

Error extracting text from https://wikileaks.org/podesta-emails/emailid/33728: 403 Client Error: Forbidden for url: https://wikileaks.org/podesta-emails/emailid/33728


Processing URLs:  64%|██████▍   | 638/1000 [46:08<07:02,  1.17s/it]

Error extracting text from http://www.iran-daily.com/News/135581.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  64%|██████▍   | 639/1000 [46:10<07:07,  1.18s/it]

Error extracting text from https://www.reuters.com/world/europe/frances-macron-launches-bid-second-term-president-2022-03-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/frances-macron-launches-bid-second-term-president-2022-03-03/


Processing URLs:  64%|██████▍   | 642/1000 [46:15<07:56,  1.33s/it]

Error extracting text from https://www.nytimes.com/2017/11/08/us/politics/senate-republicans-will-diverge-from-house-in-sweeping-tax-rewrite.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/08/us/politics/senate-republicans-will-diverge-from-house-in-sweeping-tax-rewrite.html


Processing URLs:  64%|██████▍   | 644/1000 [46:16<04:40,  1.27it/s]

Error extracting text from http://www.wsj.com/articles/iran-hard-liners-reassert-influence-on-election-slate-1453227140: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-hard-liners-reassert-influence-on-election-slate-1453227140
Error extracting text from http://www.reuters.com/article/us-poland-constitution-eu-schinas-idUSKCN0YF1OL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-constitution-eu-schinas-idUSKCN0YF1OL


Processing URLs:  65%|██████▍   | 646/1000 [46:19<06:44,  1.14s/it]

Error extracting text from https://en.radiofarda.com/a/iran-withdraw-from-nuclear-deal-if-no-economic-benefit/29056526.html: 403 Client Error: Forbidden for url: https://en.radiofarda.com/a/iran-withdraw-from-nuclear-deal-if-no-economic-benefit/29056526.html
URL filtered: http://www.dailymail.co.uk/news/article-4128748/Sturgeon-second-independence-referendum-likely.html?ito=social-twitter_dailymailUK


Processing URLs:  65%|██████▌   | 650/1000 [46:25<08:27,  1.45s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/02/23/736473/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/02/23/736473/story.html


Processing URLs:  65%|██████▌   | 653/1000 [46:34<14:29,  2.51s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/iraq-prepares-fight-post-mosul-44080169: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iraq-prepares-fight-post-mosul-44080169


Processing URLs:  66%|██████▌   | 655/1000 [46:37<11:56,  2.08s/it]

Error extracting text from http://basc.berkeley.edu/wp-content/uploads/2016/04/BASC-Newsletter-Winter-15_16.pdf: 404 Client Error: Not Found for url: https://basc.berkeley.edu/wp-content/uploads/2016/04/BASC-Newsletter-Winter-15_16.pdf


Processing URLs:  66%|██████▌   | 656/1000 [46:39<10:39,  1.86s/it]

Error extracting text from https://www.nytimes.com/2017/07/06/world/europe/donald-trump-poland-speech.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&quot: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/06/world/europe/donald-trump-poland-speech.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&quot


Processing URLs:  66%|██████▌   | 659/1000 [46:40<05:18,  1.07it/s]

Error extracting text from https://www.yazooherald.net/front-page-slideshow-news-most-recent/multiple-people-shot-yazoo-city-nightclub#sthash.IAeqIIii.OXudbJrx.dpbs: 403 Client Error: Forbidden for url: https://www.yazooherald.net/front-page-slideshow-news-most-recent/multiple-people-shot-yazoo-city-nightclub#sthash.IAeqIIii.OXudbJrx.dpbs
Error extracting text from http://ajw.asahi.com/article/behind_news/politics/AJ201511160034: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/behind_news/politics/AJ201511160034 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30018fbf0>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-britain-eu-brexiteers/brexit-campaigners-accuse-may-of-selling-uk-short-over-divorce-bill-idUSKBN1DT1QH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu

Processing URLs:  66%|██████▌   | 662/1000 [46:44<06:22,  1.13s/it]

Error extracting text from https://www.thecipherbrief.com/article/tech/power-botnets-amplifying-crime-disinformation-and-espionage-1092?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=856c599d29-EMAIL_CAMPAIGN_2017_04_30&amp;utm_medium=email&amp;utm_term=0_02cbee778d-856c599d29-122492589: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/tech/power-botnets-amplifying-crime-disinformation-and-espionage-1092?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=856c599d29-EMAIL_CAMPAIGN_2017_04_30&amp;utm_medium=email&amp;utm_term=0_02cbee778d-856c599d29-122492589


Processing URLs:  66%|██████▋   | 665/1000 [46:56<16:19,  2.92s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/venezuelas-assembly-declares-powerful-49106209: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/venezuelas-assembly-declares-powerful-49106209


Processing URLs:  67%|██████▋   | 669/1000 [47:04<10:08,  1.84s/it]

Error extracting text from https://www.rferl.org/a/macedonia-at-a-crossroad-explainer-violence/28457721.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/macedonia-at-a-crossroad-explainer-violence/28457721.html
Error extracting text from https://www.rferl.org/a/russia-open-skies-treaty/31294144.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/russia-open-skies-treaty/31294144.html


Processing URLs:  67%|██████▋   | 673/1000 [47:08<06:33,  1.20s/it]

Error extracting text from http://www.reuters.com/article/us-philippines-duterte-idUSKCN11Y1ZI?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-duterte-idUSKCN11Y1ZI?il=0


Processing URLs:  67%|██████▋   | 674/1000 [47:10<06:49,  1.26s/it]

Error extracting text from https://www.reuters.com/article/us-usa-iran-nuclear/u-s-s-blinken-the-path-to-diplomacy-is-open-right-now-with-iran-idUSKBN2AG2LT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-nuclear/u-s-s-blinken-the-path-to-diplomacy-is-open-right-now-with-iran-idUSKBN2AG2LT
URL filtered: https://www.bloomberg.com/news/live-blog/2016-11-28/putin-speaks-on-russian-economy-geopolitics-in-annual-news-conference


Processing URLs:  68%|██████▊   | 683/1000 [47:22<03:44,  1.41it/s]

Error extracting text from http://www.reuters.com/article/2015/11/16/eurozone-greece-nowotny-debt-idUSL8N13B4W020151116: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/eurozone-greece-nowotny-debt-idUSL8N13B4W020151116
Error extracting text from http://www.straitstimes.com/asia/se-asia/flight-to-spratly-reef-airfield-within-chinas-sovereignty-beijing: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  68%|██████▊   | 684/1000 [47:22<03:02,  1.73it/s]

Error extracting text from http://www.nytimes.com/2008/11/23/fashion/23biden.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2008/11/23/fashion/23biden.html


Processing URLs:  68%|██████▊   | 685/1000 [47:23<03:37,  1.45it/s]

Error extracting text from https://cleantechnica.com/2015/03/26/ev-battery-costs-already-probably-cheaper-than-2020-projections/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2015/03/26/ev-battery-costs-already-probably-cheaper-than-2020-projections/


Processing URLs:  69%|██████▊   | 687/1000 [47:38<21:57,  4.21s/it]

Error extracting text from http://www.focus-fen.net/news/2015/11/24/390470/nato-to-hold-emergency-meeting-over-downed-russian-jet.html: HTTPConnectionPool(host='www.focus-fen.net', port=80): Max retries exceeded with url: /news/2015/11/24/390470/nato-to-hold-emergency-meeting-over-downed-russian-jet.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303caee70>: Failed to resolve 'www.focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  69%|██████▉   | 689/1000 [47:39<12:26,  2.40s/it]

Error extracting text from http://www.nytimes.com/2016/03/20/health/zika-virus-puerto-rico.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/20/health/zika-virus-puerto-rico.html?_r=0
Error extracting text from http://www.reuters.com/article/us-germany-russia-cyber-idUSKCN0Y41FC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-russia-cyber-idUSKCN0Y41FC


Processing URLs:  69%|██████▉   | 692/1000 [47:41<06:39,  1.30s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/308023: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/308023


Processing URLs:  69%|██████▉   | 694/1000 [47:44<06:38,  1.30s/it]

URL filtered: http://www.bloombergview.com/articles/2016-02-29/iran-s-elections-are-magic


Processing URLs:  70%|██████▉   | 696/1000 [47:44<03:59,  1.27it/s]

Error extracting text from http://sacredheartspectrum.com/2015/12/brazil-supreme-court-makes-impeachment-of-president/: HTTPConnectionPool(host='sacredheartspectrum.com', port=80): Max retries exceeded with url: /2015/12/brazil-supreme-court-makes-impeachment-of-president/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304635640>: Failed to resolve 'sacredheartspectrum.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|██████▉   | 697/1000 [47:45<04:26,  1.14it/s]

URL filtered: https://www.youtube.com/watch?v=B7gjr0m7GR8


Processing URLs:  70%|███████   | 703/1000 [47:50<03:44,  1.32it/s]

Error extracting text from http://www.populationlabs.com/colombia_population.asp: 404 Client Error: Not Found for url: http://www.populationlabs.com/colombia_population.asp
Error extracting text from http://www.nytimes.com/2016/07/26/world/europe/turkey-cracks-down-on-journalists-its-next-target-after-failed-coup.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/26/world/europe/turkey-cracks-down-on-journalists-its-next-target-after-failed-coup.html


Processing URLs:  70%|███████   | 705/1000 [47:54<06:20,  1.29s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/germanys-potential-coalition-partners-agree-on-energy-wrangle-over-health-idUSKBN1FN0BC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/germanys-potential-coalition-partners-agree-on-energy-wrangle-over-health-idUSKBN1FN0BC


Processing URLs:  71%|███████   | 707/1000 [47:55<04:48,  1.01it/s]

Error extracting text from https://www.crunchbase.com/funding_round/samumed-series-a--1963acc7: 403 Client Error: Forbidden for url: https://www.crunchbase.com/funding_round/samumed-series-a--1963acc7


Processing URLs:  72%|███████▏  | 715/1000 [48:11<10:42,  2.25s/it]

URL filtered: https://www.youtube.com/watch?v=1Oy9wtZ3MXI


Processing URLs:  72%|███████▏  | 719/1000 [48:16<06:24,  1.37s/it]

Error extracting text from https://www.wsj.com/articles/britains-unreal-brexit-transition-debate-1501702923: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/britains-unreal-brexit-transition-debate-1501702923


Processing URLs:  72%|███████▏  | 722/1000 [48:19<06:14,  1.35s/it]

Error extracting text from http://www.sabc.co.za/news/a/a99eab004d2586e3ae29ae830b7eb7b6/Brazil-interim-President-Temer-sought-illegal-campaign-funds-: 404 Client Error: Not Found for url: https://www.sabc.co.za:443/news/a/a99eab004d2586e3ae29ae830b7eb7b6/Brazil-interim-President-Temer-sought-illegal-campaign-funds-


Processing URLs:  72%|███████▎  | 725/1000 [48:24<06:09,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-aid-idUSKCN1250UI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-aid-idUSKCN1250UI


Processing URLs:  73%|███████▎  | 726/1000 [48:25<06:11,  1.36s/it]

Error extracting text from http://uk.reuters.com/article/2015/11/29/uk-mideast-crisis-usa-military-idUKKBN0TI0UM20151129: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  73%|███████▎  | 727/1000 [48:27<06:21,  1.40s/it]

URL filtered: https://www.bloomberglaw.com/product/blaw/document/X1LHGB80000000?emc=BLAW%3A154417071%3A1&amp;resource_id=c1c88eb7a560c4ee383df2568a2e4d0c&amp;search32=


Processing URLs:  73%|███████▎  | 730/1000 [48:33<07:19,  1.63s/it]

Error extracting text from http://www.philstar.com/headlines/2017/05/08/1697947/australia-japan-forces-join-balikatan-2017: 404 Client Error: Not Found for url: https://www.philstar.com/404?msg=article%20404%20-%201


Processing URLs:  73%|███████▎  | 733/1000 [48:38<07:23,  1.66s/it]

Error extracting text from https://www.timesofisrael.com/court-pushes-off-ruling-on-evictions-of-palestinian-families-in-silwan/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/court-pushes-off-ruling-on-evictions-of-palestinian-families-in-silwan/


Processing URLs:  74%|███████▎  | 735/1000 [48:42<07:04,  1.60s/it]

Error extracting text from http://www.wsj.com/articles/u-n-security-council-chooses-antonio-guterres-as-secretary-general-1475692334: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-security-council-chooses-antonio-guterres-as-secretary-general-1475692334


Processing URLs:  74%|███████▎  | 737/1000 [48:45<07:11,  1.64s/it]

Error extracting text from http://www.atimes.com/article/shoal-concerns-china-asserts-power-via-control-concessions/: 404 Client Error: Not Found for url: https://atimes.com/article/shoal-concerns-china-asserts-power-via-control-concessions/


Processing URLs:  74%|███████▍  | 739/1000 [48:47<05:18,  1.22s/it]

Error extracting text from https://www.fdd.org/analysis/2021/07/14/mapping-the-taliban-offensive-in-afghanistan/: 403 Client Error: Forbidden for url: https://www.fdd.org/analysis/2021/07/14/mapping-the-taliban-offensive-in-afghanistan/


Processing URLs:  74%|███████▍  | 740/1000 [48:48<04:21,  1.01s/it]

Error extracting text from http://fortruss.blogspot.com/2016/01/montenegro-nato-membership-to-be-or-not.html: 404 Client Error: Not Found for url: http://fortruss.blogspot.com/2016/01/montenegro-nato-membership-to-be-or-not.html


Processing URLs:  74%|███████▍  | 742/1000 [48:50<04:03,  1.06it/s]

Error extracting text from http://www.nytimes.com/2016/05/11/world/americas/dilma-rousseff-impeachment-brazil.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/11/world/americas/dilma-rousseff-impeachment-brazil.html


Processing URLs:  74%|███████▍  | 744/1000 [48:51<03:19,  1.28it/s]

Error extracting text from https://thenextweb.com/insider/2017/08/22/the-us-navy-is-investigating-possibility-of-cyber-attack-in-latest-collision/: 403 Client Error: Forbidden for url: https://thenextweb.com/insider/2017/08/22/the-us-navy-is-investigating-possibility-of-cyber-attack-in-latest-collision/


Processing URLs:  74%|███████▍  | 745/1000 [48:52<02:48,  1.51it/s]

Error extracting text from https://resultadoselecciones2016.onpe.gob.pe/PRP2V2016/Resumen-GeneralPresidencial.html#posicion: HTTPSConnectionPool(host='resultadoselecciones2016.onpe.gob.pe', port=443): Max retries exceeded with url: /PRP2V2016/Resumen-GeneralPresidencial.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301cd3d10>: Failed to resolve 'resultadoselecciones2016.onpe.gob.pe' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▌  | 757/1000 [49:11<05:15,  1.30s/it]

Error extracting text from https://www.reuters.com/article/us-usa-russia-nuclear-idUSKBN29Q2I4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-nuclear-idUSKBN29Q2I4


Processing URLs:  76%|███████▌  | 758/1000 [49:12<04:20,  1.08s/it]

Error extracting text from http://gcaptain.com/panama-canal-expansion-to-open-end-june/: 403 Client Error: Forbidden for url: http://gcaptain.com/panama-canal-expansion-to-open-end-june/


Processing URLs:  76%|███████▋  | 763/1000 [49:26<07:30,  1.90s/it]

Error extracting text from http://www.chicagotribune.com/suburbs/highland-park/news/ct-hpn-election-integrity-forum-tl-1102-20171031-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/suburbs/highland-park/news/ct-hpn-election-integrity-forum-tl-1102-20171031-story.html


Processing URLs:  76%|███████▋  | 764/1000 [49:27<06:23,  1.63s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/iowa-governor-calls-anti-cruz-vote-ethanol-issue-36381493: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/iowa-governor-calls-anti-cruz-vote-ethanol-issue-36381493


Processing URLs:  76%|███████▋  | 765/1000 [49:30<07:53,  2.01s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-06/why-pravin-gordhan-can-t-quit-as-south-africa-s-finance-chief


Processing URLs:  77%|███████▋  | 767/1000 [49:32<06:14,  1.61s/it]

Error extracting text from https://www.fda.gov/forpatients/approvals/fast/ucm405405.htm: 403 Client Error: Forbidden for url: https://www.fda.gov/forpatients/approvals/fast/ucm405405.htm


Processing URLs:  77%|███████▋  | 772/1000 [49:38<04:22,  1.15s/it]

Error extracting text from http://in.reuters.com/article/yemen-security-court-idINKBN16X02Z?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  77%|███████▋  | 774/1000 [49:42<06:38,  1.76s/it]

URL filtered: https://www.youtube.com/watch?v=m5WJJVSE_BE


Processing URLs:  78%|███████▊  | 779/1000 [49:46<02:59,  1.23it/s]

Error extracting text from https://www.france24.com/en/live-news/20210806-on-the-frontline-afghan-woman-governor-recruits-anti-taliban-militia: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210806-on-the-frontline-afghan-woman-governor-recruits-anti-taliban-militia
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-islamic-state-idUSKCN0ZE1XK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-islamic-state-idUSKCN0ZE1XK


Processing URLs:  78%|███████▊  | 782/1000 [49:50<04:17,  1.18s/it]

Error extracting text from https://www.eurasiagroup.net/signal: 404 Client Error: Not Found for url: https://www.eurasiagroup.net/signal


Processing URLs:  79%|███████▊  | 786/1000 [49:52<02:22,  1.50it/s]

Error extracting text from http://www.nytimes.com/2015/12/11/us/politics/to-democrats-donald-trump-is-no-longer-a-laughing-matter.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/11/us/politics/to-democrats-donald-trump-is-no-longer-a-laughing-matter.html
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN17Q2F4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN17Q2F4


Processing URLs:  79%|███████▊  | 787/1000 [49:52<01:47,  1.98it/s]

Error extracting text from http://www.nytimes.com/2015/11/02/world/asia/china-japan-and-south-korea-conduct-first-trilateral-meeting-in-3-years.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/02/world/asia/china-japan-and-south-korea-conduct-first-trilateral-meeting-in-3-years.html?_r=0


Processing URLs:  80%|███████▉  | 796/1000 [50:06<04:33,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-rights-un-idUSKCN0WG0OY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-rights-un-idUSKCN0WG0OY


Processing URLs:  80%|███████▉  | 799/1000 [50:08<03:08,  1.06it/s]

Error extracting text from https://focustaiwan.tw/politics/202012060015: 403 Client Error: Forbidden for url: https://focustaiwan.tw/politics/202012060015


Processing URLs:  80%|████████  | 801/1000 [50:11<04:08,  1.25s/it]

URL filtered: https://www.youtube.com/watch?v=dATyZBEeDJ4


Processing URLs:  80%|████████  | 804/1000 [50:13<02:56,  1.11it/s]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2422604&amp;CategoryId=10717: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2422604&amp;CategoryId=10717


Processing URLs:  81%|████████  | 808/1000 [50:22<04:53,  1.53s/it]

Error extracting text from http://www.capradio.org/articles/2016/01/06/el-nino-helps,-wont-end-historic-california-drought/: 403 Client Error: Forbidden for url: http://www.capradio.org/articles/2016/01/06/el-nino-helps,-wont-end-historic-california-drought/
Error extracting text from http://www.ndtv.com/world-news/fire-ruins-600-houses-in-myanmar-1277150: 403 Client Error: Forbidden for url: http://www.ndtv.com/world-news/fire-ruins-600-houses-in-myanmar-1277150


Processing URLs:  81%|████████  | 810/1000 [50:24<03:56,  1.24s/it]

Error extracting text from http://www.sabc.co.za/news/a/ddee5b804b8526db8b72ff77bc6a42c4/Dlamini-Zuma-still-undecided-on-second-term-at-AU: 404 Client Error: Not Found for url: https://www.sabc.co.za:443/news/a/ddee5b804b8526db8b72ff77bc6a42c4/Dlamini-Zuma-still-undecided-on-second-term-at-AU


Processing URLs:  81%|████████  | 811/1000 [50:24<03:09,  1.00s/it]

Error extracting text from https://www.yahoo.com/news/driverless-cars-hit-british-streets-landmark-trial-005545714.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/driverless-cars-hit-british-streets-landmark-trial-005545714.html


Processing URLs:  81%|████████▏ | 814/1000 [50:35<09:40,  3.12s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.belgiumsun.com/index.php/sid/248310567: Document is empty
Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN1AL0R1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN1AL0R1


Processing URLs:  82%|████████▏ | 817/1000 [50:36<03:53,  1.28s/it]

Error extracting text from http://www.wsj.com/articles/uranium-provides-new-clue-on-irans-past-nuclear-arms-work-1466380760: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/uranium-provides-new-clue-on-irans-past-nuclear-arms-work-1466380760


Processing URLs:  82%|████████▏ | 819/1000 [50:38<03:49,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1361D3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1361D3


Processing URLs:  82%|████████▏ | 822/1000 [50:43<04:24,  1.48s/it]

Error extracting text from http://www.theaustralian.com.au/news/latest-news/greek-pm-courts-opposition-amid-protests/story-fn3dxix6-1227624799536: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/news/latest-news/greek-pm-courts-opposition-amid-protests/story-fn3dxix6-1227624799536?nk=61bb09d96508bdbebd4bbfceffaba7a1-1706789647


Processing URLs:  82%|████████▏ | 823/1000 [50:44<03:56,  1.34s/it]

Error extracting text from http://news.usni.org/2016/02/16/expert-on-nato-calls-for-permanent-alliance-military-presence-in-baltics-as-hedge-against-russia-military-action: 403 Client Error: Forbidden for url: http://news.usni.org/2016/02/16/expert-on-nato-calls-for-permanent-alliance-military-presence-in-baltics-as-hedge-against-russia-military-action


Processing URLs:  82%|████████▎ | 825/1000 [50:47<03:59,  1.37s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_15SEP.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_15SEP.pdf
URL filtered: https://www.bloomberg.com/news/articles/2021-05-11/putin-moves-to-quit-open-skies-as-russia-looks-to-biden-summit


Processing URLs:  83%|████████▎ | 828/1000 [50:48<02:21,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuelas-pdvsa-delays-121-million-interest-payment-investors-idUSKBN1CI2P7?utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=59e: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuelas-pdvsa-delays-121-million-interest-payment-investors-idUSKBN1CI2P7?utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=59e


Processing URLs:  83%|████████▎ | 829/1000 [50:48<02:01,  1.40it/s]

Error extracting text from https://www.hhs.gov/about/news/2021/06/09/biden-administration-announces-us-government-procurement-mercks-investigational-antiviral-medicine-covid-19-treatment.html: 403 Client Error: Forbidden for url: https://www.hhs.gov/about/news/2021/06/09/biden-administration-announces-us-government-procurement-mercks-investigational-antiviral-medicine-covid-19-treatment.html


Processing URLs:  83%|████████▎ | 830/1000 [50:58<08:41,  3.07s/it]

Error extracting text from https://www.washingtonpost.com/politics/congress/house-oks-lifting-40-year-old-us-ban-on-oil-exports/2015/10/09/6dd69860-6eb8-11e5-91eb-27ad15c2b723_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/congress/house-oks-lifting-40-year-old-us-ban-on-oil-exports/2015/10/09/6dd69860-6eb8-11e5-91eb-27ad15c2b723_story.html


Processing URLs:  83%|████████▎ | 833/1000 [51:03<05:17,  1.90s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-aid-idUSKCN0YM28L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-aid-idUSKCN0YM28L


Processing URLs:  84%|████████▎ | 835/1000 [51:04<03:51,  1.40s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/china-must-be-ready-for-military-conflict-in-south-china-sea-state-media/articleshow/53054855.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/china-must-be-ready-for-military-conflict-in-south-china-sea-state-media/articleshow/53054855.cms


Processing URLs:  84%|████████▎ | 836/1000 [51:06<04:06,  1.51s/it]

Error extracting text from http://www.joc.com/maritime-news/container-lines/hapag-lloyd/hapag-won%E2%80%99t-upsize-panama-canal-tonnage-end-year-ceo-says_20160114.html: 404 Client Error: Not Found for url: https://www.joc.com/article/hapag-wont-upsize-panama-canal-tonnage-end-year-ceo-says_20160114.html
URL filtered: https://www.brookings.edu/blog/markaz/2016/10/11/egyptian-economic-policy-see-no-evil-hear-no-evil/?utm_medium=social&amp;utm_source=twitter&amp;utm_campaign=fp


Processing URLs:  84%|████████▍ | 841/1000 [51:12<03:45,  1.42s/it]

URL filtered: https://www.facebook.com/Channel4News/videos/10153492613721939/?pnref=story


Processing URLs:  84%|████████▍ | 843/1000 [51:12<02:24,  1.09it/s]

Error extracting text from https://theconversation.com/morrisons-ratings-take-a-hit-in-newspoll-as-coalition-notionally-loses-a-seat-in-redistribution-158048: 403 Client Error: Forbidden for url: https://theconversation.com/morrisons-ratings-take-a-hit-in-newspoll-as-coalition-notionally-loses-a-seat-in-redistribution-158048


Processing URLs:  86%|████████▌ | 859/1000 [51:38<03:06,  1.33s/it]

Error extracting text from http://news.yahoo.com/hundreds-thousands-migrants-waiting-libya-074938892.html: 404 Client Error: Not Found for url: http://news.yahoo.com/hundreds-thousands-migrants-waiting-libya-074938892.html


Processing URLs:  86%|████████▌ | 861/1000 [51:42<04:15,  1.84s/it]

Error extracting text from http://ewn.co.za/2018/02/10/critics-in-merkel-s-conservatives-vow-to-block-coalition-projects: 404 Client Error: Not Found for url: https://www.ewn.co.za/2018/02/10/critics-in-merkel-s-conservatives-vow-to-block-coalition-projects


Processing URLs:  87%|████████▋ | 867/1000 [52:01<07:30,  3.39s/it]

Error extracting text from https://www.predictit.org/Market/1448/Who-will-win-the-2016-Iowa-Democratic-caucus: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1448/Who-will-win-the-2016-Iowa-Democratic-caucus


Processing URLs:  87%|████████▋ | 869/1000 [52:02<04:05,  1.87s/it]

Error extracting text from http://www.tandfonline.com/doi/full/10.1080/00963402.2016.1170361: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/full/10.1080/00963402.2016.1170361


Processing URLs:  87%|████████▋ | 871/1000 [52:04<03:25,  1.59s/it]

Error extracting text from http://www.pcacases.com/web/sendAttach/1503: 406 Client Error: Not Acceptable for url: http://www.pcacases.com/web/sendAttach/1503
URL filtered: https://www.bloomberg.com/news/articles/2017-09-11/iran-nuclear-inspections-double-under-deal-questioned-by-trump


Processing URLs:  87%|████████▋ | 873/1000 [52:05<01:55,  1.10it/s]

Error extracting text from http://www.autonews.com/article/20160411/RETAIL06/304119993/lagging-infrastructure-slows-fuel-cell-growth-lentz-says: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160411/RETAIL06/304119993/lagging-infrastructure-slows-fuel-cell-growth-lentz-says


Processing URLs:  88%|████████▊ | 875/1000 [52:07<02:04,  1.00it/s]

Error extracting text from https://psi-2020.org/: 404 Client Error: Not Found for url: https://psi-2020.org/
Error extracting text from http://www.nbcnews.com/news/us-news/doj-ex-manafort-associate-firtash-top-tier-comrade-russian-mobsters-n786806: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/us-news/doj-ex-manafort-associate-firtash-top-tier-comrade-russian-mobsters-n786806


Processing URLs:  88%|████████▊ | 877/1000 [52:07<01:16,  1.62it/s]

Error extracting text from http://www.reuters.com/article/us-japan-navy-southchinasea-exclusive-idUSKBN16K0UP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-navy-southchinasea-exclusive-idUSKBN16K0UP


Processing URLs:  88%|████████▊ | 878/1000 [52:09<01:44,  1.17it/s]

Error extracting text from http://www.themoscowtimes.com/opinion/article/a-modest-deal-vladimir-putin-s-new-d-tente-op-ed/574322.html: 500 Server Error: Internal Server Error for url: https://www.themoscowtimes.com/opinion/article/a-modest-deal-vladimir-putin-s-new-d-tente-op-ed/574322.html
URL filtered: http://www.forbes.com/sites/chuckjones/2016/01/25/apple-dangerously-close-to-being-short-of-december-quarters-revenue-guidance/?utm_campaign=Forbes&amp;utm_source=TWITTER&amp;utm_medium=social&amp;utm_channel=Investing&amp;linkId=20656090#6bfd57d24129


Processing URLs:  88%|████████▊ | 881/1000 [52:10<01:19,  1.49it/s]

Error extracting text from http://www.wsj.com/articles/tesla-posts-second-profitable-quarter-ever-1477513454: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tesla-posts-second-profitable-quarter-ever-1477513454


Processing URLs:  88%|████████▊ | 883/1000 [52:13<01:58,  1.02s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Resources/StrategyWork/GPEI-MTR_July2015.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Resources/StrategyWork/GPEI-MTR_July2015.pdf


Processing URLs:  89%|████████▊ | 887/1000 [52:15<01:27,  1.29it/s]

Error extracting text from https://www.reuters.com/article/us-autoshow-detroit-peugeot/peugeot-ceo-outlines-ambitious-plan-to-re-enter-u-s-go-electric-idUSKBN1F706J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-autoshow-detroit-peugeot/peugeot-ceo-outlines-ambitious-plan-to-re-enter-u-s-go-electric-idUSKBN1F706J


Processing URLs:  89%|████████▉ | 891/1000 [52:18<01:17,  1.41it/s]

Error extracting text from http://www.caranddriver.com/columns/has-tesla-sunk-itself-with-the-model-x-column: 403 Client Error: Forbidden for url: http://www.caranddriver.com/columns/has-tesla-sunk-itself-with-the-model-x-column


Processing URLs:  89%|████████▉ | 892/1000 [52:19<01:30,  1.19it/s]

Error extracting text from https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/AHTRyQJtiRin22kth?commentId=Kh9KYXuaLPQrCSRK8: 403 Client Error: Forbidden for url: https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/AHTRyQJtiRin22kth?commentId=Kh9KYXuaLPQrCSRK8


Processing URLs:  89%|████████▉ | 894/1000 [52:20<00:59,  1.78it/s]

Error extracting text from https://www.rferl.org/a/syria-russia-severl-dozen-killed-idlib/29050568.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/syria-russia-severl-dozen-killed-idlib/29050568.html


Processing URLs:  90%|████████▉ | 896/1000 [52:24<02:10,  1.25s/it]

Error extracting text from http://reut.rs/1UeSNos: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  90%|████████▉ | 899/1000 [52:30<02:32,  1.51s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/12/forecasting-tournament-probabilities-are-overly-stable-compared-to-prediction-market-probabilities-due-to-stale-predictions/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/12/forecasting-tournament-probabilities-are-overly-stable-compared-to-prediction-market-probabilities-due-to-stale-predictions/


Processing URLs:  90%|█████████ | 900/1000 [53:30<30:27, 18.27s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/national/article186579713.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  90%|█████████ | 903/1000 [53:43<13:34,  8.40s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/13/gitrep-13apr16am/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/13/gitrep-13apr16am/


Processing URLs:  91%|█████████ | 906/1000 [53:47<06:00,  3.83s/it]



Processing URLs:  91%|█████████ | 912/1000 [53:58<01:55,  1.32s/it]

Error extracting text from https://www.nytimes.com/2018/05/08/world/middleeast/trump-iran-nuclear-deal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/05/08/world/middleeast/trump-iran-nuclear-deal.html
Error extracting text from http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?em_pos=large&amp;emc=edit_nn_20160527&amp;nl=morning-briefing&amp;nlid=47572839&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?em_pos=large&amp;emc=edit_nn_20160527&amp;nl=morning-briefing&amp;nlid=47572839&amp;_r=0


Processing URLs:  91%|█████████▏| 914/1000 [54:00<01:44,  1.22s/it]

Error extracting text from http://ir.tesla.com/releasedetail.cfm?ReleaseID=963460: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?ReleaseID=963460


Processing URLs:  92%|█████████▏| 917/1000 [54:05<02:05,  1.51s/it]

Error extracting text from http://www.spacex.com/falcon9: 404 Client Error: The requested content does not exist. for url: https://www.spacex.com/falcon9


Processing URLs:  92%|█████████▏| 919/1000 [54:09<02:06,  1.57s/it]

Error extracting text from http://www.technologyreview.com/news/545276/how-i-learned-to-keep-worrying-even-if-north-korea-didnt-test-an-h-bomb/: 404 Client Error: Not Found for url: https://www.technologyreview.com/news/545276/how-i-learned-to-keep-worrying-even-if-north-korea-didnt-test-an-h-bomb/


Processing URLs:  92%|█████████▏| 920/1000 [54:09<01:39,  1.25s/it]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/09/03/venezuela-default-risk-overstated/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/09/03/venezuela-default-risk-overstated/


Processing URLs:  92%|█████████▏| 923/1000 [54:13<01:39,  1.29s/it]

Error extracting text from http://wpo.st/ksdA1: 503 Server Error: Service Unavailable: Back-end server is at capacity for url: http://wpo.st/ksdA1


Processing URLs:  92%|█████████▎| 925/1000 [54:14<01:01,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-candidates-idUSKBN13U0XT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-candidates-idUSKBN13U0XT?il=0


Processing URLs:  93%|█████████▎| 927/1000 [54:16<01:01,  1.19it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-wants-outline-coalition-deal-with-spd-by-mid-january-idUSKBN1EC123: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-wants-outline-coalition-deal-with-spd-by-mid-january-idUSKBN1EC123


Processing URLs:  93%|█████████▎| 933/1000 [54:25<01:17,  1.15s/it]

Error extracting text from https://www.foxnews.com/politics/robert-redfield-cdc-covid-19-lab-leak-who-compromised: 403 Client Error: Forbidden for url: https://www.foxnews.com/politics/robert-redfield-cdc-covid-19-lab-leak-who-compromised
Error extracting text from http://www.nytimes.com/2016/03/07/world/europe/nato-expands-patrols-in-aegean-sea-to-stop-human-traffickers.html?emc=edit_th_20160307&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/07/world/europe/nato-expands-patrols-in-aegean-sea-to-stop-human-traffickers.html?emc=edit_th_20160307&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  94%|█████████▎| 936/1000 [54:27<00:52,  1.23it/s]

Error extracting text from http://www.foxnews.com/world/2017/09/15/cholera-aid-shortfalls-plague-desperate-refugees-conflict.html: 403 Client Error: Forbidden for url: http://www.foxnews.com/world/2017/09/15/cholera-aid-shortfalls-plague-desperate-refugees-conflict.html
Error extracting text from http://www.imf.org/external/pubs/ft/survey/so/2012/POL120312A.htm: 403 Client Error: Forbidden for url: http://www.imf.org/external/pubs/ft/survey/so/2012/POL120312A.htm


Processing URLs:  94%|█████████▎| 937/1000 [54:28<00:47,  1.32it/s]

Error extracting text from http://thehill.com/homenews/administration/347909-watergate-prosecutor-trump-acting-like-he-is-running-out-of-time: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/347909-watergate-prosecutor-trump-acting-like-he-is-running-out-of-time/


Processing URLs:  94%|█████████▍| 938/1000 [54:28<00:41,  1.48it/s]

Error extracting text from http://www.dispatch.com/content/stories/local/2016/05/11/portman-strickland-senate-ohio-poll.html: 404 Client Error: OK for url: https://www.dispatch.com/content/stories/local/2016/05/11/portman-strickland-senate-ohio-poll.html


Processing URLs:  94%|█████████▍| 940/1000 [54:30<00:46,  1.28it/s]

Error extracting text from http://interamericansecuritywatch.com/danilo-joao-and-odebrecht/: 406 Client Error: Not Acceptable for url: http://interamericansecuritywatch.com/danilo-joao-and-odebrecht/


Processing URLs:  94%|█████████▍| 943/1000 [54:33<00:48,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16T03H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16T03H


Processing URLs:  94%|█████████▍| 945/1000 [54:43<02:23,  2.61s/it]

Error extracting text from http://www.theaustralian.com.au/news/world/russia-draws-up-a-list-of-generals-to-replace-syrian-despot-assad/story-fnb64oi6-1227596507977: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/news/world/russia-draws-up-a-list-of-generals-to-replace-syrian-despot-assad/story-fnb64oi6-1227596507977?nk=d88d37ce127d7f70475cd53476c2e23c-1706789888


Processing URLs:  95%|█████████▌| 951/1000 [54:53<01:18,  1.59s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-turkey-pe-idUSKBN16H25O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-turkey-pe-idUSKBN16H25O


Processing URLs:  95%|█████████▌| 953/1000 [54:53<00:44,  1.05it/s]

Error extracting text from https://www.nytimes.com/2018/01/09/us/north-carolina-gerrymander.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/09/us/north-carolina-gerrymander.html


Processing URLs:  96%|█████████▋| 963/1000 [55:44<02:09,  3.49s/it]

Error extracting text from https://faculty.haas.berkeley.edu/arose/MR2Vox.pdf: 404 Client Error: Not Found for url: https://faculty.haas.berkeley.edu/arose/MR2Vox.pdf


Processing URLs:  97%|█████████▋| 968/1000 [55:53<01:03,  1.98s/it]

Error extracting text from http://thinkprogress.org/justice/2016/05/09/3776434/support-republican-partys-plans-supreme-court-collapsed/: 403 Client Error: Forbidden for url: https://thinkprogress.org/justice/2016/05/09/3776434/support-republican-partys-plans-supreme-court-collapsed/


Processing URLs:  97%|█████████▋| 969/1000 [55:53<00:45,  1.45s/it]

Error extracting text from http://www.msnbc.com/msnbc/joe-biden-calls-major-union-leader-fueling-speculation-run: 403 Client Error: Forbidden for url: http://www.msnbc.com/msnbc/joe-biden-calls-major-union-leader-fueling-speculation-run


Processing URLs:  97%|█████████▋| 970/1000 [55:54<00:39,  1.33s/it]

Error extracting text from https://www.technologyreview.com/s/603493/10-breakthrough-technologies-2017-self-driving-trucks/: 404 Client Error: Not Found for url: https://www.technologyreview.com/s/603493/10-breakthrough-technologies-2017-self-driving-trucks/


Processing URLs:  97%|█████████▋| 974/1000 [56:10<01:09,  2.67s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-britain-china-idUSKBN0UK0QQ20160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-britain-china-idUSKBN0UK0QQ20160106


Processing URLs:  98%|█████████▊| 975/1000 [56:12<00:59,  2.36s/it]

Error extracting text from http://tass.ru/en/politics/882485: 404 Client Error: Not Found for url: https://tass.ru/en/politics/882485


Processing URLs:  98%|█████████▊| 976/1000 [56:15<01:03,  2.63s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/encuesta-ipsos-todos-cuadros-ultimo-sondeo-2-noticia-1891394/1: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/encuesta-ipsos-todos-cuadros-ultimo-sondeo-2-noticia-1891394/1/


Processing URLs:  98%|█████████▊| 980/1000 [56:21<00:31,  1.56s/it]

Error extracting text from http://www.ibtimes.co.uk/fancy-bear-returns-russian-hackers-target-us-cyber-conference-booby-trapped-file-1644181: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/fancy-bear-returns-russian-hackers-target-us-cyber-conference-booby-trapped-file-1644181
Error extracting text from http://www.rfi.fr/afrique/20160411-burundi-proces-putschistes-mai-2015-cyrille-ndayirukiye: 403 Client Error: Forbidden for url: http://www.rfi.fr/afrique/20160411-burundi-proces-putschistes-mai-2015-cyrille-ndayirukiye


Processing URLs:  98%|█████████▊| 982/1000 [56:22<00:18,  1.01s/it]

Error extracting text from http://www.nytimes.com/2015/09/26/us/politics/donald-trump-sees-in-marco-rubio-a-new-rival-to-taunt-but-gets-plenty-of-salvos-in-return.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/26/us/politics/donald-trump-sees-in-marco-rubio-a-new-rival-to-taunt-but-gets-plenty-of-salvos-in-return.html?_r=0


Processing URLs:  98%|█████████▊| 984/1000 [56:25<00:19,  1.20s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-satellite-idUSKCN10M07H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-satellite-idUSKCN10M07H


Processing URLs:  98%|█████████▊| 985/1000 [56:26<00:16,  1.12s/it]

Error extracting text from https://fantasyscotus.lexpredict.com/case-prediction/us-v-tx/: HTTPSConnectionPool(host='fantasyscotus.lexpredict.com', port=443): Max retries exceeded with url: /case-prediction/us-v-tx/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3047230e0>: Failed to resolve 'fantasyscotus.lexpredict.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▉| 988/1000 [56:28<00:09,  1.29it/s]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.diariodopoder.com.br/noticia.php%3Fi%3D49916147614&amp;usg=ALkJrhgTpGLiUwb_c0x9YFQ_UO_1HCoWUg: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.diariodopoder.com.br/noticia.php%3Fi%3D49916147614&amp;usg=ALkJrhgTpGLiUwb_c0x9YFQ_UO_1HCoWUg


Processing URLs:  99%|█████████▉| 989/1000 [56:29<00:09,  1.21it/s]

Error extracting text from https://www.flightglobal.com/flight-international/can-supersonic-hopefuls-deliver-as-commercial-interest-booms/143002.article: 403 Client Error: Forbidden for url: https://www.flightglobal.com/flight-international/can-supersonic-hopefuls-deliver-as-commercial-interest-booms/143002.article


Processing URLs:  99%|█████████▉| 991/1000 [56:29<00:04,  1.93it/s]

Error extracting text from https://www.wsj.com/articles/saudi-arabia-set-to-raise-oil-output-amid-recovery-in-prices-11613570923?mod=hp_lista_pos2: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabia-set-to-raise-oil-output-amid-recovery-in-prices-11613570923?mod=hp_lista_pos2
Error extracting text from https://medium.com/@davidmarcus/good-stablecoins-a-protocol-for-money-and-digital-wallets-the-formula-to-fix-our-broken-payment-f11f59fc92d7: 403 Client Error: Forbidden for url: https://medium.com/@davidmarcus/good-stablecoins-a-protocol-for-money-and-digital-wallets-the-formula-to-fix-our-broken-payment-f11f59fc92d7


Processing URLs:  99%|█████████▉| 992/1000 [56:30<00:03,  2.29it/s]

Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)00564-X/abstract: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)00564-X/abstract


Processing URLs: 100%|█████████▉| 997/1000 [57:45<00:59, 19.82s/it]

Error extracting text from http://m.nasdaq.com/article/oil-prices-move-lower-opec-production-quota-eyed-20151126-00224: HTTPConnectionPool(host='m.nasdaq.com', port=80): Read timed out. (read timeout=60)


Processing URLs: 100%|██████████| 1000/1000 [57:48<00:00,  3.47s/it]


URL filtered: https://www.youtube.com/watch?v=EnmGKxMJ3Ik


Processing URLs:   0%|          | 1/1000 [00:00<12:53,  1.29it/s]

Error extracting text from https://origin-nyi.thehill.com/blogs/pundits-blog/international-affairs/340966-venezuela-has-two-tough-options-only-one-provides: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/international-affairs/340966-venezuela-has-two-tough-options-only-one-provides/


Processing URLs:   0%|          | 5/1000 [01:05<6:16:20, 22.69s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-06-27/trump-fbi-pick-chrisopher-wray-no-stranger-to-crisis: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
URL filtered: http://deadspin.com/nfl-twitter-hacked-announces-roger-goodells-death-1781061387


Processing URLs:   1%|          | 9/1000 [02:38<6:34:40, 23.90s/it]

Error extracting text from http://ec.europa.eu/trade/policy/countries-and-regions/regions/gulf-region/: 404 Client Error: (Not Found) for url: https://ec.europa.eu/policy/countries-and-regions/regions/gulf-region/


Processing URLs:   1%|          | 12/1000 [02:42<2:33:56,  9.35s/it]

Error extracting text from http://www.reuters.com/article/2015/09/20/us-iran-nuclear-visit-idUSKCN0RK0RX20150920: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/20/us-iran-nuclear-visit-idUSKCN0RK0RX20150920


Processing URLs:   1%|▏         | 14/1000 [03:35<4:17:23, 15.66s/it]

Error extracting text from http://www.reuters.com/article/2015/11/01/southchinasea-usa-defensechief-idUSL3N12W03K20151101: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/01/southchinasea-usa-defensechief-idUSL3N12W03K20151101


Processing URLs:   2%|▏         | 16/1000 [03:40<2:30:54,  9.20s/it]

URL filtered: https://twitter.com/eucopresident/status/1342124591884414979?s=20


Processing URLs:   2%|▏         | 18/1000 [03:41<1:25:16,  5.21s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/irans-constitutional-watchdog-ratifies-election-results-38368024: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/irans-constitutional-watchdog-ratifies-election-results-38368024


Processing URLs:   2%|▏         | 20/1000 [03:44<56:24,  3.45s/it]  

URL filtered: https://amti.csis.org/paracels-beijings-other-buildup/?utm_content=buffercb946&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:   2%|▏         | 24/1000 [03:53<42:48,  2.63s/it]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-bill/2012/text#idD22B17FAA36E4B0AB8373B2C79CF3A4B: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-bill/2012/text#idD22B17FAA36E4B0AB8373B2C79CF3A4B


Processing URLs:   3%|▎         | 28/1000 [04:03<34:14,  2.11s/it]

Error extracting text from https://www.wsj.com/articles/venezuelan-default-fears-rise-with-billions-in-debt-coming-due-soon-1501680162: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelan-default-fears-rise-with-billions-in-debt-coming-due-soon-1501680162


Processing URLs:   4%|▎         | 36/1000 [04:19<30:15,  1.88s/it]

Error extracting text from http://www.c4isrnet.com/story/military-tech/omr/missile-defense/2016/02/01/homeland-missile-defense-system-successful-non-intercept-flight-test/79654430/: 404 Client Error: Not Found for url: https://www.c4isrnet.com/story/military-tech/omr/missile-defense/2016/02/01/homeland-missile-defense-system-successful-non-intercept-flight-test/79654430/


Processing URLs:   4%|▎         | 37/1000 [04:26<56:56,  3.55s/it]

URL filtered: http://www.thedailybeast.com/articles/2015/12/30/how-isis-actually-lost-ramadi.html?source=twitter_page&amp;utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=5683a1c104d30108abd0cbf1&amp;utm_medium=trueAnthem&amp;utm_source=twitter&amp;via=twitter_page


Processing URLs:   4%|▍         | 40/1000 [04:31<37:08,  2.32s/it]

Error extracting text from http://www.nbcnews.com/news/us-news/significant-issues-remain-congress-works-avoid-government-shutdown-n479826: 403 Client Error: Forbidden for url: http://www.nbcnews.com/news/us-news/significant-issues-remain-congress-works-avoid-government-shutdown-n479826


Processing URLs:   4%|▍         | 41/1000 [04:33<36:47,  2.30s/it]

Error extracting text from http://www.frontpagemag.com/point/262732/eu-europeans-must-pay-280k-each-muslim-migrant-daniel-greenfield: 404 Client Error: Not Found for url: https://www.frontpagemag.com/point/262732/eu-europeans-must-pay-280k-each-muslim-migrant-daniel-greenfield


Processing URLs:   4%|▍         | 42/1000 [04:35<33:56,  2.13s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/iran-supports-oil-producer-pact-without-pledging-supply-curbs


Processing URLs:   4%|▍         | 44/1000 [04:35<19:50,  1.24s/it]

Error extracting text from https://www.yahoo.com/news/assad-vows-retake-syria-hours-truce-101254642.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/assad-vows-retake-syria-hours-truce-101254642.html


Processing URLs:   5%|▍         | 48/1000 [05:12<2:14:35,  8.48s/it]

Error extracting text from https://morethanwars.com/2016/02/08/from-plan-colombia-to-peace-colombia-will-the-us-remove-the-farcs-terrorism-status/: HTTPSConnectionPool(host='morethanwars.com', port=443): Max retries exceeded with url: /2016/02/08/from-plan-colombia-to-peace-colombia-will-the-us-remove-the-farcs-terrorism-status/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300db2c30>: Failed to resolve 'morethanwars.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 54/1000 [05:20<38:27,  2.44s/it]  

Error extracting text from http://www.newsweek.com/putin-cannot-afford-frozen-conflict-ukraine-421083: 403 Client Error: Forbidden for url: https://www.newsweek.com/putin-cannot-afford-frozen-conflict-ukraine-421083


Processing URLs:   6%|▋         | 63/1000 [05:36<15:51,  1.02s/it]

Error extracting text from https://www.yahoo.com/news/dr-congo-president-defies-growing-calls-resign-174446567.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/dr-congo-president-defies-growing-calls-resign-174446567.html


Processing URLs:   6%|▋         | 64/1000 [05:37<13:05,  1.19it/s]

Error extracting text from https://www.nytimes.com/reuters/2017/08/04/world/africa/04reuters-somalia-violence.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/08/04/world/africa/04reuters-somalia-violence.html


Processing URLs:   7%|▋         | 68/1000 [05:41<16:53,  1.09s/it]

Error extracting text from http://www.cissm.umd.edu/publications/iranian-public-opinion-nuclear-negotiations: 404 Client Error: Not Found for url: https://www.cissm.umd.edu/publications/iranian-public-opinion-nuclear-negotiations


Processing URLs:   7%|▋         | 69/1000 [05:41<14:31,  1.07it/s]

Error extracting text from https://www.wsj.com/video/more-money-is-flowing-into-green-energy-than-ever-before-heres-why/B227310D-5972-4B82-BD81-33E09FCFCECC.html: 403 Client Error: Forbidden for url: https://www.wsj.com/video/more-money-is-flowing-into-green-energy-than-ever-before-heres-why/B227310D-5972-4B82-BD81-33E09FCFCECC.html


Processing URLs:   7%|▋         | 71/1000 [05:43<13:38,  1.14it/s]

Error extracting text from https://wwwnc.cdc.gov/eid/article/12/1/05-1254_article: 403 Client Error: Forbidden for url: https://wwwnc.cdc.gov/eid/article/12/1/05-1254_article


Processing URLs:   8%|▊         | 79/1000 [06:01<16:08,  1.05s/it]

Error extracting text from https://www.reuters.com/technology/exclusive-us-give-ransomware-hacks-similar-priority-terrorism-official-says-2021-06-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/technology/exclusive-us-give-ransomware-hacks-similar-priority-terrorism-official-says-2021-06-03/


Processing URLs:   8%|▊         | 81/1000 [06:03<12:17,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-china-idUSKCN0Z01VH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-china-idUSKCN0Z01VH


Processing URLs:   8%|▊         | 82/1000 [06:03<10:51,  1.41it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-21/venezuela-reckoning-looms-as-barclays-says-time-may-have-run-out


Processing URLs:   8%|▊         | 85/1000 [06:06<12:23,  1.23it/s]

Error extracting text from https://www.reuters.com/article/us-russia-usa-nuclear/russia-says-it-is-fully-committed-to-nuclear-missile-pact-idUSKBN1E30HZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-nuclear/russia-says-it-is-fully-committed-to-nuclear-missile-pact-idUSKBN1E30HZ


Processing URLs:   9%|▊         | 87/1000 [07:06<3:17:39, 12.99s/it]

Error extracting text from https://vitalvegas.com/some-las-vegas-casinos-could-temporarily-close-again-due-to-covid-19-concerns/: HTTPSConnectionPool(host='www.casino.org', port=443): Max retries exceeded with url: /vitalvegas/some-las-vegas-casinos-could-temporarily-close-again-due-to-covid-19-concerns/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fefc4470>, 'Connection to www.casino.org timed out. (connect timeout=60)'))
Error extracting text from http://plus55.com/politics/2016/06/protests-michel-temer-34-cities: HTTPConnectionPool(host='plus55.com', port=80): Max retries exceeded with url: /politics/2016/06/protests-michel-temer-34-cities (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fefc7aa0>: Failed to resolve 'plus55.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 90/1000 [07:10<1:47:42,  7.10s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-18/cunha-delays-decision-on-brazil-impeachment-plea-lawmaker-says


Processing URLs:   9%|▉         | 93/1000 [07:16<1:09:18,  4.58s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0W21BS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0W21BS


Processing URLs:  10%|▉         | 96/1000 [07:33<1:22:39,  5.49s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148622/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3148622/


Processing URLs:  10%|▉         | 98/1000 [07:39<1:04:05,  4.26s/it]

Error extracting text from http://www.wsj.com/articles/china-completes-runway-on-artificial-island-in-south-china-sea-1443184818: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-completes-runway-on-artificial-island-in-south-china-sea-1443184818


Processing URLs:  10%|█         | 100/1000 [07:42<41:14,  2.75s/it] 

URL filtered: https://www.youtube.com/watch?v=qudNktYwRNo


Processing URLs:  10%|█         | 103/1000 [07:43<20:55,  1.40s/it]

Error extracting text from http://www.gov.scot/Resource/0050/00504615.pdf: 404 Client Error: Not Found for url: https://www.gov.scot/Resource/0050/00504615.pdf
Error extracting text from https://www.reuters.com/article/us-russia-politics-navalny-prison/navalny-emerges-in-jail-in-russias-vladimir-region-meets-lawyers-idUSKCN2AV2EQ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-politics-navalny-prison/navalny-emerges-in-jail-in-russias-vladimir-region-meets-lawyers-idUSKCN2AV2EQ?il=0


Processing URLs:  11%|█         | 107/1000 [07:49<20:14,  1.36s/it]



Processing URLs:  11%|█         | 109/1000 [07:52<21:00,  1.42s/it]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-iraq-kirkuk-20161026-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-iraq-kirkuk-20161026-snap-story.html


Processing URLs:  11%|█         | 111/1000 [07:59<36:42,  2.48s/it]

Error extracting text from http://news.trust.org/item/20160630125547-llttr: 404 Client Error:  for url: https://news.trust.org:443/item/20160630125547-llttr


Processing URLs:  11%|█         | 112/1000 [07:59<28:06,  1.90s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-06-15/fed-meeting-central-bank-needs-to-take-action-despite-its-framework?sref=x7nYEkiY


Processing URLs:  12%|█▏        | 115/1000 [08:03<25:11,  1.71s/it]

Error extracting text from http://www.water.ca.gov/sfmp/resources/Attachment_E_InfoGath_Appendices_A-H.pdf: 404 Client Error: Not Found for url: https://water.ca.gov/sfmp/resources/Attachment_E_InfoGath_Appendices_A-H.pdf


Processing URLs:  12%|█▏        | 116/1000 [08:04<23:17,  1.58s/it]

Error extracting text from https://www.rferl.org/a/putin-tillerson-friendship-fallen-in-with-wrong-crowd/28722554.html: 403 Client Error: Forbidden for url: https://www.rferl.org/a/putin-tillerson-friendship-fallen-in-with-wrong-crowd/28722554.html


Processing URLs:  12%|█▏        | 121/1000 [08:14<23:40,  1.62s/it]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-italy-idUSKCN0V42CK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-italy-idUSKCN0V42CK


Processing URLs:  12%|█▏        | 123/1000 [08:16<16:52,  1.15s/it]

Error extracting text from http://www.reuters.com/article/us-nato-russia-kaliningrad-idUSKCN0ZL0J7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-russia-kaliningrad-idUSKCN0ZL0J7


Processing URLs:  13%|█▎        | 127/1000 [08:25<20:54,  1.44s/it]

Error extracting text from http://www.balkans.com/open-news.php?uniquenumber=208036: 404 Client Error: Not Found for url: http://www.balkans.com/open-news.php?uniquenumber=208036
Error extracting text from https://www.france24.com/en/live-news/20220317-boe-agrees-third-straight-rate-hike-as-inflation-soars).: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20220317-boe-agrees-third-straight-rate-hike-as-inflation-soars).
URL filtered: https://twitter.com/adamparsons/status/1343483555750543362?s=21


Processing URLs:  13%|█▎        | 129/1000 [08:26<15:50,  1.09s/it]

URL filtered: https://www.youtube.com/watch?v=kcN9CDSDAE4


Processing URLs:  14%|█▎        | 137/1000 [08:37<17:56,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil/oil-trading-cautious-on-middle-east-tensions-rising-u-s-drilling-activity-idUSKBN1DD023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil/oil-trading-cautious-on-middle-east-tensions-rising-u-s-drilling-activity-idUSKBN1DD023


Processing URLs:  14%|█▍        | 139/1000 [08:38<11:02,  1.30it/s]

Error extracting text from http://www.vanguardngr.com/2016/03/agatu-genocide-benue-lawmakers-slam-buhari/“A: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/03/agatu-genocide-benue-lawmakers-slam-buhari/%E2%80%9CA


Processing URLs:  14%|█▍        | 143/1000 [08:47<30:45,  2.15s/it]

Error extracting text from http://ianmasters.com/sites/default/files/mp3/bbriefing_2016_05_17c_james%20paul.mp3: 404 Client Error: Not Found for url: https://www.backgroundbriefing.org/sites/default/files/mp3/bbriefing_2016_05_17c_james%20paul.mp3


Processing URLs:  15%|█▍        | 146/1000 [08:54<30:40,  2.16s/it]

Error extracting text from http://www.yenisafak.com/en/world/syrian-regime-faces-defeat-in-north-of-aleppo-2448557: 422 Client Error:  for url: http://www.yenisafak.com/en/world/syrian-regime-faces-defeat-in-north-of-aleppo-2448557


Processing URLs:  15%|█▍        | 147/1000 [08:56<25:45,  1.81s/it]

Error extracting text from http://www.aeltracker.org/bill-details/10723/california-2016-ab-1592: 403 Client Error: Forbidden for url: http://www.aeltracker.org/bill-details/10723/california-2016-ab-1592


Processing URLs:  15%|█▍        | 149/1000 [08:58<23:55,  1.69s/it]

Error extracting text from http://russia-insider.com: 503 Server Error: Service Unavailable for url: https://russia-insider.com/


Processing URLs:  15%|█▌        | 151/1000 [09:03<30:02,  2.12s/it]

Error extracting text from http://www.centralamericalink.com/en/News/Builders_want_to_double_cost_of_Canal_expansion/: 404 Client Error: Not Found for url: https://www.centralamericalink.com/en/News/Builders_want_to_double_cost_of_Canal_expansion/


Processing URLs:  16%|█▌        | 155/1000 [09:07<16:04,  1.14s/it]

Error extracting text from https://www.nytimes.com/2021/07/12/us/politics/us-diplomats-afghanistan-taliban.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/12/us/politics/us-diplomats-afghanistan-taliban.html
URL filtered: https://twitter.com/Zhivoderova/status/966280328087310337


Processing URLs:  16%|█▋        | 163/1000 [09:18<20:55,  1.50s/it]

URL filtered: https://www.youtube.com/watch?v=SjbPi00k_ME


Processing URLs:  16%|█▋        | 165/1000 [09:19<15:58,  1.15s/it]

Error extracting text from https://www.reuters.com/article/us-iran-tanker/iran-tells-south-korea-not-to-politicise-seized-vessel-demands-release-of-funds-idUSKBN29F0LF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-tanker/iran-tells-south-korea-not-to-politicise-seized-vessel-demands-release-of-funds-idUSKBN29F0LF
URL filtered: https://www.cnbc.com/2022/04/04/twitter-shares-soar-more-than-25percent-after-elon-musk-takes-9percent-stake-in-social-media-company.html


Processing URLs:  18%|█▊        | 175/1000 [09:27<11:05,  1.24it/s]

Error extracting text from http://www.roadandtrack.com/new-cars/car-technology/news/a31011/california-clears-fully-autonomous-cars-to-drive-on-public-roads/: 403 Client Error: Forbidden for url: http://www.roadandtrack.com/new-cars/car-technology/news/a31011/california-clears-fully-autonomous-cars-to-drive-on-public-roads/


Processing URLs:  18%|█▊        | 176/1000 [09:29<12:30,  1.10it/s]

URL filtered: https://twitter.com/Mij_Europe/status/1456520115814535186


Processing URLs:  18%|█▊        | 179/1000 [09:32<13:29,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-rebels-idUSKBN19B3A1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-rebels-idUSKBN19B3A1


Processing URLs:  18%|█▊        | 181/1000 [09:35<15:46,  1.16s/it]

Error extracting text from http://tass.ru/en/defense/867818: 404 Client Error: Not Found for url: https://tass.ru/en/defense/867818


Processing URLs:  18%|█▊        | 182/1000 [09:35<12:34,  1.08it/s]

Error extracting text from http://www.opec.org/opec_web/en/data_graphs/40.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/data_graphs/40.htm


Processing URLs:  19%|█▊        | 187/1000 [09:50<22:10,  1.64s/it]

Error extracting text from http://news.sky.com/story/1617197/exclusive-inside-is-terror-weapons-lab: 404 Client Error: Not Found for url: https://news.sky.com/story/1617197/exclusive-inside-is-terror-weapons-lab


Processing URLs:  19%|█▉        | 188/1000 [09:50<16:39,  1.23s/it]

Error extracting text from https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/01/world/asia/ban-ki-moon-president-south-korea.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  19%|█▉        | 189/1000 [10:50<4:13:33, 18.76s/it]

Error extracting text from http://www.edmunds.com/toyota/mirai/2016/long-term-road-test/introduction.html: HTTPConnectionPool(host='www.edmunds.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  19%|█▉        | 190/1000 [10:51<2:58:48, 13.24s/it]

Error extracting text from https://www.nytimes.com/2017/08/04/world/asia/vietnam-south-china-sea-repsol.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/04/world/asia/vietnam-south-china-sea-repsol.html


Processing URLs:  19%|█▉        | 193/1000 [10:54<1:10:53,  5.27s/it]

Error extracting text from http://aranews.net/2016/05/us-trained-1000-kurdish-peshmergas-join-battle-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/05/us-trained-1000-kurdish-peshmergas-join-battle-mosul/
URL filtered: https://www.voanews.com/a/trump-russia-facebook-/4040040.html


Processing URLs:  20%|█▉        | 196/1000 [10:57<37:37,  2.81s/it]  

Error extracting text from http://ir.teslamotors.com/releasedetail.cfm?ReleaseID=935005: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /releasedetail.cfm?ReleaseID=935005 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffd5f2f0>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  20%|██        | 202/1000 [11:01<11:53,  1.12it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/265555-quinnipiac-poll-sanders-surges-to-lead-in-iowa: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/265555-quinnipiac-poll-sanders-surges-to-lead-in-iowa/
URL filtered: http://www.bbc.co.uk/news/world-asia-37968090?ocid=socialflow_twitter
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN18R083: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN18R083


Processing URLs:  21%|██        | 207/1000 [11:08<13:02,  1.01it/s]

Error extracting text from http://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action: 404 Client Error: Not Found for url: https://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action
Error extracting text from http://www.worldbulletin.net/turkey/183028/borsa-istanbul-opens-higher: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/turkey/183028/borsa-istanbul-opens-higher


Processing URLs:  21%|██        | 208/1000 [11:09<11:45,  1.12it/s]

Error extracting text from https://www.reuters.com/business/energy/iran-wants-us-assurances-it-will-never-abandon-nuclear-deal-if-revived-2021-11-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/iran-wants-us-assurances-it-will-never-abandon-nuclear-deal-if-revived-2021-11-08/


Processing URLs:  21%|██        | 210/1000 [12:00<2:38:54, 12.07s/it]

Error extracting text from http://recode.net/2016/03/23/fueling-the-success-of-the-hydrogen-powered-toyota-mirai-is-a-balancing-act/: Exceeded 30 redirects.


Processing URLs:  22%|██▏       | 216/1000 [12:26<59:17,  4.54s/it]  

URL filtered: https://www.youtube.com/watch?v=dBgRhxRaEqg
URL filtered: https://www.youtube.com/watch?v=uqZZIDk2c_Y


Processing URLs:  22%|██▏       | 220/1000 [12:29<28:13,  2.17s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-30/leaky-locks-may-further-delay-5-3-billion-panama-canal-widening


Processing URLs:  22%|██▏       | 223/1000 [12:31<19:18,  1.49s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-24/yellen-says-fed-still-expects-to-increase-interest-rates-in-2015


Processing URLs:  23%|██▎       | 227/1000 [12:34<13:30,  1.05s/it]

Error extracting text from https://www.nytimes.com/2020/06/17/world/asia/india-china-border-clashes.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/06/17/world/asia/india-china-border-clashes.html


Processing URLs:  23%|██▎       | 229/1000 [12:38<18:06,  1.41s/it]

Error extracting text from http://www.reuters.com/article/oil-latam-prices-idUSL2N14Z2ZF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/oil-latam-prices-idUSL2N14Z2ZF


Processing URLs:  23%|██▎       | 231/1000 [12:41<17:35,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-equity-idUSKCN1161OV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-equity-idUSKCN1161OV?il=0


Processing URLs:  24%|██▎       | 237/1000 [12:53<21:31,  1.69s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-07/17/c_134419635.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-07/17/c_134419635.htm
URL filtered: http://www.bloomberg.com/news/articles/2016-01-27/goldman-sachs-s-leissner-said-to-move-to-los-angeles-take-leave


Processing URLs:  24%|██▍       | 241/1000 [13:02<22:44,  1.80s/it]

Error extracting text from https://www.reuters.com/markets/europe/fitch-cuts-russias-rating-says-debt-default-imminent-2022-03-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/markets/europe/fitch-cuts-russias-rating-says-debt-default-imminent-2022-03-08/


Processing URLs:  24%|██▍       | 245/1000 [13:15<35:14,  2.80s/it]

Error extracting text from http://interaksyon.com/article/125550/as-tensions-rise-us-reassessing-chinas-participation-in-rimpac-naval-drill: 404 Client Error: Not Found for url: https://interaksyon.philstar.com/article/125550/as-tensions-rise-us-reassessing-chinas-participation-in-rimpac-naval-drill


Processing URLs:  25%|██▍       | 246/1000 [13:18<38:02,  3.03s/it]

Error extracting text from http://inserbia.info/today/2015/10/montenegro-thousands-take-part-in-anti-govt-anti-nato-rally/: 404 Client Error: Not Found for url: https://inserbia.info/today/2015/10/montenegro-thousands-take-part-in-anti-govt-anti-nato-rally/


Processing URLs:  25%|██▌       | 250/1000 [13:39<45:44,  3.66s/it]  

Error extracting text from https://www.nytimes.com/2016/12/02/world/asia/afghanistan-security-terrorism-taliban.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/12/02/world/asia/afghanistan-security-terrorism-taliban.html?_r=0


Processing URLs:  25%|██▌       | 252/1000 [13:43<35:24,  2.84s/it]

Error extracting text from https://polcms.secure.europarl.europa.eu/cmsdata/114742/18%20AFCO%20with%20cover.pdf: 403 Client Error: Forbidden for url: https://telacms.europarl.europa.eu/cmsdata/114742/18%20AFCO%20with%20cover.pdf


Processing URLs:  25%|██▌       | 254/1000 [13:45<23:29,  1.89s/it]

Error extracting text from http://thehill.com/blogs/floor-action/scheduling/261322-this-week: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/scheduling/261322-this-week/


Processing URLs:  26%|██▌       | 258/1000 [14:15<46:59,  3.80s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-healthcare/white-house-says-rollback-of-obamacare-must-be-part-of-short-term-fix-idUSKBN1CO2N7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-healthcare/white-house-says-rollback-of-obamacare-must-be-part-of-short-term-fix-idUSKBN1CO2N7
URL filtered: http://www.bloomberg.com/news/articles/2016-03-10/leaders-in-biggest-brazil-party-said-to-push-to-abandon-rousseff


Processing URLs:  26%|██▋       | 265/1000 [14:27<20:03,  1.64s/it]

Error extracting text from http://promedmail.org/direct.php?id=20160811.4408853: 403 Client Error: Forbidden for url: http://promedmail.org/direct.php?id=20160811.4408853
Error extracting text from https://www.timesofisrael.com/hamas-pragmatism-in-gaza-handover-hints-at-a-new-future-for-the-strip/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/hamas-pragmatism-in-gaza-handover-hints-at-a-new-future-for-the-strip/
URL filtered: https://twitter.com/eucopresident/status/700794714254086146


Processing URLs:  27%|██▋       | 270/1000 [14:31<13:11,  1.08s/it]

Error extracting text from http://aranews.net/2016/02/syrian-democratic-forces-announce-liberation-of-main-isis-bastion-in-hasakah/: 404 Client Error: Not Found for url: http://aranews.net/2016/02/syrian-democratic-forces-announce-liberation-of-main-isis-bastion-in-hasakah/
URL filtered: https://twitter.com/1TVNewsAF/status/1417057103894695936


Processing URLs:  27%|██▋       | 273/1000 [15:32<3:00:46, 14.92s/it]

Error extracting text from http://en.kremlin.ru/acts/news/50805: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 274/1000 [15:32<2:16:14, 11.26s/it]

Error extracting text from https://www.fastcompany.com/90678306/whats-happening-with-evergrande-chinas-real-estate-debt-crisis-rattles-stock-markets: 403 Client Error: Forbidden for url: https://www.fastcompany.com/90678306/whats-happening-with-evergrande-chinas-real-estate-debt-crisis-rattles-stock-markets


Processing URLs:  28%|██▊       | 275/1000 [15:32<1:40:36,  8.33s/it]

Error extracting text from https://www.reuters.com/article/us-nicaragua-taiwan/taiwan-warships-drop-anchor-in-nicaragua-amid-sinking-ties-with-china-idUSKBN1HH179: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nicaragua-taiwan/taiwan-warships-drop-anchor-in-nicaragua-amid-sinking-ties-with-china-idUSKBN1HH179


Processing URLs:  28%|██▊       | 277/1000 [15:42<1:15:36,  6.27s/it]

Error extracting text from http://www.afro.who.int/en/nigeria/press-materials/item/8290-bill-gates-applauds-who-for-establishing-environmental-surveillance-in-nigeria.html: 404 Client Error: Not Found for url: https://www.afro.who.int/en/nigeria/press-materials/item/8290-bill-gates-applauds-who-for-establishing-environmental-surveillance-in-nigeria.html


Processing URLs:  29%|██▊       | 286/1000 [15:59<32:05,  2.70s/it]  

Error extracting text from http://www.nbcnews.com/storyline/iraq-turmoil/isis-occupiers-mosul-will: 403 Client Error: Forbidden for url: http://www.nbcnews.com/storyline/iraq-turmoil/isis-occupiers-mosul-will


Processing URLs:  29%|██▉       | 289/1000 [16:02<19:30,  1.65s/it]

Error extracting text from http://news.yahoo.com/project-fear-stalks-britains-eu-referendum-campaign-050007670.html: 404 Client Error: Not Found for url: http://news.yahoo.com/project-fear-stalks-britains-eu-referendum-campaign-050007670.html


Processing URLs:  29%|██▉       | 292/1000 [16:06<18:50,  1.60s/it]

Error extracting text from https://www.empr.com/home/news/molnupiravir-merck-ridgeback-oral-antiviral-investigational-covid-19-treatment/: 403 Client Error: Forbidden for url: https://www.empr.com/home/news/molnupiravir-merck-ridgeback-oral-antiviral-investigational-covid-19-treatment/


Processing URLs:  29%|██▉       | 294/1000 [16:18<37:09,  3.16s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN13A133: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN13A133


Processing URLs:  30%|██▉       | 295/1000 [16:18<27:39,  2.35s/it]

Error extracting text from https://www.usni.org/magazines/proceedings/2017-08/ship-collisions-address-underlying-causes-including-culture: 403 Client Error: Forbidden for url: https://www.usni.org/magazines/proceedings/2017-08/ship-collisions-address-underlying-causes-including-culture


Processing URLs:  30%|██▉       | 297/1000 [16:24<29:05,  2.48s/it]

Error extracting text from http://ict.usc.edu/pubs/Emotional%20Signaling%20in%20a%20Social%20Dilemma-an%20Automatic%20Analysis.pdf: HTTPSConnectionPool(host='ict.usc.edupubs', port=443): Max retries exceeded with url: /Emotional%20Signaling%20in%20a%20Social%20Dilemma-an%20Automatic%20Analysis.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30364c320>: Failed to resolve 'ict.usc.edupubs' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 299/1000 [16:26<18:05,  1.55s/it]

Error extracting text from http://english.aawsat.com/2016/04/article55349693/iraq-hanging-boiling-cauldron-washington-threatens-use-force: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/04/article55349693/iraq-hanging-boiling-cauldron-washington-threatens-use-force


Processing URLs:  30%|███       | 301/1000 [16:28<13:44,  1.18s/it]

Error extracting text from http://www.france24.com/en/20151206-author-brazils-last-presidential-impeachment-defends-rousseff: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151206-author-brazils-last-presidential-impeachment-defends-rousseff


Processing URLs:  30%|███       | 304/1000 [16:34<20:46,  1.79s/it]

Error extracting text from http://cco.ndu.edu/Portals/96/Documents/Articles/russia&#39;s%20renewed%20Military%20Thinking.pdf: 400 Client Error: Bad Request for url: http://cco.ndu.edu/Portals/96/Documents/Articles/russia&#39;s%20renewed%20Military%20Thinking.pdf


Processing URLs:  31%|███       | 307/1000 [16:39<16:03,  1.39s/it]

Error extracting text from http://www.faa.gov/uas: 403 Client Error: Forbidden for url: http://www.faa.gov/uas


Processing URLs:  31%|███       | 311/1000 [16:46<20:45,  1.81s/it]

URL filtered: https://twitter.com/zerohedge/status/930184137478164482


Processing URLs:  31%|███▏      | 314/1000 [16:48<11:45,  1.03s/it]

Error extracting text from https://defence-blog.com/news/russian-navy-weapon-test-ended-in-failure-after-cruise-missile-failed.html: 403 Client Error: Forbidden for url: https://defence-blog.com/news/russian-navy-weapon-test-ended-in-failure-after-cruise-missile-failed.html


Processing URLs:  32%|███▏      | 317/1000 [17:50<3:20:28, 17.61s/it]

Error extracting text from http://www.seattletimes.com/business/highway-bill-compromise-would-revive-export-import-bank/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  32%|███▏      | 322/1000 [17:59<47:28,  4.20s/it]  

Error extracting text from https://www.un.org/press/en/2017/sc12961.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2017/sc12961.doc.htm


Processing URLs:  32%|███▏      | 324/1000 [18:10<48:27,  4.30s/it]  

Error extracting text from http://www.arabnews.com/columns/news/855876: 403 Client Error: Forbidden for url: https://www.arabnews.com/columns/news/855876


Processing URLs:  32%|███▎      | 325/1000 [18:12<42:40,  3.79s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Wild_poliovirus_list_2010-2016_26JAN.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Wild_poliovirus_list_2010-2016_26JAN.pdf


Processing URLs:  33%|███▎      | 327/1000 [18:16<30:36,  2.73s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-04-28/obama-s-push-for-court-pick-fizzles-as-republicans-stand-firm


Processing URLs:  33%|███▎      | 329/1000 [18:17<19:55,  1.78s/it]

Error extracting text from http://turkishweekly.net/2016/01/20/comment/turkey-energy-infrastructure-forecast-2016-risks-and-opportunities/: 404 Client Error: Not Found for url: https://turkishweekly.net/2016/01/20/comment/turkey-energy-infrastructure-forecast-2016-risks-and-opportunities/


Processing URLs:  34%|███▍      | 342/1000 [18:42<16:56,  1.54s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160120/1305184260.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160120/1305184260.html


Processing URLs:  34%|███▍      | 343/1000 [18:44<17:55,  1.64s/it]

Error extracting text from http://www.nationalreview.com/article/452064/supreme-court-gerrymandering-case-wisconsin-partisan-politics-constitution: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/452064/supreme-court-gerrymandering-case-wisconsin-partisan-politics-constitution/


Processing URLs:  35%|███▍      | 349/1000 [18:56<18:08,  1.67s/it]

Error extracting text from https://www.ft.com/content/73772db1-caf6-3beb-a642-9be4061344be: 404 Client Error: Not Found for url: https://www.ft.com/content/73772db1-caf6-3beb-a642-9be4061344be


Processing URLs:  35%|███▌      | 353/1000 [19:10<35:56,  3.33s/it]

Error extracting text from http://ewn.co.za/2017/06/17/listen-what-it-will-take-to-impeach-president-jacob-zuma: 404 Client Error: Not Found for url: https://www.ewn.co.za/2017/06/17/listen-what-it-will-take-to-impeach-president-jacob-zuma


Processing URLs:  36%|███▌      | 356/1000 [19:25<54:41,  5.10s/it]

URL filtered: https://www.youtube.com/watch?v=pa-dGYjSq5k


Processing URLs:  36%|███▌      | 358/1000 [19:28<38:04,  3.56s/it]

Error extracting text from http://iadb.jid.org/el-consejo-de-delegados/estados-miemros/brazil: 403 Client Error: Forbidden for url: http://iadb.jid.org/el-consejo-de-delegados/estados-miemros/brazil


Processing URLs:  36%|███▌      | 359/1000 [19:29<30:30,  2.86s/it]

Error extracting text from https://www.wsj.com/amp/articles/homes-price-rise-is-not-explained-by-duties-11618854766: 403 Client Error: Forbidden for url: https://www.wsj.com/amp/articles/homes-price-rise-is-not-explained-by-duties-11618854766


Processing URLs:  36%|███▌      | 360/1000 [19:30<24:04,  2.26s/it]

Error extracting text from http://www.reuters.com/article/us-iran-oil-opec-idUSKBN16L1DT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-opec-idUSKBN16L1DT


Processing URLs:  36%|███▌      | 361/1000 [19:30<18:42,  1.76s/it]

Error extracting text from https://www.investors.com/news/bitcoin-etfs-sec-approval-cryptocurrency-funds/: 403 Client Error: Forbidden for url: https://www.investors.com/news/bitcoin-etfs-sec-approval-cryptocurrency-funds/


Processing URLs:  36%|███▌      | 362/1000 [19:31<15:17,  1.44s/it]

Error extracting text from http://www.straitstimes.com/world/middle-east/hopes-for-palestinian-unity-delayed: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  37%|███▋      | 366/1000 [19:37<18:07,  1.72s/it]

Error extracting text from http://reut.rs/1QcFyHD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-kurds-idUSKCN0UU0AO


Processing URLs:  37%|███▋      | 367/1000 [19:37<14:46,  1.40s/it]

Error extracting text from https://www.hindustantimes.com/world-news/myanmar-military-using-battlefield-weapons-against-protesters-says-amnesty-101615453532215.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/myanmar-military-using-battlefield-weapons-against-protesters-says-amnesty-101615453532215.html


Processing URLs:  37%|███▋      | 370/1000 [19:40<10:19,  1.02it/s]

Error extracting text from http://www.brainyquote.com/quotes/authors/c/carl_von_clausewitz.html#qfvBmHcbPrQ2JJtl.99: 403 Client Error: Forbidden for url: https://www.brainyquote.com/quotes/authors/c/carl_von_clausewitz.html#qfvBmHcbPrQ2JJtl.99
URL filtered: https://www.bloomberg.com/news/articles/2021-06-03/united-bets-on-supersonic-future-with-3-billion-boom-jet-order


Processing URLs:  38%|███▊      | 375/1000 [19:44<08:18,  1.25it/s]

Error extracting text from http://www.wsj.com/articles/consumer-confidence-climbs-in-october-1445004887: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/consumer-confidence-climbs-in-october-1445004887


Processing URLs:  38%|███▊      | 377/1000 [19:47<10:12,  1.02it/s]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKBN0UK0V520160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  38%|███▊      | 383/1000 [20:05<27:22,  2.66s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKCN1190XG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKCN1190XG


Processing URLs:  39%|███▉      | 390/1000 [20:11<06:58,  1.46it/s]

Error extracting text from https://www.france24.com/en/middle-east/20210803-israel-can-act-alone-against-iran-after-ship-attack-pm-says: 403 Client Error: Forbidden for url: https://www.france24.com/en/middle-east/20210803-israel-can-act-alone-against-iran-after-ship-attack-pm-says
URL filtered: http://www.bloomberg.com/news/articles/2016-03-22/brazil-grants-states-debt-relief-as-impeachment-gains-traction
Error extracting text from http://www.reuters.com/article/idUSKCN0RP24Q20150925: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0RP24Q20150925


Processing URLs:  39%|███▉      | 393/1000 [20:13<07:27,  1.36it/s]

Error extracting text from https://www.axios.com/another-potential-mueller-honey-pot-spicers-notebooks-2487861790.html: 403 Client Error: Forbidden for url: https://www.axios.com/another-potential-mueller-honey-pot-spicers-notebooks-2487861790.html


Processing URLs:  40%|███▉      | 395/1000 [20:21<24:27,  2.43s/it]

Error extracting text from http://www.iwar.org.uk/news-archive/tia/futuremap-program.htm: 404 Client Error: Not Found for url: https://iwar.org.uk/news-archive/tia/futuremap-program.htm


Processing URLs:  40%|████      | 402/1000 [20:42<31:24,  3.15s/it]

Error extracting text from http://www.globalresearch.ca/obama-quietly-signs-propaganda...fake-news-eu.../5565373: 404 Client Error: Not Found for url: https://www.globalresearch.ca/obama-quietly-signs-propaganda...fake-news-eu.../5565373


Processing URLs:  40%|████      | 404/1000 [20:43<17:23,  1.75s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/09/21/Correspondent-Iraqi-parliament-votes-to-sack-finance-minister-Hoshyar-Zebari-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/09/21/Correspondent-Iraqi-parliament-votes-to-sack-finance-minister-Hoshyar-Zebari-.html
URL filtered: https://www.youtube.com/watch?v=RiRU2lHaRq4


Processing URLs:  41%|████      | 409/1000 [20:46<09:41,  1.02it/s]

Error extracting text from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.587.9776&amp;rep=rep1&amp;type=pdf: 401 Client Error: Unauthorized for url: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.587.9776&amp;rep=rep1&amp;type=pdf


Processing URLs:  41%|████      | 411/1000 [20:52<17:17,  1.76s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-charles-koch-donald-trump-20160731-story.html?ct_digitalads_politics_content-promotion_outbrain_________: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-charles-koch-donald-trump-20160731-story.html?ct_digitalads_politics_content-promotion_outbrain_________


Processing URLs:  42%|████▏     | 415/1000 [21:20<1:10:53,  7.27s/it]

Error extracting text from https://www.thebalance.com/u-s-debt-ceiling-why-it-matters-past-crises-3305868: 406 Client Error: Not Acceptable for url: https://www.thebalancemoney.com:443/u-s-debt-ceiling-why-it-matters-past-crises-3305868


Processing URLs:  42%|████▏     | 417/1000 [21:21<37:09,  3.82s/it]  

Error extracting text from http://www.reuters.com/article/us-france-navy-china-idUSKBN16O0QK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-navy-china-idUSKBN16O0QK


Processing URLs:  42%|████▏     | 423/1000 [21:46<26:49,  2.79s/it]  

Error extracting text from https://theconversation.com/spain-steels-itself-for-another-election-after-months-with-no-government-60675: 403 Client Error: Forbidden for url: https://theconversation.com/spain-steels-itself-for-another-election-after-months-with-no-government-60675


Processing URLs:  43%|████▎     | 432/1000 [21:59<14:49,  1.57s/it]

Error extracting text from https://www.wsj.com/articles/biden-says-mob-that-stormed-capitol-were-domestic-terrorists-11610046962: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/biden-says-mob-that-stormed-capitol-were-domestic-terrorists-11610046962
URL filtered: https://www.bloomberg.com/news/articles/2021-05-17/nyc-marathon-will-come-back-in-november-with-33-000-runners?srnd=premium&amp;sref=i2Bc5OtW


Processing URLs:  44%|████▎     | 437/1000 [22:03<09:14,  1.02it/s]

Error extracting text from http://new.time.com/4390090/istanbul-attack-russian-isis-militants/: HTTPConnectionPool(host='new.time.com', port=80): Max retries exceeded with url: /4390090/istanbul-attack-russian-isis-militants/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3009378c0>: Failed to resolve 'new.time.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▍     | 438/1000 [22:03<08:22,  1.12it/s]

Error extracting text from http://pressroom.toyota.com/article_display.cfm?article_id=5692: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/


Processing URLs:  44%|████▍     | 439/1000 [22:10<23:17,  2.49s/it]

Error extracting text from http://www.standwithnigeria.org/wp-content/uploads/2016/06/NIgeria-Fractured-and-Forgotten.pdf: 404 Client Error: Not Found for url: https://www.standwithnigeria.org/wp-content/uploads/2016/06/NIgeria-Fractured-and-Forgotten.pdf


Processing URLs:  44%|████▍     | 442/1000 [22:15<16:51,  1.81s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/today-headlines/2016-08/05/content_7192991.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/today-headlines/2016-08/05/content_7192991.htm


Processing URLs:  44%|████▍     | 443/1000 [22:16<15:40,  1.69s/it]

Error extracting text from http://europe.newsweek.com/beating-isis-caliphate-terrorism-jihadism-508559?utm_source=email&amp;utm_medium=newsletter&amp;utm_campaign=newsletter&amp;utm_content=read_more&amp;spMailingID=845588&amp;spUserID=MTI0NzM2MjM0NzYS1&amp;spJobID=650161361&amp;spReportId=NjUwMTYxMzYxS0: 403 Client Error: Forbidden for url: https://www.newsweek.com/2016/10/21/beating-isis-caliphate-terrorism-jihadism-508559.html


Processing URLs:  45%|████▍     | 446/1000 [22:20<11:40,  1.26s/it]

Error extracting text from http://in.rbth.com/economics/finance/2016/11/24/brics-bank-to-finance-indian-chinese-infrastructure-projects_650693: HTTPConnectionPool(host='in.rbth.com', port=80): Max retries exceeded with url: /economics/finance/2016/11/24/brics-bank-to-finance-indian-chinese-infrastructure-projects_650693 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3026e8560>: Failed to resolve 'in.rbth.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  45%|████▍     | 448/1000 [22:20<06:49,  1.35it/s]

Error extracting text from http://en.rfi.fr/africa/20160812-burundi-provides-no-answers-un-committee-against-torture: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/africa/20160812-burundi-provides-no-answers-un-committee-against-torture
Error extracting text from http://www.nato.int/cps/en/natohq/opinions_125211.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_125211.htm


Processing URLs:  45%|████▌     | 452/1000 [22:24<06:37,  1.38it/s]

Error extracting text from https://www.nytimes.com/2018/02/19/world/middleeast/iran-syria-israel.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/19/world/middleeast/iran-syria-israel.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  45%|████▌     | 453/1000 [22:26<08:47,  1.04it/s]

Error extracting text from http://russia-insider.com/en/politics/russias-had-enough-no-more-business-usual-us-eu/ri12548: 503 Server Error: Service Unavailable for url: https://russia-insider.com/en/politics/russias-had-enough-no-more-business-usual-us-eu/ri12548


Processing URLs:  46%|████▌     | 455/1000 [22:27<07:17,  1.24it/s]

Error extracting text from http://www.ultimasnoticias.com.ve/noticias/actualidad/economia/pdvsa-cancela-intereses-a-tenedores-de-los-bonos-2.aspx: 403 Client Error: Forbidden for url: http://www.ultimasnoticias.com.ve/noticias/actualidad/economia/pdvsa-cancela-intereses-a-tenedores-de-los-bonos-2.aspx
Error extracting text from http://www.khaama.com/afghan-parliament-rejects-president-ghanis-decree-on-electoral-reforms-01242: 403 Client Error: Forbidden for url: http://www.khaama.com/afghan-parliament-rejects-president-ghanis-decree-on-electoral-reforms-01242


Processing URLs:  46%|████▌     | 456/1000 [23:28<2:50:17, 18.78s/it]

Error extracting text from http://aa.com.tr/en/economy/turkey-to-start-tender-for-3rd-nuke-plant-in-2016-17/492663: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  46%|████▌     | 459/1000 [23:35<1:07:26,  7.48s/it]

Error extracting text from http://www.abc.net.au/news/2018-02-13/jacob-zuma-given-48-hours-to-quit-as-south-africa-president/9425668: 500 Server Error: Internal Server Error for url: https://www.abc.net.au/news/2018-02-13/jacob-zuma-given-48-hours-to-quit-as-south-africa-president/9425668
Error extracting text from https://www.opensecrets.org/states/cands.php?cycle=2016&amp;state=NC: 403 Client Error: Forbidden for url: https://www.opensecrets.org/states/cands.php?cycle=2016&amp;state=NC


Processing URLs:  46%|████▌     | 461/1000 [23:35<34:04,  3.79s/it]  

Error extracting text from http://www.reuters.com/article/myanmar-land-idUSL8N1622WY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/myanmar-land-idUSL8N1622WY


Processing URLs:  47%|████▋     | 467/1000 [24:07<1:09:32,  7.83s/it]

Error extracting text from http://www.vinereport.com/article/game.of.thrones.news.the.winds.of.winter.release.will.be.in.the.middle.of.the.shows.season.six.reports.say/6108.htm: 522 Server Error:  for url: https://www.vinereport.com/article/game.of.thrones.news.the.winds.of.winter.release.will.be.in.the.middle.of.the.shows.season.six.reports.say/6108.htm


Processing URLs:  47%|████▋     | 468/1000 [24:08<51:46,  5.84s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-12-23/putin-praises-u-s-response-to-security-proposals-as-positive?srnd=premium


Processing URLs:  48%|████▊     | 475/1000 [24:19<19:51,  2.27s/it]

Error extracting text from http://www.conservativehome.com/thetorydiary/2016/06/vote-leave-versus-stronger-in-how-the-referendum-campaigns-ground-operations-measure-up.html: 403 Client Error: Forbidden for url: http://conservativehome.com/thetorydiary/2016/06/vote-leave-versus-stronger-in-how-the-referendum-campaigns-ground-operations-measure-up.html


Processing URLs:  48%|████▊     | 477/1000 [24:23<18:00,  2.07s/it]

Error extracting text from http://www.gov.me/en/News/158529/Podgorica-Montenegro-7-March-2016-Montenegro-s-Nato-Membership-Council-chaired-by-President-of-the-Council-and-Prime-Minister-Mi.html: 404 Client Error: not found for url: https://www.gov.me/en/News/158529/Podgorica-Montenegro-7-March-2016-Montenegro-s-Nato-Membership-Council-chaired-by-President-of-the-Council-and-Prime-Minister-Mi.html


Processing URLs:  48%|████▊     | 478/1000 [24:24<14:58,  1.72s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-09/02/c_134581785.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-09/02/c_134581785.htm


Processing URLs:  48%|████▊     | 479/1000 [24:24<11:08,  1.28s/it]

Error extracting text from https://www.nytimes.com/2017/04/18/world/asia/north-korea-missile-program-sabotage.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/18/world/asia/north-korea-missile-program-sabotage.html?_r=0


Processing URLs:  48%|████▊     | 480/1000 [24:25<09:17,  1.07s/it]

Error extracting text from http://warontherocks.com/2016/05/levantine-labyrinth-preparing-for-subterranean-warfare-in-iraq-and-syria/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/05/levantine-labyrinth-preparing-for-subterranean-warfare-in-iraq-and-syria/


Processing URLs:  48%|████▊     | 483/1000 [24:27<05:56,  1.45it/s]

Error extracting text from http://thehill.com/policy/national-security/312119-us-announces-sanctions-on-russia: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/312119-us-announces-sanctions-on-russia/
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN18J02D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN18J02D


Processing URLs:  48%|████▊     | 484/1000 [24:28<07:31,  1.14it/s]

Error extracting text from https://www.popsugar.co.uk/celebrity/Why-Does-Sansa-Have-Littlefinger-Killed-43943804?utm_medium=redirect&amp;utm_campaign=US:GB&amp;utm_source=www.google.co.uk: 410 Client Error: Gone for url: https://www.popsugar.co.uk/celebrity/Why-Does-Sansa-Have-Littlefinger-Killed-43943804?utm_medium=redirect&amp;utm_campaign=US:GB&amp;utm_source=www.google.co.uk


Processing URLs:  48%|████▊     | 485/1000 [24:30<11:12,  1.31s/it]

Error extracting text from http://www.stripes.com/news/undeterred-by-china-navy-ops-continue-in-south-china-sea-1.384138: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/undeterred-by-china-navy-ops-continue-in-south-china-sea-1.384138


Processing URLs:  49%|████▊     | 487/1000 [24:33<09:32,  1.12s/it]

Error extracting text from http://eng.mod.gov.cn/MilitaryExercises/index.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/MilitaryExercises/index.htm


Processing URLs:  49%|████▉     | 490/1000 [25:35<2:39:17, 18.74s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2021-04-26/americans-are-ok-with-bidens-spending-plans-poll: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  50%|████▉     | 495/1000 [25:42<38:24,  4.56s/it]  

URL filtered: http://www.npr.org/sections/thetwo-way/2015/09/03/437322607/speaker-of-iran-s-parliament-suggests-prisoner-swap-for-rezaian-other-americans?utm_source=twitter.com&amp;utm_campaign=npr&amp;utm_medium=social&amp;utm_term=nprnews


Processing URLs:  50%|████▉     | 498/1000 [25:44<17:39,  2.11s/it]

Error extracting text from https://www.geekwire.com/2020/analysis-read-antitrust-case-amazon-key-takeaways/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2020/analysis-read-antitrust-case-amazon-key-takeaways/


Processing URLs:  50%|█████     | 500/1000 [25:47<15:56,  1.91s/it]

Error extracting text from http://english.tse.jus.br/arquivos/federal-constitution: HTTPConnectionPool(host='english.tse.jus.br', port=80): Max retries exceeded with url: /arquivos/federal-constitution (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffc6f050>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  50%|█████     | 501/1000 [25:48<13:11,  1.59s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/04/07/44/0301000000AEN20160407008800315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  50%|█████     | 502/1000 [25:48<10:03,  1.21s/it]

Error extracting text from https://www.thestreet.com/markets/stocks-slip-lower-as-markets-re-set-after-hawkish-fed-rate-hike).: 403 Client Error: Forbidden for url: https://www.thestreet.com/markets/stocks-slip-lower-as-markets-re-set-after-hawkish-fed-rate-hike).


Processing URLs:  50%|█████     | 503/1000 [25:51<14:23,  1.74s/it]

Error extracting text from http://vestnikkavkaza.net/news/Egypt-s-Hurghada-Airport-to-reserve-terminal-for-Russian-tourists.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/Egypt-s-Hurghada-Airport-to-reserve-terminal-for-Russian-tourists.html


Processing URLs:  50%|█████     | 505/1000 [25:53<09:45,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-oil-venezuela-naphtha-exclusive-idUSKCN0UW1LE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-venezuela-naphtha-exclusive-idUSKCN0UW1LE


Processing URLs:  51%|█████     | 508/1000 [25:54<06:05,  1.35it/s]

Error extracting text from https://www.yahoo.com/gma/7-iranian-boats-harass-navy-ship-gulf-183806915--abc-news-topstories.html: 404 Client Error: Not Found for url: https://www.yahoo.com/gma/7-iranian-boats-harass-navy-ship-gulf-183806915--abc-news-topstories.html
Error extracting text from http://blogs.wsj.com/washwire/2015/10/23/why-the-idea-of-no-fly-zones-in-syria-should-be-grounded/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/10/23/why-the-idea-of-no-fly-zones-in-syria-should-be-grounded/


Processing URLs:  51%|█████     | 510/1000 [27:55<3:49:12, 28.07s/it]

Error extracting text from https://www.yang2020.com/policies/single-payer-healthcare/: HTTPSConnectionPool(host='www.yang2020.com', port=443): Max retries exceeded with url: /policies/single-payer-healthcare/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ffc6cbf0>, 'Connection to www.yang2020.com timed out. (connect timeout=60)'))


Processing URLs:  51%|█████     | 511/1000 [27:56<2:54:42, 21.44s/it]

Error extracting text from http://enenews.com/govt-report-plutonium-detected-california-air-samples-fallout-fukushima-nuclear-accident-be-blame: 404 Client Error: Not Found for url: http://enenews.com/govt-report-plutonium-detected-california-air-samples-fallout-fukushima-nuclear-accident-be-blame


Processing URLs:  52%|█████▏    | 516/1000 [28:00<38:00,  4.71s/it]  

Error extracting text from http://www.wsj.com/articles/can-i-get-that-with-extra-gmo-1461710531: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/can-i-get-that-with-extra-gmo-1461710531


Processing URLs:  52%|█████▏    | 519/1000 [28:02<15:56,  1.99s/it]

Error extracting text from https://www.opec.org/opec_web/en/publications/338.htm: 403 Client Error: Forbidden for url: https://www.opec.org/opec_web/en/publications/338.htm


Processing URLs:  52%|█████▏    | 524/1000 [28:20<26:31,  3.34s/it]

Error extracting text from http://www.reuters.com/article/2015/10/08/us-brazil-rousseff-impeachment-idUSKCN0S22CA20151008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/08/us-brazil-rousseff-impeachment-idUSKCN0S22CA20151008


Processing URLs:  53%|█████▎    | 527/1000 [28:24<16:12,  2.06s/it]

Error extracting text from https://nicholaswade.medium.com/origin-of-covid-following-the-clues-6f03564c038: 403 Client Error: Forbidden for url: https://nicholaswade.medium.com/origin-of-covid-following-the-clues-6f03564c038


Processing URLs:  53%|█████▎    | 529/1000 [28:29<16:45,  2.13s/it]

Error extracting text from http://www.embrussia.ru/node/597: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  53%|█████▎    | 534/1000 [28:46<18:52,  2.43s/it]

Error extracting text from https://www.nytimes.com/2021/02/27/nyregion/cuomo-charlotte-bennett-sexual-harassment.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/27/nyregion/cuomo-charlotte-bennett-sexual-harassment.html


Processing URLs:  54%|█████▎    | 536/1000 [28:55<24:30,  3.17s/it]

Error extracting text from https://browse.digitalglobe.com/imagefinder/main.jsp: HTTPSConnectionPool(host='browse.digitalglobe.com', port=443): Max retries exceeded with url: /imagefinder/main.jsp (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  54%|█████▍    | 538/1000 [28:57<13:58,  1.82s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/23/gitrep-22mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/23/gitrep-22mar16pm/
Error extracting text from http://www.latimes.com/world/asia/la-fg-south-china-sea-ruling-20160712-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/asia/la-fg-south-china-sea-ruling-20160712-snap-story.html


Processing URLs:  54%|█████▍    | 540/1000 [28:58<09:40,  1.26s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-bond-default-probable-fitch-downgrade-says-1504129573: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-bond-default-probable-fitch-downgrade-says-1504129573


Processing URLs:  54%|█████▍    | 542/1000 [29:00<07:09,  1.07it/s]

Error extracting text from http://www.cnbc.com/2015/09/27/financial-times-saudi-arabia-withdraws-overseas-funds.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/09/27/financial-times-saudi-arabia-withdraws-overseas-funds.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-radiation-idUSKCN0VQ22F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-radiation-idUSKCN0VQ22F


Processing URLs:  54%|█████▍    | 543/1000 [29:03<13:55,  1.83s/it]

URL filtered: https://twitter.com/hxhassan/status/787801615071752193


Processing URLs:  55%|█████▍    | 547/1000 [29:06<08:01,  1.06s/it]

Error extracting text from http://blog.dilbert.com/post/137375194651/the-biggest-trump-story-that-you-missed-master: HTTPConnectionPool(host='blog.dilbert.com', port=80): Max retries exceeded with url: /post/137375194651/the-biggest-trump-story-that-you-missed-master (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303988fe0>: Failed to resolve 'blog.dilbert.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▍    | 549/1000 [29:08<07:03,  1.07it/s]

Error extracting text from https://www.thalesgroup.com/sites/default/files/asset/document/thales-cyber-security-for-scada-systems.pdf: 404 Client Error: Not Found for url: https://www.thalesgroup.com/sites/default/files/asset/document/thales-cyber-security-for-scada-systems.pdf


Processing URLs:  55%|█████▌    | 552/1000 [29:12<07:48,  1.05s/it]

Error extracting text from http://www.imdb.com/title/tt1860359/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt1860359/


Processing URLs:  56%|█████▌    | 559/1000 [29:28<17:46,  2.42s/it]



Processing URLs:  56%|█████▌    | 560/1000 [29:29<15:10,  2.07s/it]

Error extracting text from http://www.gametheory.net/dictionary/CooperativeGame.html: 406 Client Error: Not Acceptable for url: http://www.gametheory.net/dictionary/CooperativeGame.html


Processing URLs:  57%|█████▋    | 569/1000 [29:49<23:31,  3.28s/it]

Error extracting text from http://www.israelhayom.com/site/newsletter_article.php?id=34339: 403 Client Error: Forbidden for url: https://www.israelhayom.com/site/newsletter_article.php?id=34339


Processing URLs:  57%|█████▋    | 572/1000 [29:54<15:32,  2.18s/it]

Error extracting text from https://www.worldoil.com/news/2021/3/19/maduro-enlists-foreign-oil-companies-to-help-end-us-sanctions: 404 Client Error: Not Found for url: https://www.worldoil.com/news/2021/3/19/maduro-enlists-foreign-oil-companies-to-help-end-us-sanctions


Processing URLs:  57%|█████▋    | 574/1000 [29:59<15:35,  2.19s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasts-cases.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasts-cases.html


Processing URLs:  57%|█████▊    | 575/1000 [30:02<16:55,  2.39s/it]

Error extracting text from https://www.thenationalnews.com/uae/expo-2020/2021/10/11/dubai-latest-updates-tickets-jobs-visit-passport/).: 404 Client Error: Not Found for url: https://www.thenationalnews.com/uae/expo-2020/2021/10/11/dubai-latest-updates-tickets-jobs-visit-passport/)./


Processing URLs:  58%|█████▊    | 577/1000 [30:03<10:12,  1.45s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_49212.htm#: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49212.htm


Processing URLs:  58%|█████▊    | 579/1000 [30:04<06:22,  1.10it/s]

Error extracting text from https://www.yahoo.com/news/m/439cd7d6-e175-3602-8322-f892c134f0d2/china-and-russia-are.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/m/439cd7d6-e175-3602-8322-f892c134f0d2/china-and-russia-are.html


Processing URLs:  58%|█████▊    | 580/1000 [30:07<11:08,  1.59s/it]

Error extracting text from http://38north.org/2012/04/tongchang041012/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  58%|█████▊    | 582/1000 [30:12<14:10,  2.03s/it]

Error extracting text from http://www.hellenicshippingnews.com/uncertain-opening-date-of-panama-canal-expansion/: 404 Client Error: Not Found for url: https://www.hellenicshippingnews.com/uncertain-opening-date-of-panama-canal-expansion/


Processing URLs:  59%|█████▊    | 587/1000 [30:22<11:28,  1.67s/it]

Error extracting text from https://www.us-cert.gov/ncas/alerts/TA17-318A: 403 Client Error: Forbidden for url: https://www.us-cert.gov/ncas/alerts/TA17-318A
URL filtered: https://www.youtube.com/watch?v=Hf3jf0yUvCs


Processing URLs:  59%|█████▉    | 591/1000 [30:25<07:34,  1.11s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/12/05/world/second-russia-jet-crashes-failed-carrier-landing-near-syria/#.WEWXU-ArLnA: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/12/05/world/second-russia-jet-crashes-failed-carrier-landing-near-syria/#.WEWXU-ArLnA


Processing URLs:  59%|█████▉    | 594/1000 [30:30<08:28,  1.25s/it]

Error extracting text from http://www.nrttv.com/en/Details.aspx?Jimare=8495: 403 Client Error: Forbidden for url: https://www.nrttv.com/en/Details.aspx?Jimare=8495


Processing URLs:  60%|█████▉    | 596/1000 [30:33<09:37,  1.43s/it]

Error extracting text from http://www.theepochtimes.com/n3/2031862-xi-jinping-reins-in-chinas-politburo-with-10-commandments/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2031862-xi-jinping-reins-in-chinas-politburo-with-10-commandments/
Error extracting text from http://fusion.net/story/108985/nicaraguas-interest-in-russian-fighter-jets-could-trigger-the-stupidest-arms-race-ever/: HTTPConnectionPool(host='fusion.net', port=80): Max retries exceeded with url: /story/108985/nicaraguas-interest-in-russian-fighter-jets-could-trigger-the-stupidest-arms-race-ever/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30398a0f0>: Failed to resolve 'fusion.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  60%|██████    | 600/1000 [30:37<07:11,  1.08s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/0001418091/000110465922045641/tm2212748d1_sc13da.htm#ex-b_001: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/0001418091/000110465922045641/tm2212748d1_sc13da.htm#ex-b_001
Error extracting text from http://jakartaglobe.id/business/rcep-negotiations-conclude-2017-trade-minister/: 403 Client Error: Forbidden for url: https://jakartaglobe.id/business/rcep-negotiations-conclude-2017-trade-minister/


Processing URLs:  60%|██████    | 602/1000 [30:41<09:32,  1.44s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/kerry-condemns-hospital-attack-aleppo-working-truce-38849634: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/kerry-condemns-hospital-attack-aleppo-working-truce-38849634


Processing URLs:  60%|██████    | 603/1000 [30:43<10:13,  1.55s/it]

Error extracting text from http://blogs.spectator.co.uk/2016/02/why-are-amnesty-keeping-secret-the-details-of-their-plan-to-bring-in-more-migrants/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/02/why-are-amnesty-keeping-secret-the-details-of-their-plan-to-bring-in-more-migrants/


Processing URLs:  60%|██████    | 604/1000 [30:43<07:49,  1.19s/it]

Error extracting text from https://www.nytimes.com/2017/01/14/science/spacex-falcon-9-iridium-elon-musk.html?emc=edit_th_20170115&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/14/science/spacex-falcon-9-iridium-elon-musk.html?emc=edit_th_20170115&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  61%|██████    | 607/1000 [30:48<10:13,  1.56s/it]

Error extracting text from https://www.aljazeera.com/news/2022/1/11/new-email-piles-more-pressure-on-uk-pm-johnson-over-lockdown-parties,: 404 Client Error: Not Found for url: https://www.aljazeera.com/news/2022/1/11/new-email-piles-more-pressure-on-uk-pm-johnson-over-lockdown-parties,


Processing URLs:  61%|██████    | 608/1000 [30:50<10:18,  1.58s/it]

Error extracting text from https://www.pebble.com/new: 404 Client Error: Not Found for url: https://www.fitbit.com:443/new


Processing URLs:  61%|██████    | 609/1000 [30:50<07:55,  1.22s/it]

Error extracting text from http://www.scotsman.com/news/general-election-2017-can-snp-secure-indyref2-majority-1-4422569: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/general-election-2017-can-snp-secure-indyref2-majority-1-4422569


Processing URLs:  61%|██████▏   | 613/1000 [31:06<21:01,  3.26s/it]

Error extracting text from https://www.nytimes.com/2018/01/17/sports/north-south-korea-olympics-hockey.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/17/sports/north-south-korea-olympics-hockey.html


Processing URLs:  62%|██████▏   | 616/1000 [31:13<16:45,  2.62s/it]

Error extracting text from http://www.yenisafak.com/ekonomi/rusyada-egitime-devam-2386542: 422 Client Error:  for url: http://www.yenisafak.com/ekonomi/rusyada-egitime-devam-2386542
URL filtered: https://www.bloomberg.com/news/articles/2018-02-09/inside-the-abrupt-end-of-silicon-valley-s-biggest-trial


Processing URLs:  62%|██████▏   | 618/1000 [31:13<09:18,  1.46s/it]

Error extracting text from http://www.arabnews.com/node/1248371: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1248371


Processing URLs:  62%|██████▏   | 620/1000 [31:14<05:46,  1.10it/s]

Error extracting text from https://financialpost.com/pmn/press-releases-pmn/business-wire-news-releases-pmn/peru-holds-a-ribbon-cutting-ceremony-at-its-pavilion-at-dubai-expo-and-wins-4-world-travel-awards: 403 Client Error: Forbidden for url: https://financialpost.com/pmn/press-releases-pmn/business-wire-news-releases-pmn/peru-holds-a-ribbon-cutting-ceremony-at-its-pavilion-at-dubai-expo-and-wins-4-world-travel-awards
Error extracting text from http://www.reuters.com/article/us-nigeria-oil-idUSKCN0WZ0SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-oil-idUSKCN0WZ0SB


Processing URLs:  62%|██████▏   | 622/1000 [31:15<04:58,  1.27it/s]

Error extracting text from http://www.reuters.com/article/2015/10/03/russia-opec-oil-idUSL5N1230ET20151003: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/03/russia-opec-oil-idUSL5N1230ET20151003


Processing URLs:  62%|██████▏   | 623/1000 [31:16<04:04,  1.54it/s]

Error extracting text from http://www.wsj.com/articles/eu-keeps-stronger-russia-sanctions-in-reserve-at-syria-talks-1476884794: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-keeps-stronger-russia-sanctions-in-reserve-at-syria-talks-1476884794


Processing URLs:  62%|██████▎   | 625/1000 [31:18<04:43,  1.32it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/11/05/venezuela-can-opposition-win-dec-6-election/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/11/05/venezuela-can-opposition-win-dec-6-election/


Processing URLs:  63%|██████▎   | 628/1000 [31:23<07:28,  1.21s/it]

Error extracting text from https://www.reuters.com/article/us-yemen-security-ship-idUSKBN1AS0NX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-ship-idUSKBN1AS0NX


Processing URLs:  63%|██████▎   | 633/1000 [31:48<20:23,  3.33s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-idUSKBN15F17I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-idUSKBN15F17I


Processing URLs:  64%|██████▎   | 636/1000 [31:52<12:54,  2.13s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/iran-aims-to-rejoin-swift/2005714.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/iran-aims-to-rejoin-swift/2005714.html


Processing URLs:  64%|██████▍   | 638/1000 [31:53<08:08,  1.35s/it]

Error extracting text from http://www.peaceau.org/en/article/communique-of-the-741st-psc-meeting-on-the-situation-in-somalia-and-the-implementation-of-the-mandate-of-the-au-mission-in-somalia-amisom: 403 Client Error: Forbidden for url: http://www.peaceau.org/en/article/communique-of-the-741st-psc-meeting-on-the-situation-in-somalia-and-the-implementation-of-the-mandate-of-the-au-mission-in-somalia-amisom


Processing URLs:  64%|██████▍   | 640/1000 [31:54<05:20,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13S0LA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13S0LA?il=0


Processing URLs:  65%|██████▍   | 646/1000 [32:04<09:30,  1.61s/it]

Error extracting text from https://www.google.com/amp/s/uk.mobile.reuters.com/article/amp/idINKBN28Y1D8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idINKBN28Y1D8


Processing URLs:  65%|██████▍   | 649/1000 [32:05<05:05,  1.15it/s]

Error extracting text from https://uk.news.yahoo.com/nato-set-invite-montenegro-join-alliance-sources-160122502.html#f7DVZAo: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/nato-set-invite-montenegro-join-alliance-sources-160122502.html#f7DVZAo
Error extracting text from http://www.nytimes.com/aponline/2015/09/03/world/middleeast/ap-ml-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/09/03/world/middleeast/ap-ml-syria.html
Error extracting text from https://www.google.com/selfdrivingcar/where/: 404 Client Error: Not Found for url: https://www.google.com/selfdrivingcar/where/


Processing URLs:  65%|██████▌   | 651/1000 [32:10<08:14,  1.42s/it]

URL filtered: http://www.bloombergview.com/articles/2016-03-09/a-surprising-test-for-republicans-in-iowa


Processing URLs:  66%|██████▌   | 656/1000 [32:18<08:20,  1.45s/it]

Error extracting text from https://www.wsj.com/articles/u-s-sanctions-russia-over-moves-against-nalvany-11614693850: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-sanctions-russia-over-moves-against-nalvany-11614693850


Processing URLs:  66%|██████▌   | 658/1000 [32:19<05:47,  1.02s/it]

Error extracting text from http://business.financialpost.com/investing/the-aaa-credit-rating-club-is-getting-cozy-and-canada-stands-to-benefit-from-it: 403 Client Error: Forbidden for url: https://financialpost.com/investing/the-aaa-credit-rating-club-is-getting-cozy-and-canada-stands-to-benefit-from-it


Processing URLs:  66%|██████▌   | 660/1000 [32:22<06:10,  1.09s/it]

Error extracting text from https://www.iarpa.gov/challenges/gfchallenge2.html: 404 Client Error: Not Found for url: https://www.iarpa.gov/challenges/gfchallenge2.html
Error extracting text from http://www.nytimes.com/2016/05/13/health/zika-brazil-olympics-who-guidelines.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/13/health/zika-brazil-olympics-who-guidelines.html?_r=0


Processing URLs:  66%|██████▌   | 661/1000 [32:22<05:44,  1.02s/it]

Error extracting text from http://thehill.com/policy/international/265302-us-barrels-toward-lifting-iran-sanctions: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/265302-us-barrels-toward-lifting-iran-sanctions/


Processing URLs:  66%|██████▌   | 662/1000 [32:25<07:31,  1.33s/it]

Error extracting text from http://www.theglobeandmail.com/news/national/eu-sets-monday-deadline-for-belgium-to-back-ceta-deal/article32484471/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/national/eu-sets-monday-deadline-for-belgium-to-back-ceta-deal/article32484471/


Processing URLs:  66%|██████▋   | 665/1000 [32:31<10:53,  1.95s/it]

Error extracting text from http://ec.europa.eu/dgs/home-affairs/e-library/documents/policies/international-affairs/general/docs/turkey_second_progress_report_en.pdf: 404 Client Error: Not Found for url: https://home-affairs.ec.europa.eu/sites/default/files/e-library/documents/policies/international-affairs/general/docs/turkey_second_progress_report_en.pdf
URL filtered: http://www.reuters.com/article/us-germany-facebook-idUSKBN14Z0OH


Processing URLs:  67%|██████▋   | 670/1000 [32:38<07:34,  1.38s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_129197.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_129197.htm?selectedLocale=en


Processing URLs:  67%|██████▋   | 672/1000 [32:41<08:01,  1.47s/it]

Error extracting text from http://www.ibtimes.com/iphone-6s-japan-launch-sales-down-15-percent-compared-apple-incs-iphone-6-2126812: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iphone-6s-japan-launch-sales-down-15-percent-compared-apple-incs-iphone-6-2126812


Processing URLs:  68%|██████▊   | 677/1000 [32:53<11:00,  2.05s/it]

Error extracting text from https://crsreports.congress.gov/product/pdf/IF/IF10038: 403 Client Error: Forbidden for url: https://crsreports.congress.gov/product/pdf/IF/IF10038


Processing URLs:  68%|██████▊   | 678/1000 [32:58<15:41,  2.92s/it]

URL filtered: https://twitter.com/katyafimava/status/1440426033807724552


Processing URLs:  68%|██████▊   | 682/1000 [33:04<09:38,  1.82s/it]

Error extracting text from https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/moderna-covid-19-vaccine: 404 Client Error: Not Found for url: https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/moderna-covid-19-vaccine


Processing URLs:  68%|██████▊   | 684/1000 [33:06<06:50,  1.30s/it]

Error extracting text from http://www.scientificamerican.com/article/time-travel-simulation-resolves-grandfather-paradox/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/time-travel-simulation-resolves-grandfather-paradox/


Processing URLs:  69%|██████▊   | 686/1000 [33:08<06:07,  1.17s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-15/he-fed-s-new-chapter-will-be-written-in-dollar-denominated-oil


Processing URLs:  69%|██████▉   | 689/1000 [33:09<03:58,  1.30it/s]

Error extracting text from https://cruxnow.com/vatican/2017/04/09/pope-francis-hints-may-not-around-2019/: 404 Client Error: Not Found for url: https://cruxnow.com/vatican/2017/04/09/pope-francis-hints-may-not-around-2019


Processing URLs:  69%|██████▉   | 692/1000 [33:14<05:43,  1.12s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-11-22/alabama-senate-race-leaves-pro-lifers-no-choice


Processing URLs:  70%|██████▉   | 698/1000 [33:19<04:32,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-china-russia-idUSKCN0XQ0BP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-china-russia-idUSKCN0XQ0BP


Processing URLs:  70%|██████▉   | 699/1000 [33:21<05:52,  1.17s/it]

URL filtered: https://www.youtube.com/watch?v=i2c8L5XwCvI#t=885


Processing URLs:  70%|███████   | 705/1000 [33:30<07:23,  1.50s/it]

Error extracting text from https://www.commonspace.scot/articles/8796/poll-europeans-push-eu-governments-accept-independent-scotland: HTTPSConnectionPool(host='www.commonspace.scot', port=443): Max retries exceeded with url: /articles/8796/poll-europeans-push-eu-governments-accept-independent-scotland (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.commonspace.scot'. (_ssl.c:1000)")))
URL filtered: http://www.scmp.com/news/asia/southeast-asia/article/2080895/philippines-protest-chinas-plan-build-disputed-shoal?utm_content=bufferef664&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  71%|███████   | 708/1000 [33:31<04:00,  1.21it/s]

Error extracting text from http://www.nytimes.com/1996/08/07/us/clues-in-meteorite-seem-to-show-signs-of-life-on-mars-long-ago.html?pagewanted=all: 403 Client Error: Forbidden for url: http://www.nytimes.com/1996/08/07/us/clues-in-meteorite-seem-to-show-signs-of-life-on-mars-long-ago.html?pagewanted=all


Processing URLs:  71%|███████   | 709/1000 [33:32<04:27,  1.09it/s]

URL filtered: https://twitter.com/BorisJohnson/status/1356143343600885761


Processing URLs:  71%|███████   | 712/1000 [33:34<03:14,  1.48it/s]

Error extracting text from http://uk.reuters.com/article/2015/11/14/uk-mideast-crisis-syria-steinmeier-idUKKCN0T310T20151114: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.nytimes.com/2016/06/09/world/asia/bangladesh-killings-bloggers.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/09/world/asia/bangladesh-killings-bloggers.html


Processing URLs:  72%|███████▏  | 719/1000 [33:45<04:13,  1.11it/s]

Error extracting text from https://www.reuters.com/article/us-usa-tax-salt/discord-among-republicans-already-weighs-on-trumps-tax-plan-idUSKBN1CB13X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax-salt/discord-among-republicans-already-weighs-on-trumps-tax-plan-idUSKBN1CB13X
URL filtered: https://twitter.com/ScotGovFM/status/841235110112481281
Error extracting text from http://www.nytimes.com/1988/01/21/books/books-of-the-times-403888.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1988/01/21/books/books-of-the-times-403888.html


Processing URLs:  72%|███████▏  | 721/1000 [33:47<05:05,  1.10s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/why-bosnia-needs-join-nato-asap-16175: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/why-bosnia-needs-join-nato-asap-16175


Processing URLs:  72%|███████▏  | 724/1000 [33:55<07:59,  1.74s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.foliomag.com/2016/study-two-thirds-americans-still-read-print-magazines/: Document is empty


Processing URLs:  73%|███████▎  | 728/1000 [34:02<07:31,  1.66s/it]

URL filtered: https://www.oximity.com/article/Russia-completes-preparations-to-deliv-1#.ViN-1DBCg4Q.twitter


Processing URLs:  73%|███████▎  | 731/1000 [34:04<04:36,  1.03s/it]

Error extracting text from http://news.yahoo.com/montenegro-pm-shrugs-off-violent-demos-calling-resignation-174632502.html: 404 Client Error: Not Found for url: http://news.yahoo.com/montenegro-pm-shrugs-off-violent-demos-calling-resignation-174632502.html


Processing URLs:  74%|███████▎  | 735/1000 [34:07<03:14,  1.36it/s]

Error extracting text from http://tass.ru/en/defense/832390: 404 Client Error: Not Found for url: https://tass.ru/en/defense/832390
URL filtered: http://www.bloomberg.com/politics/articles/2016-02-05/who-will-win-new-hampshire-here-are-eight-credible-predictions
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.titr24.com/News-XcZBy0SNnlmVI45gPDDokw==/نتایج-دهمین-دوره-انتخابات-مجلس-به-تفکیک-گرایش-سیاسی: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.titr24.com/News-XcZBy0SNnlmVI45gPDDokw==/%D9%86%D8%AA%D8%A7%DB%8C%D8%AC-%D8%AF%D9%87%D9%85%DB%8C%D9%86-%D8%AF%D9%88%D8%B1%D9%87-%D8%A7%D9%86%D8%AA%D8%AE%D8%A7%D8%A8%D8%A7%D8%AA-%D9%85%D8%AC%D9%84%D8%B3-%D8%A8%D9%87-%D8%AA%D9%81%DA%A9%DB%8C%DA%A9-%DA%AF%D8%B1%D8%A7%DB%8C%D8%B4-%D8%B3%DB%8C%D8%A7%D8%B3%DB%8C


Processing URLs:  74%|███████▍  | 740/1000 [34:19<08:25,  1.94s/it]

Error extracting text from https://www.cnbc.com/2017/10/31/reuters-america-table-opec-oil-output-falls-by-80000-bpd-in-october--reuters-survey.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/10/31/reuters-america-table-opec-oil-output-falls-by-80000-bpd-in-october--reuters-survey.html


Processing URLs:  74%|███████▍  | 745/1000 [34:23<04:22,  1.03s/it]

Error extracting text from http://www.voteleavetakecontrol.org/briefing_newdeal: 404 Client Error: Not Found for url: http://www.voteleavetakecontrol.org/briefing_newdeal


Processing URLs:  75%|███████▍  | 748/1000 [34:27<04:49,  1.15s/it]

Error extracting text from https://coconuts.co/yangon/news/junta-wants-us-to-extradite-un-rep-for-prosecution-on-treason/: 403 Client Error: Forbidden for url: https://coconuts.co/yangon/news/junta-wants-us-to-extradite-un-rep-for-prosecution-on-treason/


Processing URLs:  75%|███████▌  | 750/1000 [34:29<03:42,  1.12it/s]

Error extracting text from http://seekingalpha.com/article/3654496-opecs-december-meeting-will-have-to-address-the-financial-woes: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3654496-opecs-december-meeting-will-have-to-address-the-financial-woes


Processing URLs:  75%|███████▌  | 751/1000 [34:29<03:10,  1.31it/s]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/05/27/0301000000AEN20160527000200315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  75%|███████▌  | 752/1000 [34:30<03:56,  1.05it/s]

Error extracting text from http://www.gov.me/en/News/163034/NATO-welcomes-Montenegro-as-a-new-member.html: 404 Client Error: not found for url: https://www.gov.me/en/News/163034/NATO-welcomes-Montenegro-as-a-new-member.html


Processing URLs:  76%|███████▌  | 755/1000 [34:37<05:49,  1.43s/it]

Error extracting text from https://ecfsapi.fcc.gov/file/1113201502040/171113%20-%20CTIA%20Ex%20Parte.pdf: 403 Client Error: Forbidden for url: https://ecfsapi.fcc.gov/file/1113201502040/171113%20-%20CTIA%20Ex%20Parte.pdf


Processing URLs:  76%|███████▌  | 757/1000 [34:39<05:09,  1.27s/it]

Error extracting text from http://webcache.googleusercontent.com/search?q=cache:yDRixyIMes4J:www.ft.com/fastft/2016/01/22/venezuela-risks-argentina-style-legal-drama-if-it-defaults/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=u: 404 Client Error: Not Found for url: http://webcache.googleusercontent.com/search?q=cache:yDRixyIMes4J:www.ft.com/fastft/2016/01/22/venezuela-risks-argentina-style-legal-drama-if-it-defaults/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=u


Processing URLs:  76%|███████▌  | 758/1000 [34:41<05:02,  1.25s/it]

URL filtered: https://www.thecipherbrief.com/column/strategic-view/wanted-robust-intelligence-and-diplomatic-reporting#.WYVEp04mEyY.linkedin


Processing URLs:  76%|███████▌  | 762/1000 [34:46<05:25,  1.37s/it]

Error extracting text from https://coronavirus.data.gov.uk/deaths: 404 Client Error: Not Found for url: https://coronavirus.data.gov.uk/deaths


Processing URLs:  76%|███████▋  | 764/1000 [34:51<07:21,  1.87s/it]

Error extracting text from http://www.ecoti.in/oa99LY: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/beijing-takes-its-south-china-sea-strategy-to-the-himalayas/articleshow/79460089.cms?utm_source%3Dwhatsapp_web%26utm_medium%3Dsocial%26utm_campaign%3Dsocialsharebuttons


Processing URLs:  77%|███████▋  | 770/1000 [35:12<11:31,  3.01s/it]

Error extracting text from http://247wallst.com/energy-business/2016/10/07/so-hows-that-saudi-aramco-ipo-coming-along/: 403 Client Error: Forbidden for url: https://247wallst.com/energy-business/2016/10/07/so-hows-that-saudi-aramco-ipo-coming-along/


Processing URLs:  77%|███████▋  | 771/1000 [35:17<13:44,  3.60s/it]

Error extracting text from http://rips.or.jp/Policy_Perspectives/Policy%20Perpsectives%20No25_Restless%20Rivals_2017.pdf: 404 Client Error: Not Found for url: https://www.rips.or.jp/Policy_Perspectives/Policy%20Perpsectives%20No25_Restless%20Rivals_2017.pdf


Processing URLs:  77%|███████▋  | 774/1000 [35:22<08:06,  2.15s/it]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-iran-prisoners-released-20160116-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-iran-prisoners-released-20160116-story.html


Processing URLs:  78%|███████▊  | 779/1000 [35:33<06:33,  1.78s/it]

Error extracting text from https://www.middleeastmonitor.com/20171109-abbas-and-saudi-crown-prince-discuss-anti-iran-alliance/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20171109-abbas-and-saudi-crown-prince-discuss-anti-iran-alliance/


Processing URLs:  78%|███████▊  | 781/1000 [35:34<03:57,  1.09s/it]

Error extracting text from http://www.imdb.com/title/tt0078841/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0078841/


Processing URLs:  78%|███████▊  | 782/1000 [35:36<05:36,  1.54s/it]

Error extracting text from http://38north.org/2015/12/punggye120215/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  78%|███████▊  | 784/1000 [35:42<08:28,  2.36s/it]

Error extracting text from http://www.ft: HTTPConnectionPool(host='www.ft', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30018fa70>: Failed to resolve 'www.ft' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▊  | 786/1000 [35:42<04:49,  1.35s/it]

Error extracting text from https://www.chathamhouse.org/2021/03/myanmars-month-long-phony-war-over: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/2021/03/myanmars-month-long-phony-war-over


Processing URLs:  79%|███████▊  | 787/1000 [35:43<04:17,  1.21s/it]

Error extracting text from http://www.noilbeveragetax.com/faq.aspx: 404 Client Error: Not Found for url: http://www.noilbeveragetax.com/faq.aspx


Processing URLs:  80%|███████▉  | 795/1000 [35:58<04:43,  1.38s/it]

Error extracting text from http://24-m.com/: 403 Client Error: Forbidden for url: http://24-m.com/
Error extracting text from http://www.reuters.com/article/2015/11/21/us-usa-fed-williams-rates-idUSKCN0TA0WZ20151121: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/21/us-usa-fed-williams-rates-idUSKCN0TA0WZ20151121


Processing URLs:  80%|███████▉  | 796/1000 [35:59<04:50,  1.42s/it]

Error extracting text from http://www.newsweek.com/isis-attacks-kill-47-iraqi-soldiers-near-ramadi-436863: 403 Client Error: Forbidden for url: https://www.newsweek.com/isis-attacks-kill-47-iraqi-soldiers-near-ramadi-436863


Processing URLs:  80%|███████▉  | 799/1000 [36:01<02:46,  1.20it/s]

Error extracting text from http://www.wsj.com/articles/iraqi-city-of-mosul-transformed-a-year-after-islamic-state-capture-1433888626: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-city-of-mosul-transformed-a-year-after-islamic-state-capture-1433888626
Error extracting text from https://www.researchgate.net/publication/286512331_Collective_Intelligence_and_Group_Performance: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/286512331_Collective_Intelligence_and_Group_Performance


Processing URLs:  80%|████████  | 802/1000 [36:06<03:31,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-mattis-asia-idUSKBN15G3FG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-mattis-asia-idUSKBN15G3FG


Processing URLs:  80%|████████  | 804/1000 [36:10<04:23,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-usa-markets-trump-analysis-idUSKBN16T2YH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-markets-trump-analysis-idUSKBN16T2YH


Processing URLs:  81%|████████  | 810/1000 [36:16<03:15,  1.03s/it]

Error extracting text from http://www.wsj.com/articles/global-stocks-steady-ahead-of-fed-rate-decision-1446022222: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/global-stocks-steady-ahead-of-fed-rate-decision-1446022222


Processing URLs:  81%|████████  | 811/1000 [36:17<03:16,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-cyber-russia-congress-sanctions-idUSKBN14T2HS?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-congress-sanctions-idUSKBN14T2HS?il=0


Processing URLs:  81%|████████▏ | 813/1000 [36:17<02:03,  1.52it/s]

Error extracting text from https://www.reuters.com/world/europe/neutral-swiss-adopt-sanctions-against-russia-2022-02-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/neutral-swiss-adopt-sanctions-against-russia-2022-02-28/


Processing URLs:  82%|████████▏ | 815/1000 [36:18<01:32,  2.01it/s]

Error extracting text from http://www.wsj.com/articles/spacex-plans-to-resume-rocket-launches-1483363705: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/spacex-plans-to-resume-rocket-launches-1483363705


Processing URLs:  82%|████████▏ | 816/1000 [36:19<01:48,  1.69it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=58470#.WmoYVEysOfU: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=58470#.WmoYVEysOfU


Processing URLs:  82%|████████▏ | 820/1000 [36:38<08:03,  2.68s/it]

Error extracting text from http://portal.stf.jus.br/noticias/verNoticiaDetalhe.asp?idConteudo=464261&amp;ori=1: 403 Client Error: Forbidden for url: http://portal.stf.jus.br/noticias/verNoticiaDetalhe.asp?idConteudo=464261&amp;ori=1


Processing URLs:  82%|████████▏ | 821/1000 [36:39<06:39,  2.23s/it]

URL filtered: https://www.youtube.com/watch?v=3W7-ngmO_p8


Processing URLs:  82%|████████▏ | 824/1000 [36:44<06:02,  2.06s/it]

URL filtered: https://www.cnet.com/news/amazon-apple-google-facebook-targeted-by-new-pack-of-antitrust-bills/


Processing URLs:  83%|████████▎ | 826/1000 [36:44<03:39,  1.26s/it]

Error extracting text from https://www.extremetech.com/internet/249575-fcc-votes-kill-net-neutrality-dismantle-title-ii-rules-governing-isps: 403 Client Error: Forbidden for url: https://www.extremetech.com/internet/249575-fcc-votes-kill-net-neutrality-dismantle-title-ii-rules-governing-isps


Processing URLs:  83%|████████▎ | 829/1000 [36:47<02:33,  1.12it/s]

Error extracting text from https://www.usaid.gov/sites/default/files/documents/1866/yemen_ce_fs17_09-30-2017.pdf: 404 Client Error: Not Found for url: https://www.usaid.gov/sites/default/files/documents/1866/yemen_ce_fs17_09-30-2017.pdf
Error extracting text from https://english.a2news.com/2021/04/03/population-census-postponed-in-north-macedonia-albanian-mps-accuse-lawmakers-of-manipulation-attempts/: HTTPSConnectionPool(host='english.a2news.com', port=443): Max retries exceeded with url: /2021/04/03/population-census-postponed-in-north-macedonia-albanian-mps-accuse-lawmakers-of-manipulation-attempts/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303cad610>: Failed to resolve 'english.a2news.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-japan-companies-yen-idUSKBN0TQ2OG20151207#chGP2OhB88qGyBXM.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/u

Processing URLs:  83%|████████▎ | 831/1000 [36:48<02:17,  1.23it/s]

Error extracting text from http://www.nytimes.com/2016/02/17/us/politics/supreme-court-path-is-littered-with-pitfalls-for-president-and-gop.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/17/us/politics/supreme-court-path-is-littered-with-pitfalls-for-president-and-gop.html


Processing URLs:  84%|████████▎ | 835/1000 [36:51<01:57,  1.40it/s]

Error extracting text from http://www.nytimes.com/2016/06/03/business/energy-environment/opec-meeting-oil-production-saudi-arabia.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/03/business/energy-environment/opec-meeting-oil-production-saudi-arabia.html?_r=0


Processing URLs:  84%|████████▎ | 836/1000 [37:51<48:35, 17.78s/it]

Error extracting text from https://olympics.com/ioc/news/joint-statement-by-the-ioc-ipc-tokyo-2020-tokyo-metropolitan-government-and-the-government-of-japan: HTTPSConnectionPool(host='olympics.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  84%|████████▎ | 837/1000 [37:51<34:23, 12.66s/it]

Error extracting text from http://www.wsj.com/articles/pacific-shoals-of-trouble-1473203623: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pacific-shoals-of-trouble-1473203623


Processing URLs:  84%|████████▍ | 839/1000 [37:54<18:20,  6.84s/it]

Error extracting text from https://www.nytimes.com/2021/04/28/world/europe/uk-johnson-ethics-probe.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/28/world/europe/uk-johnson-ethics-probe.html


Processing URLs:  84%|████████▍ | 841/1000 [37:55<09:36,  3.62s/it]

Error extracting text from http://www.wsj.com/articles/oil-price-slump-could-force-u-s-non-opec-suppliers-to-make-deep-cuts-1441959208: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-price-slump-could-force-u-s-non-opec-suppliers-to-make-deep-cuts-1441959208


Processing URLs:  84%|████████▍ | 843/1000 [38:02<08:50,  3.38s/it]

URL filtered: https://www.youtube.com/watch?v=7jiaU0xbOKs


Processing URLs:  85%|████████▍ | 849/1000 [38:12<05:24,  2.15s/it]

Error extracting text from https://www.google.com/amp/www.mtlblog.com/2016/10/canada-to-vote-on-legalizing-recreational-marijuana-in-2017/amp/?client=safari: 404 Client Error: Not Found for url: https://www.mtlblog.com/2016/10/canada-to-vote-on-legalizing-recreational-marijuana-in-2017/amp/


Processing URLs:  85%|████████▌ | 850/1000 [38:12<04:04,  1.63s/it]

Error extracting text from https://www.opposingviews.com/sports/3-explanations-3-perfect-games-2012: 403 Client Error: Forbidden for url: https://www.opposingviews.com/sports/3-explanations-3-perfect-games-2012


Processing URLs:  86%|████████▌ | 857/1000 [38:21<03:09,  1.32s/it]

URL filtered: https://www.linkedin.com/in/regina-joseph-780b211/


Processing URLs:  86%|████████▌ | 860/1000 [38:24<02:19,  1.01it/s]

Error extracting text from http://www.nytimes.com/2016/10/04/world/europe/russia-plutonium-nuclear-treaty.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/04/world/europe/russia-plutonium-nuclear-treaty.html


Processing URLs:  86%|████████▌ | 862/1000 [38:25<01:33,  1.47it/s]

Error extracting text from https://medium.com/the-physics-arxiv-blog/the-last-ai-breakthrough-deepmind-made-before-google-bought-it-for-400m-7952031ee5e1#.cv903tajf: 403 Client Error: Forbidden for url: https://medium.com/the-physics-arxiv-blog/the-last-ai-breakthrough-deepmind-made-before-google-bought-it-for-400m-7952031ee5e1#.cv903tajf


Processing URLs:  87%|████████▋ | 867/1000 [38:41<05:58,  2.69s/it]

Error extracting text from https://www.nytimes.com/2021/06/17/health/covid-pill-antiviral.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/17/health/covid-pill-antiviral.html


Processing URLs:  87%|████████▋ | 873/1000 [38:55<04:29,  2.13s/it]

Error extracting text from http://www.reuters.com/article/us-hongkong-election-idUSKBN16W0WF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-election-idUSKBN16W0WF?il=0


Processing URLs:  88%|████████▊ | 876/1000 [39:05<04:57,  2.40s/it]

Error extracting text from https://aoav.org.uk/category/explosive_violence_by_country/pakistan/: 403 Client Error: Forbidden for url: https://aoav.org.uk/category/explosive_violence_by_country/pakistan/


Processing URLs:  88%|████████▊ | 879/1000 [39:12<04:25,  2.19s/it]

Error extracting text from http://omantribune.com/details/10456/: 404 Client Error: Not Found for url: http://omantribune.com/details/10456/


Processing URLs:  88%|████████▊ | 881/1000 [39:15<03:30,  1.77s/it]

Error extracting text from http://www.theregister.co.uk/2016/03/03/ddos_hacker_secrets_exposed/: 403 Client Error: Forbidden for url: https://www.theregister.com/2016/03/03/ddos_hacker_secrets_exposed/


Processing URLs:  88%|████████▊ | 882/1000 [39:16<03:02,  1.54s/it]

Error extracting text from http://www.andrewerickson.com/wp-content/uploads/2016/07/PH-CN-20160712-Press-Release-No-11-English.pdf: 403 Client Error: Forbidden for url: http://www.andrewerickson.com/wp-content/uploads/2016/07/PH-CN-20160712-Press-Release-No-11-English.pdf


Processing URLs:  89%|████████▉ | 890/1000 [40:07<22:30, 12.28s/it]

Error extracting text from http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html


Processing URLs:  90%|████████▉ | 896/1000 [40:19<05:17,  3.05s/it]

Error extracting text from https://translate.google.com/translate?hl=&amp;sl=am&amp;tl=en&amp;u=https%3A%2F%2Fwww.press.et%2Fama%2F%3Fp%3D44496: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=&amp;sl=am&amp;tl=en&amp;u=https%3A%2F%2Fwww.press.et%2Fama%2F%3Fp%3D44496


Processing URLs:  90%|████████▉ | 897/1000 [40:21<04:26,  2.59s/it]

URL filtered: https://www.youtube.com/watch?v=KJkmcNjh_bg


Processing URLs:  90%|█████████ | 902/1000 [40:26<02:28,  1.51s/it]

Error extracting text from https://www.indiegogo.com/projects/diy-crispr-genome-engineering-kits-from-the-odin#/: 403 Client Error: Forbidden for url: https://www.indiegogo.com/projects/diy-crispr-genome-engineering-kits-from-the-odin#/


Processing URLs:  90%|█████████ | 905/1000 [40:29<01:50,  1.16s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=118102: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=118102


Processing URLs:  91%|█████████ | 910/1000 [40:39<02:24,  1.61s/it]

Error extracting text from https://www.newsweek.com/israel-threatens-lebanon-over-rocket-attacks-syria-warns-israel-against-airstrikes-1611665: 403 Client Error: Forbidden for url: https://www.newsweek.com/israel-threatens-lebanon-over-rocket-attacks-syria-warns-israel-against-airstrikes-1611665


Processing URLs:  92%|█████████▏| 922/1000 [41:10<03:57,  3.04s/it]

Error extracting text from https://www.the-newshub.com/us-politics/benghazi-panel-grilling-of-hillary-clinton-could-be-joe-bidens-moment-to-enter-the-presidential-race: 403 Client Error: Forbidden for url: https://www.the-newshub.com/us-politics/benghazi-panel-grilling-of-hillary-clinton-could-be-joe-bidens-moment-to-enter-the-presidential-race
URL filtered: http://www.bloomberg.com/news/articles/2015-11-02/opec-seen-staying-pat-on-output-by-oil-guru-who-called-2014-rout


Processing URLs:  93%|█████████▎| 926/1000 [41:15<02:03,  1.66s/it]

Error extracting text from http://on.wsj.com/2ewfyfS: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraq-s-kirkuk-hit-by-renewed-clashes-with-islamic-state-1477224953?emailToken=JRr8cPB9ZniTh9Y0bcw9zlQxK7JNCO6TRU7UaXDLJg3GpTnPrOSs2Kg5wtCzqHivSF0/%2BNEY7ys%2BXjnYhWthGdSNkqJy1A74JCYH9M2VjFY%3D


Processing URLs:  93%|█████████▎| 929/1000 [41:20<01:46,  1.50s/it]

Error extracting text from https://www.timesofisrael.com/breaking-from-longheld-israeli-stance-gantz-willing-to-accept-jcpoa-revival/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/breaking-from-longheld-israeli-stance-gantz-willing-to-accept-jcpoa-revival/


Processing URLs:  93%|█████████▎| 931/1000 [41:23<01:46,  1.55s/it]

Error extracting text from http://polling.reuters.com/#poll/TR130/filters/PARTY_ID_:2/dates/20151107-20151113/type/day: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/#poll/TR130/filters/PARTY_ID_:2/dates/20151107-20151113/type/day


Processing URLs:  93%|█████████▎| 934/1000 [41:26<01:18,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-g20-growth-europe-idUSKBN17A0GA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-g20-growth-europe-idUSKBN17A0GA


Processing URLs:  94%|█████████▎| 935/1000 [41:27<01:06,  1.02s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/#.VzPaW2Nh0dw: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/#.VzPaW2Nh0dw


Processing URLs:  94%|█████████▍| 938/1000 [41:32<01:23,  1.35s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-29/defying-logic-and-sanctions-venezuela-s-economy-slumps-along


Processing URLs:  94%|█████████▍| 945/1000 [41:35<00:28,  1.90it/s]

Error extracting text from http://problem.Goverment: HTTPConnectionPool(host='problem.goverment', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3035283b0>: Failed to resolve 'problem.goverment' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/venezuela-cut-deeper-into-junk-by-fitch-amid-high-default-odds
Error extracting text from https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/327427067_The_Theory_of_Graceful_Extensibility_Basic_rules_that_govern_adaptive_systems


Processing URLs:  95%|█████████▍| 946/1000 [41:37<00:37,  1.43it/s]

Error extracting text from http://economictimes.indiatimes.com/news/defence/vietnam-moves-new-rocket-launchers-into-disputed-south-china-sea-sources/articleshow/53629487.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/vietnam-moves-new-rocket-launchers-into-disputed-south-china-sea-sources/articleshow/53629487.cms


Processing URLs:  95%|█████████▍| 947/1000 [41:40<01:00,  1.15s/it]

Error extracting text from http://lasillavacia.com/historia/el-nuevo-ultimatum-de-santos-53113: 404 Client Error: Not Found for url: https://www.lasillavacia.com/historia/el-nuevo-ultimatum-de-santos-53113


Processing URLs:  95%|█████████▍| 948/1000 [41:41<00:59,  1.15s/it]

Error extracting text from http://www.universetoday.com/129984/uh-going-need-bigger-landing-pad/: 503 Server Error: Service Unavailable for url: https://www.universetoday.com/129984/uh-going-need-bigger-landing-pad/


Processing URLs:  95%|█████████▍| 949/1000 [41:41<00:48,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/france-takes-an-activist-line-in-the-muslim-world-1421873087: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/france-takes-an-activist-line-in-the-muslim-world-1421873087


Processing URLs:  95%|█████████▌| 952/1000 [41:44<00:42,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-tabqa-idUSKBN16T1KM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-tabqa-idUSKBN16T1KM


Processing URLs:  96%|█████████▌| 956/1000 [41:54<01:27,  2.00s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN19P02W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN19P02W


Processing URLs:  96%|█████████▌| 959/1000 [41:57<00:47,  1.16s/it]

Error extracting text from http://thefederalist.com/2015/12/16/military-strategist-explains-why-donald-trump-leads-and-how-he-will-fail/: 403 Client Error: Forbidden for url: http://thefederalist.com/2015/12/16/military-strategist-explains-why-donald-trump-leads-and-how-he-will-fail/


Processing URLs:  96%|█████████▌| 961/1000 [41:59<00:49,  1.26s/it]

URL filtered: https://m.youtube.com/watch?v=88LSYM2RGG0


Processing URLs:  96%|█████████▋| 965/1000 [42:02<00:33,  1.03it/s]

Error extracting text from http://uk.reuters.com/article/2015/12/01/uk-nato-montenegro-idUKKBN0TK4BF20151201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  97%|█████████▋| 968/1000 [42:10<00:58,  1.84s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/un-faces-rival-claims-myanmar-seat-doubts-over-afghanistan-2021-09-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/un-faces-rival-claims-myanmar-seat-doubts-over-afghanistan-2021-09-13/


Processing URLs:  98%|█████████▊| 975/1000 [42:18<00:21,  1.15it/s]

Error extracting text from http://www.reuters.com/article/us-philippines-southchinasea-china-idUSKCN12S18B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-southchinasea-china-idUSKCN12S18B


Processing URLs:  98%|█████████▊| 978/1000 [42:23<00:29,  1.34s/it]

Error extracting text from https://www.nasdaq.com/articles/u.s.-drillers-add-oil-and-gas-rigs-for-fourth-week-in-a-row-baker-hughes-2021-10-01: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/u.s.-drillers-add-oil-and-gas-rigs-for-fourth-week-in-a-row-baker-hughes-2021-10-01


Processing URLs:  98%|█████████▊| 981/1000 [42:24<00:12,  1.50it/s]

Error extracting text from https://www.aninews.in/news/world/asia/chinese-top-official-defected-to-us-gave-biden-administration-info-about-wuhan-lab-report-suggests20210618094457/: 403 Client Error: Forbidden for url: https://www.aninews.in/news/world/asia/chinese-top-official-defected-to-us-gave-biden-administration-info-about-wuhan-lab-report-suggests20210618094457/
Error extracting text from https://ahvalnews.com/turkish-lira/fitch-signals-possible-turkey-downgrade-after-central-bank-chief-sacked: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkish-lira/fitch-signals-possible-turkey-downgrade-after-central-bank-chief-sacked


Processing URLs:  98%|█████████▊| 983/1000 [42:26<00:12,  1.31it/s]

Error extracting text from https://www.france24.com/en/diplomacy/20211202-iran-steps-up-uranium-enrichment-capacity-despite-talks-to-salvage-nuclear-deal): 403 Client Error: Forbidden for url: https://www.france24.com/en/diplomacy/20211202-iran-steps-up-uranium-enrichment-capacity-despite-talks-to-salvage-nuclear-deal)


Processing URLs:  99%|█████████▊| 986/1000 [42:28<00:08,  1.70it/s]

Error extracting text from https://www.wsj.com/articles/syria-peace-talks-end-with-little-progress-1488564683: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-peace-talks-end-with-little-progress-1488564683
Error extracting text from http://www.nytimes.com/2015/11/26/world/americas/brazil-corruption-petrobas.html?emc=edit_th_20151126&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/26/world/americas/brazil-corruption-petrobas.html?emc=edit_th_20151126&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  99%|█████████▉| 993/1000 [42:40<00:09,  1.36s/it]

Error extracting text from http://news.yahoo.com/russia-opposes-sanctions-iran-ballistic-missile-test-143231950.html: 404 Client Error: Not Found for url: http://news.yahoo.com/russia-opposes-sanctions-iran-ballistic-missile-test-143231950.html


Processing URLs: 100%|█████████▉| 996/1000 [42:44<00:05,  1.49s/it]

Error extracting text from http://sos.nh.gov/DemBal16PP.aspx?id=8589953253: 404 Client Error: Not Found for url: https://sos.nh.gov/DemBal16PP.aspx?id=8589953253


Processing URLs: 100%|█████████▉| 998/1000 [42:48<00:03,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-research-crude-goldman-idUSKBN1610IS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-research-crude-goldman-idUSKBN1610IS


Processing URLs: 100%|██████████| 1000/1000 [43:11<00:00,  2.59s/it]
Processing URLs:   0%|          | 4/1000 [00:04<22:04,  1.33s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/radar-on-line/impeachment/a-diferenca-dos-abaixo-assinados-pro-e-contra-o-impeachment/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/radar-on-line/impeachment/a-diferenca-dos-abaixo-assinados-pro-e-contra-o-impeachment/&amp;prev=search


Processing URLs:   1%|          | 8/1000 [00:08<20:16,  1.23s/it]

Error extracting text from http://www.thinkspain.com/news-spain/27699/psoe-dead-against-mega-coalition-with-pp-and-ciudadanos-says-only-if-rajoy-steps-aside: 404 Client Error: Not Found for url: https://www.thinkspain.com:443/news-spain/27699/psoe-dead-against-mega-coalition-with-pp-and-ciudadanos-says-only-if-rajoy-steps-aside


Processing URLs:   1%|          | 9/1000 [00:10<20:16,  1.23s/it]

Error extracting text from http://www.ibtimes.co.uk/iraqi-pm-haider-al-abadi-announces-operation-liberate-shirqat-isis-1582203: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/iraqi-pm-haider-al-abadi-announces-operation-liberate-shirqat-isis-1582203


Processing URLs:   1%|          | 10/1000 [00:13<32:45,  1.99s/it]

Error extracting text from http://38north.org/2016/06/jschilling062316/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:   2%|▏         | 16/1000 [00:27<37:21,  2.28s/it]

Error extracting text from http://www.petra.gov.jo/Public_News/Nws_NewsDetails.aspx?Site_Id=1&amp;lang=2&amp;NewsID=267426&amp;CatID=13&amp;Type=Home&amp;GType=1: 404 Client Error:  for url: https://www.petra.gov.jo/Public_News/Nws_NewsDetails.aspx?Site_Id=1&amp;lang=2&amp;NewsID=267426&amp;CatID=13&amp;Type=Home&amp;GType=1
URL filtered: https://www.bloomberg.com/news/articles/2017-11-21/u-s-vows-tough-sanctions-if-south-sudan-doesn-t-end-conflict


Processing URLs:   2%|▏         | 18/1000 [00:28<24:26,  1.49s/it]

Error extracting text from http://www.ipsos-nederland.nl/ipsos-politieke-barometer/barometer-van-deze-week: 404 Client Error: Not Found for url: http://www.ipsos-nederland.nl/ipsos-politieke-barometer/barometer-van-deze-week


Processing URLs:   2%|▏         | 23/1000 [00:31<10:02,  1.62it/s]

Error extracting text from https://www.reuters.com/article/us-usa-fed-kaplan/feds-kaplan-says-low-10-year-yield-an-ominous-sign-idUSKBN1CG005?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fed-kaplan/feds-kaplan-says-low-10-year-yield-an-ominous-sign-idUSKBN1CG005?il=0
URL filtered: https://www.bloomberg.com/news/articles/2017-08-03/why-investors-shouldn-t-trust-low-volatility
Error extracting text from http://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14H1SR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14H1SR


Processing URLs:   3%|▎         | 26/1000 [00:39<27:53,  1.72s/it]

Error extracting text from http://money.cnn.com/2016/02/29/investing/oil-prices-surge-opec/index.html-: 404 Client Error: Not Found for url: https://money.cnn.com/2016/02/29/investing/oil-prices-surge-opec/index.html-


Processing URLs:   3%|▎         | 29/1000 [00:42<21:55,  1.35s/it]

Error extracting text from http://www.who.int/topics/poliomyelitis/polio-free/en/: 404 Client Error: Not Found for url: https://www.who.int/topics/poliomyelitis/polio-free/en/


Processing URLs:   3%|▎         | 30/1000 [01:42<4:58:13, 18.45s/it]

Error extracting text from http://www.cmegroup.com/trading/interest-rates/fed-funds-flash.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   3%|▎         | 31/1000 [01:43<3:33:57, 13.25s/it]

Error extracting text from http://election.princeton.edu/2016/04/09/current-polls-favor-a-trump-delegate-majority/#more-14926: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /04/09/current-polls-favor-a-trump-delegate-majority/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301304890>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   4%|▍         | 38/1000 [01:58<42:23,  2.64s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-10-20/bond-markets-stir-across-middle-east-after-historic-saudi-sale


Processing URLs:   4%|▍         | 41/1000 [02:00<25:21,  1.59s/it]

Error extracting text from https://science.sciencemag.org/content/early/2021/03/03/science.abg3055: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.abg3055


Processing URLs:   4%|▍         | 43/1000 [02:03<23:36,  1.48s/it]

Error extracting text from http://www.newsweek.com/russias-nuclear-submarine-tests-ballistic-missile-fire-arctic-sea-629187: 403 Client Error: Forbidden for url: https://www.newsweek.com/russias-nuclear-submarine-tests-ballistic-missile-fire-arctic-sea-629187


Processing URLs:   4%|▍         | 44/1000 [02:04<21:37,  1.36s/it]

Error extracting text from https://cleantechnica.com/2016/10/14/atievas-complicated-ev-history-concept-image-name-atvus-unveiling-plans/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/10/14/atievas-complicated-ev-history-concept-image-name-atvus-unveiling-plans/


Processing URLs:   5%|▍         | 47/1000 [02:08<20:09,  1.27s/it]

Error extracting text from https://www.envio.org.ni/articulo/3480: HTTPSConnectionPool(host='www.envio.org.ni', port=443): Max retries exceeded with url: /articulo/3480 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x302d13110>: Failed to resolve 'www.envio.org.ni' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 50/1000 [02:13<24:08,  1.52s/it]

Error extracting text from http://warisboring.com/articles/the-u-s-militarys-poor-record-training-the-iraqi-army/?mc_cid=21ad7409d3&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/the-u-s-militarys-poor-record-training-the-iraqi-army/?mc_cid=21ad7409d3&amp;mc_eid=0467f21653


Processing URLs:   5%|▌         | 52/1000 [02:16<20:40,  1.31s/it]

Error extracting text from http://www.wsj.com/articles/india-cant-find-love-for-its-offshore-rupee-bonds-1459659632: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/india-cant-find-love-for-its-offshore-rupee-bonds-1459659632


Processing URLs:   5%|▌         | 54/1000 [02:18<17:00,  1.08s/it]

Error extracting text from http://www.geekwire.com/2015/speculation-mounts-over-elon-musks-plans-for-spacexs-mars-colonial-transporter/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2015/speculation-mounts-over-elon-musks-plans-for-spacexs-mars-colonial-transporter/


Processing URLs:   6%|▌         | 57/1000 [02:22<20:13,  1.29s/it]

Error extracting text from http://www.reuters.com/article/russia-usa-sanctions-idINKBN14H1G0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-usa-sanctions-idINKBN14H1G0


Processing URLs:   6%|▌         | 61/1000 [02:25<12:44,  1.23it/s]

Error extracting text from https://www.predictit.org/markets/detail/3633/Who-will-win-the-2020-Democratic-presidential-nomination: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/3633/Who-will-win-the-2020-Democratic-presidential-nomination


Processing URLs:   6%|▋         | 64/1000 [02:28<16:15,  1.04s/it]

Error extracting text from http://www.rtlnieuws.nl/nederland/politiek/rutte-kans-dat-vvd-gaat-regeren-met-pvv-is-nul: 404 Client Error: Not Found for url: https://www.rtlnieuws.nl/nederland/politiek/rutte-kans-dat-vvd-gaat-regeren-met-pvv-is-nul


Processing URLs:   7%|▋         | 66/1000 [02:29<12:03,  1.29it/s]

Error extracting text from http://www.cnbc.com/2015/10/20/report-north-korea-prepping-for-new-nuclear-test.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/10/20/report-north-korea-prepping-for-new-nuclear-test.html


Processing URLs:   7%|▋         | 67/1000 [02:31<15:13,  1.02it/s]

Error extracting text from https://rabbisacks.org/topics/hope-vrs-optimism/: 404 Client Error: Not Found for url: https://rabbisacks.org/topics/hope-vrs-optimism/


Processing URLs:   7%|▋         | 72/1000 [02:37<18:56,  1.22s/it]

URL filtered: https://twitter.com/MZ_GOV_PL?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:   8%|▊         | 77/1000 [02:41<14:39,  1.05it/s]

Error extracting text from https://www.russiamatters.org/blog/growing-scope-russian-chinese-naval-exercises-points-closer-ties: 403 Client Error: Forbidden for url: https://www.russiamatters.org/blog/growing-scope-russian-chinese-naval-exercises-points-closer-ties


Processing URLs:   8%|▊         | 78/1000 [02:47<35:26,  2.31s/it]

Error extracting text from http://www.chicagobooth.edu/ideas/efficientMarket.aspx: 404 Client Error: Page not found for url: https://www.chicagobooth.edu/ideas/efficientMarket


Processing URLs:   8%|▊         | 80/1000 [02:51<32:38,  2.13s/it]



Processing URLs:   8%|▊         | 82/1000 [03:04<1:02:54,  4.11s/it]

URL filtered: https://www.thecipherbrief.com/article/north-america/defending-us-north-korean-long-range-missiles-1091?utm_content=buffer1d157&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_ca


Processing URLs:   8%|▊         | 84/1000 [03:06<41:53,  2.74s/it]  

Error extracting text from http://www.reuters.com/article/us-venezuela-economy-idUSKBN0UL2OK20160107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-idUSKBN0UL2OK20160107


Processing URLs:   9%|▊         | 87/1000 [03:08<24:47,  1.63s/it]

Error extracting text from http://www.reuters.com/article/2015/10/28/us-mideast-crisis-syria-iran-idUSKCN0SM14C20151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-mideast-crisis-syria-iran-idUSKCN0SM14C20151028


Processing URLs:   9%|▉         | 90/1000 [03:23<45:47,  3.02s/it]  

Error extracting text from https://af.reuters.com/article/africaTech/idAFL4N1ME2UT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:   9%|▉         | 91/1000 [03:25<42:42,  2.82s/it]

Error extracting text from https://reut.rs/2WXPGwA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   9%|▉         | 93/1000 [03:29<31:41,  2.10s/it]

Error extracting text from http://pulitzercenter.org/projects/iraq-battle-mosul: 403 Client Error: Forbidden for url: http://pulitzercenter.org/projects/iraq-battle-mosul
URL filtered: https://www.linkedin.com/jobs/nextev-usa-jobs


Processing URLs:  10%|▉         | 95/1000 [03:29<19:54,  1.32s/it]

URL filtered: https://www.youtube.com/watch?v=QgaRd4d8hOY&amp;t=1m10s


Processing URLs:  10%|▉         | 98/1000 [03:33<18:15,  1.22s/it]

Error extracting text from http://www.washingtontimes.com/news/2015/sep/21/hillary-clintons-free-fall-in-polls-stops-amid-a-c/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/sep/21/hillary-clintons-free-fall-in-polls-stops-amid-a-c/


Processing URLs:  10%|▉         | 99/1000 [03:34<15:52,  1.06s/it]

Error extracting text from http://warontherocks.com/2016/11/trolling-for-trump-how-russia-is-trying-to-destroy-our-democracy/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/11/trolling-for-trump-how-russia-is-trying-to-destroy-our-democracy/


Processing URLs:  10%|█         | 104/1000 [03:40<19:08,  1.28s/it]

Error extracting text from http://www.superforecasting.com/iran-adversarial-collaboration-challenge-update-the-neutral-forecasters-are-leading/: HTTPSConnectionPool(host='www.superforecasting.com', port=443): Max retries exceeded with url: /iran-adversarial-collaboration-challenge-update-the-neutral-forecasters-are-leading/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  11%|█         | 109/1000 [03:46<18:32,  1.25s/it]

Error extracting text from https://thehill.com/homenews/administration/544333-russia-says-us-refused-biden-putin-call-amid-tensions: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/544333-russia-says-us-refused-biden-putin-call-amid-tensions/


Processing URLs:  11%|█         | 111/1000 [03:49<17:58,  1.21s/it]

Error extracting text from http://www.politico.com/story/2017/08/10/trump-thanks-vladimir-putin-diplomats-241498: 404 Client Error: Not Found for url: https://www.politico.com/story/2017/08/10/trump-thanks-vladimir-putin-diplomats-241498


Processing URLs:  11%|█         | 112/1000 [03:50<19:51,  1.34s/it]

Error extracting text from http://nanonews.org/crude-prices-fall-as-opec-fails-to-cut-quota/: 500 Server Error: Internal Server Error for url: https://nanonews.org/crude-prices-fall-as-opec-fails-to-cut-quota/


Processing URLs:  11%|█▏        | 113/1000 [03:52<21:28,  1.45s/it]

Error extracting text from http://www.ibtimes.com/china-warns-war-us-over-south-china-sea-australia-debate-naval-patrols-support-2160081: 403 Client Error: Forbidden for url: https://www.ibtimes.com/china-warns-war-us-over-south-china-sea-australia-debate-naval-patrols-support-2160081


Processing URLs:  11%|█▏        | 114/1000 [03:55<27:21,  1.85s/it]

Error extracting text from http://38north.org/category/05-james-church/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  12%|█▏        | 116/1000 [04:06<1:00:01,  4.07s/it]

Error extracting text from http://gardow.com/davebradlee/redistricting: 404 Client Error: Not Found for url: https://www.gardow.com/davebradlee/redistricting


Processing URLs:  12%|█▏        | 121/1000 [04:10<17:46,  1.21s/it]  

Error extracting text from http://english.aawsat.com/theaawsat/business/uae-china-agree-complete-free-trade-gulf-states: 403 Client Error: Forbidden for url: http://english.aawsat.com/theaawsat/business/uae-china-agree-complete-free-trade-gulf-states
Error extracting text from https://www.wsj.com/articles/kim-jong-un-shown-ordering-more-warheads-after-tillersons-praise-for-restraint-1503488094: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/kim-jong-un-shown-ordering-more-warheads-after-tillersons-praise-for-restraint-1503488094


Processing URLs:  12%|█▏        | 124/1000 [04:17<26:11,  1.79s/it]

Error extracting text from http://www.oxitec.com/fda-preliminary-finding-no-significant-impact-oxitecs-self-limiting-mosquito/: 404 Client Error: Not Found for url: https://www.oxitec.com/fda-preliminary-finding-no-significant-impact-oxitecs-self-limiting-mosquito/


Processing URLs:  13%|█▎        | 128/1000 [04:22<17:39,  1.21s/it]

Error extracting text from https://www.nytimes.com/live/2021/07/10/world/jovenel-moise-assassinated/the-assassination-of-moise-stirs-a-battle-for-control-in-public-and-behind-the-scenes: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/07/10/world/jovenel-moise-assassinated/the-assassination-of-moise-stirs-a-battle-for-control-in-public-and-behind-the-scenes


Processing URLs:  13%|█▎        | 131/1000 [04:24<11:58,  1.21it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0UG0ML20160103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0UG0ML20160103
Error extracting text from http://www.reuters.com/article/us-thailand-king-constitution-idUSKBN1780VB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-king-constitution-idUSKBN1780VB


Processing URLs:  14%|█▎        | 135/1000 [04:36<28:06,  1.95s/it]

Error extracting text from https://jonathancohn.medium.com/why-do-these-64-democrats-not-support-a-15-minimum-wage-d96a5527f41b: 403 Client Error: Forbidden for url: https://jonathancohn.medium.com/why-do-these-64-democrats-not-support-a-15-minimum-wage-d96a5527f41b
Error extracting text from http://www.reuters.com/article/us-global-forex-idUSKBN0U034920151218: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-forex-idUSKBN0U034920151218


Processing URLs:  14%|█▎        | 137/1000 [04:38<20:43,  1.44s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/02/23/736474/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/02/23/736474/story.html


Processing URLs:  14%|█▍        | 139/1000 [04:40<16:37,  1.16s/it]

Error extracting text from https://www.nytimes.com/2017/11/20/business/chrysler-pacifica.html?hpw&amp;rref=automobiles&amp;action=click&amp;pgtype=Homepage&amp;module=well-region&amp;region=bottom-well&amp;WT.nav=bottom-well: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/20/business/chrysler-pacifica.html?hpw&amp;rref=automobiles&amp;action=click&amp;pgtype=Homepage&amp;module=well-region&amp;region=bottom-well&amp;WT.nav=bottom-well


Processing URLs:  14%|█▍        | 141/1000 [04:42<18:38,  1.30s/it]

URL filtered: https://mobile.twitter.com/BoeingSpace/status/1383399212163878915


Processing URLs:  14%|█▍        | 145/1000 [04:48<18:31,  1.30s/it]

Error extracting text from https://www.spaceweatherlive.com/en/solar-activity/solar-flares: 403 Client Error: Forbidden for url: https://www.spaceweatherlive.com/en/solar-activity/solar-flares


Processing URLs:  15%|█▌        | 151/1000 [04:54<14:19,  1.01s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-02/15/c_135099371.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/15/c_135099371.htm


Processing URLs:  16%|█▌        | 155/1000 [04:57<10:48,  1.30it/s]

Error extracting text from http://thehill.com/homenews/administration/348467-gop-rep-rendezvous-being-set-up-with-trump-to-relay-info-from: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/348467-gop-rep-rendezvous-being-set-up-with-trump-to-relay-info-from/


Processing URLs:  16%|█▌        | 161/1000 [05:19<1:19:08,  5.66s/it]

Error extracting text from http://www.investopedia.com/university/charts/: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/university/charts/


Processing URLs:  16%|█▌        | 162/1000 [05:21<1:00:03,  4.30s/it]

Error extracting text from http://www.nytimes.com/2016/06/06/world/asia/afghanistan-lajwardeen-mining-lapis-lazuli.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/06/world/asia/afghanistan-lajwardeen-mining-lapis-lazuli.html


Processing URLs:  16%|█▋        | 164/1000 [05:22<33:05,  2.37s/it]  

Error extracting text from http://thehill.com/blogs/pundits-blog/presidential-campaign/306793-the-exit-polls-tell-us-one-sure-thing-voters-wanted: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/presidential-campaign/306793-the-exit-polls-tell-us-one-sure-thing-voters-wanted/


Processing URLs:  17%|█▋        | 168/1000 [05:24<12:55,  1.07it/s]

URL filtered: https://www.inc.com/bill-murphy-jr/facebook-just-announced-its-banning-7-of-its-most-dangerous-accounts-heres-full-list-why-they-did-it.html
Error extracting text from https://www.reuters.com/article/us-health-coronavirus-usa/romney-floats-sweeping-vaccine-plan-as-u-s-nears-20-million-covid-19-cases-idUSKBN2962L3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-usa/romney-floats-sweeping-vaccine-plan-as-u-s-nears-20-million-covid-19-cases-idUSKBN2962L3


Processing URLs:  17%|█▋        | 169/1000 [06:25<3:41:03, 15.96s/it]

Error extracting text from https://www.aa.com.tr/en/middle-east/new-israeli-government-approves-construction-in-jewish-settlements/2283678: HTTPSConnectionPool(host='www.aa.com.tr', port=443): Read timed out. (read timeout=60)


Processing URLs:  17%|█▋        | 170/1000 [06:26<2:46:17, 12.02s/it]

Error extracting text from http://www.financialexpress.com/india-news/closer-us-india-ties-reason-for-pakistans-burgeoning-ties-with-russia-nawaz-sharifs-special-envoy-on-jk/411215/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/india-news/closer-us-india-ties-reason-for-pakistans-burgeoning-ties-with-russia-nawaz-sharifs-special-envoy-on-jk/411215/


Processing URLs:  17%|█▋        | 173/1000 [06:27<1:14:05,  5.38s/it]

Error extracting text from http://www.google.com/trends/explore#q=brexit: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#q=brexit
Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN1A6248: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN1A6248


Processing URLs:  18%|█▊        | 175/1000 [06:31<52:40,  3.83s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-sanctions-idUSKBN19B2Y7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-sanctions-idUSKBN19B2Y7


Processing URLs:  18%|█▊        | 178/1000 [06:43<56:16,  4.11s/it]

URL filtered: https://www.youtube.com/watch?v=dmR0BiGvXUw


Processing URLs:  18%|█▊        | 180/1000 [06:44<33:36,  2.46s/it]

Error extracting text from http://www.un.org/depts/los/convention_agreements/texts/unclos/part15.htm: 403 Client Error: Forbidden for url: https://www.un.org/depts/los/convention_agreements/texts/unclos/part15.htm


Processing URLs:  18%|█▊        | 185/1000 [06:48<13:29,  1.01it/s]

Error extracting text from http://www.smh.com.au/business/markets/goldman-sachs-models-market-reaction-to-trumps-economic-and-trade-policies-20170214-gud50o.html: 404 Client Error: Not Found for url: https://www.smh.com.au/business/markets/goldman-sachs-models-market-reaction-to-trumps-economic-and-trade-policies-20170214-gud50o.html
Error extracting text from http://www.nytimes.com/aponline/2016/12/09/world/middleeast/ap-un-united-nations-syria.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/12/09/world/middleeast/ap-un-united-nations-syria.html?_r=0


Processing URLs:  19%|█▉        | 189/1000 [06:53<15:23,  1.14s/it]

Error extracting text from http://en.tuidang.org/news/communist-regime/2016/06/is-chinas-propaganda-chief-headed-for-a-fall.html: 403 Client Error: Forbidden for url: https://global.tuidang.org/news/communist-regime/2016/06/is-chinas-propaganda-chief-headed-for-a-fall.html


Processing URLs:  19%|█▉        | 193/1000 [07:09<42:53,  3.19s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S1018363914000221: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S1018363914000221


Processing URLs:  20%|█▉        | 195/1000 [07:11<26:45,  1.99s/it]

Error extracting text from https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(20)30389-8/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(20)30389-8/fulltext


Processing URLs:  20%|█▉        | 199/1000 [07:17<20:38,  1.55s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0TT1G420151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0TT1G420151210


Processing URLs:  20%|██        | 202/1000 [07:25<29:08,  2.19s/it]

Error extracting text from http://www.myplaniq.com/articles/20160320-brazil-needs-a-stronger-currency-like-it-needs-a-hole-in-the-head-goldman-warns/: 404 Client Error: Not Found for url: https://www.myplaniq.com/invest/20160320-brazil-needs-a-stronger-currency-like-it-needs-a-hole-in-the-head-goldman-warns/


Processing URLs:  20%|██        | 204/1000 [07:29<29:25,  2.22s/it]

Error extracting text from http://38north.org/?s=submarine: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  21%|██        | 206/1000 [07:29<16:02,  1.21s/it]

Error extracting text from http://www.washingtontimes.com/news/2016/jan/20/us-airstrike-on-mosul-banks-kill-iraqi-civilians/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/20/us-airstrike-on-mosul-banks-kill-iraqi-civilians/


Processing URLs:  21%|██        | 208/1000 [07:30<10:15,  1.29it/s]

Error extracting text from http://globalriskinsights.com/2015/12/the-nuclear-implications-of-turkey-russia-tensions/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2015/12/the-nuclear-implications-of-turkey-russia-tensions/
Error extracting text from http://www.reuters.com/article/us-russia-forum-brics-bank-idUSKCN0Z61FF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-forum-brics-bank-idUSKCN0Z61FF


Processing URLs:  21%|██        | 209/1000 [07:31<10:30,  1.25it/s]

Error extracting text from https://finance.yahoo.com/news/merkel-german-governors-mull-way-122438235.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/merkel-german-governors-mull-way-122438235.html
URL filtered: http://www.bloomberg.com/news/articles/2015-11-01/imf-pushes-europe-for-formal-restructuring-accord-on-greek-debt


Processing URLs:  21%|██        | 211/1000 [07:33<10:58,  1.20it/s]

Error extracting text from http://en.trend.az/iran/nuclearp/2435758.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2435758.html


Processing URLs:  22%|██▏       | 215/1000 [07:35<08:38,  1.51it/s]

Error extracting text from http://www.hackmageddon.com/2016/01/11/2015-cyber-attacks-statistics/: 403 Client Error: Forbidden for url: http://www.hackmageddon.com/2016/01/11/2015-cyber-attacks-statistics/


Processing URLs:  22%|██▏       | 216/1000 [07:35<07:05,  1.84it/s]

Error extracting text from https://www.nytimes.com/2021/08/17/world/asia/taliban-afghanistan-al-qaeda.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/17/world/asia/taliban-afghanistan-al-qaeda.html


Processing URLs:  22%|██▏       | 218/1000 [07:39<19:04,  1.46s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-04-04/scottish-nationalists-seen-winning-supermajority-poll-shows


Processing URLs:  22%|██▏       | 220/1000 [07:40<10:59,  1.18it/s]

Error extracting text from http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html
URL filtered: https://twitter.com/BattlesStudies/status/965936322715308033?ref_src=twcamp%5Ecopy%7Ctwsrc%5Eandroid%7Ctwgr%5Ecopy%7Ctwcon%5E7090%7Ctwterm%5E3


Processing URLs:  23%|██▎       | 226/1000 [07:45<10:00,  1.29it/s]

Error extracting text from http://www.wsj.com/articles/north-korea-says-its-in-final-phase-of-developing-new-satellite-1442273800: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-says-its-in-final-phase-of-developing-new-satellite-1442273800


Processing URLs:  23%|██▎       | 233/1000 [07:55<17:36,  1.38s/it]

Error extracting text from http://www.gallup.com/opinion/polling-matters/191264/cruz-image-plummets-trump-improves-among-republicans.aspx: 404 Client Error: Not Found for url: https://www.gallup.com/opinion/polling-matters/191264/cruz-image-plummets-trump-improves-among-republicans.aspx


Processing URLs:  23%|██▎       | 234/1000 [07:56<14:37,  1.15s/it]

Error extracting text from https://www.jns.org/hamas-calls-on-jerusalem-arabs-to-protect-al-aqsa-from-flag-march/: 403 Client Error: Forbidden for url: https://www.jns.org/hamas-calls-on-jerusalem-arabs-to-protect-al-aqsa-from-flag-march/


Processing URLs:  24%|██▎       | 235/1000 [07:56<11:29,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/11/29/business/gm-driverless-cars.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/29/business/gm-driverless-cars.html


Processing URLs:  24%|██▎       | 237/1000 [07:57<09:45,  1.30it/s]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-afghanistan-marines-helmand-2017-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-afghanistan-marines-helmand-2017-htmlstory.html


Processing URLs:  24%|██▍       | 240/1000 [08:04<23:22,  1.84s/it]

URL filtered: https://www.facebook.com/Assemblyman-Mike-Gatto-248355458520271/


Processing URLs:  24%|██▍       | 244/1000 [08:07<13:47,  1.09s/it]

Error extracting text from http://www.reuters.com/article/2015/08/13/us-russia-crisis-food-idUSKCN0QI18R20150813: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/08/13/us-russia-crisis-food-idUSKCN0QI18R20150813
URL filtered: http://www.bloomberg.com/news/articles/2015-10-01/venezuelan-bonds-rally-as-government-said-to-own-25-of-debt
Error extracting text from http://agilisanalysis.com/home/?p=179: HTTPConnectionPool(host='agilisanalysis.com', port=80): Max retries exceeded with url: /home/?p=179 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff2e4da0>: Failed to resolve 'agilisanalysis.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  25%|██▍       | 248/1000 [08:08<07:07,  1.76it/s]

Error extracting text from http://in.reuters.com/article/turkey-referendum-erdogan-europe-idINKBN16S12G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from http://www.nytimes.com/2015/09/29/technology/personaltech/apple-iphone-6s-breaks-first-weekend-sales-record.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/29/technology/personaltech/apple-iphone-6s-breaks-first-weekend-sales-record.html?_r=0


Processing URLs:  25%|██▌       | 251/1000 [08:12<11:57,  1.04it/s]

Error extracting text from http://abcnews.go.com/Politics/wireStory/iran-sanctions-renewal-law-obama-signature-44204828: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/iran-sanctions-renewal-law-obama-signature-44204828


Processing URLs:  26%|██▌       | 258/1000 [08:32<23:49,  1.93s/it]

Error extracting text from http://www.reuters.com/article/us-britain-europe-reactions-finland-idUSKCN0ZA0PG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-europe-reactions-finland-idUSKCN0ZA0PG


Processing URLs:  26%|██▌       | 259/1000 [08:32<17:31,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-business/transition-to-be-agreed-with-brexit-trade-deal-uks-pm-may-idUSKBN1CS0TK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-business/transition-to-be-agreed-with-brexit-trade-deal-uks-pm-may-idUSKBN1CS0TK


Processing URLs:  26%|██▌       | 262/1000 [08:35<12:16,  1.00it/s]

Error extracting text from http://newsinfo.inquirer.net/825665/duterte-wont-compromise-ph-stand-on-west-philippine-sea: 403 Client Error: Forbidden for url: https://newsinfo.inquirer.net/825665/duterte-wont-compromise-ph-stand-on-west-philippine-sea


Processing URLs:  26%|██▋       | 263/1000 [08:35<10:54,  1.13it/s]

Error extracting text from https://www.nytimes.com/2017/10/21/opinion/sunday/north-korea-cyberthreat.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/21/opinion/sunday/north-korea-cyberthreat.html


Processing URLs:  26%|██▋       | 265/1000 [08:37<10:12,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-china-iran-oil-idUSKBN14P15W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-iran-oil-idUSKBN14P15W


Processing URLs:  27%|██▋       | 267/1000 [08:37<07:25,  1.64it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-20/the-bank-of-japan-s-2-5-billion-plan-to-buy-non-existent-etfs


Processing URLs:  27%|██▋       | 273/1000 [08:45<15:55,  1.31s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-caucus: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-caucus


Processing URLs:  28%|██▊       | 275/1000 [08:46<12:43,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-serbia-germany-idUSKBN17E2B0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-serbia-germany-idUSKBN17E2B0


Processing URLs:  28%|██▊       | 276/1000 [08:47<10:40,  1.13it/s]

Error extracting text from http://www.conservativewoman.co.uk/david-keighley-another-referendum-stitch-up-pro-european-emmott-to-rule-on-broadcast-bias/: 403 Client Error: Forbidden for url: http://www.conservativewoman.co.uk/david-keighley-another-referendum-stitch-up-pro-european-emmott-to-rule-on-broadcast-bias/


Processing URLs:  28%|██▊       | 278/1000 [08:48<09:51,  1.22it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57353#.WZMhMjMfmu4: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57353#.WZMhMjMfmu4


Processing URLs:  28%|██▊       | 285/1000 [08:58<17:44,  1.49s/it]

Error extracting text from http://www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Haug_Taleb_2011.pdf: 404 Client Error: Not Found for url: https://www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Haug_Taleb_2011.pdf


Processing URLs:  29%|██▊       | 287/1000 [09:01<18:48,  1.58s/it]

Error extracting text from http://www.reuters.com/article/2015/10/16/us-usa-election-funding-idUSKCN0SA2RP20151016: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/16/us-usa-election-funding-idUSKCN0SA2RP20151016


Processing URLs:  29%|██▉       | 289/1000 [09:02<11:30,  1.03it/s]

Error extracting text from http://theconversation.com/why-russia-thinks-its-exceptional-85240: 403 Client Error: Forbidden for url: http://theconversation.com/why-russia-thinks-its-exceptional-85240


Processing URLs:  29%|██▉       | 292/1000 [09:03<08:18,  1.42it/s]

Error extracting text from http://thehill.com/homenews/administration/277634-obama-us-prepping-shield-to-counter-north-korea: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/277634-obama-us-prepping-shield-to-counter-north-korea/


Processing URLs:  30%|███       | 301/1000 [09:20<17:46,  1.53s/it]

Error extracting text from https://www.espn.com/olympics/story/_/id/32004162/calls-boycott-china-olympics-amid-uyghur-genocide-fall-flat-washington: 403 Client Error: Forbidden for url: https://www.espn.com/olympics/story/_/id/32004162/calls-boycott-china-olympics-amid-uyghur-genocide-fall-flat-washington


Processing URLs:  30%|███       | 304/1000 [09:23<14:36,  1.26s/it]

Error extracting text from https://www.csoonline.com/article/3237324/cyber-attacks-espionage/what-is-a-cyber-attack-recent-examples-show-disturbing-trends.html: 404 Client Error: Not Found for url: https://www.csoonline.com/article/3237324/cyber-attacks-espionage/what-is-a-cyber-attack-recent-examples-show-disturbing-trends.html


Processing URLs:  31%|███       | 310/1000 [09:37<16:48,  1.46s/it]

Error extracting text from http://www.wsj.com/video/one-china-two-leaders-two-messages/CF31B0D8-ABCC-4A8A-AD7D-32905484EC6D.html: 403 Client Error: Forbidden for url: https://www.wsj.com/video/one-china-two-leaders-two-messages/CF31B0D8-ABCC-4A8A-AD7D-32905484EC6D.html


Processing URLs:  31%|███▏      | 313/1000 [09:41<13:29,  1.18s/it]

Error extracting text from http://www.nytimes.com/2015/10/10/business/economy/export-import-bank-will-come-to-new-house-vote.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/10/business/economy/export-import-bank-will-come-to-new-house-vote.html
URL filtered: https://twitter.com/NASAWebb/status/1474424081286062085


Processing URLs:  32%|███▏      | 316/1000 [09:43<10:40,  1.07it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VD07T?feedType=RSS&amp;feedName=businessNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VD07T?feedType=RSS&amp;feedName=businessNews


Processing URLs:  32%|███▏      | 317/1000 [10:43<3:07:02, 16.43s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-08-22/no-us-russia-cyber-unit-without-trump-notifying-congress-bill-says: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  32%|███▏      | 320/1000 [10:47<1:18:32,  6.93s/it]

Error extracting text from https://www.caracaschronicles.com/2016/11/21/schrodingers-coupon-payment/: 403 Client Error: Forbidden for url: https://www.caracaschronicles.com/2016/11/21/schrodingers-coupon-payment/


Processing URLs:  32%|███▏      | 321/1000 [10:47<56:30,  4.99s/it]  

Error extracting text from http://streetwiseprofessor.com/?p=10826: 403 Client Error: Forbidden for url: https://streetwiseprofessor.com/?p=10826


Processing URLs:  33%|███▎      | 326/1000 [10:57<24:22,  2.17s/it]

Error extracting text from http://www.gov.me/en/News/152390/Parliament-of-Montenegro-adopts-Government-proposed-Resolution-on-joining-NATO.html: 404 Client Error: not found for url: https://www.gov.me/en/News/152390/Parliament-of-Montenegro-adopts-Government-proposed-Resolution-on-joining-NATO.html


Processing URLs:  33%|███▎      | 331/1000 [11:06<14:04,  1.26s/it]

Error extracting text from https://journals.sagepub.com/doi/abs/10.1177/0956797619897915: 403 Client Error: Forbidden for url: https://journals.sagepub.com/doi/abs/10.1177/0956797619897915


Processing URLs:  33%|███▎      | 333/1000 [11:08<14:46,  1.33s/it]

Error extracting text from https://www.evms.edu/education/centers_institutes_departments/internal_medicine/faculty_staff/pulmonary__critical_care_faculty/name_11909_en.html: 404 Client Error: Not Found for url: https://www.evms.edu/education/centers_institutes_departments/internal_medicine/faculty_staff/pulmonary__critical_care_faculty/name_11909_en.html


Processing URLs:  33%|███▎      | 334/1000 [11:10<17:01,  1.53s/it]

Error extracting text from http://www.joc.com/port-news/panama-canal-news/panama-locks-opening-now-expected-second-half_20160118.html: 404 Client Error: Not Found for url: https://www.joc.com/article/panama-locks-opening-now-expected-second-half_20160118.html


Processing URLs:  34%|███▍      | 338/1000 [11:16<17:30,  1.59s/it]

Error extracting text from http://thehill.com/policy/energy-environment/322149-trump-team-split-on-climate-pact-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/322149-trump-team-split-on-climate-pact-report/
URL filtered: https://twitter.com/florian_krammer/status/1397880781675053057


Processing URLs:  34%|███▍      | 341/1000 [11:19<14:16,  1.30s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-01/destructive-hacks-strike-saudi-arabia-posing-challenge-to-trump


Processing URLs:  35%|███▍      | 346/1000 [11:25<14:29,  1.33s/it]

Error extracting text from http://www.foxnews.com/politics/2015/10/13/white-house-iran-missile-test-likely-un-violation/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/10/13/white-house-iran-missile-test-likely-un-violation/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhape.com.br/blogdafolha/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhape.com.br/blogdafolha/&amp;prev=search


Processing URLs:  35%|███▌      | 351/1000 [11:37<19:10,  1.77s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15A10O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15A10O?il=0


Processing URLs:  35%|███▌      | 352/1000 [11:38<17:43,  1.64s/it]

Error extracting text from https://reut.rs/3b4u4VX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-draghi/draghi-forms-new-italian-government-names-politicians-technocrats-as-ministers-idUSKBN2AC2AG?il=0


Processing URLs:  36%|███▌      | 355/1000 [11:44<19:06,  1.78s/it]

Error extracting text from https://thehill.com/policy/finance/541886-senate-relief-bill-would-exempt-student-loan-forgiveness-from-taxes: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/541886-senate-relief-bill-would-exempt-student-loan-forgiveness-from-taxes/


Processing URLs:  36%|███▌      | 356/1000 [11:44<14:20,  1.34s/it]

Error extracting text from http://news.yahoo.com/why-donald-us-republicans-explain-trump-fever-174920630.html: 404 Client Error: Not Found for url: http://news.yahoo.com/why-donald-us-republicans-explain-trump-fever-174920630.html


Processing URLs:  36%|███▌      | 360/1000 [11:51<17:00,  1.59s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/brazil-lawmakers-relaunch-dilma-rousseff-impeachment-proceedings/articleshow/51449690.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/brazil-lawmakers-relaunch-dilma-rousseff-impeachment-proceedings/articleshow/51449690.cms


Processing URLs:  37%|███▋      | 366/1000 [12:01<16:06,  1.53s/it]

Error extracting text from http://ec.europa.eu/budget/mff/figures/index_en.cfm: 404 Client Error: (Not Found) for url: https://ec.europa.eu/not_found


Processing URLs:  37%|███▋      | 369/1000 [12:06<17:23,  1.65s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/suu-kyis-party-stays-mum-on-choice-for-president: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  37%|███▋      | 373/1000 [12:11<14:05,  1.35s/it]

Error extracting text from https://www.amazon.com/gp/search?index=books&amp;linkCode=qs&amp;keywords=9780801444579: 503 Server Error: Service Unavailable for url: https://www.amazon.com/gp/search?index=books&amp;linkCode=qs&amp;keywords=9780801444579


Processing URLs:  38%|███▊      | 379/1000 [12:29<37:13,  3.60s/it]

Error extracting text from http://www.news-sentinel.com/news/us-and-world/ap-interview-yemen-factions-said-to-have-pledged-easing-aid_20170727&amp;profile=-1: 404 Client Error: Not Found for url: https://www.fortwayne.com/news/us-and-world/ap-interview-yemen-factions-said-to-have-pledged-easing-aid_20170727&amp;profile=-1


Processing URLs:  38%|███▊      | 384/1000 [12:37<14:24,  1.40s/it]

Error extracting text from http://www.reuters.com/article/us-india-pakistan-water-idUSKBN16N03E?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-pakistan-water-idUSKBN16N03E?il=0


Processing URLs:  39%|███▉      | 388/1000 [12:43<12:30,  1.23s/it]

URL filtered: https://www.youtube.com/watch?v=o1ZAVkwNmN8


Processing URLs:  39%|███▉      | 391/1000 [12:45<09:35,  1.06it/s]

Error extracting text from http://www.eurasianet.org/node/66189: 403 Client Error: Forbidden for url: http://www.eurasianet.org/node/66189


Processing URLs:  39%|███▉      | 392/1000 [12:45<07:43,  1.31it/s]

Error extracting text from https://www.nytimes.com/2017/03/01/us/politics/obama-trump-russia-election-hacking.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/01/us/politics/obama-trump-russia-election-hacking.html?_r=0


Processing URLs:  40%|███▉      | 396/1000 [12:50<09:15,  1.09it/s]

Error extracting text from https://onlinelibrary.wiley.com/doi/full/10.1002/smll.202002169: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/full/10.1002/smll.202002169


Processing URLs:  40%|███▉      | 398/1000 [13:54<3:10:54, 19.03s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/53039: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)
URL filtered: http://www.bloomberg.com/news/articles/2015-10-07/rousseff-s-job-at-risk-as-brazil-court-accepts-case-on-campaign


Processing URLs:  40%|████      | 403/1000 [13:56<48:13,  4.85s/it]  

URL filtered: https://twitter.com/germanyintheeu/status/1497629293937049606
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-qayyara-idUSKCN0ZP0FN?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-qayyara-idUSKCN0ZP0FN?il=0


Processing URLs:  40%|████      | 404/1000 [13:56<37:26,  3.77s/it]

Error extracting text from http://postimg.org/image/3ko5dhg3z/full/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/3ko5dhg3z/full/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffd5fc80>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 405/1000 [13:58<32:30,  3.28s/it]

URL filtered: https://tass.com/politics/1374073?utm_source=twitter.com&amp;utm_medium=social&amp;utm_campaign=smm_social_share


Processing URLs:  41%|████      | 408/1000 [14:01<22:05,  2.24s/it]

Error extracting text from http://www.anl.gov/articles/finding-functionals-fission: 404 Client Error: Not Found for url: https://www.anl.gov/articles/finding-functionals-fission


Processing URLs:  41%|████      | 411/1000 [14:05<14:55,  1.52s/it]

Error extracting text from https://www.nytimes.com/2017/03/19/world/europe/germany-martin-schulz-angela-merkel.html?emc=edit_mbe_20170320&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/19/world/europe/germany-martin-schulz-angela-merkel.html?emc=edit_mbe_20170320&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1


Processing URLs:  41%|████      | 412/1000 [14:07<17:23,  1.78s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-usa-drills-china-idUSKBN16A10X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-usa-drills-china-idUSKBN16A10X


Processing URLs:  41%|████▏     | 414/1000 [14:09<13:04,  1.34s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-12-18/house-passes-u-s-spending-bill-that-ends-crude-oil-export-ban


Processing URLs:  42%|████▏     | 420/1000 [14:24<23:06,  2.39s/it]

Error extracting text from https://bluebook.unmeetings.org/ : 403 Client Error: Forbidden for url: https://bluebook.unmeetings.org/%20


Processing URLs:  42%|████▏     | 423/1000 [14:27<13:44,  1.43s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-iraq-idUSKCN0VA36N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-iraq-idUSKCN0VA36N


Processing URLs:  43%|████▎     | 428/1000 [14:31<07:07,  1.34it/s]

Error extracting text from http://www.roadandtrack.com/new-cars/car-technology/news/a31072/google-autonomous-cars-involved-in-wreck-that-hospitalized-driver/: 403 Client Error: Forbidden for url: http://www.roadandtrack.com/new-cars/car-technology/news/a31072/google-autonomous-cars-involved-in-wreck-that-hospitalized-driver/
Error extracting text from http://www.reuters.com/article/us-usa-fiscal-idUSKBN0TT28J20151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-idUSKBN0TT28J20151210


Processing URLs:  43%|████▎     | 431/1000 [14:31<03:33,  2.66it/s]

Error extracting text from http://www.france24.com/en/20161203-fatah-picks-party-officials-amid-talk-abbas-succession?ref=tw_i: 403 Client Error: Forbidden for url: http://www.france24.com/en/20161203-fatah-picks-party-officials-amid-talk-abbas-succession?ref=tw_i
Error extracting text from https://www.yahoo.com/news/asean-split-deal-china-south-china-sea-row-044005997.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/asean-split-deal-china-south-china-sea-row-044005997.html


Processing URLs:  44%|████▎     | 437/1000 [14:41<12:36,  1.34s/it]

Error extracting text from http://thehill.com/homenews/campaign/344298-trump-job-approval-swings-lower: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/344298-trump-job-approval-swings-lower/


Processing URLs:  44%|████▍     | 439/1000 [14:43<10:17,  1.10s/it]

Error extracting text from https://www.nytimes.com/2021/06/20/world/europe/coronavirus-lab-anthrax.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/20/world/europe/coronavirus-lab-anthrax.html


Processing URLs:  44%|████▍     | 440/1000 [14:44<10:39,  1.14s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/27/gitrep-27mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/27/gitrep-27mar16pm/


Processing URLs:  45%|████▌     | 450/1000 [15:08<21:21,  2.33s/it]

Error extracting text from https://www.teslamotors.com/blog/your-autopilot-has-arrived: 403 Client Error: Forbidden for url: https://www.teslamotors.com/blog/your-autopilot-has-arrived


Processing URLs:  45%|████▌     | 451/1000 [15:10<19:51,  2.17s/it]

Error extracting text from http://thehill.com/homenews/special/362041-alabama-gop-senator-i-voted-for-a-write-in-instead-of-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/special/362041-alabama-gop-senator-i-voted-for-a-write-in-instead-of-moore/


Processing URLs:  45%|████▌     | 452/1000 [15:10<14:39,  1.61s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu-peston/brexit-trade-deal-between-eu-and-uk-possible-on-wednesday-itvs-peston-idUSKBN28X02R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-peston/brexit-trade-deal-between-eu-and-uk-possible-on-wednesday-itvs-peston-idUSKBN28X02R


Processing URLs:  45%|████▌     | 454/1000 [15:15<20:06,  2.21s/it]

Error extracting text from http://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/fahrzeugzulassungen_node.html: 404 Client Error: Not Found for url: https://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/fahrzeugzulassungen_node.html


Processing URLs:  46%|████▌     | 455/1000 [15:17<19:11,  2.11s/it]

Error extracting text from https://electionarium.com/2021-canadian-election-results-and-predictions/: HTTPSConnectionPool(host='electionarium.com', port=443): Max retries exceeded with url: /2021-canadian-election-results-and-predictions/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'electionarium.com'. (_ssl.c:1000)")))


Processing URLs:  46%|████▌     | 456/1000 [15:20<21:27,  2.37s/it]

Error extracting text from http://www.alamy.com/stock-photo-female-north-korean-soldiers-patrol-along-the-banks-of-yalu-river-112446002.html: 404 Client Error: Not Found for url: https://www.alamy.com:443/stock-photo-female-north-korean-soldiers-patrol-along-the-banks-of-yalu-river-112446002.html


Processing URLs:  46%|████▌     | 457/1000 [15:28<38:19,  4.24s/it]

Error extracting text from https://news.abs-cbn.com/news/03/29/21/australia-concerned-over-destabilizing-actions-in-south-china-sea-envoy: 403 Client Error: Forbidden for url: https://news.abs-cbn.com/news/03/29/21/australia-concerned-over-destabilizing-actions-in-south-china-sea-envoy


Processing URLs:  46%|████▌     | 461/1000 [15:39<29:20,  3.27s/it]

URL filtered: https://www.youtube.com/watch?v=GcORpieKvIs


Processing URLs:  46%|████▋     | 465/1000 [15:46<19:57,  2.24s/it]

Error extracting text from http://www.eluniversal.com/noticias/daily-news/guerra-venezuela-lost-over-200000-bpd-oil-production-2017_658331: 404 Client Error: Not Found for url: https://www.eluniversal.com/noticias/daily-news/guerra-venezuela-lost-over-200000-bpd-oil-production-2017_658331


Processing URLs:  47%|████▋     | 469/1000 [15:55<21:32,  2.43s/it]

Error extracting text from http://www.kitco.com/charts/popup/au0030lnb.html: 404 Client Error: Not Found for url: https://frontend.prod.kitco.com/charts/popup/au0030lnb.html


Processing URLs:  47%|████▋     | 472/1000 [16:58<2:48:15, 19.12s/it]

Error extracting text from https://www.iaea.org/press/?p=5189: HTTPSConnectionPool(host='www.iaea.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  47%|████▋     | 473/1000 [16:59<2:02:45, 13.98s/it]

Error extracting text from https://www.regioactive.de/party/endlich-offen-berlin-humboldt-forum-2021-07-20-Zgm1WkDJtw: 404 Client Error:  for url: https://www.regioactive.de/party/endlich-offen-berlin-humboldt-forum-2021-07-20-Zgm1WkDJtw


Processing URLs:  48%|████▊     | 479/1000 [18:06<2:58:00, 20.50s/it]

Error extracting text from https://archive.is/xeMIg: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  48%|████▊     | 484/1000 [18:26<56:03,  6.52s/it]  

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKCN0WC2I0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKCN0WC2I0


Processing URLs:  49%|████▉     | 488/1000 [18:30<18:45,  2.20s/it]

Error extracting text from https://www.nytimes.com/2020/08/03/opinion/trump-biden-presidential-debates-2020.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/08/03/opinion/trump-biden-presidential-debates-2020.html


Processing URLs:  49%|████▉     | 492/1000 [18:34<10:16,  1.21s/it]

Error extracting text from https://seekingalpha.com/article/4432757-time-to-bet-on-crude-buy-clu1: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4432757-time-to-bet-on-crude-buy-clu1


Processing URLs:  49%|████▉     | 493/1000 [18:35<09:19,  1.10s/it]

Error extracting text from http://www.bizpacreview.com/2016/09/04/allah-obliterates-us-navy-fleet-iran-spits-face-new-music-video-386764: 403 Client Error: Forbidden for url: https://www.bizpacreview.com/2016/09/04/allah-obliterates-us-navy-fleet-iran-spits-face-new-music-video-386764


Processing URLs:  50%|████▉     | 496/1000 [18:40<12:08,  1.45s/it]

Error extracting text from https://frbatlanta.org/cqer/research/gdpnow/?panel=1: HTTPSConnectionPool(host='frbatlanta.org', port=443): Max retries exceeded with url: /cqer/research/gdpnow/?panel=1 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  50%|████▉     | 499/1000 [18:44<11:30,  1.38s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_131378.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_131378.htm?selectedLocale=en


Processing URLs:  50%|█████     | 505/1000 [18:55<12:48,  1.55s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/schumer-dam-cyberattack-shot-bow-iran-37583193: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/schumer-dam-cyberattack-shot-bow-iran-37583193


Processing URLs:  51%|█████     | 506/1000 [18:55<10:35,  1.29s/it]

Error extracting text from http://wtocenter.vn/news/india-changes-tack-rcep-negotiations: 403 Client Error: Forbidden for url: https://wtocenter.vn/news/india-changes-tack-rcep-negotiations


Processing URLs:  51%|█████     | 507/1000 [18:56<08:07,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14Q1T8?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14Q1T8?il=0


Processing URLs:  51%|█████     | 510/1000 [18:59<08:16,  1.01s/it]

Error extracting text from https://www.reuters.com/article/china-france-cgtn/chinas-state-broadcaster-applies-to-france-for-right-to-air-in-europe-ft-idUSFWN2KP1N7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/china-france-cgtn/chinas-state-broadcaster-applies-to-france-for-right-to-air-in-europe-ft-idUSFWN2KP1N7


Processing URLs:  51%|█████     | 512/1000 [19:08<20:27,  2.52s/it]

URL filtered: https://twitter.com/zerohedge/status/955874438725152769


Processing URLs:  52%|█████▏    | 521/1000 [19:24<11:16,  1.41s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-insight-idUSKCN0X50O0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-insight-idUSKCN0X50O0


Processing URLs:  52%|█████▎    | 525/1000 [19:33<14:10,  1.79s/it]

Error extracting text from http://www.nytimes.com/2016/04/14/health/zika-virus-causes-birth-defects-cdc.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/14/health/zika-virus-causes-birth-defects-cdc.html


Processing URLs:  53%|█████▎    | 526/1000 [19:36<16:27,  2.08s/it]

Error extracting text from http://38north.org/2015/08/sohae080415/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  53%|█████▎    | 530/1000 [19:55<31:54,  4.07s/it]

Error extracting text from http://www.parl.gc.ca/legisinfo/LAAG.aspx?Language=E&amp;Mode=1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
URL filtered: https://www.bloomberg.com/news/features/2021-03-01/will-amazon-workers-unionize-it-s-a-tough-sell-at-bessemer-alabama-plant
URL filtered: https://www.youtube.com/watch?v=DtVBCG6ThDk


Processing URLs:  53%|█████▎    | 534/1000 [19:57<13:09,  1.69s/it]

Error extracting text from https://csis-prod.s3.amazonaws.com/s3fs-public/legacy_files/files/publication/twq05winterhaqqani.pdf: 403 Client Error: Forbidden for url: https://csis-prod.s3.amazonaws.com/s3fs-public/legacy_files/files/publication/twq05winterhaqqani.pdf


Processing URLs:  54%|█████▎    | 537/1000 [20:03<13:43,  1.78s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/latest-monitor-500-sites-hit-air-offensive-36706984: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/latest-monitor-500-sites-hit-air-offensive-36706984


Processing URLs:  54%|█████▍    | 539/1000 [20:10<20:51,  2.71s/it]

Error extracting text from https://www.faa.gov/uas/request_waiver/waivers_granted/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/request_waiver/waivers_granted/


Processing URLs:  54%|█████▍    | 541/1000 [20:12<14:05,  1.84s/it]

Error extracting text from https://www.niaid.nih.gov/research/vincent-j-munster-phd: 403 Client Error: Forbidden for url: https://www.niaid.nih.gov/research/vincent-j-munster-phd


Processing URLs:  54%|█████▍    | 544/1000 [20:15<09:41,  1.27s/it]

Error extracting text from https://www.reuters.com/article/us-usa-russia-nuclear/biden-seeks-five-year-extension-of-new-start-arms-treaty-with-russia-idUSKBN29Q2I4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-nuclear/biden-seeks-five-year-extension-of-new-start-arms-treaty-with-russia-idUSKBN29Q2I4


Processing URLs:  55%|█████▌    | 552/1000 [20:29<08:57,  1.20s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/300285-poll-gop-senator-takes-lead-in-key-senate-race: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/300285-poll-gop-senator-takes-lead-in-key-senate-race/


Processing URLs:  56%|█████▋    | 564/1000 [20:43<07:13,  1.01it/s]

Error extracting text from https://www.us-cert.gov/ncas/alerts/TA17-293A: 403 Client Error: Forbidden for url: https://www.us-cert.gov/ncas/alerts/TA17-293A
Error extracting text from http://www.reuters.com/article/us-tesla-investigation-idUSKCN0ZG2ZC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-investigation-idUSKCN0ZG2ZC


Processing URLs:  57%|█████▋    | 566/1000 [20:45<05:50,  1.24it/s]

Error extracting text from https://nationalpost.com/opinion/maxime-bernier-why-my-new-political-movement-because-canada-has-been-hijacked: 403 Client Error: Forbidden for url: https://nationalpost.com/opinion/maxime-bernier-why-my-new-political-movement-because-canada-has-been-hijacked


Processing URLs:  57%|█████▋    | 570/1000 [20:47<04:22,  1.64it/s]

Error extracting text from https://www.nytimes.com/2017/11/22/science/oumuamua-space-asteroid.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/22/science/oumuamua-space-asteroid.html?_r=0


Processing URLs:  57%|█████▋    | 572/1000 [20:48<04:40,  1.53it/s]

URL filtered: https://m.youtube.com/watch?v=pWDITp-Bngg


Processing URLs:  58%|█████▊    | 577/1000 [20:54<05:44,  1.23it/s]

Error extracting text from http://www.schengenvisainfo.com/schengen-visa-countries-list/: 403 Client Error: Forbidden for url: https://www.schengenvisainfo.com/schengen-visa-countries-list/


Processing URLs:  58%|█████▊    | 579/1000 [20:59<11:23,  1.62s/it]

URL filtered: https://www.youtube.com/watch?v=YUH9jD__qHY


Processing URLs:  58%|█████▊    | 581/1000 [21:00<07:18,  1.05s/it]

Error extracting text from https://www.thenewhumanitarian.org/analysis/2018/03/05/how-declare-famine-primer-south: 404 Client Error: Not Found for url: https://www.thenewhumanitarian.org/analysis/2018/03/05/how-declare-famine-primer-south


Processing URLs:  58%|█████▊    | 582/1000 [21:00<06:54,  1.01it/s]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/01/21/74/0301000000AEN20160121009800315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  58%|█████▊    | 584/1000 [21:01<04:20,  1.60it/s]

Error extracting text from http://allafrica.com/stories/201604220408.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201604220408.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffa53980>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN14P1HH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN14P1HH?il=0


Processing URLs:  58%|█████▊    | 585/1000 [21:01<03:23,  2.04it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhamax.com.br/politica/taques-comemora-dia-historico-e-ja-acredita-em-impeachment-de-dilma/78089&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhamax.com.br/politica/taques-comemora-dia-historico-e-ja-acredita-em-impeachment-de-dilma/78089&amp;prev=search


Processing URLs:  59%|█████▉    | 588/1000 [22:23<2:31:36, 22.08s/it]

Error extracting text from http://www.usnews.com/news/blogs/data-mine/2016/04/08/cyberattacks-surge-on-energy-companies-electric-grid: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  59%|█████▉    | 590/1000 [22:24<1:17:02, 11.27s/it]

Error extracting text from http://www.poynter.org/fact-checkers-code-of-principles/: 403 Client Error: Forbidden for url: http://www.poynter.org/fact-checkers-code-of-principles/


Processing URLs:  59%|█████▉    | 591/1000 [22:25<55:54,  8.20s/it]  

Error extracting text from http://www.nasdaq.com/article/venezuelas-pdvsa-postpones-bond-investor-calls-20170711-01018: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/venezuelas-pdvsa-postpones-bond-investor-calls-20170711-01018


Processing URLs:  59%|█████▉    | 593/1000 [22:25<30:38,  4.52s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/3193.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/3193.htm


Processing URLs:  60%|█████▉    | 597/1000 [22:29<12:13,  1.82s/it]

Error extracting text from http://www.reuters.com/article/us-iran-election-idUSKCN0XR04L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-idUSKCN0XR04L


Processing URLs:  60%|██████    | 603/1000 [22:49<23:09,  3.50s/it]

Error extracting text from http://iopscience.iop.org/article/10.1088/0264-9381/27/17/173001/meta;jsessionid=87CEE0BA26769494D369F2E4B8FF81D1.c3.iopscience.cld.iop.org: 403 Client Error:  for url: https://iopscience.iop.org:443/article/10.1088/0264-9381/27/17/173001/meta;jsessionid=87CEE0BA26769494D369F2E4B8FF81D1.c3.iopscience.cld.iop.org


Processing URLs:  60%|██████    | 604/1000 [22:50<17:24,  2.64s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/28/models-read-my-book/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/28/models-read-my-book/


Processing URLs:  61%|██████    | 607/1000 [22:53<10:54,  1.67s/it]

Error extracting text from https://www.theregreview.org/2020/10/01/hovenkamp-antitrust-regulation-over-time/: 403 Client Error: Forbidden for url: https://www.theregreview.org/2020/10/01/hovenkamp-antitrust-regulation-over-time/


Processing URLs:  61%|██████    | 611/1000 [22:57<08:09,  1.26s/it]

Error extracting text from http://syriadirect.org/news/jabha-shamiya-commander-blames-%E2%80%98complete-lack-of-coordination%E2%80%99for-aleppo-losses/: 404 Client Error: Not Found for url: http://syriadirect.org/news/jabha-shamiya-commander-blames-%E2%80%98complete-lack-of-coordination%E2%80%99for-aleppo-losses/


Processing URLs:  61%|██████    | 612/1000 [22:58<07:43,  1.19s/it]

Error extracting text from http://polling.reuters.com/#!poll/TM651Y15_13: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/#!poll/TM651Y15_13


Processing URLs:  61%|██████▏   | 613/1000 [23:59<2:01:37, 18.86s/it]

Error extracting text from http://www.seattletimes.com/business/army-plans-dakota-access-oil-pipeline-environmental-study/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 615/1000 [23:59<1:00:27,  9.42s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=320290: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=320290
Error extracting text from http://english.ahram.org.eg/NewsContent/50/1203/399228/AlAhram-Weekly/World/Sudan-vs-Ethiopia-The-luxury-of-war.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/50/1203/399228/AlAhram-Weekly/World/Sudan-vs-Ethiopia-The-luxury-of-war.aspx


Processing URLs:  62%|██████▏   | 616/1000 [24:00<43:07,  6.74s/it]  

Error extracting text from https://www.wsj.com/articles/funding-for-bezos-space-company-fails-to-launch-in-house-11624008601: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/funding-for-bezos-space-company-fails-to-launch-in-house-11624008601


Processing URLs:  62%|██████▏   | 619/1000 [24:04<20:07,  3.17s/it]

Error extracting text from http://www.reuters.com/article/2015/11/19/us-usa-fed-fischer-idUSKCN0T82ZA20151119: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/19/us-usa-fed-fischer-idUSKCN0T82ZA20151119


Processing URLs:  63%|██████▎   | 626/1000 [24:12<11:39,  1.87s/it]

Error extracting text from http://www.ethnologue.com/about/language-status: 404 Client Error: Not Found for url: https://www.ethnologue.com/about/language-status


Processing URLs:  63%|██████▎   | 630/1000 [24:15<05:47,  1.06it/s]

Error extracting text from https://www.vanguardngr.com/2017/08/famine-looms-in-northeast-nigeria-as-food-crisis-worsens/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2017/08/famine-looms-in-northeast-nigeria-as-food-crisis-worsens/


Processing URLs:  63%|██████▎   | 631/1000 [24:17<07:34,  1.23s/it]

Error extracting text from http://www.entergy.com/News_Room/newsrelease.aspx?NR_ID=3077: 404 Client Error: Not Found for url: https://www.entergy.com/News_Room/newsrelease.aspx?NR_ID=3077


Processing URLs:  63%|██████▎   | 633/1000 [24:19<05:53,  1.04it/s]

Error extracting text from https://www.nytimes.com/2022/01/25/world/asia/north-korea-launches-missiles-kim.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/25/world/asia/north-korea-launches-missiles-kim.html
URL filtered: https://www.google.com/amp/s/www.vice.com/amp/en/article/jgqyvg/facebook-helped-alex-jones-share-prepare-for-war-posts-before-the-capitol-riots
URL filtered: http://www.bloomberg.com/news/articles/2016-02-11/imf-warns-of-renewed-grexit-fears-without-credible-greece-plan


Processing URLs:  64%|██████▎   | 637/1000 [24:21<03:55,  1.54it/s]

Error extracting text from http://www.reuters.com/article/us-un-syria-idUSKBN15F2FG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-syria-idUSKBN15F2FG


Processing URLs:  64%|██████▍   | 638/1000 [24:21<03:50,  1.57it/s]

Error extracting text from http://cleantechnica.com/2015/06/06/why-california-gets-electric-cars-other-zev-mandate-states-dont-get/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2015/06/06/why-california-gets-electric-cars-other-zev-mandate-states-dont-get/


Processing URLs:  64%|██████▍   | 639/1000 [24:22<03:14,  1.86it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/oct/22/outside-observers-vital-fair-venezuela-vote-opposi/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/oct/22/outside-observers-vital-fair-venezuela-vote-opposi/
URL filtered: https://www.youtube.com/watch?v=7X8LHRAsaCo


Processing URLs:  64%|██████▍   | 643/1000 [24:24<03:32,  1.68it/s]

Error extracting text from https://www.timesofisrael.com/hamas-gaza-chief-threatens-to-renew-fighting-if-israel-violates-al-aqsa/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/hamas-gaza-chief-threatens-to-renew-fighting-if-israel-violates-al-aqsa/


Processing URLs:  65%|██████▍   | 647/1000 [25:32<1:50:22, 18.76s/it]

Error extracting text from https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.quotes.html: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  65%|██████▍   | 649/1000 [25:33<56:53,  9.72s/it]  

Error extracting text from http://www.strategicstudiesinstitute.army.mil/pubs/display.cfm?pubID=1300: HTTPConnectionPool(host='www.strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /pubs/display.cfm?pubID=1300 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffa53b30>: Failed to resolve 'www.strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▌   | 651/1000 [25:35<31:16,  5.38s/it]

Error extracting text from http://www.cnbc.com/2015/12/11/us-oil-export-ban-very-likely-to-be-lifted-in-spending-bill-source.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/11/us-oil-export-ban-very-likely-to-be-lifted-in-spending-bill-source.html


Processing URLs:  66%|██████▌   | 658/1000 [25:48<13:22,  2.35s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-27/euro-area-confidence-at-highest-in-4-years-as-ecb-mulls-stimulus


Processing URLs:  66%|██████▌   | 660/1000 [25:49<07:33,  1.33s/it]

Error extracting text from http://uisjournal.com/features/2017/02/01/illinois-proposes-a-sugar-tax-to-potentially-ease-budget-concerns/: 404 Client Error: Not Found for url: http://uisjournal.com/features/2017/02/01/illinois-proposes-a-sugar-tax-to-potentially-ease-budget-concerns/


Processing URLs:  66%|██████▌   | 662/1000 [25:56<12:43,  2.26s/it]

Error extracting text from http://38north.org/2015/09/sohae091515/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  66%|██████▋   | 663/1000 [25:57<11:35,  2.06s/it]

Error extracting text from https://tradingeconomics.com/commodity/eu-natural-gas: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/eu-natural-gas


Processing URLs:  66%|██████▋   | 664/1000 [26:59<1:44:30, 18.66s/it]

Error extracting text from https://dfat.gov.au/trade/agreements/rcep/news/Documents/rcep-joint-leaders-statement-8-september-2016.pdf: HTTPSConnectionPool(host='www.dfat.gov.au', port=443): Read timed out. (read timeout=60)


Processing URLs:  66%|██████▋   | 665/1000 [26:59<1:15:33, 13.53s/it]

Error extracting text from http://www.wsj.com/article_email/shows-of-strength-from-trump-and-putin-1444347565-lMyQjAxMTE1NzA4OTUwMzk0Wj: 403 Client Error: Forbidden for url: https://www.wsj.com/article_email/shows-of-strength-from-trump-and-putin-1444347565-lMyQjAxMTE1NzA4OTUwMzk0Wj


Processing URLs:  67%|██████▋   | 666/1000 [27:05<1:02:08, 11.16s/it]

URL filtered: https://www.youtube.com/watch?v=9CO6M2HsoIA


Processing URLs:  67%|██████▋   | 668/1000 [27:05<34:26,  6.23s/it]  

Error extracting text from http://davemarash.com/2016/03/02/march-2-2016-shane-harris/: 406 Client Error: Not Acceptable for url: http://davemarash.com/2016/03/02/march-2-2016-shane-harris/


Processing URLs:  67%|██████▋   | 671/1000 [27:09<18:16,  3.33s/it]

Error extracting text from http://www.ibtimes.co.uk/saudi-arabia-seeking-8bn-loan-plug-fiscal-deficit-1548606: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/saudi-arabia-seeking-8bn-loan-plug-fiscal-deficit-1548606


Processing URLs:  67%|██████▋   | 673/1000 [27:11<10:50,  1.99s/it]

URL filtered: https://www.youtube.com/watch?v=U8BWBn26bX0


Processing URLs:  68%|██████▊   | 675/1000 [27:11<06:17,  1.16s/it]

Error extracting text from http://www.arabnews.com/world/news/909471: 403 Client Error: Forbidden for url: https://www.arabnews.com/world/news/909471


Processing URLs:  68%|██████▊   | 678/1000 [27:15<07:33,  1.41s/it]

Error extracting text from http://rowvid.com/?v=_BgJEXQkjNQ&amp;t=71.75&amp;s=0.25: 404 Client Error: Not Found for url: http://ww1.rowvid.com


Processing URLs:  68%|██████▊   | 679/1000 [27:16<06:15,  1.17s/it]

Error extracting text from http://seekingalpha.com/article/1287761-how-often-does-s-and-p-500-have-10-percent-and-20-percent-negative-price-moves: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/1287761-how-often-does-s-and-p-500-have-10-percent-and-20-percent-negative-price-moves


Processing URLs:  68%|██████▊   | 680/1000 [27:17<06:31,  1.22s/it]

Error extracting text from http://www.citizen.co.za/1245811/malema-summoned-for-anc-coalition-talks-report/: 404 Client Error: Not Found for url: https://www.citizen.co.za/malema-summoned-for-anc-coalition-talks-report/


Processing URLs:  68%|██████▊   | 685/1000 [27:34<16:46,  3.19s/it]

Error extracting text from https://www.un.org/press/en/2021/db210326.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/db210326.doc.htm


Processing URLs:  69%|██████▊   | 687/1000 [27:50<27:35,  5.29s/it]

Error extracting text from https://gohofstra.com/news/2021/8/9/mens-lacrosse-mlax-jon-cooper-named-head-coach-of-team-canada-for-2022-winter-olympics.aspx: 404 Client Error: Not Found for url: https://gohofstra.com/news/2021/8/9/mens-lacrosse-mlax-jon-cooper-named-head-coach-of-team-canada-for-2022-winter-olympics.aspx


Processing URLs:  69%|██████▉   | 689/1000 [27:51<15:38,  3.02s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html


Processing URLs:  69%|██████▉   | 691/1000 [27:54<10:28,  2.03s/it]

Error extracting text from https://webforms.ey.com/Publication/vwLUAssets/EY_Artikel_-_Effective_IFRS_conversion_for_an_IPO/$FILE/EY-Effective-IFRS-conversion-for-an-IPO-11-2013.pdf: HTTPSConnectionPool(host='webforms.ey.com', port=443): Max retries exceeded with url: /Publication/vwLUAssets/EY_Artikel_-_Effective_IFRS_conversion_for_an_IPO/$FILE/EY-Effective-IFRS-conversion-for-an-IPO-11-2013.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffa51400>: Failed to resolve 'webforms.ey.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  69%|██████▉   | 692/1000 [27:54<08:18,  1.62s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AQ2QQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AQ2QQ


Processing URLs:  69%|██████▉   | 693/1000 [27:56<08:35,  1.68s/it]

Error extracting text from https://www.iafrikan.com/2017/11/06/iox-cable-ltd-to-provide-first-subsea-route-between-u-s-and-india-via-brazil-and-south-africa/: 404 Client Error: Not Found for url: https://iafrikan.com/2017/11/06/iox-cable-ltd-to-provide-first-subsea-route-between-u-s-and-india-via-brazil-and-south-africa/


Processing URLs:  70%|██████▉   | 697/1000 [28:02<07:23,  1.46s/it]

Error extracting text from http://warontherocks.com/2015/12/the-seven-deadly-sins-of-russia-analysis/: 403 Client Error: Forbidden for url: http://warontherocks.com/2015/12/the-seven-deadly-sins-of-russia-analysis/


Processing URLs:  70%|███████   | 705/1000 [28:14<06:12,  1.26s/it]

URL filtered: https://www.youtube.com/watch?v=vHLKFSWzImk


Processing URLs:  71%|███████   | 710/1000 [28:22<08:01,  1.66s/it]

Error extracting text from http://www.latimes.com/business/la-fi-spacex-investigation-20160909-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-spacex-investigation-20160909-snap-story.html


Processing URLs:  71%|███████   | 711/1000 [28:24<08:41,  1.80s/it]

Error extracting text from http://www.meddeviceonline.com/doc/insulet-enrolls-first-patients-omnipod-artificial-pancreas-system-0001: 410 Client Error: Gone for url: https://www.meddeviceonline.com/doc/insulet-enrolls-first-patients-omnipod-artificial-pancreas-system-0001


Processing URLs:  71%|███████▏  | 714/1000 [28:27<05:42,  1.20s/it]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BS1P9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BS1P9


Processing URLs:  72%|███████▏  | 717/1000 [29:32<1:29:56, 19.07s/it]

Error extracting text from http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 718/1000 [29:34<1:05:47, 14.00s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-10-23/carson-surges-past-trump-in-latest-bloomberg-politics-des-moines-register-iowa-poll


Processing URLs:  72%|███████▏  | 724/1000 [29:40<13:24,  2.91s/it]  

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-usa-idUSKCN0UT201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-usa-idUSKCN0UT201


Processing URLs:  72%|███████▎  | 725/1000 [29:41<10:05,  2.20s/it]

Error extracting text from http://www.wsj.com/articles/germanys-angela-merkel-becomes-unexpected-greek-ally-in-migrant-crisis-1456773578: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germanys-angela-merkel-becomes-unexpected-greek-ally-in-migrant-crisis-1456773578


Processing URLs:  73%|███████▎  | 727/1000 [29:43<07:19,  1.61s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-alabama-senate-race-poll-20171202-story,amp.html: 410 Client Error: Gone for url: https://www.chicagotribune.com/news/nationworld/politics/ct-alabama-senate-race-poll-20171202-story,amp.html


Processing URLs:  73%|███████▎  | 730/1000 [29:47<06:44,  1.50s/it]

Error extracting text from https://www.fpri.org/article/2020/01/a-faint-breeze-of-change-malaysias-relations-with-china/: 403 Client Error: Forbidden for url: https://www.fpri.org/article/2020/01/a-faint-breeze-of-change-malaysias-relations-with-china/


Processing URLs:  73%|███████▎  | 734/1000 [29:55<07:31,  1.70s/it]

Error extracting text from http://archive.fortune.com/magazines/fortune/fortune_archive/2006/07/10/8380798/index.htm: 404 Client Error: Not Found for url: https://archive.fortune.com/magazines/fortune/fortune_archive/2006/07/10/8380798/index.htm


Processing URLs:  74%|███████▎  | 735/1000 [29:57<07:53,  1.79s/it]

Error extracting text from http://af.reuters.com/article/energyOilNews/idAFE6N10I06120151001: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  74%|███████▎  | 737/1000 [31:06<1:32:31, 21.11s/it]

Error extracting text from http://www.ncnn.com/edit-news/9892-elon-poll-support-slips-for-hb2-mccrory-leads-cooper: HTTPConnectionPool(host='www.ncnn.com', port=80): Max retries exceeded with url: /edit-news/9892-elon-poll-support-slips-for-hb2-mccrory-leads-cooper (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ff0bbd70>, 'Connection to www.ncnn.com timed out. (connect timeout=60)'))
URL filtered: https://twitter.com/iraqi_day/status/799590921792692224


Processing URLs:  74%|███████▍  | 739/1000 [31:08<51:32, 11.85s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-05-13/venezuelan-economy-czar-says-more-import-cuts-coming-to-pay-debt


Processing URLs:  74%|███████▍  | 743/1000 [31:11<20:34,  4.80s/it]

Error extracting text from http://english.yonhapnews.co.kr/business/2016/01/25/37/0503000000AEN20160125004900320F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  74%|███████▍  | 745/1000 [31:12<12:46,  3.00s/it]

Error extracting text from http://www.reuters.com/article/us-china-party-idUSKCN0ZH3Q2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-party-idUSKCN0ZH3Q2


Processing URLs:  75%|███████▍  | 748/1000 [31:15<08:00,  1.91s/it]

Error extracting text from http://www.latimes.com/world/la-fg-air-war-syria-20161028-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-air-war-syria-20161028-story.html


Processing URLs:  75%|███████▍  | 749/1000 [31:15<06:27,  1.54s/it]

Error extracting text from https://pca-cpa.org/en/news/pca-press-release-the-south-china-sea-arbitration-the-republic-of-the-philippines-v-the-peoples-republic-of-china/: 403 Client Error: Forbidden for url: https://pca-cpa.org/en/news/pca-press-release-the-south-china-sea-arbitration-the-republic-of-the-philippines-v-the-peoples-republic-of-china/


Processing URLs:  75%|███████▌  | 752/1000 [31:19<05:34,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-syria-assad-idUSKCN0WX133: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-syria-assad-idUSKCN0WX133


Processing URLs:  75%|███████▌  | 754/1000 [31:24<07:56,  1.94s/it]

Error extracting text from http://www.wsj.com/articles/russian-special-forces-seen-as-key-to-aleppo-victory-1481884200?cx_campaign=poptart&amp;mod=cx_poptart#cxrecs_s: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russian-special-forces-seen-as-key-to-aleppo-victory-1481884200?cx_campaign=poptart&amp;mod=cx_poptart#cxrecs_s


Processing URLs:  76%|███████▌  | 755/1000 [31:26<07:00,  1.72s/it]

Error extracting text from https://definitions.uslegal.com/c/chaptered/: 405 Client Error: Not Allowed for url: https://definitions.uslegal.com/c/chaptered/


Processing URLs:  76%|███████▌  | 758/1000 [31:29<04:47,  1.19s/it]

Error extracting text from https://www.reuters.com/article/uk-russia-putin-health/uk-media-report-that-putin-is-ill-and-poised-to-quit-is-nonsense-says-kremlin-idUKKBN27M17H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-russia-putin-health/uk-media-report-that-putin-is-ill-and-poised-to-quit-is-nonsense-says-kremlin-idUKKBN27M17H


Processing URLs:  76%|███████▌  | 761/1000 [31:36<08:16,  2.08s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/841043/dod-cyber-strategy-defines-how-officials-discern-cyber-incidents-from-armed-att: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/841043/dod-cyber-strategy-defines-how-officials-discern-cyber-incidents-from-armed-att


Processing URLs:  76%|███████▌  | 762/1000 [31:36<06:27,  1.63s/it]

Error extracting text from http://www.wsj.com/articles/brazilian-police-search-home-of-dilma-rousseffs-campaign-strategist-1456150375: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazilian-police-search-home-of-dilma-rousseffs-campaign-strategist-1456150375


Processing URLs:  76%|███████▋  | 764/1000 [31:37<04:27,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15N2HP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15N2HP


Processing URLs:  76%|███████▋  | 765/1000 [31:40<05:47,  1.48s/it]

Error extracting text from https://www.nytimes.com/2021/02/05/us/politics/fbi-intelligence-domestic-terrorism.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/05/us/politics/fbi-intelligence-domestic-terrorism.html


Processing URLs:  77%|███████▋  | 768/1000 [31:41<03:26,  1.12it/s]

Error extracting text from https://www.reuters.com/article/uk-russia-putin-health-idUKKBN27M17H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-russia-putin-health-idUKKBN27M17H


Processing URLs:  77%|███████▋  | 773/1000 [31:49<05:31,  1.46s/it]

Error extracting text from https://daplpipelinefacts.com/about-the-dakota-access-pipeline/: 404 Client Error: Not Found for url: https://daplpipelinefacts.com/about-the-dakota-access-pipeline/


Processing URLs:  77%|███████▋  | 774/1000 [31:50<04:51,  1.29s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/Economy/Central-bank-rules-out-capital-control-to-support-ringgit: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/Economy/Central-bank-rules-out-capital-control-to-support-ringgit


Processing URLs:  78%|███████▊  | 781/1000 [31:59<04:13,  1.16s/it]

Error extracting text from http://www.securitycouncilreport.org/un-documents/south-sudan/: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/un-documents/south-sudan/


Processing URLs:  78%|███████▊  | 782/1000 [31:59<03:11,  1.14it/s]

Error extracting text from https://www.timesofisrael.com/israeli-officials-said-to-push-calmer-waters-with-iran-after-alleged-ship-blasts/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/israeli-officials-said-to-push-calmer-waters-with-iran-after-alleged-ship-blasts/


Processing URLs:  78%|███████▊  | 785/1000 [33:03<1:08:08, 19.02s/it]

Error extracting text from https://www.betfair.com/exchange/plus/politics/market/1.128390571: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/politics/market/1.128390571 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3051d98b0>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:  79%|███████▉  | 788/1000 [33:06<25:03,  7.09s/it]  

Error extracting text from https://www.nytimes.com/2017/06/12/opinion/a-fierce-famine-stalks-africa.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/12/opinion/a-fierce-famine-stalks-africa.html


Processing URLs:  79%|███████▉  | 789/1000 [33:07<17:43,  5.04s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-araq-idUSKCN0UP1Y120160111: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-araq-idUSKCN0UP1Y120160111


Processing URLs:  79%|███████▉  | 790/1000 [33:08<13:39,  3.90s/it]

URL filtered: https://www.youtube.com/watch?v=KIiQvCNzU7s


Processing URLs:  79%|███████▉  | 793/1000 [33:08<05:49,  1.69s/it]

Error extracting text from http://www.wsj.com/articles/u-n-adopts-new-sanctions-against-north-korea-1456934616: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-adopts-new-sanctions-against-north-korea-1456934616
Error extracting text from https://www.reuters.com/markets/stocks/biontech-starts-work-omicron-specific-vaccine-2021-11-29/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/markets/stocks/biontech-starts-work-omicron-specific-vaccine-2021-11-29/


Processing URLs:  80%|████████  | 801/1000 [33:18<03:42,  1.12s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/265421-trump-leads-cruz-by-two-points-in-new-iowa-poll: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/265421-trump-leads-cruz-by-two-points-in-new-iowa-poll/


Processing URLs:  80%|████████  | 805/1000 [33:25<05:13,  1.61s/it]

Error extracting text from http://blog.whatscotlandthinks.org/2017/02/on-sampling-error/: HTTPSConnectionPool(host='blog.whatscotlandthinks.org', port=443): Max retries exceeded with url: /2017/02/on-sampling-error/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blog.whatscotlandthinks.org'. (_ssl.c:1000)")))


Processing URLs:  81%|████████  | 808/1000 [33:30<05:09,  1.61s/it]

Error extracting text from https://finance.yahoo.com/news/zuma-succession-fight-may-spur-220001592.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/zuma-succession-fight-may-spur-220001592.html


Processing URLs:  81%|████████  | 810/1000 [33:33<04:55,  1.56s/it]

Error extracting text from http://www.colorado.edu/AmStudies/lewis/ecology/rolecreditagencies.pdf: 404 Client Error: Not Found for url: https://www.colorado.edu/AmStudies/lewis/ecology/rolecreditagencies.pdf


Processing URLs:  81%|████████▏ | 814/1000 [33:38<03:36,  1.17s/it]

Error extracting text from https://finance.yahoo.com/news/ameren-misses-4q-profit-forecasts-131449130.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/ameren-misses-4q-profit-forecasts-131449130.html


Processing URLs:  82%|████████▏ | 816/1000 [33:40<02:55,  1.05it/s]

Error extracting text from http://english.alarabiya.net/en/views/news/middle-east/2016/06/21/Hezbollah-s-hot-summer-between-Aleppo-s-battle-and-banking-wars-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/views/news/middle-east/2016/06/21/Hezbollah-s-hot-summer-between-Aleppo-s-battle-and-banking-wars-.html


Processing URLs:  82%|████████▏ | 822/1000 [34:52<58:15, 19.64s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-09-12/factbox-automakers-get-serious-about-electric-cars: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  82%|████████▎ | 825/1000 [34:54<22:41,  7.78s/it]

Error extracting text from http://m.screendaily.com/5103203.article: HTTPConnectionPool(host='m.screendaily.com', port=80): Max retries exceeded with url: /5103203.article (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db2150>: Failed to resolve 'm.screendaily.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-brady-idUSKBN16R2F9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-brady-idUSKBN16R2F9


Processing URLs:  83%|████████▎ | 833/1000 [35:19<10:39,  3.83s/it]

Error extracting text from http://www.un.org/press/en/2016/sc12234.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2016/sc12234.doc.htm


Processing URLs:  84%|████████▍ | 839/1000 [35:42<08:27,  3.15s/it]

Error extracting text from http://www.channelnewsasia.com/news/business/weak-japan-data-supplies/2298872.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/weak-japan-data-supplies/2298872.html


Processing URLs:  84%|████████▍ | 842/1000 [35:45<04:14,  1.61s/it]

URL filtered: https://www.youtube.com/watch?v=gHjronXyeqI


Processing URLs:  85%|████████▍ | 848/1000 [36:11<15:55,  6.28s/it]

Error extracting text from http://www.investopedia.com/ask/answers/032515/how-does-price-oil-affect-venezuelas-economy.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/ask/answers/032515/how-does-price-oil-affect-venezuelas-economy.asp


Processing URLs:  85%|████████▌ | 853/1000 [36:22<06:51,  2.80s/it]

Error extracting text from https://nation.com.pk/20-Jan-2021/together-for-peace: 503 Server Error: Backend fetch failed for url: https://www.nation.com.pk/20-Jan-2021/together-for-peace


Processing URLs:  86%|████████▌ | 860/1000 [36:27<01:42,  1.37it/s]

Error extracting text from http://www.chron.com/news/article/A-third-of-SC-superdelegates-voice-support-for-6629774.php: 403 Client Error: Forbidden for url: https://www.chron.com/news/article/A-third-of-SC-superdelegates-voice-support-for-6629774.php
Error extracting text from https://www.reuters.com/article/us-usa-cyber-russia-microfocus/uk-tech-firm-micro-focus-to-curb-code-reviews-by-high-risk-governments-idUSKBN1CE2M6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-microfocus/uk-tech-firm-micro-focus-to-curb-code-reviews-by-high-risk-governments-idUSKBN1CE2M6


Processing URLs:  86%|████████▋ | 865/1000 [36:31<01:32,  1.46it/s]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-top-court-rules-parliament-failed-to-hold-zuma-to-account-over-scandal-idUSKBN1EN0JO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-top-court-rules-parliament-failed-to-hold-zuma-to-account-over-scandal-idUSKBN1EN0JO
URL filtered: https://www.bloomberg.com/news/articles/2016-10-20/where-the-next-crisis-will-come-from


Processing URLs:  87%|████████▋ | 867/1000 [36:34<02:07,  1.05it/s]

Error extracting text from https://www.sandvine.com/downloads/general/global-internet-phenomena/2015/encrypted-internet-traffic.pdf: 404 Client Error: Not Found for url: https://www.sandvine.com/hubfs/downloads/archive/global-internet-phenomena-spotlight-encrypted-internet-traffic.pdf


Processing URLs:  87%|████████▋ | 870/1000 [36:42<04:17,  1.98s/it]

URL filtered: http://www.bbc.com/news/world-middle-east-36557092?ns_mchannel=social&amp;ns_campaign=bbc_breaking&amp;ns_source=twitter&amp;ns_linkname=news_centra


Processing URLs:  87%|████████▋ | 874/1000 [36:49<03:27,  1.65s/it]

URL filtered: https://twitter.com/ClayGraubard


Processing URLs:  88%|████████▊ | 877/1000 [36:50<02:10,  1.06s/it]

Error extracting text from https://www.japantimes.co.jp/news/2021/02/01/national/japan-china-coast-guard-law-senkakus/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2021/02/01/national/japan-china-coast-guard-law-senkakus/


Processing URLs:  88%|████████▊ | 879/1000 [36:51<01:25,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-burundi-rwanda-congodemocratic-un-idUSKCN0Y4013: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-rwanda-congodemocratic-un-idUSKCN0Y4013


Processing URLs:  88%|████████▊ | 880/1000 [36:53<01:56,  1.03it/s]

Error extracting text from http://www.wsj.com/articles/islamic-state-uses-syrias-biggest-dam-as-rampart-and-potential-weapon-1453333531: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/islamic-state-uses-syrias-biggest-dam-as-rampart-and-potential-weapon-1453333531


Processing URLs:  88%|████████▊ | 882/1000 [36:56<02:11,  1.12s/it]

Error extracting text from http://ir.avisbudgetgroup.com/releasedetail.cfm?ReleaseID=1031344: 403 Client Error: Forbidden for url: http://ir.avisbudgetgroup.com/releasedetail.cfm?ReleaseID=1031344
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.jb.com.br/pais/noticias/2016/03/19/datafolha-maioria-quer-impeachment-mas-rejeita-michel-temer-governando/&amp;usg=ALkJrhhRT7N6q-5_fz8x6o7f_cMcLi_oow: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.jb.com.br/pais/noticias/2016/03/19/datafolha-maioria-quer-impeachment-mas-rejeita-michel-temer-governando/&amp;usg=ALkJrhhRT7N6q-5_fz8x6o7f_cMcLi_oow


Processing URLs:  89%|████████▊ | 886/1000 [36:59<01:48,  1.05it/s]

URL filtered: https://worldview.stratfor.com/situation-report/venezuela-creditor-requests-default-ruling-bond-payments?utm_source=Twitter&amp;utm_medium=social&amp;utm_campaign=article


Processing URLs:  89%|████████▉ | 890/1000 [37:02<01:34,  1.16it/s]

Error extracting text from http://www.nasdaq.com/article/eu-to-resolve-google-antitrust-cases-in-the-next-few-months-cm793420#ixzz4i7s374hS: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/eu-to-resolve-google-antitrust-cases-in-the-next-few-months-cm793420#ixzz4i7s374hS


Processing URLs:  89%|████████▉ | 893/1000 [37:05<01:34,  1.13it/s]

Error extracting text from https://www.nytimes.com/live/2021/01/27/world/covid-19-coronavirus#tomorrow-our-fridges-will-be-empty-one-european-health-official-says-as-vaccines-dwindle: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/01/27/world/covid-19-coronavirus#tomorrow-our-fridges-will-be-empty-one-european-health-official-says-as-vaccines-dwindle
Error extracting text from http://zoepionierin.de/renault-zoe-ein-fazit/: HTTPConnectionPool(host='zoepionierin.de', port=80): Max retries exceeded with url: /renault-zoe-ein-fazit/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30451c5c0>: Failed to resolve 'zoepionierin.de' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|████████▉ | 896/1000 [37:08<01:33,  1.12it/s]

Error extracting text from http://www.cncda.org/CMS/Pubs/Cal%20Covering%204Q%2015.pdf: 403 Client Error: Forbidden for url: http://www.cncda.org/CMS/Pubs/Cal%20Covering%204Q%2015.pdf


Processing URLs:  90%|████████▉ | 899/1000 [37:12<01:53,  1.12s/it]

Error extracting text from http://thehill.com/homenews/administration/343487-poll-only-1-in-4-think-trump-will-serve-entire-term: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/343487-poll-only-1-in-4-think-trump-will-serve-entire-term/


Processing URLs:  90%|█████████ | 901/1000 [37:17<02:50,  1.72s/it]

Error extracting text from http://ewn.co.za/2016/09/05/ANCYL-ANCWL-and-MKMVA-rally-behind-Zuma: 404 Client Error: Not Found for url: https://www.ewn.co.za/2016/09/05/ANCYL-ANCWL-and-MKMVA-rally-behind-Zuma


Processing URLs:  90%|█████████ | 903/1000 [37:18<01:41,  1.04s/it]

Error extracting text from https://www.opendemocracy.net/5050/anne-marie-goetz/still-no-country-for-women-double-standards-choosing-next-UN-Secretary-General: 403 Client Error: Forbidden for url: https://www.opendemocracy.net/5050/anne-marie-goetz/still-no-country-for-women-double-standards-choosing-next-UN-Secretary-General


Processing URLs:  91%|█████████ | 906/1000 [37:22<02:05,  1.34s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-24/scotland-edges-closer-to-independence-vote-after-brexit-ruling


Processing URLs:  91%|█████████ | 909/1000 [37:23<01:03,  1.43it/s]

Error extracting text from http://www.worldtribune.com/iraqi-forces-advance-toward-airbase-near-mosul/: 403 Client Error: Forbidden for url: http://www.worldtribune.com/iraqi-forces-advance-toward-airbase-near-mosul/


Processing URLs:  91%|█████████ | 910/1000 [37:24<00:56,  1.59it/s]

Error extracting text from http://theconversation.com/spain-is-a-third-election-in-a-year-on-the-horizon-63681: 403 Client Error: Forbidden for url: http://theconversation.com/spain-is-a-third-election-in-a-year-on-the-horizon-63681


Processing URLs:  91%|█████████▏| 913/1000 [38:28<26:25, 18.22s/it]

Error extracting text from https://www.teamusa.org/-/media/TeamUSA/SafeSport/Documents/USOPC-Commitment-to-US-Olympic-and-Paralympic-Community_July-2019.pdf: HTTPSConnectionPool(host='www.teamusa.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  92%|█████████▏| 915/1000 [38:31<13:57,  9.85s/it]

Error extracting text from https://www.wsj.com/articles/after-navalny-challenging-russias-putin-is-getting-even-harder-11627464602: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/after-navalny-challenging-russias-putin-is-getting-even-harder-11627464602


Processing URLs:  92%|█████████▏| 919/1000 [38:35<04:04,  3.01s/it]

Error extracting text from http://www.reuters.com/article/us-eu-usa-ttip-idUSKCN12I2FZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-ttip-idUSKCN12I2FZ?il=0


Processing URLs:  92%|█████████▏| 920/1000 [38:37<03:39,  2.74s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-26/art-of-the-trade-deal-china-championed-pact-faces-tricky-talks


Processing URLs:  92%|█████████▏| 924/1000 [38:39<01:31,  1.20s/it]

URL filtered: https://www.youtube.com/watch?v=AJX60CSED1s
Error extracting text from https://www.irrawaddy.com/news/burma/myanmar-asks-help-fight-h1n1-virus.html: 403 Client Error: Forbidden for url: https://www.irrawaddy.com/news/burma/myanmar-asks-help-fight-h1n1-virus.html


Processing URLs:  93%|█████████▎| 926/1000 [38:41<01:19,  1.07s/it]

Error extracting text from https://www.fastcompany.com/40514189/intel-new-chip-aims-for-quantum-supremacy: 403 Client Error: Forbidden for url: https://www.fastcompany.com/40514189/intel-new-chip-aims-for-quantum-supremacy


Processing URLs:  93%|█████████▎| 929/1000 [38:45<01:24,  1.19s/it]

Error extracting text from http://www.armed-services.senate.gov/imo/media/doc/Clapper_02-09-16.pdf: 403 Client Error: Forbidden for url: http://www.armed-services.senate.gov/imo/media/doc/Clapper_02-09-16.pdf


Processing URLs:  93%|█████████▎| 931/1000 [38:47<01:19,  1.15s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN14H0WE?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN14H0WE?il=0


Processing URLs:  93%|█████████▎| 933/1000 [38:49<01:08,  1.02s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-economy-idUSKCN1AX2EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-idUSKCN1AX2EH


Processing URLs:  93%|█████████▎| 934/1000 [38:51<01:16,  1.16s/it]

URL filtered: https://twitter.com/dylanmatt


Processing URLs:  94%|█████████▎| 937/1000 [38:54<01:13,  1.17s/it]

Error extracting text from http://www.lokmarg.com/ban-ki-moon-returns-to-s-korea-poised-for-presidential-bid/: 404 Client Error: Not Found for url: https://lokmarg.com/ban-ki-moon-returns-to-s-korea-poised-for-presidential-bid/


Processing URLs:  94%|█████████▍| 938/1000 [38:55<01:06,  1.08s/it]

Error extracting text from https://finance.yahoo.com/news/upper-house-russian-parliament-ratifies-111144358.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/upper-house-russian-parliament-ratifies-111144358.html


Processing URLs:  94%|█████████▍| 944/1000 [39:04<01:19,  1.42s/it]

Error extracting text from http://www.cfr.org/migration/europes-migration-crisis/p32874: 404 Client Error: Not Found for url: https://www.cfr.org/migration/europes-migration-crisis/p32874


Processing URLs:  94%|█████████▍| 945/1000 [39:04<01:04,  1.16s/it]

URL filtered: https://twitter.com/ThomasErdbrink/status/687158033844203520


Processing URLs:  95%|█████████▍| 947/1000 [39:05<00:45,  1.16it/s]

Error extracting text from https://www.fpri.org/article/2016/07/everything-old-new-russia-returns-nicaragua/: 403 Client Error: Forbidden for url: https://www.fpri.org/article/2016/07/everything-old-new-russia-returns-nicaragua/


Processing URLs:  95%|█████████▌| 950/1000 [39:10<00:54,  1.09s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/01/gitrep-31mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/01/gitrep-31mar16pm/


Processing URLs:  95%|█████████▌| 951/1000 [39:10<00:42,  1.16it/s]

Error extracting text from http://petroglobalnews24.com/2017-03-07-goldman-sachs-group-inc-the-reaffirms-hold-rating-for-tesla-motors-inc-tsla/: 404 Client Error: Not Found for url: http://petroglobalnews24.com/2017-03-07-goldman-sachs-group-inc-the-reaffirms-hold-rating-for-tesla-motors-inc-tsla/


Processing URLs:  95%|█████████▌| 952/1000 [39:11<00:42,  1.13it/s]

URL filtered: https://www.linkedin.com/in/larswe/
URL filtered: http://venturebeat.com/2016/06/30/facebook-messenger-now-has-11000-chatbots-for-you-to-try/


Processing URLs:  97%|█████████▋| 966/1000 [39:27<00:40,  1.18s/it]

Error extracting text from http://www.realcleardefense.com/articles/2016/10/31/iraqi_special_forces_dawn_assault_on_eastern_mosul_110285.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/10/31/iraqi_special_forces_dawn_assault_on_eastern_mosul_110285.html


Processing URLs:  97%|█████████▋| 968/1000 [39:30<00:35,  1.10s/it]

Error extracting text from http://www.nytimes.com/2015/10/03/business/economy/jobs-report-hiring-unemployment-wages-fed-rates.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/03/business/economy/jobs-report-hiring-unemployment-wages-fed-rates.html?_r=0


Processing URLs:  97%|█████████▋| 970/1000 [39:31<00:21,  1.38it/s]

Error extracting text from http://www.nrttv.com/en/birura-details.aspx?Jimare=3834: 403 Client Error: Forbidden for url: https://www.nrttv.com/en/birura-details.aspx?Jimare=3834


Processing URLs:  97%|█████████▋| 972/1000 [39:33<00:24,  1.14it/s]

Error extracting text from http://business.financialpost.com/news/fp-street/if-a-retired-birdwatcher-can-be-a-superforecaster-maybe-you-can-too-heres-how: 403 Client Error: Forbidden for url: https://financialpost.com/news/fp-street/if-a-retired-birdwatcher-can-be-a-superforecaster-maybe-you-can-too-heres-how


Processing URLs:  98%|█████████▊| 977/1000 [39:40<00:29,  1.29s/it]

Error extracting text from https://www.sofx.com/welcome-to-sofx/: 403 Client Error: Forbidden for url: https://www.sofx.com/welcome-to-sofx/


Processing URLs:  99%|█████████▊| 986/1000 [39:52<00:17,  1.24s/it]

Error extracting text from http://science.nasa.gov/missions/ace/: 404 Client Error: Page not found: /missions/ace/ for url: https://science.nasa.gov/missions/ace/


Processing URLs:  99%|█████████▉| 989/1000 [40:07<00:31,  2.90s/it]

Error extracting text from http://www.parl.ca/legisinfo/Home.aspx?BillStatus=RoyalAssentGiven&amp;Page=1&amp;Language=E&amp;download=xml: 404 Client Error: Not Found for url: https://www.parl.ca/ErrorPage/Default.aspx?Url=https%3a%2f%2fwww.parl.ca%2flegisinfo%2fHome.aspx%3fBillStatus%3dRoyalAssentGiven%26amp%3bPage%3d1%26amp%3bLanguage%3dE%26amp%3bdownload%3dxml&StatusCode=404


Processing URLs:  99%|█████████▉| 990/1000 [40:08<00:25,  2.52s/it]

URL filtered: https://www.youtube.com/watch?v=PIQSEq6tEVs#t=287


Processing URLs:  99%|█████████▉| 993/1000 [40:09<00:08,  1.25s/it]

Error extracting text from http://www.tribuneindia.com/news/sunday-special/people/-buddha-back-from-traffic-jam/236921.html: 403 Client Error: Forbidden for url: http://www.tribuneindia.com/news/sunday-special/people/-buddha-back-from-traffic-jam/236921.html
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/kognitivnoe-modelirovanie-dlya-resheniya-zadach-upravleniya-slabostrukturirovannymi-sistemami-situatsiyami&amp;usg=ALkJrhgT0K0paNVdA8G1RDYs-0h55JdHWg: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/kognitivnoe-modelirovanie-dlya-resheniya-zadach-upravleniya-slabostrukturirovannymi-sistemami-situatsiyami&amp;usg=ALkJrhgT0K0paNVdA8G1RDYs-0h55JdHWg


Processing URLs: 100%|█████████▉| 995/1000 [40:10<00:04,  1.22it/s]

Error extracting text from http://www.reuters.com/article/2015/09/25/us-usa-election-biden-exclusive-idUSKCN0RP24Q20150925: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/25/us-usa-election-biden-exclusive-idUSKCN0RP24Q20150925
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O0UEKB6S972F01-0N87UFAMS8TBL8O2J6Q4DBES0B


Processing URLs: 100%|█████████▉| 999/1000 [40:32<00:03,  3.67s/it]

Error extracting text from http://eng.mod.gov.cn/DefenseNews/2016-01/27/content_4637896.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/DefenseNews/2016-01/27/content_4637896.htm


Processing URLs: 100%|██████████| 1000/1000 [40:33<00:00,  2.43s/it]
Processing URLs:   0%|          | 1/1000 [00:00<02:24,  6.91it/s]

Error extracting text from http://www.reuters.com/article/usa-economy-growth-idUSKCN0YU1ZH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/usa-economy-growth-idUSKCN0YU1ZH


Processing URLs:   0%|          | 2/1000 [00:00<06:21,  2.62it/s]

Error extracting text from https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/60777ec3bc2c737f0cdaafae/1618444011978/REINZ+Monthly+Property+Report+-+March+2021.pdf: 403 Client Error: Forbidden for url: https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/60777ec3bc2c737f0cdaafae/1618444011978/REINZ+Monthly+Property+Report+-+March+2021.pdf


Processing URLs:   0%|          | 5/1000 [00:03<09:29,  1.75it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-11/putin-tells-defense-chiefs-to-strengthen-russian-nuclear-forces
Error extracting text from https://thefly.com/landingPageNews.php?id=3063893&headline=GNW-Genworth-Oceanwide-extend-merger-agreement: HTTPSConnectionPool(host='thefly.com', port=443): Max retries exceeded with url: /landingPageNews.php?id=3063893&headline=GNW-Genworth-Oceanwide-extend-merger-agreement (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:   1%|          | 7/1000 [00:07<22:06,  1.34s/it]

Error extracting text from http://www.reuters.com/article/2015/10/27/usa-congress-exim-vote-idUSL1N12R3GJ20151027: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/27/usa-congress-exim-vote-idUSL1N12R3GJ20151027


Processing URLs:   2%|▏         | 15/1000 [00:16<15:50,  1.04it/s]

Error extracting text from http://www.tradingeconomics.com/venezuela/indicators: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/venezuela/indicators


Processing URLs:   2%|▏         | 17/1000 [00:17<11:05,  1.48it/s]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2092.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2092.htm


Processing URLs:   2%|▏         | 22/1000 [00:23<16:57,  1.04s/it]

Error extracting text from https://www.debka.com/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:   2%|▎         | 25/1000 [00:26<17:06,  1.05s/it]

Error extracting text from http://www.who.int/entity/influenza/human_animal_interface/Influenza_Summary_IRA_HA_interface_07_25_2017.pdf: 404 Client Error: Not Found for url: https://www.who.int/entity/influenza/human_animal_interface/Influenza_Summary_IRA_HA_interface_07_25_2017.pdf


Processing URLs:   3%|▎         | 30/1000 [00:33<18:52,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN13S05C?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN13S05C?il=0


Processing URLs:   3%|▎         | 34/1000 [00:42<27:50,  1.73s/it]

Error extracting text from http://www.nytimes.com/1956/12/30/archives/cold-fusion-of-hydrogen-atoms-a-fourth-method-pulling-together.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1956/12/30/archives/cold-fusion-of-hydrogen-atoms-a-fourth-method-pulling-together.html


Processing URLs:   4%|▍         | 42/1000 [00:56<22:16,  1.39s/it]

Error extracting text from http://www.nytimes.com/2016/04/10/world/asia/john-kerry-arrives-in-afghanistan-with-message-of-support.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/10/world/asia/john-kerry-arrives-in-afghanistan-with-message-of-support.html


Processing URLs:   4%|▍         | 44/1000 [01:02<30:43,  1.93s/it]

Error extracting text from http://www.extremetech.com/internet/232325-china-bans-internet-companies-from-reporting-news-tightens-grip-on-media: 403 Client Error: Forbidden for url: http://www.extremetech.com/internet/232325-china-bans-internet-companies-from-reporting-news-tightens-grip-on-media
Error extracting text from http://blogs.wsj.com/washwire/2015/12/07/ted-cruz-surpasses-donald-trump-in-new-poll-of-iowa-republicans/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/12/07/ted-cruz-surpasses-donald-trump-in-new-poll-of-iowa-republicans/


Processing URLs:   5%|▍         | 47/1000 [01:08<30:06,  1.90s/it]

Error extracting text from http://www.atimes.com/att-ceo-stephenson-takes-alphabet-time-warner-bid/: 404 Client Error: Not Found for url: https://atimes.com/att-ceo-stephenson-takes-alphabet-time-warner-bid/
URL filtered: https://www.bloomberg.com/politics/articles/2017-03-27/scottish-parliament-to-back-independence-vote-in-brexit-defiance


Processing URLs:   5%|▌         | 53/1000 [01:14<18:08,  1.15s/it]

Error extracting text from http://www.wsj.com/articles/in-terror-fight-france-feels-increasingly-isolated-1448065952: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-terror-fight-france-feels-increasingly-isolated-1448065952


Processing URLs:   6%|▌         | 58/1000 [01:22<22:58,  1.46s/it]

Error extracting text from http://www.absaconference.org/pdf56/II930Byers.pdf: 403 Client Error: Forbidden for url: https://absaconference.org/pdf56/II930Byers.pdf


Processing URLs:   6%|▌         | 60/1000 [01:24<18:05,  1.15s/it]

Error extracting text from http://www.demconvention.com/wp-content/uploads/2016/07/Democratic-Party-Platform-7.21.16-no-lines.pdf: 404 Client Error: Not Found for url: https://www.demconvention.com/wp-content/uploads/2016/07/Democratic-Party-Platform-7.21.16-no-lines.pdf
Error extracting text from http://www.nytimes.com/2016/04/14/world/europe/russian-plane-us-ship-baltic-sea.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/14/world/europe/russian-plane-us-ship-baltic-sea.html


Processing URLs:   6%|▌         | 61/1000 [01:24<14:04,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/08/28/world/middleeast/saudi-yemen-united-nations-guterres-child-deaths.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/28/world/middleeast/saudi-yemen-united-nations-guterres-child-deaths.html?_r=0


Processing URLs:   6%|▋         | 65/1000 [01:27<10:44,  1.45it/s]

Error extracting text from http://www.wsj.com/articles/u-s-jet-flies-over-waters-claimed-by-china-1450466358: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-jet-flies-over-waters-claimed-by-china-1450466358
Error extracting text from http://www.reuters.com/article/us-britain-eu-poll-idUSKCN0YR0WB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-poll-idUSKCN0YR0WB


Processing URLs:   7%|▋         | 68/1000 [01:30<15:44,  1.01s/it]

Error extracting text from https://www.euvlitho.com/Blogs/The%20Chip%20Choke%20Point%20-%20The%20Wire%20China.pdf: 406 Client Error: Not Acceptable for url: https://www.euvlitho.com/Blogs/The%20Chip%20Choke%20Point%20-%20The%20Wire%20China.pdf


Processing URLs:   7%|▋         | 69/1000 [01:32<16:33,  1.07s/it]

Error extracting text from http://www.politicalirish.com/threads/communist-farm-chief-aims-to-harvest-anti-putin-vote.24772/: 404 Client Error: Not Found for url: http://www.politicalirish.com/threads/communist-farm-chief-aims-to-harvest-anti-putin-vote.24772/


Processing URLs:   8%|▊         | 75/1000 [01:50<36:40,  2.38s/it]

Error extracting text from http://buenosairesherald.com/article/199083/rousseff-has-votes-to-fend-off-impeachment: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/199083/rousseff-has-votes-to-fend-off-impeachment


Processing URLs:   8%|▊         | 76/1000 [01:51<28:15,  1.84s/it]

Error extracting text from http://asiafoundation.org/in-asia/2015/11/18/looking-ahead-in-afghanistan-a-conversation-with-political-economist-timor-sharan/: 403 Client Error: Forbidden for url: http://asiafoundation.org/in-asia/2015/11/18/looking-ahead-in-afghanistan-a-conversation-with-political-economist-timor-sharan/


Processing URLs:   8%|▊         | 78/1000 [01:55<30:53,  2.01s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X708N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X708N


Processing URLs:   8%|▊         | 79/1000 [02:57<5:04:51, 19.86s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/article152136047.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   8%|▊         | 85/1000 [03:07<58:52,  3.86s/it]  

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/July/REINZ%20Residential%20Press%20Release%20-%20July%202021.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/July/REINZ%20Residential%20Press%20Release%20-%20July%202021.pdf


Processing URLs:   9%|▊         | 86/1000 [03:07<42:07,  2.77s/it]

Error extracting text from http://www.balkaninsight.com/en/article/protests-won-t-stop-until-montenegro-pm-resigns-09-28-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/protests-won-t-stop-until-montenegro-pm-resigns-09-28-2015


Processing URLs:   9%|▉         | 89/1000 [03:10<22:40,  1.49s/it]

Error extracting text from http://www.reuters.com/article/us-ukraine-crisis-cyber-idUSKBN1421YT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-cyber-idUSKBN1421YT?il=0


Processing URLs:   9%|▉         | 90/1000 [03:11<21:31,  1.42s/it]

Error extracting text from http://www.brookings.edu/research/opinions/2013/12/17-china-air-defense-identification-zone-osawa: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/opinions/2013/12/17-china-air-defense-identification-zone-osawa


Processing URLs:   9%|▉         | 92/1000 [03:13<18:45,  1.24s/it]

Error extracting text from http://gadaa.net/FinfinneTribune/2016/02/habtamu-dugo-will-expressing-concern-prevent-state-led-mass-murder-in-oromia-ethiopia/: 404 Client Error: Not Found for url: https://gadaa.net/FinfinneTribune/2016/02/habtamu-dugo-will-expressing-concern-prevent-state-led-mass-murder-in-oromia-ethiopia/


Processing URLs:  10%|▉         | 97/1000 [03:19<16:19,  1.08s/it]

Error extracting text from http://thehill.com/policy/national-security/353102-trump-faces-key-deadline-on-russia-sanctions: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/353102-trump-faces-key-deadline-on-russia-sanctions/
Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x637040/4E3260F0-B711-47DF-9C9E-16A8DDA99A19/Q4_12_SHL_022013_final.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x637040/4E3260F0-B711-47DF-9C9E-16A8DDA99A19/Q4_12_SHL_022013_final.pdf


Processing URLs:  10%|█         | 101/1000 [03:24<19:14,  1.28s/it]

Error extracting text from https://www.gov.ca.gov/news.php?id=19515: 403 Client Error: Forbidden for url: https://www.gov.ca.gov/news.php?id=19515


Processing URLs:  10%|█         | 103/1000 [03:27<18:34,  1.24s/it]

URL filtered: https://twitter.com/Snowden/status/787324496491479040


Processing URLs:  11%|█         | 106/1000 [04:27<2:40:06, 10.75s/it]

Error extracting text from https://dc.isda.org/cds/petroleos-de-venezuela-s-a/: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /cds/petroleos-de-venezuela-s-a/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300725d90>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))
Error extracting text from http://www.nytimes.com/2016/02/28/us/politics/donald-trump-republican-party.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/28/us/politics/donald-trump-republican-party.html?_r=0


Processing URLs:  11%|█         | 107/1000 [04:27<1:59:29,  8.03s/it]

Error extracting text from https://www.reuters.com/business/us-tariff-review-considers-commodity-shortages-inflation-official-2021-05-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/us-tariff-review-considers-commodity-shortages-inflation-official-2021-05-14/


Processing URLs:  11%|█         | 112/1000 [04:30<27:12,  1.84s/it]  

Error extracting text from http://www.nytimes.com/2016/02/27/business/media/sarah-kershaw-former-times-reporter-dies-at-49.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/27/business/media/sarah-kershaw-former-times-reporter-dies-at-49.html?_r=0
Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2816%2900320-2/fulltext?elsca1=etoc&amp;amp;elsca2=email&amp;amp;elsca3=0140-6736_20160220_387_10020_&amp;amp;elsca4=Public%20Health|Infectious%20Diseases|Health%20Policy|Internal%2FFamily%20Medicine|General%20Surgery|Lancet: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2816%2900320-2/fulltext?elsca1=etoc&amp;amp;elsca2=email&amp;amp;elsca3=0140-6736_20160220_387_10020_&amp;amp;elsca4=Public%20Health%7CInfectious%20Diseases%7CHealth%20Policy%7CInternal%2FFamily%20Medicine%7CGeneral%20Surgery%7CLancet


Processing URLs:  12%|█▏        | 115/1000 [04:34<23:04,  1.56s/it]

Error extracting text from https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/AHTRyQJtiRin22kth?commentId=PvhNbXPLJb4e6Yg9E: 403 Client Error: Forbidden for url: https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/AHTRyQJtiRin22kth?commentId=PvhNbXPLJb4e6Yg9E


Processing URLs:  12%|█▏        | 122/1000 [04:53<21:43,  1.49s/it]

Error extracting text from http://www.reuters.com/article/us-fiatchrysler-tesla-idUSKCN0XC26B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-fiatchrysler-tesla-idUSKCN0XC26B
Error extracting text from http://www.reuters.com/article/2015/09/30/us-usa-crude-exports-senate-idUSKCN0RU2VV20150930: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/30/us-usa-crude-exports-senate-idUSKCN0RU2VV20150930


Processing URLs:  12%|█▏        | 124/1000 [04:56<20:29,  1.40s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/erdogan-says-allies-appro/2251234.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/erdogan-says-allies-appro/2251234.html


Processing URLs:  13%|█▎        | 127/1000 [05:59<4:39:11, 19.19s/it]

Error extracting text from http://aa.com.tr/en/politics/turkish-assembly-to-renew-friendship-with-russian-duma/688225: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)
URL filtered: https://www.bloomberg.com/politics/articles/2017-03-08/brexit-talks-where-will-the-eu-red-lines-be-after-article-50


Processing URLs:  13%|█▎        | 130/1000 [06:03<2:02:53,  8.47s/it]

URL filtered: https://twitter.com/nuskowi/status/703663447695859716


Processing URLs:  13%|█▎        | 133/1000 [06:06<1:04:08,  4.44s/it]

Error extracting text from http://thehill.com/homenews/administration/312245-trump-praises-very-smart-putin: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/312245-trump-praises-very-smart-putin/


Processing URLs:  13%|█▎        | 134/1000 [06:07<52:51,  3.66s/it]  

Error extracting text from http://www.heavyliftpfi.com/news/airlander-cleared-for-take-off.html: 403 Client Error: Forbidden for url: https://www.heavyliftpfi.com/news/airlander-cleared-for-take-off.html


Processing URLs:  14%|█▍        | 139/1000 [06:15<25:58,  1.81s/it]

Error extracting text from http://nationalinterest.org/feature/irans-elections-reformists-hardliners-the-%E2%80%98deep-state%E2%80%99-15212?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/irans-elections-reformists-hardliners-the-%E2%80%98deep-state%E2%80%99-15212?page=2


Processing URLs:  14%|█▍        | 143/1000 [06:21<18:16,  1.28s/it]

Error extracting text from http://thehill.com/policy/national-security/281565-feds-fight-to-prevent-clinton-deposition-in-email-case: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/281565-feds-fight-to-prevent-clinton-deposition-in-email-case/


Processing URLs:  14%|█▍        | 144/1000 [06:22<16:37,  1.16s/it]

Error extracting text from http://www.wsj.com/articles/next-up-for-greece-how-to-shrink-the-debt-1452797090: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/next-up-for-greece-how-to-shrink-the-debt-1452797090
URL filtered: https://twitter.com/markknoller/status/656865717376163840


Processing URLs:  15%|█▍        | 149/1000 [06:33<29:51,  2.11s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-02-20/merkel-s-final-barrier-to-historic-fourth-term-has-a-name-kevin


Processing URLs:  16%|█▌        | 158/1000 [06:59<38:27,  2.74s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-06-24/how-iranian-oil-tankers-keep-syria-s-war-machine-alive


Processing URLs:  16%|█▋        | 163/1000 [07:10<34:53,  2.50s/it]

Error extracting text from https://websays.com/es/marcas/tecnologia/websays-desarrolla-el-programa-netsentiment-para-medir-la-audiencia-online-de-las-elecciones-italianas/: 403 Client Error: Forbidden for url: https://websays.com/es/marcas/tecnologia/websays-desarrolla-el-programa-netsentiment-para-medir-la-audiencia-online-de-las-elecciones-italianas/


Processing URLs:  16%|█▋        | 164/1000 [07:11<29:53,  2.14s/it]

Error extracting text from https://nairametrics.com/2021/07/13/brent-crude-back-above-75-despite-u-s-inflation-scare/: 403 Client Error: Forbidden for url: https://nairametrics.com/2021/07/13/brent-crude-back-above-75-despite-u-s-inflation-scare/


Processing URLs:  16%|█▋        | 165/1000 [07:12<22:51,  1.64s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-davidson-idUSKBN16K1I1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-davidson-idUSKBN16K1I1
URL filtered: https://www.bloomberg.com/news/articles/2017-09-20/venezuela-said-to-be-late-on-185-million-sovereign-bond-payment


Processing URLs:  17%|█▋        | 169/1000 [07:16<18:42,  1.35s/it]

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/isis-sharia-court-in-mosul-destroyed/amp/?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/isis-sharia-court-in-mosul-destroyed/amp/


Processing URLs:  17%|█▋        | 172/1000 [07:21<16:01,  1.16s/it]

Error extracting text from http://www.counterpunch.org/2016/02/05/peace-talks-paused-after-putins-triumph-in-aleppo/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2016/02/05/peace-talks-paused-after-putins-triumph-in-aleppo/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=fa&amp;tl=en&amp;u=http://www.tasnimnews.com/fa/news/1394/12/09/1013647/: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fa&amp;tl=en&amp;u=http://www.tasnimnews.com/fa/news/1394/12/09/1013647/


Processing URLs:  17%|█▋        | 174/1000 [07:23<14:48,  1.08s/it]

Error extracting text from https://www.kyivpost.com/content/ukraine-abroad/us-tightens-sanctions-against-russia-eu-rolls-over-measures-for-six-months-397021.html: 403 Client Error: Forbidden for url: https://www.kyivpost.com/content/ukraine-abroad/us-tightens-sanctions-against-russia-eu-rolls-over-measures-for-six-months-397021.html


Processing URLs:  18%|█▊        | 178/1000 [07:34<31:33,  2.30s/it]

Error extracting text from http://www.wsta.co.uk/press/731-uk-becomes-a-nation-of-wine-drinkers-as-wine-replaces-beer-as-our-drink-of-choice: 404 Client Error: Not Found for url: https://wsta.co.uk/press/731-uk-becomes-a-nation-of-wine-drinkers-as-wine-replaces-beer-as-our-drink-of-choice


Processing URLs:  18%|█▊        | 179/1000 [07:35<25:27,  1.86s/it]

Error extracting text from https://www.newsweek.com/2022-beijing-winter-olympics-boycott-looming-genocide-covid-1503868: 403 Client Error: Forbidden for url: https://www.newsweek.com/2022-beijing-winter-olympics-boycott-looming-genocide-covid-1503868


Processing URLs:  18%|█▊        | 184/1000 [07:41<13:41,  1.01s/it]

Error extracting text from http://www.nytimes.com/2016/01/07/world/asia/north-korea-hydrogen-bomb-claim-reactions.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/07/world/asia/north-korea-hydrogen-bomb-claim-reactions.html?_r=0
Error extracting text from https://www.techdirt.com/articles/20151231/08133333210/us-department-agriculture-taftattip-study-small-gains-us-losses-eu.shtml: 403 Client Error: Forbidden for url: https://www.techdirt.com/articles/20151231/08133333210/us-department-agriculture-taftattip-study-small-gains-us-losses-eu.shtml


Processing URLs:  19%|█▊        | 186/1000 [07:44<15:50,  1.17s/it]

Error extracting text from https://www.wsj.com/articles/wall-street-watchdog-keeps-bitcoin-etfs-on-ice-11626616801: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/wall-street-watchdog-keeps-bitcoin-etfs-on-ice-11626616801


Processing URLs:  19%|█▊        | 187/1000 [07:45<16:21,  1.21s/it]

Error extracting text from https://unama.unmissions.org/sites/default/files/protection_of_civilians_in_armed_conflict_midyear_report_2016_final.pdf: HTTPSConnectionPool(host='unama.unmissions.org', port=443): Max retries exceeded with url: /sites/default/files/protection_of_civilians_in_armed_conflict_midyear_report_2016_final.pdf (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: https://www.bloomberg.com/news/features/2021-08-31/will-business-travel-come-back-data-show-air-hotel-travel-forever-changed?sref=wUBMASPo
URL filtered: https://www.youtube.com/watch?v=hEPatDpT_ek


Processing URLs:  19%|█▉        | 193/1000 [07:49<12:38,  1.06it/s]

Error extracting text from http://www.businesskorea.co.kr/news/ict/12573-android-iphone-divergence-mid-price-smartphones-disappearing-korean-market: 404 Client Error: Not Found for url: https://www.businesskorea.co.kr/news/ict/12573-android-iphone-divergence-mid-price-smartphones-disappearing-korean-market


Processing URLs:  20%|█▉        | 196/1000 [07:52<10:39,  1.26it/s]

Error extracting text from https://www.nytimes.com/2021/11/05/world/africa/ethiopia-tigray-eight-groups.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/11/05/world/africa/ethiopia-tigray-eight-groups.html


Processing URLs:  20%|█▉        | 197/1000 [07:53<12:29,  1.07it/s]

Error extracting text from http://www.jsonline.com/news/statepolitics/koch-brothers-pull-ad-buy-backing-ron-johnson-b99765361z1-387673851.html: 404 Client Error: OK for url: https://www.jsonline.com/news/statepolitics/koch-brothers-pull-ad-buy-backing-ron-johnson-b99765361z1-387673851.html/


Processing URLs:  20%|█▉        | 199/1000 [07:56<15:59,  1.20s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481590/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481590/


Processing URLs:  20%|██        | 202/1000 [08:06<27:10,  2.04s/it]

Error extracting text from http://www.sense-eu.info/: 410 Client Error: Gone for url: http://www.sense-eu.info/


Processing URLs:  20%|██        | 203/1000 [08:08<24:38,  1.85s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/trump-administration-says-iran-complying-with-nuclear-deal/articleshow/58252637.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/trump-administration-says-iran-complying-with-nuclear-deal/articleshow/58252637.cms


Processing URLs:  21%|██        | 206/1000 [08:10<12:05,  1.10it/s]

Error extracting text from http://www.timesofisrael.com/hezbollah-said-pulling-back-from-syria-fighting/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/hezbollah-said-pulling-back-from-syria-fighting/


Processing URLs:  21%|██        | 208/1000 [08:13<14:29,  1.10s/it]

Error extracting text from http://www.ohchr.org/EN/HRBodies/HRC/UNIIB/Pages/UNIIB.aspx: 403 Client Error: Forbidden for url: https://www.ohchr.org/EN/HRBodies/HRC/UNIIB/Pages/UNIIB.aspx


Processing URLs:  21%|██        | 210/1000 [08:15<14:08,  1.07s/it]

Error extracting text from http://www.businessinsider.com/r-eus-timmermans-sees-room-for-solution-in-polands-constitutional-crisis-2016-4: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-eus-timmermans-sees-room-for-solution-in-polands-constitutional-crisis-2016-4


Processing URLs:  21%|██        | 211/1000 [08:16<13:59,  1.06s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/capitol-hill-establishments-turn-rise-35595513: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/capitol-hill-establishments-turn-rise-35595513


Processing URLs:  21%|██        | 212/1000 [08:16<11:34,  1.13it/s]

Error extracting text from https://www.thelocal.it/20210126/explained-why-has-italy-prime-minister-resigned-and-what-happens-now: 403 Client Error: Forbidden for url: https://www.thelocal.it/20210126/explained-why-has-italy-prime-minister-resigned-and-what-happens-now


Processing URLs:  21%|██▏       | 213/1000 [08:17<12:58,  1.01it/s]

Error extracting text from https://www.lesswrong.com/posts/AHTRyQJtiRin22kth/the-darwin-game-1?commentId=PeHa27vgerZYpgery: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/AHTRyQJtiRin22kth/the-darwin-game-1?commentId=PeHa27vgerZYpgery


Processing URLs:  22%|██▏       | 215/1000 [08:19<10:03,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-lavrov-idUSKBN14G0PG?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-lavrov-idUSKBN14G0PG?mod=related&amp;channelName=worldNews


Processing URLs:  22%|██▏       | 218/1000 [08:22<10:03,  1.29it/s]

Error extracting text from https://english.alarabiya.net/en/business/economy/2017/01/17/Aramco-CEO-Shares-listing-to-include-Saudi-one-or-two-international-markets.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/economy/2017/01/17/Aramco-CEO-Shares-listing-to-include-Saudi-one-or-two-international-markets.html


Processing URLs:  22%|██▏       | 221/1000 [08:26<14:47,  1.14s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-09-27/europe-s-energy-crisis-is-about-to-go-global-as-gas-prices-soar?sref=x7nYEkiY


Processing URLs:  22%|██▏       | 224/1000 [08:26<06:50,  1.89it/s]

Error extracting text from http://news.yahoo.com/nato-set-invite-montenegro-join-alliance-sources-160122506.html: 404 Client Error: Not Found for url: http://news.yahoo.com/nato-set-invite-montenegro-join-alliance-sources-160122506.html
Error extracting text from http://religion.blogs.cnn.com/2011/03/25/jesuits-pay-record-166-1-million-in-child-abuse-case/: 410 Client Error: Gone for url: http://religion.blogs.cnn.com/2011/03/25/jesuits-pay-record-166-1-million-in-child-abuse-case/


Processing URLs:  23%|██▎       | 226/1000 [08:27<06:11,  2.08it/s]

Error extracting text from https://www.nytimes.com/2017/04/21/world/europe/paris-champs-elysees-gunman.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/21/world/europe/paris-champs-elysees-gunman.html


Processing URLs:  23%|██▎       | 227/1000 [08:27<05:23,  2.39it/s]

Error extracting text from https://www.nytimes.com/reuters/2017/11/03/business/03reuters-global-oil.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/11/03/business/03reuters-global-oil.html


Processing URLs:  23%|██▎       | 228/1000 [08:28<05:10,  2.49it/s]

Error extracting text from http://www.marinetraffic.com/it/ais/home/centerx:51/centery:26/zoom:7: 403 Client Error: Forbidden for url: https://www.marinetraffic.com/it/ais/home/centerx:51/centery:26/zoom:7
Error extracting text from http://www.bakhtarnews.com.af/eng/politics/item/23742-president-ghani-outlines-elections-plans-to-un-envoy.html?tmpl=component&amp;print=1: HTTPConnectionPool(host='www.bakhtarnews.com.af', port=80): Max retries exceeded with url: /eng/politics/item/23742-president-ghani-outlines-elections-plans-to-un-envoy.html?tmpl=component&amp;print=1 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fce8d0>: Failed to resolve 'www.bakhtarnews.com.af' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  23%|██▎       | 231/1000 [08:29<06:51,  1.87it/s]

Error extracting text from http://www.un.org/sg/offthecuff/index.asp?nid=4259: 403 Client Error: Forbidden for url: https://www.un.org/sg/offthecuff/index.asp?nid=4259


Processing URLs:  24%|██▎       | 236/1000 [08:37<13:31,  1.06s/it]

Error extracting text from https://www.sfgate.com/news/article/EXPLAINER-Senate-eyes-budget-rule-to-push-past-16081541.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/news/article/EXPLAINER-Senate-eyes-budget-rule-to-push-past-16081541.php
URL filtered: https://www.youtube.com/watch?v=F7k2GUSMhGQ


Processing URLs:  24%|██▍       | 239/1000 [08:39<10:02,  1.26it/s]

Error extracting text from https://www.nytimes.com/2017/12/26/world/asia/india-pakistan-border.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/26/world/asia/india-pakistan-border.html


Processing URLs:  24%|██▍       | 244/1000 [08:44<14:58,  1.19s/it]

Error extracting text from http://www.todayonline.com/world/turkish-defense-minister-warns-mosul-offensive-may-trigger-1-million-migrant-flood: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/turkish-defense-minister-warns-mosul-offensive-may-trigger-1-million-migrant-flood


Processing URLs:  25%|██▍       | 247/1000 [08:51<27:25,  2.18s/it]

Error extracting text from http://www.arbeitgeber.de/www/arbeitgeber.nsf/res/GDP_CPI_Prod_Forecasts.pdf/%24file/GDP_CPI_Prod_Forecasts.pdf: 404 Client Error: Not Found for url: https://arbeitgeber.de/www/arbeitgeber.nsf/res/GDP_CPI_Prod_Forecasts.pdf/%24file/GDP_CPI_Prod_Forecasts.pdf


Processing URLs:  25%|██▍       | 248/1000 [08:52<23:02,  1.84s/it]

Error extracting text from https://www.axios.com/fbi-homeland-security-capitol-threat-9e2f3c8f-de45-4e5d-8fb7-2aeb658432d1.html: 403 Client Error: Forbidden for url: https://www.axios.com/fbi-homeland-security-capitol-threat-9e2f3c8f-de45-4e5d-8fb7-2aeb658432d1.html


Processing URLs:  25%|██▌       | 253/1000 [09:12<51:51,  4.16s/it]

Error extracting text from https://morningconsult.com/alert/reports-obama-to-nominate-merrick-garland-for-scotus/: 404 Client Error: Not Found for url: https://morningconsult.com/alert/reports-obama-to-nominate-merrick-garland-for-scotus/


Processing URLs:  26%|██▌       | 256/1000 [09:16<28:12,  2.28s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN18Y17H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN18Y17H


Processing URLs:  26%|██▌       | 257/1000 [10:17<4:04:01, 19.71s/it]

Error extracting text from http://www.spaceflightinsider.com/missions/defense/north-korea-launches-long-range-rocket-kwangmyongsong-4-satellite/#wDhKIgWmPhRxzVJF.99: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /missions/defense/north-korea-launches-long-range-rocket-kwangmyongsong-4-satellite/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ff933b00>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  27%|██▋       | 270/1000 [11:30<35:33,  2.92s/it]  

URL filtered: https://www.facebook.com/dangerburundi/
Error extracting text from http://www.reuters.com/article/us-usa-election-cyber-idUSKBN12G0XN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-cyber-idUSKBN12G0XN


Processing URLs:  27%|██▋       | 271/1000 [11:31<29:36,  2.44s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/US-looms-behind-China-stance-on-North-Korea: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/US-looms-behind-China-stance-on-North-Korea


Processing URLs:  27%|██▋       | 273/1000 [11:31<17:44,  1.46s/it]

Error extracting text from http://www.ghanaweb.com/GhanaHomePage/NewsArchive/I-m-first-African-leader-to-visit-Iran-after-nuclear-treaty-Mahama-416093: 403 Client Error: Forbidden for url: https://www.ghanaweb.com/GhanaHomePage/NewsArchive/I-m-first-African-leader-to-visit-Iran-after-nuclear-treaty-Mahama-416093
Error extracting text from http://www.business-standard.com/article/economy-policy/trade-pact-battle-likely-to-continue-116042500010_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/economy-policy/trade-pact-battle-likely-to-continue-116042500010_1.html


Processing URLs:  27%|██▋       | 274/1000 [11:32<15:08,  1.25s/it]

Error extracting text from http://www.theregister.co.uk/2016/02/23/utah_govt_systems_copping_300_million_security_incidents_a_day/: 403 Client Error: Forbidden for url: https://www.theregister.com/2016/02/23/utah_govt_systems_copping_300_million_security_incidents_a_day/


Processing URLs:  28%|██▊       | 275/1000 [11:33<13:46,  1.14s/it]

Error extracting text from http://www.who.int/mediacentre/news/statements/2015/ihr-polio-17-august-2015/en/: 404 Client Error: Not Found for url: https://www.who.int/mediacentre/news/statements/2015/ihr-polio-17-august-2015/en/


Processing URLs:  28%|██▊       | 277/1000 [11:36<14:13,  1.18s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/10/14/national/politics-diplomacy/rouhani-conveys-kishida-invite-abe-visit-iran/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/10/14/national/politics-diplomacy/rouhani-conveys-kishida-invite-abe-visit-iran/


Processing URLs:  28%|██▊       | 279/1000 [11:38<13:22,  1.11s/it]

Error extracting text from http://www.comres.co.uk/polls/itv-news-daily-mail-eu-referendum-poll-may-2016/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/itv-news-daily-mail-eu-referendum-poll-may-2016/


Processing URLs:  28%|██▊       | 280/1000 [11:38<10:31,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AM0HA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AM0HA


Processing URLs:  28%|██▊       | 283/1000 [11:39<05:13,  2.29it/s]

Error extracting text from https://www.nato.int/cps/en/natohq/news_188552.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/news_188552.htm


Processing URLs:  28%|██▊       | 284/1000 [11:39<06:29,  1.84it/s]

Error extracting text from http://www.dailypress.com/news/science/dp-nws-darklight-jeff-lab-20160617-story.html: 404 Client Error: Not Found for url: https://www.dailypress.com/news/science/dp-nws-darklight-jeff-lab-20160617-story.html


Processing URLs:  29%|██▊       | 286/1000 [11:41<07:09,  1.66it/s]

Error extracting text from https://larswericson.wordpress.com/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/


Processing URLs:  29%|██▉       | 294/1000 [11:47<08:07,  1.45it/s]

Error extracting text from http://thehill.com/policy/energy-environment/257582-refiners-call-for-clean-oil-export-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/257582-refiners-call-for-clean-oil-export-bill/
Error extracting text from http://www.reuters.com/article/us-china-parliament-defence-idUSKBN16B043: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-parliament-defence-idUSKBN16B043


Processing URLs:  30%|██▉       | 295/1000 [11:48<10:12,  1.15it/s]

Error extracting text from http://www.ibtimes.co.uk/yes-i-want-elections-now-venezuelan-president-maduro-says-he-calls-talks-opposition-1618268: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/yes-i-want-elections-now-venezuelan-president-maduro-says-he-calls-talks-opposition-1618268


Processing URLs:  30%|███       | 302/1000 [11:59<20:25,  1.76s/it]

Error extracting text from http://buenosairesherald.com/article/208650/top-court-oks-maduro-emergency-decree-: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/208650/top-court-oks-maduro-emergency-decree-


Processing URLs:  30%|███       | 303/1000 [12:00<19:06,  1.64s/it]

Error extracting text from https://larswericson.wordpress.com/2016/01/12/memo-to-resonancia/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/01/12/memo-to-resonancia/


Processing URLs:  30%|███       | 304/1000 [12:02<19:27,  1.68s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/official_texts_112964.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/official_texts_112964.htm


Processing URLs:  31%|███       | 310/1000 [12:14<24:10,  2.10s/it]

Error extracting text from http://www.onr.org.uk/jobs/disciplines/nuclear-security-information-and-cyber-security-inspector.htm: 404 Client Error: Not Found for url: https://www.onr.org.uk/jobs/disciplines/nuclear-security-information-and-cyber-security-inspector.htm
URL filtered: http://www.bloomberg.com/news/articles/2015-10-01/kuroda-struggles-to-wean-banks-off-bonds-as-more-boj-easing-seen


Processing URLs:  31%|███▏      | 313/1000 [12:16<14:03,  1.23s/it]

Error extracting text from https://constitutioncenter.org/amp/blog/what-will-justice-breyer-do: 403 Client Error: Forbidden for url: https://constitutioncenter.org/amp/blog/what-will-justice-breyer-do


Processing URLs:  31%|███▏      | 314/1000 [12:17<15:09,  1.33s/it]

Error extracting text from http://www.wsj.com/articles/u-s-china-split-over-north-korea-casts-pall-on-ties-1454655805: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-china-split-over-north-korea-casts-pall-on-ties-1454655805


Processing URLs:  32%|███▏      | 316/1000 [12:20<13:39,  1.20s/it]

Error extracting text from http://www.almanis.com/blog/brexit-debate-forecasters-confident-remain-win/: 404 Client Error: Not Found for url: https://www.almanisprivate.com/blog/brexit-debate-forecasters-confident-remain-win/
Error extracting text from https://www.france24.com/en/live-news/20220123-europe-could-be-headed-for-pandemic-endgame-who: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20220123-europe-could-be-headed-for-pandemic-endgame-who


Processing URLs:  32%|███▏      | 318/1000 [12:22<12:53,  1.13s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-03/northeast-nigeria-recovering-from-islamist-violence-unicef-says


Processing URLs:  32%|███▏      | 320/1000 [12:22<07:47,  1.45it/s]

Error extracting text from https://www.nytimes.com/2021/08/13/world/asia/afghanistan-ghani-president-isolated.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/13/world/asia/afghanistan-ghani-president-isolated.html


Processing URLs:  32%|███▏      | 321/1000 [12:23<07:15,  1.56it/s]

Error extracting text from http://www.realcleardefense.com/articles/2016/06/22/russias_growing_strategic_nuclear_forces_and_new_start_treaty_compliance_109472-3.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/06/22/russias_growing_strategic_nuclear_forces_and_new_start_treaty_compliance_109472-3.html


Processing URLs:  32%|███▏      | 322/1000 [12:24<08:26,  1.34it/s]

Error extracting text from https://cleantechnica.com/2016/11/16/tesla-continues-streamline-production-now-bundling-various-model-x-interior-options/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/11/16/tesla-continues-streamline-production-now-bundling-various-model-x-interior-options/


Processing URLs:  32%|███▏      | 323/1000 [12:24<08:14,  1.37it/s]

Error extracting text from http://www.wsj.com/articles/china-to-hold-military-exercises-in-south-china-sea-raising-tensions-1467542325: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-to-hold-military-exercises-in-south-china-sea-raising-tensions-1467542325


Processing URLs:  33%|███▎      | 326/1000 [12:26<06:35,  1.70it/s]

Error extracting text from https://www.reuters.com/article/us-ethiopia-dam-sudan-egypt/three-way-talks-on-ethiopian-dam-reach-new-impasse-idUSKBN29F0J7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ethiopia-dam-sudan-egypt/three-way-talks-on-ethiopian-dam-reach-new-impasse-idUSKBN29F0J7
Error extracting text from https://news.usni.org/2016/03/30/china-defends-deployment-of-anti-ship-missiles-to-south-china-sea-island: 403 Client Error: Forbidden for url: https://news.usni.org/2016/03/30/china-defends-deployment-of-anti-ship-missiles-to-south-china-sea-island


Processing URLs:  33%|███▎      | 328/1000 [12:29<10:09,  1.10it/s]

Error extracting text from http://www.businessinsider.com/iran-has-reportedly-stopped-dismantling-nuclear-centrifuges-2015-11: 404 Client Error: Not Found for url: https://www.businessinsider.com/iran-has-reportedly-stopped-dismantling-nuclear-centrifuges-2015-11


Processing URLs:  33%|███▎      | 329/1000 [12:30<11:27,  1.03s/it]

Error extracting text from https://www.ancestry.com/cs/offers/traitsadd?ucdmid=05955548-0006-0000-0000-000000000000&tg=E25d6e79-60a5-465d-92d6-4bd27adb8718: 403 Client Error: Forbidden for url: https://www.ancestry.com/cs/offers/traitsadd?ucdmid=05955548-0006-0000-0000-000000000000&tg=E25d6e79-60a5-465d-92d6-4bd27adb8718


Processing URLs:  33%|███▎      | 331/1000 [12:32<10:06,  1.10it/s]

Error extracting text from https://www.thelocal.it/20171218/italys-five-star-movement-leader-softens-stance-on-euro-and-party-alliances-ahead-of-2018-election: 403 Client Error: Forbidden for url: https://www.thelocal.it/20171218/italys-five-star-movement-leader-softens-stance-on-euro-and-party-alliances-ahead-of-2018-election


Processing URLs:  33%|███▎      | 332/1000 [12:33<10:33,  1.05it/s]

Error extracting text from http://news.xinhuanet.com/english/2016-04/10/c_135263976.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/10/c_135263976.htm


Processing URLs:  34%|███▍      | 343/1000 [12:55<13:21,  1.22s/it]

Error extracting text from https://www.nytimes.com/2014/09/10/upshot/why-does-scotland-want-independence-its-culture-vs-economics.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2014/09/10/upshot/why-does-scotland-want-independence-its-culture-vs-economics.html


Processing URLs:  35%|███▍      | 347/1000 [12:59<10:05,  1.08it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/03/03/why-venezuela-wont-default-on-debts/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/03/03/why-venezuela-wont-default-on-debts/


Processing URLs:  35%|███▍      | 348/1000 [12:59<08:47,  1.24it/s]

Error extracting text from http://www.wsj.com/articles/the-mystery-of-missing-inflation-weighs-on-fed-rate-move-1450056838?mod=trending_now_1: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-mystery-of-missing-inflation-weighs-on-fed-rate-move-1450056838?mod=trending_now_1


Processing URLs:  35%|███▌      | 352/1000 [13:03<08:01,  1.35it/s]

Error extracting text from http://www.geoengineeringwatch.org/geoengineering-frankenstorm-hurricane0sandy-and-the-air-force-weather-weapon-system-pt1/: 403 Client Error: Forbidden for url: http://www.geoengineeringwatch.org/geoengineering-frankenstorm-hurricane0sandy-and-the-air-force-weather-weapon-system-pt1/


Processing URLs:  36%|███▌      | 355/1000 [13:05<07:42,  1.39it/s]

Error extracting text from http://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160105


Processing URLs:  36%|███▌      | 357/1000 [13:08<10:42,  1.00it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/taliban-districts-ne-afghanistan-fleeing-troops-78658014: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/taliban-districts-ne-afghanistan-fleeing-troops-78658014
URL filtered: https://twitter.com/FCDOGovUK/status/1415290772111380482


Processing URLs:  36%|███▌      | 360/1000 [13:09<07:17,  1.46it/s]

Error extracting text from http://www.nytimes.com/2015/10/01/us/politics/hillary-clinton-camp-begins-to-fear-run-by-joe-biden.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/us/politics/hillary-clinton-camp-begins-to-fear-run-by-joe-biden.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  36%|███▌      | 361/1000 [13:10<06:27,  1.65it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-spd-leaders-upbeat-as-biggest-branch-backs-coalition-talks-idUSKBN1F90K9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-spd-leaders-upbeat-as-biggest-branch-backs-coalition-talks-idUSKBN1F90K9


Processing URLs:  37%|███▋      | 367/1000 [13:21<17:26,  1.65s/it]



Processing URLs:  37%|███▋      | 369/1000 [13:25<21:03,  2.00s/it]

Error extracting text from http://m.nzherald.co.nz/the-country/news/article.cfm?c_id=16&amp;objectid=11800192: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/the-country/news/article.cfm?c_id=16&amp;objectid=11800192


Processing URLs:  37%|███▋      | 371/1000 [14:00<1:40:47,  9.61s/it]

Error extracting text from http://summitlake.com/wp_1commentary/?p=534: HTTPConnectionPool(host='summitlake.com', port=80): Max retries exceeded with url: /wp_1commentary/?p=534 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3039896a0>: Failed to resolve 'summitlake.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 376/1000 [14:19<1:04:48,  6.23s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-12/china-s-stocks-bonds-yuan-slump-in-unison-on-liquidity-concern


Processing URLs:  38%|███▊      | 380/1000 [14:22<24:25,  2.36s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN15P1AK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN15P1AK


Processing URLs:  38%|███▊      | 381/1000 [14:26<29:43,  2.88s/it]

Error extracting text from http://www.mq.edu.au/about_us/faculties_and_departments/faculty_of_arts/mhpir/staff/staff-politics_and_international_relations/john_kilcullen/a_comparison_of_australian_british_canadian_and_us_political_systems/: 503 Server Error: Service Temporarily Unavailable for url: https://www.mq.edu.au/about_us/faculties_and_departments/faculty_of_arts/mhpir/staff/staff-politics_and_international_relations/john_kilcullen/a_comparison_of_australian_british_canadian_and_us_political_systems/


Processing URLs:  39%|███▊      | 386/1000 [14:34<14:36,  1.43s/it]

Error extracting text from http://www.nytimes.com/2016/02/29/world/middleeast/iran-elections-parliament.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/29/world/middleeast/iran-elections-parliament.html


Processing URLs:  39%|███▉      | 391/1000 [14:49<22:13,  2.19s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-24/venezuela-wins-relief-from-bondholders-as-39-accept-pdvsa-swap


Processing URLs:  40%|███▉      | 395/1000 [14:55<19:53,  1.97s/it]

Error extracting text from https://thebulletin.org/rolling-apocalypse-maturing-cyber-threat11202: 404 Client Error: Not Found for url: https://thebulletin.org/rolling-apocalypse-maturing-cyber-threat11202/


Processing URLs:  40%|████      | 403/1000 [15:15<28:48,  2.90s/it]

Error extracting text from http://www.rounds.senate.gov/newsroom/press-releases/rounds-introduces-iran-cyber-sanctions-act: 403 Client Error: Forbidden for url: http://www.rounds.senate.gov/newsroom/press-releases/rounds-introduces-iran-cyber-sanctions-act


Processing URLs:  40%|████      | 404/1000 [15:17<26:18,  2.65s/it]

Error extracting text from https://aoav.org.uk/2016/easter-sunday-suicide-bomber-pakistan-kills/: 403 Client Error: Forbidden for url: https://aoav.org.uk/2016/easter-sunday-suicide-bomber-pakistan-kills/


Processing URLs:  41%|████      | 409/1000 [15:24<14:22,  1.46s/it]

Error extracting text from https://www.nytimes.com/2017/08/21/us/rinat-akhmetshin-russia-trump-meeting.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/21/us/rinat-akhmetshin-russia-trump-meeting.html


Processing URLs:  41%|████▏     | 414/1000 [15:43<35:29,  3.63s/it]

Error extracting text from http://www.reuters.com/article/us-thailand-king-idUSKBN12H08D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-king-idUSKBN12H08D


Processing URLs:  42%|████▏     | 415/1000 [15:45<29:10,  2.99s/it]

Error extracting text from http://ec.europa.eu/budget/mycountry/PL/index_en.cfm: 404 Client Error: (Not Found) for url: https://ec.europa.eu/not_found


Processing URLs:  42%|████▏     | 424/1000 [16:11<25:09,  2.62s/it]

Error extracting text from http://uk.reuters.com/article/uk-saudi-economy-reserves-analysis-idUKKBN19I17R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  43%|████▎     | 427/1000 [16:14<12:28,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-talks-idUSKCN0WG25W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-talks-idUSKCN0WG25W
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN12U0AR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN12U0AR
Error extracting text from http://libertypell.com/remain-leave-lessons-brexit-vote/: HTTPConnectionPool(host='libertypell.com', port=80): Max retries exceeded with url: /remain-leave-lessons-brexit-vote/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe913920>: Failed to resolve 'libertypell.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  43%|████▎     | 431/1000 [16:20<17:54,  1.89s/it]

Error extracting text from http://tass.ru/en/world/874017: 404 Client Error: Not Found for url: https://tass.ru/en/world/874017


Processing URLs:  43%|████▎     | 434/1000 [16:33<32:06,  3.40s/it]

Error extracting text from http://www.generalfusion.com/: 403 Client Error: Forbidden for url: https://generalfusion.com


Processing URLs:  44%|████▎     | 435/1000 [16:35<28:25,  3.02s/it]

Error extracting text from http://agilisanalysis.com/home/?p=351: HTTPConnectionPool(host='agilisanalysis.com', port=80): Max retries exceeded with url: /home/?p=351 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff715d60>: Failed to resolve 'agilisanalysis.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▎     | 437/1000 [16:35<16:03,  1.71s/it]

Error extracting text from https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-apt28.pdf: 530 Server Error:  for url: https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-apt28.pdf


Processing URLs:  44%|████▍     | 440/1000 [16:42<17:10,  1.84s/it]

Error extracting text from http://finance.yahoo.com/news/apple-could-surprise-with-higher-iphone-and-watch-sales-155558289.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/apple-could-surprise-with-higher-iphone-and-watch-sales-155558289.html


Processing URLs:  44%|████▍     | 442/1000 [16:43<12:09,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-china-parliament-defence-idUSKBN16H049: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-parliament-defence-idUSKBN16H049


Processing URLs:  44%|████▍     | 443/1000 [16:44<11:54,  1.28s/it]

Error extracting text from http://www.pajhwok.com/en/2016/07/12/decrees-election-reforms-7-senators-named-parliamentary-team: 404 Client Error: Not Found for url: https://pajhwok.com/en/2016/07/12/decrees-election-reforms-7-senators-named-parliamentary-team


Processing URLs:  44%|████▍     | 445/1000 [16:46<09:31,  1.03s/it]

Error extracting text from https://www.nytimes.com/2017/05/31/opinion/trumps-united-american-emirate.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/31/opinion/trumps-united-american-emirate.html


Processing URLs:  45%|████▍     | 446/1000 [16:48<12:17,  1.33s/it]

Error extracting text from http://reliefweb.int/report/burundi/dozen-injured-burundi-grenade-blast-un-chief-visits: 404 Client Error: Not Found for url: https://reliefweb.int:443/report/burundi/dozen-injured-burundi-grenade-blast-un-chief-visits


Processing URLs:  45%|████▍     | 447/1000 [16:49<11:41,  1.27s/it]

Error extracting text from http://www.nasa.gov/mission_pages/rbsp/main/index.html: 404 Client Error: Not Found for url: https://www.nasa.gov/mission_pages/rbsp/main/index.html
URL filtered: http://www.bloomberg.com/markets


Processing URLs:  45%|████▌     | 450/1000 [16:52<09:59,  1.09s/it]

Error extracting text from http://nanonews.org/scottish-independence-nicola-sturgeon-to-launch-new-drive/: 500 Server Error: Internal Server Error for url: https://nanonews.org/scottish-independence-nicola-sturgeon-to-launch-new-drive/
URL filtered: https://www.bloomberg.com/news/articles/2017-07-13/u-k-accepts-it-must-pay-brexit-bill-on-departing-european-union


Processing URLs:  46%|████▌     | 455/1000 [17:32<37:42,  4.15s/it]  

Error extracting text from http://thehill.com/homenews/media/360490-roy-moore-attorney-canadian-msnbc-hosts-background-could-help-him-understand: 403 Client Error: Forbidden for url: https://thehill.com/homenews/media/360490-roy-moore-attorney-canadian-msnbc-hosts-background-could-help-him-understand/


Processing URLs:  46%|████▌     | 457/1000 [17:33<22:59,  2.54s/it]

URL filtered: https://www.youtube.com/watch?v=s-c8X52Qg4o


Processing URLs:  46%|████▌     | 461/1000 [17:33<08:28,  1.06it/s]

Error extracting text from https://www.nytimes.com/2017/11/02/world/americas/venezuela-debt.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/02/world/americas/venezuela-debt.html
Error extracting text from https://query1.finance.yahoo.com/v7/finance/download/&quot;: 403 Client Error: Forbidden for url: https://query1.finance.yahoo.com/v7/finance/download/&quot;
Error extracting text from http://www.reuters.com/article/us-usa-cyber-congress-idUSKBN13P2ER: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-congress-idUSKBN13P2ER
URL filtered: http://www.bloombergview.com/articles/2015-09-14/syria-at-peace-to-imagine-it-think-of-bosnia-


Processing URLs:  46%|████▋     | 463/1000 [17:35<07:13,  1.24it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/impeachment-trial-looms-for-brazils-beleaguered-rousseff/articleshow/53630316.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/impeachment-trial-looms-for-brazils-beleaguered-rousseff/articleshow/53630316.cms


Processing URLs:  46%|████▋     | 464/1000 [17:35<06:19,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN14D0JA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN14D0JA
URL filtered: http://fortune.com/2018/07/26/data-sheet-facebook-stock-price-plunge/


Processing URLs:  47%|████▋     | 466/1000 [17:36<05:09,  1.72it/s]

Error extracting text from http://www.euronews.com/business-newswires/3158433-greece-wont-cut-pensions-again-to-meet-euimf-demands-labour-minister/: 404 Client Error: Not Found for url: https://www.euronews.com/business-newswires/3158433-greece-wont-cut-pensions-again-to-meet-euimf-demands-labour-minister/


Processing URLs:  47%|████▋     | 471/1000 [17:42<07:50,  1.12it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-09/brazil-s-rousseff-said-to-be-pressured-to-name-lula-minister
Error extracting text from https://www.reuters.com/article/us-afghanistan-taliban/taliban-attack-afghan-checkpoints-killing-more-than-20-police-idUSKBN1DE0IV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban/taliban-attack-afghan-checkpoints-killing-more-than-20-police-idUSKBN1DE0IV


Processing URLs:  48%|████▊     | 475/1000 [17:46<07:43,  1.13it/s]

Error extracting text from https://bit.ly/3pSGtCk: 403 Client Error: Forbidden for url: https://zeenews.india.com/india/indian-and-vietnam-navies-to-undertake-passex-exercises-in-south-china-sea-2332750.html


error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserErro

Error extracting text from http://www.streetinsider.com/Reuters/Swiss+to+return+$70+million+more+to+Brazil+in+Petrobras+investigation/11429292.html: Document is empty


Processing URLs:  48%|████▊     | 477/1000 [17:49<10:00,  1.15s/it]

URL filtered: https://www.youtube.com/watch?v=rIr6rEndy0A


Processing URLs:  48%|████▊     | 483/1000 [17:55<08:00,  1.08it/s]

Error extracting text from http://www.securityweek.com/south-korea-accuses-north-cyber-attacks-nuclear-plants: 403 Client Error: Forbidden for url: https://www.securityweek.com/south-korea-accuses-north-cyber-attacks-nuclear-plants


Processing URLs:  49%|████▊     | 486/1000 [18:03<17:48,  2.08s/it]

Error extracting text from http://buenosairesherald.com/article/214590/recount-confirms-fujimori%E2%80%99s-majority-in-peru-congress: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/214590/recount-confirms-fujimori%E2%80%99s-majority-in-peru-congress


Processing URLs:  49%|████▉     | 493/1000 [18:33<30:55,  3.66s/it]  

Error extracting text from http://finance.yahoo.com/news/saudi-minister-believer-reform-low-oil-price-022818070.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/saudi-minister-believer-reform-low-oil-price-022818070.html


Processing URLs:  49%|████▉     | 494/1000 [19:05<1:40:19, 11.90s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/mandalay-upper-myanmar/19189-could-u-zaw-myint-maung-be-mandalay-s-next-chief-minister.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/mandalay-upper-myanmar/19189-could-u-zaw-myint-maung-be-mandalay-s-next-chief-minister.html


Processing URLs:  50%|████▉     | 499/1000 [19:20<34:13,  4.10s/it]  

Error extracting text from http://www.globalpost.com/article/6732344/2016/02/12/uae-deploy-special-forces-jets-anti-campaign: 404 Client Error: Not Found for url: https://theworld.org/article/6732344/2016/02/12/uae-deploy-special-forces-jets-anti-campaign


Processing URLs:  50%|█████     | 501/1000 [19:22<21:02,  2.53s/it]

Error extracting text from https://www.google.ca/amp/mobile.reuters.com/article/amp/idUSKBN12V147?client=ms-android-rogers-ca: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN12V147
Error extracting text from http://www.reuters.com/article/us-poland-constitution-eu-idUSKCN0YN458?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-constitution-eu-idUSKCN0YN458?il=0


Processing URLs:  50%|█████     | 504/1000 [19:24<10:13,  1.24s/it]

Error extracting text from https://www.wsj.com/articles/three-way-contest-for-raqqa-to-shape-mideast-1489055407: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/three-way-contest-for-raqqa-to-shape-mideast-1489055407


Processing URLs:  51%|█████     | 507/1000 [19:25<05:42,  1.44it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16P0BS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16P0BS


Processing URLs:  52%|█████▏    | 516/1000 [19:39<12:21,  1.53s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-10/28/c_134757333.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-10/28/c_134757333.htm


Processing URLs:  52%|█████▏    | 517/1000 [19:40<09:40,  1.20s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-goldman-selling-selling-gold-1498850055: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-goldman-selling-selling-gold-1498850055


Processing URLs:  52%|█████▏    | 521/1000 [19:42<06:24,  1.25it/s]

Error extracting text from https://www.nytimes.com/2017/07/26/world/asia/afghanistan-taliban-kandahar-slaughter.html?mabReward=CTM7&amp;recp=0&amp;action=click&amp;pgtype=Homepage&amp;region=CColumn&amp;module=Recommendation&amp;src=rechp&amp;WT.nav=RecEngine: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/26/world/asia/afghanistan-taliban-kandahar-slaughter.html?mabReward=CTM7&amp;recp=0&amp;action=click&amp;pgtype=Homepage&amp;region=CColumn&amp;module=Recommendation&amp;src=rechp&amp;WT.nav=RecEngine


Processing URLs:  52%|█████▎    | 525/1000 [19:52<12:52,  1.63s/it]

Error extracting text from https://www.thelocal.de/20161206/germany-among-top-european-countries-for-education: 403 Client Error: Forbidden for url: https://www.thelocal.de/20161206/germany-among-top-european-countries-for-education


Processing URLs:  53%|█████▎    | 527/1000 [19:56<12:52,  1.63s/it]

Error extracting text from https://mishtalk.com/2016/06/17/brexit-about-that-onlinephone-discrepancy-it-largely-vanished-surprising-poll-after-cox/: 404 Client Error: Not Found for url: https://mishtalk.com/2016/06/17/brexit-about-that-onlinephone-discrepancy-it-largely-vanished-surprising-poll-after-cox/


Processing URLs:  53%|█████▎    | 529/1000 [19:59<12:30,  1.59s/it]

Error extracting text from https://www.nytimes.com/2017/02/16/us/politics/neil-gorsuch-supreme-court-senate-hearing.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/16/us/politics/neil-gorsuch-supreme-court-senate-hearing.html


Processing URLs:  53%|█████▎    | 531/1000 [20:02<11:22,  1.46s/it]

Error extracting text from http://www.neweuropeinvestor.com/news/four-new-countrues-added-to-russian-food-ban-list-10508/: 404 Client Error: Not Found for url: https://www.neweuropeinvestor.com/news/four-new-countrues-added-to-russian-food-ban-list-10508/


Processing URLs:  54%|█████▎    | 537/1000 [20:10<07:24,  1.04it/s]

Error extracting text from http://www.charlesrehn.com/charlesrehn/books/aconversationwithamerica/essays/psychotronics/symptomsofpsychotronicattacks.htm: 406 Client Error: Not Acceptable for url: http://www.charlesrehn.com/charlesrehn/books/aconversationwithamerica/essays/psychotronics/symptomsofpsychotronicattacks.htm
Error extracting text from http://www.nytimes.com/2011/12/13/world/asia/chinese-fisherman-kills-south-korean-coast-guardsman.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2011/12/13/world/asia/chinese-fisherman-kills-south-korean-coast-guardsman.html?_r=0


Processing URLs:  54%|█████▍    | 540/1000 [20:15<08:44,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-clapper-idUSKBN14W1R8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-clapper-idUSKBN14W1R8


Processing URLs:  54%|█████▍    | 541/1000 [20:17<08:47,  1.15s/it]

Error extracting text from http://www.gov.me/en/News/156951/Prime-Minister-Milo-dukanovic-hosts-delegation-of-World-Bank.html: 404 Client Error: not found for url: https://www.gov.me/en/News/156951/Prime-Minister-Milo-dukanovic-hosts-delegation-of-World-Bank.html
URL filtered: https://twitter.com/DAlperovitch/status/1473362460673515527


Processing URLs:  54%|█████▍    | 543/1000 [20:17<05:16,  1.44it/s]

Error extracting text from https://www.nytimes.com/2017/01/24/us/politics/wall-border-trump.html?emc=edit_ee_20170125&amp;nl=todaysheadlines-europe&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/24/us/politics/wall-border-trump.html?emc=edit_ee_20170125&amp;nl=todaysheadlines-europe&amp;nlid=77825025


Processing URLs:  55%|█████▌    | 550/1000 [20:27<09:04,  1.21s/it]

Error extracting text from http://icirnigeria.org/outlawed-shiites-allege-mass-killing-of-members/: 403 Client Error: Forbidden for url: https://icirnigeria.org/outlawed-shiites-allege-mass-killing-of-members/


Processing URLs:  55%|█████▌    | 551/1000 [20:28<07:22,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-britain-royals-germany-merkel-idUSKBN1A413V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-royals-germany-merkel-idUSKBN1A413V


Processing URLs:  56%|█████▌    | 555/1000 [20:31<05:56,  1.25it/s]

Error extracting text from http://www.newsweek.com/trump-approval-rating-base-president-778628: 403 Client Error: Forbidden for url: https://www.newsweek.com/trump-approval-rating-base-president-778628
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www1.folha.uol.com.br/poder/2016/03/1745997-oposicao-pede-renuncia-de-dilma-para-aproximar-pmdb-de-impeachment.shtml&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www1.folha.uol.com.br/poder/2016/03/1745997-oposicao-pede-renuncia-de-dilma-para-aproximar-pmdb-de-impeachment.shtml&amp;prev=search


Processing URLs:  56%|█████▌    | 556/1000 [20:42<27:43,  3.75s/it]

Error extracting text from https://www.washingtonpost.com/world/africa/burundi-1-killed-in-grenade-attack-in-presidents-hometown/2016/06/14/a72478fe-322f-11e6-ab9d-1da2b0f24f93_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/africa/burundi-1-killed-in-grenade-attack-in-presidents-hometown/2016/06/14/a72478fe-322f-11e6-ab9d-1da2b0f24f93_story.html
URL filtered: https://www.bloomberg.com/news/articles/2017-01-12/tillerson-says-china-can-t-have-access-to-south-china-sea-isles


Processing URLs:  56%|█████▌    | 561/1000 [20:45<09:48,  1.34s/it]

Error extracting text from http://www.nytimes.com/2015/08/30/world/europe/with-rubles-decline-russian-tourists-gain-appreciation-for-the-motherland.html?partner=MYWAY&amp;ei=5065: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/30/world/europe/with-rubles-decline-russian-tourists-gain-appreciation-for-the-motherland.html?partner=MYWAY&amp;ei=5065


Processing URLs:  56%|█████▋    | 564/1000 [20:54<14:47,  2.04s/it]

Error extracting text from https://www.wsj.com/articles/opec-has-a-crippling-problem-its-members-cant-stop-pumping-1501443711: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-has-a-crippling-problem-its-members-cant-stop-pumping-1501443711


Processing URLs:  57%|█████▋    | 566/1000 [20:56<11:33,  1.60s/it]

Error extracting text from http://www.spacex.com/missions: 404 Client Error: The requested content does not exist. for url: https://www.spacex.com/missions


Processing URLs:  57%|█████▋    | 569/1000 [21:02<10:22,  1.44s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-04-08/amazon-union-vote-count-expected-to-start-as-early-as-thursday?sref=KkPzpZvz


Processing URLs:  57%|█████▊    | 575/1000 [22:08<2:06:22, 17.84s/it]

Error extracting text from http://aa.com.tr/en/economy/turkeys-current-account-deficit-falls-in-july/643661: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  58%|█████▊    | 577/1000 [22:09<1:05:40,  9.31s/it]

Error extracting text from http://www.latimes.com/science/sciencenow/la-sci-sn-planet-nine-new-evidence-20161022-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/science/sciencenow/la-sci-sn-planet-nine-new-evidence-20161022-snap-story.html


Processing URLs:  58%|█████▊    | 583/1000 [22:38<26:56,  3.88s/it]  

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-ireland-close/brexit-irish-border-deal-possible-within-hours-idUKKBN1E12PT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  59%|█████▊    | 587/1000 [22:49<14:28,  2.10s/it]

Error extracting text from http://utv.ie/News/2015/11/17/As-It-Happened-Deal-reached-at-Stormont-49029: 403 Client Error: Forbidden for url: http://utv.ie/News/2015/11/17/As-It-Happened-Deal-reached-at-Stormont-49029


Processing URLs:  59%|█████▉    | 589/1000 [22:53<14:25,  2.11s/it]

URL filtered: https://www.bloomberglaw.com/product/blaw/document/OZTKIT6TTDS0?bc=W1siU2VhcmNoIFJlc3VsdHMiLCIvcHJvZHVjdC9ibGF3L3NlYXJjaC9yZXN1bHRzLzI2NjhhYjg0OTE3M


Processing URLs:  59%|█████▉    | 594/1000 [22:59<08:43,  1.29s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2016/01/15/threat_of_brokered_convention_fuels_gop_rules_panel_129335.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2016/01/15/threat_of_brokered_convention_fuels_gop_rules_panel_129335.html


Processing URLs:  60%|█████▉    | 596/1000 [23:01<08:59,  1.34s/it]

Error extracting text from http://www.brookings.edu/research/papers/2016/01/27-islamic-state-challenges-alqaida-lister: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/papers/2016/01/27-islamic-state-challenges-alqaida-lister


Processing URLs:  60%|█████▉    | 597/1000 [23:03<09:41,  1.44s/it]

Error extracting text from https://amzn.to/3mScARX: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Politics-Industry-Political-Innovation-Democracy/dp/B08B8YR257?crid=3TD1OJHK57EFK&keywords=politics+industry&qid=1618544051&sprefix=politics+industry,aps,226&sr=8-1&linkCode=sl1&tag=cafeconmuerte-20&linkId=ae7eb45bf635b30212ad9884b8c00282&language=en_US&ref_=as_li_ss_tl


Processing URLs:  60%|█████▉    | 599/1000 [23:04<06:36,  1.01it/s]

Error extracting text from https://www.nytimes.com/2017/02/16/us/politics/affordable-care-act-congress.html?emc=edit_th_20170217&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/16/us/politics/affordable-care-act-congress.html?emc=edit_th_20170217&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  60%|██████    | 601/1000 [23:08<10:04,  1.51s/it]

URL filtered: https://fivethirtyeight.com/features/donald-trump-is-making-europe-liberal-again/?ex_cid=538twitter


Processing URLs:  60%|██████    | 604/1000 [23:11<08:21,  1.27s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.streetinsider.com/Reuters/South+Africa+finance+minister+to+target+narrower+deficits+to+shore+up+ratings%3A+Reuters+poll/11340258.html: Document is empty
URL filtered: http://www.bloomberg.com/news/articles/2015-12-01/china-s-central-bank-steps-up-cash-injections-amid-ipo-restart


Processing URLs:  61%|██████    | 610/1000 [23:21<11:35,  1.78s/it]

Error extracting text from http://marketrealist.com/2016/03/goldman-sachs-lowers-wti-brent-price-forecast/: 404 Client Error: Not Found for url: https://marketrealist.com:443/2016/03/goldman-sachs-lowers-wti-brent-price-forecast/


Processing URLs:  61%|██████    | 612/1000 [23:24<09:40,  1.50s/it]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2006/december/tradoc_118238.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2006/december/tradoc_118238.pdf


Processing URLs:  61%|██████▏   | 614/1000 [23:25<07:28,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=1Qo3F-0keq8


Processing URLs:  62%|██████▏   | 616/1000 [24:26<1:31:46, 14.34s/it]

Error extracting text from http://aa.com.tr/en/politics/no-transitional-role-for-syrias-assad-opposition/556141: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 617/1000 [24:27<1:10:53, 11.11s/it]

Error extracting text from https://tradingeconomics.com/china/military-expenditure-percent-of-gdp-wb-data.html: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/china/military-expenditure-percent-of-gdp-wb-data.html


Processing URLs:  62%|██████▏   | 618/1000 [24:27<53:24,  8.39s/it]  

Error extracting text from http://www.letsgetitrightca.org/news/press-releases/historic-california-coalition-celebrates-signature-filing-campaign-kickoff-for-adult-use-of-marijuana-act-on-nov-ballot: 404 Client Error: Not Found for url: https://www.letsgetitrightca.org/news/press-releases/historic-california-coalition-celebrates-signature-filing-campaign-kickoff-for-adult-use-of-marijuana-act-on-nov-ballot


Processing URLs:  62%|██████▏   | 620/1000 [24:31<33:08,  5.23s/it]

Error extracting text from https://www.pecantreeog.com/blog/2020-5-29-what-does-it-mean-to-shut-in-a-well/: 406 Client Error: Not Acceptable for url: https://www.pecantreeog.com/blog/2020-5-29-what-does-it-mean-to-shut-in-a-well/


Processing URLs:  62%|██████▏   | 622/1000 [24:34<21:29,  3.41s/it]

Error extracting text from http://www.dtic.mil/dtic/tr/fulltext/u2/a956443.pdf: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /dtic/tr/fulltext/u2/a956443.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  62%|██████▏   | 624/1000 [24:36<13:12,  2.11s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/12/14/world/europe/ap-eu-greece-bailout.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/12/14/world/europe/ap-eu-greece-bailout.html?_r=0


Processing URLs:  63%|██████▎   | 629/1000 [24:43<08:10,  1.32s/it]

Error extracting text from http://aranews.net/2016/06/us-led-coalition-shifts-focus-mosul-liberation/: 404 Client Error: Not Found for url: http://aranews.net/2016/06/us-led-coalition-shifts-focus-mosul-liberation/


Processing URLs:  63%|██████▎   | 631/1000 [24:45<07:37,  1.24s/it]

Error extracting text from http://www.nytimes.com/2015/10/23/us/politics/hillary-clinton-benghazi-committee.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/23/us/politics/hillary-clinton-benghazi-committee.html


Processing URLs:  63%|██████▎   | 632/1000 [24:50<13:28,  2.20s/it]

Error extracting text from http://www.quinnipiac.edu/news-and-events/quinnipiac-university-poll/iowa/release-detail?ReleaseID=2291: 404 Client Error: Not Found for url: https://www.qu.edu/news-and-events/quinnipiac-university-poll/iowa/release-detail/?ReleaseID=2291


Processing URLs:  64%|██████▎   | 636/1000 [24:54<07:35,  1.25s/it]

Error extracting text from http://economictimes.indiatimes.com/news/politics-and-nation/negativity-at-heart-of-asia-exposed-india-says-pakistan/articleshow/55880567.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/politics-and-nation/negativity-at-heart-of-asia-exposed-india-says-pakistan/articleshow/55880567.cms


Processing URLs:  64%|██████▍   | 641/1000 [25:30<44:41,  7.47s/it]  

Error extracting text from https://balkaninsight.com/2021/04/05/detained-north-macedonia-tycoon-was-linchpin-of-authoritarian-regime/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/04/05/detained-north-macedonia-tycoon-was-linchpin-of-authoritarian-regime/


Processing URLs:  64%|██████▍   | 643/1000 [25:33<25:49,  4.34s/it]

Error extracting text from http://www.poleafrique.info/rehabilitation-cites-universitaires-pression-de-fesci-comite-de-reflexion-annonce/: 403 Client Error: Forbidden for url: https://www.poleafrique.info/rehabilitation-cites-universitaires-pression-de-fesci-comite-de-reflexion-annonce/


Processing URLs:  64%|██████▍   | 644/1000 [25:34<19:10,  3.23s/it]

URL filtered: http://www.bloomberglaw.com


Processing URLs:  65%|██████▍   | 646/1000 [25:35<12:02,  2.04s/it]

URL filtered: https://twitter.com/zeynep?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  65%|██████▍   | 648/1000 [25:36<08:29,  1.45s/it]

Error extracting text from https://www.nasa.gov/mission_pages/WISE/main/index.html: 404 Client Error: Not Found for url: https://www.nasa.gov/mission/wise


Processing URLs:  65%|██████▌   | 653/1000 [27:46<3:24:10, 35.30s/it]

Error extracting text from https://www.iagopcaucuses.com/#/state: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  66%|██████▌   | 656/1000 [27:50<1:18:30, 13.69s/it]

Error extracting text from http://www.ibtimes.co.uk/isis-blows-abrams-tank-rocket-iraqi-special-forces-penetrate-deeper-into-mosul-1589860: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/isis-blows-abrams-tank-rocket-iraqi-special-forces-penetrate-deeper-into-mosul-1589860


Processing URLs:  66%|██████▌   | 657/1000 [27:51<57:07,  9.99s/it]  

Error extracting text from https://ec.europa.eu/europeaid/sites/devco/files/nip-edf11-dominican-republic-2014-2020_en.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/europeaid/sites/devco/files/nip-edf11-dominican-republic-2014-2020_en.pdf


Processing URLs:  66%|██████▌   | 658/1000 [28:52<2:22:21, 24.97s/it]

Error extracting text from http://aa.com.tr/en/americas/maduro-calls-for-talks-with-venezuela-opposition/871178: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  66%|██████▌   | 660/1000 [28:54<1:19:14, 13.98s/it]

Error extracting text from https://governor.hawaii.gov/wp-content/uploads/2021/09/2109007-ATG_Executive-Order-No.-21-06-distribution-signed.pdf: 404 Client Error: Not Found for url: https://governor.hawaii.gov/wp-content/uploads/2021/09/2109007-ATG_Executive-Order-No.-21-06-distribution-signed.pdf


Processing URLs:  67%|██████▋   | 666/1000 [29:03<19:59,  3.59s/it]  

Error extracting text from http://www.pcmag.me/a/2508472: 403 Client Error: Forbidden for url: https://www.pcmag.com/en/a/2508472/


Processing URLs:  67%|██████▋   | 671/1000 [29:09<08:28,  1.55s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN1990XI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN1990XI


Processing URLs:  67%|██████▋   | 672/1000 [29:10<07:09,  1.31s/it]

Error extracting text from https://www.transparency.org/country/#HRV: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#HRV


Processing URLs:  68%|██████▊   | 675/1000 [29:12<05:08,  1.05it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN10J1CM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN10J1CM


Processing URLs:  68%|██████▊   | 677/1000 [30:12<1:17:54, 14.47s/it]

Error extracting text from http://aa.com.tr/en/todays-headlines/operation-to-retake-mosul-not-in-deep-future-us-general/529359: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  68%|██████▊   | 680/1000 [30:24<40:18,  7.56s/it]  

Error extracting text from http://www.barrons.com/articles/opec-june-oil-production-rose-except-in-venezuela-1499875589: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/opec-june-oil-production-rose-except-in-venezuela-1499875589


Processing URLs:  68%|██████▊   | 681/1000 [30:25<31:06,  5.85s/it]

Error extracting text from http://sana.sy/en/?p=80720: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
URL filtered: https://www.bloomberg.com/news/articles/2021-11-11/u-s-warns-europe-that-russian-troops-may-plan-ukraine-invasion


Processing URLs:  68%|██████▊   | 683/1000 [30:26<18:13,  3.45s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56591#.WPxThojyuUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56591#.WPxThojyuUk
URL filtered: https://www.bloomberg.com/news/articles/2018-02-06/day-of-reckoning-for-south-africa-s-zuma-as-anc-top-brass-meets


Processing URLs:  68%|██████▊   | 685/1000 [30:28<12:50,  2.45s/it]

Error extracting text from http://www.wola.org/commentary/colombia_s_peace_process_ensuring_the_success_of_a_potential_bilateral_ceasefire_agreemen: 403 Client Error: Forbidden for url: https://www.wola.org/commentary/colombia_s_peace_process_ensuring_the_success_of_a_potential_bilateral_ceasefire_agreemen


Processing URLs:  69%|██████▊   | 687/1000 [30:35<14:23,  2.76s/it]

Error extracting text from http://www.thelocal.de/20160131/border-police-should-threaten-to-shoot-migrants-afd: 403 Client Error: Forbidden for url: https://www.thelocal.de/20160131/border-police-should-threaten-to-shoot-migrants-afd


Processing URLs:  69%|██████▉   | 689/1000 [30:40<13:30,  2.60s/it]

Error extracting text from https://durangoherald.com/articles/114965-more-farmers-are-having-trouble-making-loan-payments?wallit_nosession=1: 404 Client Error: Not Found for url: https://www.durangoherald.com/articles/114965-more-farmers-are-having-trouble-making-loan-payments?wallit_nosession=1


Processing URLs:  69%|██████▉   | 690/1000 [30:41<11:03,  2.14s/it]

Error extracting text from http://www.businessinsider.com/r-anti-rousseff-impeachment-push-in-brazil-loses-ground-2016-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-anti-rousseff-impeachment-push-in-brazil-loses-ground-2016-1
URL filtered: https://www.youtube.com/watch?v=zLAlen2RvnI
URL filtered: https://twitter.com/wikileaks/status/831978909495353366


Processing URLs:  70%|██████▉   | 696/1000 [30:52<09:26,  1.86s/it]

Error extracting text from http://www.ibtimes.co.uk/top-isis-commander-killed-mosul-daesh-fighters-close-key-bridges-city-1567420: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/top-isis-commander-killed-mosul-daesh-fighters-close-key-bridges-city-1567420


Processing URLs:  70%|██████▉   | 698/1000 [30:57<10:37,  2.11s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2953976/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2953976/


Processing URLs:  70%|███████   | 702/1000 [31:04<08:35,  1.73s/it]

Error extracting text from http://europeum.blogactiv.eu/2017/07/17/same-sex-marriage-in-germany-a-lost-vote-for-merkel-or-a-part-of-canny-politics/: 404 Client Error: Not Found for url: http://europeum.blogactiv.eu/2017/07/17/same-sex-marriage-in-germany-a-lost-vote-for-merkel-or-a-part-of-canny-politics/


Processing URLs:  70%|███████   | 704/1000 [31:07<08:09,  1.65s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-23/putin-said-to-plan-islamic-state-strikes-with-or-without-u-s-


Processing URLs:  71%|███████   | 710/1000 [31:13<04:48,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKCN12C0KF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKCN12C0KF


Processing URLs:  71%|███████▏  | 713/1000 [31:16<04:37,  1.03it/s]

Error extracting text from https://focustaiwan.tw/cross-strait/202012050017XlqFkKHWMtAaEQFjACegQIBRAB&amp;usg=AOvVaw3U41: 403 Client Error: Forbidden for url: https://focustaiwan.tw/cross-strait/202012050017XlqFkKHWMtAaEQFjACegQIBRAB&amp;usg=AOvVaw3U41


Processing URLs:  71%|███████▏  | 714/1000 [31:17<04:34,  1.04it/s]

Error extracting text from http://seekingalpha.com/article/4018774-iran-fails-attract-international-oil-companies-oil-markets-daily?page=2: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4018774-iran-fails-attract-international-oil-companies-oil-markets-daily?page=2


Processing URLs:  72%|███████▏  | 715/1000 [31:19<05:56,  1.25s/it]

Error extracting text from http://www.ibtimes.com/tesla-motors-says-model-3-deliveries-will-be-staggered-starting-california-moving-2340503: 403 Client Error: Forbidden for url: https://www.ibtimes.com/tesla-motors-says-model-3-deliveries-will-be-staggered-starting-california-moving-2340503


Processing URLs:  72%|███████▏  | 716/1000 [31:20<05:07,  1.08s/it]

Error extracting text from https://abcnews.go.com/US/wireStory/explainer-charges-kyle-rittenhouse-face-81084162: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/explainer-charges-kyle-rittenhouse-face-81084162


Processing URLs:  72%|███████▏  | 717/1000 [31:21<05:31,  1.17s/it]

Error extracting text from http://seekingalpha.com/article/3962475-oil-outlook-secrets-april-will-reveal-least-better: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3962475-oil-outlook-secrets-april-will-reveal-least-better


Processing URLs:  72%|███████▏  | 720/1000 [32:28<1:33:10, 19.96s/it]

Error extracting text from https://www.kentucky.com/news/coronavirus/article253602358.html: HTTPSConnectionPool(host='www.kentucky.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 722/1000 [33:29<2:09:55, 28.04s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-03-05/russian-court-orders-navalny-to-pay-damages-in-lawsuit-filed-by-kremlin-ally: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 731/1000 [34:04<14:28,  3.23s/it]  

Error extracting text from http://www.japantimes.co.jp/news/2016/01/26/national/politics-diplomacy/russia-hold-summer-forum-disputed-island/#.Vqkpn_krKUk: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/26/national/politics-diplomacy/russia-hold-summer-forum-disputed-island/#.Vqkpn_krKUk


Processing URLs:  73%|███████▎  | 732/1000 [34:07<13:52,  3.10s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-21/abu-dhabi-s-biggest-bank-says-u-s-oil-prices-may-drop-to-20


Processing URLs:  74%|███████▍  | 738/1000 [34:12<04:49,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-russia-venezuela-debt/russia-reaches-outline-debt-restructuring-agreement-with-venezuela-ria-idUSKBN1CJ0IN?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-venezuela-debt/russia-reaches-outline-debt-restructuring-agreement-with-venezuela-ria-idUSKBN1CJ0IN?il=0


Processing URLs:  74%|███████▍  | 739/1000 [34:13<03:47,  1.15it/s]

Error extracting text from https://www.nytimes.com/2017/10/17/nyregion/driverless-cars-manhattan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/17/nyregion/driverless-cars-manhattan.html


Processing URLs:  74%|███████▍  | 740/1000 [34:14<04:34,  1.05s/it]

Error extracting text from http://www.thesun.co.uk/sol/homepage/news/politics/6886601/David-Cameron-urged-to-delay-EU-referendum.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/news/politics/6886601/David-Cameron-urged-to-delay-EU-referendum.html


Processing URLs:  74%|███████▍  | 741/1000 [42:14<10:10:16, 141.38s/it]

Error extracting text from https://www.thespainreport.com/articles/878-160829135138-sanchez-confirms-psoe-no-after-meeting-rajoy#636: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/878-160829135138-sanchez-confirms-psoe-no-after-meeting-rajoy (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30300a360>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  74%|███████▍  | 744/1000 [42:17<3:32:25, 49.79s/it]  

Error extracting text from https://news.usni.org/2017/09/28/navy-using-legally-creative-contract-structure-keep-ship-availabilities-track-despite-continuing-resolutions?utm_source=USNI+News&amp;utm_campaign: 403 Client Error: Forbidden for url: https://news.usni.org/2017/09/28/navy-using-legally-creative-contract-structure-keep-ship-availabilities-track-despite-continuing-resolutions?utm_source=USNI+News&amp;utm_campaign


Processing URLs:  75%|███████▍  | 746/1000 [42:42<2:05:33, 29.66s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-ipsos-idUSKCN0X501V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-ipsos-idUSKCN0X501V


Processing URLs:  75%|███████▍  | 749/1000 [42:46<45:46, 10.94s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-01-15/u-k-markets-pounded-and-brexit-vote-doesn-t-even-have-a-date


Processing URLs:  75%|███████▌  | 751/1000 [42:48<26:27,  6.37s/it]

Error extracting text from https://www.audible.co.uk/pd/Putin-Prisoner-of-Power-Audiobook/B07W4WVNXK?qid=1612345328&amp;sr=1-1&amp;ref=a_search_c3_lProduct_1_1&amp;pf_rd_p=c6e316b8-14da-418d-8f91-b3cad83c5183&amp;pf_rd_r=K8JM0VMYKRP3VDSQFATE: 503 Server Error: Service Unavailable for url: https://www.audible.co.uk/pd/Putin-Prisoner-of-Power-Audiobook/B07W4WVNXK?qid=1612345328&amp;sr=1-1&amp;ref=a_search_c3_lProduct_1_1&amp;pf_rd_p=c6e316b8-14da-418d-8f91-b3cad83c5183&amp;pf_rd_r=K8JM0VMYKRP3VDSQFATE


Processing URLs:  75%|███████▌  | 752/1000 [42:49<21:08,  5.11s/it]

Error extracting text from https://www.reuters.com/article/us-asml-holding-smic-idUSKBN2AV1S6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asml-holding-smic-idUSKBN2AV1S6


Processing URLs:  76%|███████▌  | 756/1000 [42:51<08:45,  2.16s/it]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-russia-idUSKBN1AI2RV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-idUSKBN1AI2RV


Processing URLs:  76%|███████▌  | 760/1000 [42:58<07:43,  1.93s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/iv.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/iv.html


Processing URLs:  76%|███████▌  | 761/1000 [42:59<06:24,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-01/brazil-s-temer-fares-better-than-rousseff-in-opinion-poll


Processing URLs:  76%|███████▋  | 765/1000 [43:10<07:24,  1.89s/it]

Error extracting text from http://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html


Processing URLs:  77%|███████▋  | 769/1000 [43:13<04:07,  1.07s/it]

Error extracting text from http://charts.stocktwits.com/production/original_44753248.png?1446324597: 403 Client Error: Forbidden for url: http://charts.stocktwits.com/production/original_44753248.png?1446324597


Processing URLs:  77%|███████▋  | 770/1000 [43:13<03:13,  1.19it/s]

Error extracting text from http://www.arabnews.com/node/1011826/saudi-arabia: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1011826/saudi-arabia


Processing URLs:  77%|███████▋  | 771/1000 [43:14<02:54,  1.31it/s]

Error extracting text from http://warontherocks.com/2016/02/the-long-road-to-mosul/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/02/the-long-road-to-mosul/


Processing URLs:  77%|███████▋  | 774/1000 [43:29<12:52,  3.42s/it]

Error extracting text from http://www.latimes.com/business/hiltzik/la-fi-mh-vw-is-a-great-test-on-white-collar-crooks-20150921-column.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/hiltzik/la-fi-mh-vw-is-a-great-test-on-white-collar-crooks-20150921-column.html


Processing URLs:  79%|███████▉  | 788/1000 [43:53<07:36,  2.15s/it]

URL filtered: https://www.youtube.com/watch?v=MKFFrfT4E1c


Processing URLs:  79%|███████▉  | 791/1000 [43:54<04:01,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-apple-stocks/apple-market-value-we-may-need-a-bigger-chart-idUSKBN1D20BQ?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-11-02&amp;utm_term=US%20Reuters%20News%20Now: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apple-stocks/apple-market-value-we-may-need-a-bigger-chart-idUSKBN1D20BQ?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-11-02&amp;utm_term=US%20Reuters%20News%20Now


Processing URLs:  79%|███████▉  | 793/1000 [43:59<05:55,  1.72s/it]

Error extracting text from http://92newshd.tv/pm-nawaz-sharif-to-return-country-on-3rd-day-of-eid/: 404 Client Error: Not Found for url: https://92newshd.tv/pm-nawaz-sharif-to-return-country-on-3rd-day-of-eid


Processing URLs:  80%|███████▉  | 796/1000 [44:03<04:21,  1.28s/it]

Error extracting text from http://www.nytimes.com/2016/11/02/world/middleeast/bashar-assad-syria-civil-war.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/02/world/middleeast/bashar-assad-syria-civil-war.html


Processing URLs:  80%|███████▉  | 798/1000 [44:06<05:02,  1.50s/it]

URL filtered: https://www.youtube.com/watch?v=1supv13gW8g


Processing URLs:  80%|████████  | 801/1000 [44:07<02:46,  1.20it/s]

Error extracting text from https://cleantechnica.com/2016/10/03/tesla-shatters-quarterly-sales-record-24500-model-s-model-x-evs-delivered-q3-70-increase-q2/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/10/03/tesla-shatters-quarterly-sales-record-24500-model-s-model-x-evs-delivered-q3-70-increase-q2/
Error extracting text from http://www.reuters.com/article/2015/10/20/us-usa-oilexports-refiners-idUSKCN0SE2VW20151020: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/20/us-usa-oilexports-refiners-idUSKCN0SE2VW20151020


Processing URLs:  80%|████████  | 802/1000 [44:13<07:19,  2.22s/it]

Error extracting text from http://www.suffolk.edu/documents/SUPRC/1_28_2016_tables.pdf: 404 Client Error: Not Found for url: https://www.suffolk.edu/documents/SUPRC/1_28_2016_tables.pdf
URL filtered: https://twitter.com/trvrb/status/1410376325303468040
URL filtered: http://www.bbc.com/news/world-middle-east-43051249?ns_mchannel=social&amp;ns_campaign=bbc_breaking&amp;ns_source=twitter&amp;ns_linkname=news_central


Processing URLs:  81%|████████  | 808/1000 [44:18<04:16,  1.34s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-03/caracas-burns-traders-buy-big-venezuelan-credit-dashboard


Processing URLs:  82%|████████▏ | 820/1000 [48:14<1:37:27, 32.49s/it]

Error extracting text from http://iran-times.com/independents-largest-bloc-elected/: 406 Client Error: Not Acceptable for url: http://iran-times.com/independents-largest-bloc-elected/


Processing URLs:  82%|████████▏ | 824/1000 [48:32<31:14, 10.65s/it]  

Error extracting text from http://www.trust.org/item/20151021195359-ek2s4: 404 Client Error:  for url: https://www.trust.org:443/item/20151021195359-ek2s4


Processing URLs:  83%|████████▎ | 826/1000 [48:33<16:17,  5.62s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-eu-minister-idUSKBN1871XA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-minister-idUSKBN1871XA


Processing URLs:  83%|████████▎ | 829/1000 [48:40<09:00,  3.16s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0WF0Q3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0WF0Q3


Processing URLs:  83%|████████▎ | 831/1000 [48:43<06:21,  2.26s/it]

URL filtered: https://twitter.com/kylieatwood/status/1426853588782010370?s=21


Processing URLs:  84%|████████▎ | 837/1000 [48:48<02:46,  1.02s/it]

Error extracting text from http://www.nytimes.com/2016/12/27/business/dealbook/grading-the-big-deals-of-2016-low-and-incomplete-marks-abound.html?ribbon-ad-idx=6&amp;rref=business/dealbook&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=DealBook&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/27/business/dealbook/grading-the-big-deals-of-2016-low-and-incomplete-marks-abound.html?ribbon-ad-idx=6&amp;rref=business/dealbook&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=DealBook&amp;pgtype=article


Processing URLs:  84%|████████▍ | 842/1000 [48:56<04:28,  1.70s/it]

URL filtered: https://www.youtube.com/watch?time_continue=11&amp;v=mNhdXPpXT0E


Processing URLs:  84%|████████▍ | 845/1000 [49:00<03:36,  1.40s/it]

Error extracting text from https://www.clinicaltrialsarena.com/analysis/molnupiravir-market/: 404 Client Error: Not Found for url: https://www.clinicaltrialsarena.com/analysis/molnupiravir-market/


Processing URLs:  85%|████████▍ | 847/1000 [49:04<04:22,  1.72s/it]

Error extracting text from https://www.reuters.com/business/us-trade-chief-pressured-lift-duties-canadian-lumber-2021-05-16/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/us-trade-chief-pressured-lift-duties-canadian-lumber-2021-05-16/


Processing URLs:  86%|████████▌ | 857/1000 [49:20<04:38,  1.95s/it]

Error extracting text from http://www.universetoday.com/130086/nasa-estimates-spacex-2018-mars-mission-will-cost-300-million/: 503 Server Error: Service Unavailable for url: https://www.universetoday.com/130086/nasa-estimates-spacex-2018-mars-mission-will-cost-300-million/


Processing URLs:  86%|████████▌ | 859/1000 [49:24<04:40,  1.99s/it]

Error extracting text from http://www.reuters.com/article/us-usa-defense-cybersecurity-idUSKBN0MF2G920150319: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-defense-cybersecurity-idUSKBN0MF2G920150319


Processing URLs:  86%|████████▋ | 865/1000 [49:32<03:29,  1.55s/it]

Error extracting text from http://data.unhcr.org/mediterranean/regional.php#_ga=1.1515249.1734802494.1454954886: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/regional.php#_ga=1.1515249.1734802494.1454954886


Processing URLs:  87%|████████▋ | 867/1000 [49:34<02:46,  1.25s/it]

Error extracting text from http://phys.org/news/2016-01-killer-robots-late-scientists-davos.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-01-killer-robots-late-scientists-davos.html


Processing URLs:  87%|████████▋ | 872/1000 [49:47<03:59,  1.87s/it]

Error extracting text from http://science.sciencemag.org/content/339/6121/819.long: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.1231143


Processing URLs:  88%|████████▊ | 875/1000 [49:49<02:16,  1.09s/it]

Error extracting text from http://www.reuters.com/article/us-thailand-politics-idUSKBN15N19M?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-politics-idUSKBN15N19M?il=0


Processing URLs:  88%|████████▊ | 877/1000 [49:51<01:45,  1.17it/s]

Error extracting text from http://www.latimes.com/local/lanow/la-me-ln-el-nino-california-20160121-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/local/lanow/la-me-ln-el-nino-california-20160121-htmlstory.html


Processing URLs:  88%|████████▊ | 882/1000 [49:55<01:16,  1.54it/s]

URL filtered: https://twitter.com/allmyalibis/status/748918204538445826


Processing URLs:  89%|████████▊ | 887/1000 [50:02<02:22,  1.26s/it]

Error extracting text from http://www.mtlblog.com/2016/10/canada-to-vote-on-legalizing-recreational-marijuana-in-2017/: 404 Client Error: Not Found for url: https://www.mtlblog.com/2016/10/canada-to-vote-on-legalizing-recreational-marijuana-in-2017/


Processing URLs:  89%|████████▉ | 890/1000 [50:09<03:49,  2.09s/it]

URL filtered: https://twitter.com/Arianespace/status/1473406138083426305


Processing URLs:  89%|████████▉ | 894/1000 [50:12<02:19,  1.32s/it]

Error extracting text from http://apne.ws/2sHLe6K: 404 Client Error: Not Found for url: http://trib.al/2sHLe6K


Processing URLs:  90%|████████▉ | 896/1000 [50:14<01:58,  1.14s/it]

Error extracting text from http://www.imdb.com/title/tt0093191/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0093191/


Processing URLs:  90%|█████████ | 900/1000 [50:19<02:27,  1.47s/it]

Error extracting text from http://ec.europa.eu/atwork/applying-eu-law/infringements-proceedings/index_en.htm: 404 Client Error: Not Found for url: https://commission.europa.eu/strategy/decision-making_en


Processing URLs:  90%|█████████ | 902/1000 [50:24<03:15,  2.00s/it]

Error extracting text from http://www.state.gov/documents/organization/210204.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/
Error extracting text from https://www.reuters.com/article/us-nigeria-security/suspected-boko-haram-members-kill-18-people-in-northeast-nigeria-idUSKCN1BD003: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security/suspected-boko-haram-members-kill-18-people-in-northeast-nigeria-idUSKCN1BD003


Processing URLs:  91%|█████████ | 907/1000 [50:30<01:52,  1.21s/it]

Error extracting text from https://www.thelocal.de/20180202/serious-differences-still-blocking-government-deal-says-merkel: 403 Client Error: Forbidden for url: https://www.thelocal.de/20180202/serious-differences-still-blocking-government-deal-says-merkel


Processing URLs:  91%|█████████ | 910/1000 [50:35<02:01,  1.36s/it]

Error extracting text from https://bit.ly/3ps6S9w: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement/


Processing URLs:  91%|█████████ | 912/1000 [50:38<01:59,  1.36s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-gorsuch-democrats-idUSKBN16Z2NJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-gorsuch-democrats-idUSKBN16Z2NJ


Processing URLs:  92%|█████████▏| 915/1000 [50:41<01:53,  1.34s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Mar-07/340944-irans-rouhani-praises-khatami-role-in-recent-vote.ashx?utm_content=bufferbe7de: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Mar-07/340944-irans-rouhani-praises-khatami-role-in-recent-vote.ashx?utm_content=bufferbe7de


Processing URLs:  92%|█████████▏| 920/1000 [50:47<01:32,  1.15s/it]

Error extracting text from http://www.reuters.com/article/nigeria-currency-idUSL8N1973DJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/nigeria-currency-idUSL8N1973DJ


Processing URLs:  92%|█████████▏| 924/1000 [50:52<01:15,  1.00it/s]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#AMH82qrXI7lXuofm.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#AMH82qrXI7lXuofm.97


Processing URLs:  92%|█████████▎| 925/1000 [50:57<03:00,  2.40s/it]

Error extracting text from http://m.philstar.com/314191/show/01a5ba87e0b0036672be4beecb26b73f/: 404 Client Error: Not Found for url: https://www.philstar.com/314191/show/01a5ba87e0b0036672be4beecb26b73f/


Processing URLs:  93%|█████████▎| 928/1000 [51:00<01:35,  1.32s/it]

Error extracting text from https://southfront.org/iraqi-map-update-results-of-6th-day-of-battle-for-mosul/: HTTPSConnectionPool(host='southfront.org', port=443): Max retries exceeded with url: /iraqi-map-update-results-of-6th-day-of-battle-for-mosul/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffa50980>: Failed to resolve 'southfront.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  93%|█████████▎| 932/1000 [51:10<02:48,  2.48s/it]

URL filtered: https://www.youtube.com/watch?v=kEHdyiBMgAg


Processing URLs:  94%|█████████▎| 935/1000 [51:15<02:17,  2.12s/it]

Error extracting text from http://www.autoguangzhou.com.cn/zwtQV/list_97.aspx: 404 Client Error: Not Found for url: https://www.autoguangzhou.com.cn/zwtQV/list_97.aspx


Processing URLs:  94%|█████████▍| 939/1000 [51:20<01:27,  1.43s/it]

Error extracting text from https://www.wsj.com/articles/former-u-n-secretary-general-ban-ki-moon-drops-out-of-south-korean-presidential-race-1485931698: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/former-u-n-secretary-general-ban-ki-moon-drops-out-of-south-korean-presidential-race-1485931698


Processing URLs:  94%|█████████▍| 943/1000 [51:29<01:41,  1.77s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-protests-idUSKCN0WE05B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-protests-idUSKCN0WE05B
URL filtered: http://www.bloomberg.com/news/articles/2016-11-12/colombia-reaches-new-final-peace-agreement-with-farc-rebels


Processing URLs:  95%|█████████▍| 947/1000 [51:32<00:52,  1.00it/s]

Error extracting text from https://www.nytimes.com/2017/08/03/world/middleeast/iran-hassan-rouhani-ayatollah-khamenei.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/03/world/middleeast/iran-hassan-rouhani-ayatollah-khamenei.html


Processing URLs:  95%|█████████▌| 951/1000 [51:39<01:03,  1.29s/it]

Error extracting text from http://www.stripes.com/news/pacific/abe-putin-to-negotiate-on-territories-peace-treaty-1.408499: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/asia_pacific/abe-putin-to-negotiate-on-territories-peace-treaty-1.408499


Processing URLs:  96%|█████████▌| 955/1000 [51:42<00:38,  1.16it/s]

Error extracting text from http://www2.politicalbetting.com/index.php/archives/2016/04/18/the-referendum-is-currently-much-more-about-the-economy-than-immigration/: 404 Client Error: Not Found for url: http://www2.politicalbetting.com/index.php/archives/2016/04/18/the-referendum-is-currently-much-more-about-the-economy-than-immigration/
Error extracting text from http://www.washingtontimes.com/news/2016/may/19/gates-blasts-obama-deceiving-public-role-combat-tr/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/may/19/gates-blasts-obama-deceiving-public-role-combat-tr/


Processing URLs:  96%|█████████▌| 958/1000 [51:45<00:42,  1.02s/it]

Error extracting text from http://citizen.co.za/1250226/killing-of-eastern-cape-anc-ward-councillor-condemned/: 404 Client Error: Not Found for url: https://www.citizen.co.za/killing-of-eastern-cape-anc-ward-councillor-condemned/


Processing URLs:  96%|█████████▋| 963/1000 [51:52<00:37,  1.00s/it]

Error extracting text from https://www.rand.org/blog/2016/07/indonesia-china-tensions-in-the-natuna-sea-evidence.html?adbsc=social_20160706_902811&amp;adbid=UPDATE-c165654-6156535790752010240&amp;adbpl=li&amp;adbpr=165654: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2016/07/indonesia-china-tensions-in-the-natuna-sea-evidence.html?adbsc=social_20160706_902811&amp;adbid=UPDATE-c165654-6156535790752010240&amp;adbpl=li&amp;adbpr=165654


Processing URLs:  96%|█████████▋| 964/1000 [51:53<00:28,  1.28it/s]

Error extracting text from https://www.nytimes.com/2021/06/09/world/europe/navalny-ban-putin-biden-summit.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/09/world/europe/navalny-ban-putin-biden-summit.html


Processing URLs:  97%|█████████▋| 969/1000 [51:58<00:28,  1.08it/s]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-syria-trump-idUKKBN13A2UD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  97%|█████████▋| 971/1000 [51:59<00:23,  1.23it/s]

Error extracting text from https://www.nytimes.com/2017/01/10/us/politics/donald-trump-russia-intelligence.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/10/us/politics/donald-trump-russia-intelligence.html


Processing URLs:  98%|█████████▊| 978/1000 [52:27<01:50,  5.02s/it]

Error extracting text from http://www.reuters.com/article/safrica-ratings-sp-idUSL8N15519S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/safrica-ratings-sp-idUSL8N15519S


Processing URLs:  98%|█████████▊| 979/1000 [52:28<01:18,  3.72s/it]

Error extracting text from http://www.amazon.com/Shadows-Mind-Missing-Science-Consciousness/dp/0195106466: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Shadows-Mind-Missing-Science-Consciousness/dp/0195106466


Processing URLs:  98%|█████████▊| 981/1000 [52:29<00:40,  2.15s/it]

URL filtered: https://www.facebook.com/MIDRussia/posts/today-the-republic-of-nicaragua-celebrates-its-independence-daythe-ussr-and-nica/2784672414965515/


Processing URLs:  98%|█████████▊| 983/1000 [52:29<00:20,  1.22s/it]

Error extracting text from https://www.jnj.com/johnson-johnson-announces-single-shot-janssen-covid-19-vaccine-candidate-met-primary-endpoints-in-interim-analysis-of-its-phase-3-ensemble-trial: 403 Client Error: Forbidden for url: https://www.jnj.com/johnson-johnson-announces-single-shot-janssen-covid-19-vaccine-candidate-met-primary-endpoints-in-interim-analysis-of-its-phase-3-ensemble-trial


Processing URLs:  98%|█████████▊| 985/1000 [52:31<00:13,  1.09it/s]

Error extracting text from http://www.wsj.com/articles/putin-and-erdogan-meet-to-restore-ties-after-jet-shot-down-1470747139: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/putin-and-erdogan-meet-to-restore-ties-after-jet-shot-down-1470747139


Processing URLs:  99%|█████████▊| 987/1000 [52:34<00:16,  1.27s/it]

Error extracting text from https://me.me/i/what-did-i-tell-you-about-hiring-nazis-not-to-11491693: 503 Server Error: Service Unavailable for url: https://me.me/i/what-did-i-tell-you-about-hiring-nazis-not-to-11491693


Processing URLs:  99%|█████████▉| 989/1000 [52:35<00:09,  1.11it/s]

URL filtered: http://www.recode.net/2016/12/15/13967928/facebook-fake-news-plan-abc-snopes-politfact-factcheck


Processing URLs:  99%|█████████▉| 993/1000 [52:55<00:21,  3.14s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-08/impeaching-a-brazilian-president-is-complicated-a-quick-guide


Processing URLs: 100%|█████████▉| 995/1000 [52:57<00:11,  2.35s/it]

Error extracting text from http://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/9f985b33-23bc-4c9f-961b-7edf1ab902d8.pdf: 404 Client Error: Not Found for url: https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/9f985b33-23bc-4c9f-961b-7edf1ab902d8.pdf
URL filtered: https://twitter.com/juliamacfarlane/status/1426599158408957954?s=21
URL filtered: https://twitter.com/David_Cameron/status/700798241638629377?ref_src=twsrc^tfw


Processing URLs: 100%|█████████▉| 999/1000 [52:59<00:01,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-usa-china-southchinasea-idUSKBN19S0IU?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-southchinasea-idUSKBN19S0IU?il=0


Processing URLs: 100%|██████████| 1000/1000 [53:30<00:00,  3.21s/it]


Error extracting text from http://news.morningstar.com/articlenet/article.aspx?id=826833: 504 Server Error: Gateway Time-out for url: http://news.morningstar.com/articlenet/article.aspx?id=826833


Processing URLs:   0%|          | 0/1000 [00:00<?, ?it/s]

URL filtered: http://www.theguardian.com/world/2015/aug/26/venice-mayor-gay-pride-parade?utm_content=buffer072b9&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer
Error extracting text from http://reinforced.Heavy: HTTPConnectionPool(host='reinforced.heavy', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe6543b0>: Failed to resolve 'reinforced.heavy' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   0%|          | 3/1000 [00:02<13:04,  1.27it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-23/venezuela-s-lame-duck-congress-names-new-supreme-court-justices


Processing URLs:   1%|          | 6/1000 [00:07<20:14,  1.22s/it]

Error extracting text from http://www.debevoise.com/~/media/files/insights/publications/2015/07/debevoise_sanctions_alert_issue40.pdf: 403 Client Error: Forbidden for url: http://www.debevoise.com/~/media/files/insights/publications/2015/07/debevoise_sanctions_alert_issue40.pdf


Processing URLs:   1%|          | 9/1000 [00:13<24:30,  1.48s/it]

Error extracting text from http://www.timesofisrael.com/bill-clinton-tells-florida-jews-hillary-will-prioritize-israel-ties/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/bill-clinton-tells-florida-jews-hillary-will-prioritize-israel-ties/


Processing URLs:   1%|▏         | 14/1000 [00:28<50:33,  3.08s/it]

Error extracting text from http://www.process-worldwide.com/india-keen-on-setting-up-lng-terminal-and-petrochemical-plants-at-iranian-port-a-533286/: 410 Client Error: Gone for url: https://www.process-worldwide.com/india-keen-on-setting-up-lng-terminal-and-petrochemical-plants-at-iranian-port-a-533286/


Processing URLs:   2%|▏         | 15/1000 [00:29<39:29,  2.41s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/360954-trump-and-wikileaks-five-things-to-know: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/360954-trump-and-wikileaks-five-things-to-know/


Processing URLs:   2%|▏         | 16/1000 [00:29<29:07,  1.78s/it]

Error extracting text from http://www.reuters.com/article/2015/09/27/us-usa-congress-boehner-idUSKCN0RR0O320150927: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/27/us-usa-congress-boehner-idUSKCN0RR0O320150927


Processing URLs:   2%|▏         | 17/1000 [00:30<23:23,  1.43s/it]

Error extracting text from https://www.reuters.com/world/uk/uks-frost-says-eu-must-concede-more-brexit-2021-10-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/uks-frost-says-eu-must-concede-more-brexit-2021-10-15/


Processing URLs:   2%|▏         | 19/1000 [00:30<13:50,  1.18it/s]

Error extracting text from http://m.hindustantimes.com/analysis/why-indus-water-treaty-is-a-bad-bargaining-chip-for-india/story-LLyx2YvAPsSSU4y9uLWxyK.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/analysis/why-indus-water-treaty-is-a-bad-bargaining-chip-for-india/story-LLyx2YvAPsSSU4y9uLWxyK.html


Processing URLs:   2%|▏         | 20/1000 [00:30<11:37,  1.40it/s]

Error extracting text from https://pythagorassite.files.wordpress.com/2016/04/screenshot_4_1_16__9_47_pm.png?w=920: 404 Client Error: Not Found for url: https://pythagorassite.files.wordpress.com/2016/04/screenshot_4_1_16__9_47_pm.png?w=920


Processing URLs:   2%|▏         | 23/1000 [00:35<19:04,  1.17s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Portugal-s-Guterres-poised-to-be-next-UN-secretary-general: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Portugal-s-Guterres-poised-to-be-next-UN-secretary-general
Error extracting text from https://www.reuters.com/article/russia-nato-drills-idUSKBN28K1K7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-nato-drills-idUSKBN28K1K7


Processing URLs:   2%|▎         | 25/1000 [00:35<11:26,  1.42it/s]

Error extracting text from http://www.straitstimes.com/asia/philippines-v-china-in-the-south-china-sea-all-you-need-to-know-ahead-of-the-hague-ruling: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:   3%|▎         | 26/1000 [00:36<10:59,  1.48it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-24/syria-russia-warplanes-pound-islamic-state-with-eye-on-key-road


Processing URLs:   3%|▎         | 33/1000 [00:52<31:50,  1.98s/it]

Error extracting text from https://www.warren.senate.gov/newsroom/press-releases/schumer-warren-the-next-president-can-and-should-cancel-up-to-50000-in-student-loan-debt-immediately-democrats-outline-plan-for-immediate-action-in-2021: 403 Client Error: Forbidden for url: https://www.warren.senate.gov/newsroom/press-releases/schumer-warren-the-next-president-can-and-should-cancel-up-to-50000-in-student-loan-debt-immediately-democrats-outline-plan-for-immediate-action-in-2021


Processing URLs:   4%|▎         | 35/1000 [00:55<30:05,  1.87s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-28/venezuela-needs-oil-s-rally-more-than-anyone-as-economy-teeters


Processing URLs:   4%|▍         | 41/1000 [02:00<4:40:47, 17.57s/it]

Error extracting text from http://www.seattletimes.com/seattle-news/politics/president-clinton-to-raise-funds-in-seattle-friday-for-hillary-clintons-campaign/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   4%|▍         | 44/1000 [03:04<6:32:38, 24.64s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-08-14/haiti-since-the-assassination-of-president-moise: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   4%|▍         | 45/1000 [03:05<4:40:30, 17.62s/it]

URL filtered: https://twitter.com/usembassyburma/status/1441184315052679168


Processing URLs:   5%|▌         | 50/1000 [03:09<1:12:09,  4.56s/it]

Error extracting text from http://www.washingtontimes.com/news/2015/oct/7/export-import-bank-roils-house-gop-discussion-ahea/?page=all: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/oct/7/export-import-bank-roils-house-gop-discussion-ahea/?page=all


Processing URLs:   5%|▌         | 53/1000 [03:21<55:23,  3.51s/it]  

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-russia/russian-frigate-fires-cruise-missiles-at-islamic-state-targets-near-syrias-deir-al-zor-idUSKCN1BG16D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia/russian-frigate-fires-cruise-missiles-at-islamic-state-targets-near-syrias-deir-al-zor-idUSKCN1BG16D


Processing URLs:   6%|▌         | 56/1000 [03:26<35:12,  2.24s/it]

Error extracting text from https://undocs.org/en/S/PV.8697: HTTPSConnectionPool(host='daccess-ods.un.org', port=443): Max retries exceeded with url: /access.nsf/Get?OpenAgent&DS=S/PV.8697&Lang=E (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   6%|▌         | 57/1000 [03:27<31:20,  1.99s/it]

URL filtered: http://www.chron.com/news/article/Google-uncovers-Russian-bought-ads-on-YouTube-12263383.php


Processing URLs:   6%|▌         | 59/1000 [03:27<17:53,  1.14s/it]

Error extracting text from https://www.barrons.com/articles/meta-faces-another-tough-quarter-one-analyst-sees-more-trouble-ahead-51649949068: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/meta-faces-another-tough-quarter-one-analyst-sees-more-trouble-ahead-51649949068


Processing URLs:   6%|▌         | 60/1000 [04:27<4:06:15, 15.72s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-03-02/biden-sanctions-putins-inner-circle-for-navalny-imprisonment: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   6%|▌         | 61/1000 [04:28<3:03:41, 11.74s/it]

Error extracting text from https://www.reuters.com/business/energy/hungarian-foreign-minister-says-agrees-long-term-gas-deal-with-russia-2021-08-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/hungarian-foreign-minister-says-agrees-long-term-gas-deal-with-russia-2021-08-30/


Processing URLs:   6%|▋         | 63/1000 [04:29<1:50:06,  7.05s/it]

Error extracting text from http://www.techweekeurope.co.uk/security/cyberwar/energy-sector-cyber-attacks-198429: 500 Server Error: Internal Server Error for url: https://techweekeurope.co.uk/security/cyberwar/energy-sector-cyber-attacks-198429


Processing URLs:   6%|▋         | 65/1000 [04:31<1:10:25,  4.52s/it]

Error extracting text from https://www.conservativeoutfitters.com/blogs/news/breaking-president-trump-to-sign-new-executive-orders-on-immigration: 404 Client Error: Not Found for url: https://conservativeoutfitters.com/blogs/news/breaking-president-trump-to-sign-new-executive-orders-on-immigration


Processing URLs:   7%|▋         | 66/1000 [04:32<55:02,  3.54s/it]  

Error extracting text from http://gawker.com/breaking-news-joe-biden-still-available-if-youre-inte-1736868252: 404 Client Error: Not Found for url: https://gawker.com/breaking-news-joe-biden-still-available-if-youre-inte-1736868252


Processing URLs:   7%|▋         | 67/1000 [04:33<41:29,  2.67s/it]

Error extracting text from http://www.wsj.com/articles/australia-wont-back-ex-prime-ministers-bid-to-lead-u-n-1469768538: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/australia-wont-back-ex-prime-ministers-bid-to-lead-u-n-1469768538


Processing URLs:   7%|▋         | 70/1000 [04:36<28:19,  1.83s/it]

Error extracting text from https://www.dni.gov/index.php/about/organization/global-trends-2030: 404 Client Error: Not Found for url: https://www.dni.gov/index.php/about/organization/global-trends-2030


Processing URLs:   7%|▋         | 72/1000 [04:39<23:33,  1.52s/it]

Error extracting text from http://jen.jiji.com/jc/eng?g=eco&amp;k=2016021200824: HTTPSConnectionPool(host='jen.jiji.com', port=443): Max retries exceeded with url: /jc/eng?g=eco&amp;k=2016021200824 (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1000)')))
URL filtered: https://www.bloomberg.com/politics/articles/2017-06-18/sanders-signals-backing-of-senate-slowdown-over-health-care-bill


Processing URLs:   7%|▋         | 74/1000 [04:43<25:46,  1.67s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-14/oil-fall-means-almost-everything-for-sale-as-deals-accelerate


Processing URLs:   8%|▊         | 80/1000 [04:46<13:25,  1.14it/s]

Error extracting text from http://euanmearns.com/electrocuted/: 403 Client Error: Forbidden for url: http://euanmearns.com/electrocuted/


Processing URLs:   8%|▊         | 82/1000 [04:49<17:44,  1.16s/it]

Error extracting text from https://abcnews.go.com/International/wireStory/high-level-talks-resume-returning-us-iran-nuclear-77550806: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/high-level-talks-resume-returning-us-iran-nuclear-77550806


Processing URLs:   8%|▊         | 83/1000 [04:50<14:59,  1.02it/s]

Error extracting text from http://www.cdm.me/english/membership-fee-for-nato-will-not-destroy-montenegro-continue-with-reforms: 403 Client Error: Forbidden for url: https://www.cdm.me/english/membership-fee-for-nato-will-not-destroy-montenegro-continue-with-reforms


Processing URLs:   8%|▊         | 84/1000 [04:50<12:01,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-mitsubishimotors-regulations-idUSKCN0XJ00B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mitsubishimotors-regulations-idUSKCN0XJ00B


Processing URLs:   9%|▊         | 86/1000 [04:55<23:53,  1.57s/it]

Error extracting text from http://csis.org/files/publication/141218_Cyber_Operations_North_Korea.pdf: 404 Client Error: Not Found for url: https://www.csis.org/files/publication/141218_Cyber_Operations_North_Korea.pdf


Processing URLs:   9%|▉         | 89/1000 [05:01<30:52,  2.03s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-11/goldman-was-skewered-for-venezuela-deal-so-what-about-big-oil


Processing URLs:   9%|▉         | 91/1000 [05:02<17:28,  1.15s/it]

Error extracting text from http://english.alarabiya.net/en/features/2017/07/23/Houthis-withhold-cholera-medicine-in-al-Hudaydah-exposes-them-to-damage.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/features/2017/07/23/Houthis-withhold-cholera-medicine-in-al-Hudaydah-exposes-them-to-damage.html


Processing URLs:  10%|▉         | 96/1000 [05:11<26:31,  1.76s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XC0D4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XC0D4


Processing URLs:  10%|▉         | 98/1000 [05:12<17:31,  1.17s/it]

Error extracting text from https://www.oecd.org/eco/outlook/OECD-February-2016-Interim-Economic-Outlook-Forecasts-data.xlsx: 410 Client Error: Gone for url: https://www.oecd.org/economy/outlook/OECD-February-2016-Interim-Economic-Outlook-Forecasts-data.xlsx
Error extracting text from http://www.reuters.com/article/us-russia-aircraft-carrier-commentary-idUSKCN12J1L2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-aircraft-carrier-commentary-idUSKCN12J1L2


Processing URLs:  10%|▉         | 99/1000 [05:12<13:34,  1.11it/s]

URL filtered: http://firstnewshawk.com/facebook-launches-fake-news-filter-in-france/


Processing URLs:  10%|█         | 101/1000 [05:13<08:47,  1.70it/s]

Error extracting text from http://www.wsj.com/articles/greeces-creditors-divided-over-measures-athens-must-take-1457365854: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/greeces-creditors-divided-over-measures-athens-must-take-1457365854


Processing URLs:  10%|█         | 102/1000 [05:21<37:22,  2.50s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-08/russian-hackers-fueled-catalan-separatism-madrid-institute-says
URL filtered: https://twitter.com/britainelects/status/722196236921671681


Processing URLs:  11%|█         | 108/1000 [05:28<21:05,  1.42s/it]

Error extracting text from http://www.hybridcars.com/nikola-one-series-hybrid-truck-takes-aim-at-diesels-dominance/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/nikola-one-series-hybrid-truck-takes-aim-at-diesels-dominance/


Processing URLs:  11%|█         | 110/1000 [05:38<48:25,  3.26s/it]

Error extracting text from http://m.belfasttelegraph.co.uk/video-news/video-russian-su34-jets-get-airtoair-missiles-first-time-since-start-of-operation-in-syria-34246159.html: 404 Client Error: Not Found for url: https://m.belfasttelegraph.co.uk/video-news/video-russian-su34-jets-get-airtoair-missiles-first-time-since-start-of-operation-in-syria-34246159.html


Processing URLs:  11%|█         | 111/1000 [05:40<45:29,  3.07s/it]

Error extracting text from http://tass.ru/en/world/858268: 404 Client Error: Not Found for url: https://tass.ru/en/world/858268


Processing URLs:  11%|█▏        | 113/1000 [05:41<26:51,  1.82s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/kurdistan/268243: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/kurdistan/268243


Processing URLs:  12%|█▏        | 115/1000 [05:42<15:12,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/09/21/us-iran-nuclear-iaea-idUSKCN0RL0Z020150921: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/21/us-iran-nuclear-iaea-idUSKCN0RL0Z020150921


Processing URLs:  12%|█▏        | 117/1000 [05:43<11:21,  1.30it/s]

Error extracting text from https://www.fbi.gov/about-us/investigate/organizedcrime/cases/carting-industry: 403 Client Error: Forbidden for url: https://www.fbi.gov/about-us/investigate/organizedcrime/cases/carting-industry


Processing URLs:  12%|█▏        | 118/1000 [05:44<12:47,  1.15it/s]

Error extracting text from https://superforecasting.squarespace.com: 404 Client Error: Not Found for url: https://superforecasting.squarespace.com/


Processing URLs:  12%|█▏        | 119/1000 [05:44<10:10,  1.44it/s]

Error extracting text from https://www.nytimes.com/2017/03/02/world/asia/xi-jinping-china-retirement-rules.html?mcubz=3&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/02/world/asia/xi-jinping-china-retirement-rules.html?mcubz=3&amp;_r=0


Processing URLs:  12%|█▏        | 121/1000 [05:46<11:12,  1.31it/s]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=118883: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=118883


Processing URLs:  12%|█▏        | 123/1000 [05:48<11:43,  1.25it/s]

Error extracting text from https://www.reuters.com/article/us-china-russia-military/china-russia-to-hold-first-joint-mediterranean-naval-drills-in-may-idUSKBN0NL16F20150430: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-russia-military/china-russia-to-hold-first-joint-mediterranean-naval-drills-in-may-idUSKBN0NL16F20150430


Processing URLs:  13%|█▎        | 128/1000 [05:56<16:45,  1.15s/it]

Error extracting text from http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html?_r=0
URL filtered: https://www.bloomberg.com/politics/articles/2017-04-30/china-scores-tacit-victory-at-southeast-asian-conclave-in-manila


Processing URLs:  13%|█▎        | 131/1000 [05:56<08:28,  1.71it/s]

Error extracting text from http://www.nationmultimedia.com/opinion/Abe-should-find-way-forward-on-Northern-Territorie-30279293.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/opinion/Abe-should-find-way-forward-on-Northern-Territorie-30279293.html
Error extracting text from http://www.cdm.me/english/darmanovic-usa-recognised-montenegros-efforts-and-commitment: 403 Client Error: Forbidden for url: https://www.cdm.me/english/darmanovic-usa-recognised-montenegros-efforts-and-commitment


Processing URLs:  14%|█▎        | 135/1000 [06:02<16:14,  1.13s/it]

Error extracting text from http://www.reuters.com/article/2015/11/27/us-nato-georgia-idUSKBN0TG1HP20151127#5B1TXJVHXH37uhwW.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/us-nato-georgia-idUSKBN0TG1HP20151127#5B1TXJVHXH37uhwW.97


Processing URLs:  14%|█▍        | 140/1000 [06:20<35:40,  2.49s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-election-trump-israel-idUSKBN0TN0I720151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-trump-israel-idUSKBN0TN0I720151204


Processing URLs:  14%|█▍        | 145/1000 [06:29<24:08,  1.69s/it]

Error extracting text from http://www.vanguardngr.com/2016/06/new-niger-delta-militant-group-warns-widespread-attacks/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/06/new-niger-delta-militant-group-warns-widespread-attacks/


Processing URLs:  15%|█▍        | 148/1000 [06:33<20:48,  1.47s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-18/spacex-halts-launch-of-rocket-10-seconds-before-planned-liftoff
Error extracting text from http://www.reuters.com/article/us-turkey-politics-erdogan-syria-idUSKBN17R2GU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-politics-erdogan-syria-idUSKBN17R2GU


Processing URLs:  15%|█▌        | 152/1000 [06:37<14:45,  1.04s/it]

Error extracting text from http://www.komodoexercise.org/#!events/c23wi: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db2870>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 153/1000 [06:38<13:28,  1.05it/s]

Error extracting text from http://globalriskinsights.com/2016/04/montenegro-another-balkan-nation-experience-unrest/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/04/montenegro-another-balkan-nation-experience-unrest/
URL filtered: https://www.bloomberg.com/news/articles/2017-11-08/nato-to-boost-command-structure-cyber-policy-with-eye-on-russia


Processing URLs:  16%|█▌        | 157/1000 [06:42<17:35,  1.25s/it]

Error extracting text from http://www.ibtimes.com/bernie-sanders-jewish-candidate-invokes-spirituality-when-asked-about-god-jimmy-2152474: 403 Client Error: Forbidden for url: https://www.ibtimes.com/bernie-sanders-jewish-candidate-invokes-spirituality-when-asked-about-god-jimmy-2152474


Processing URLs:  16%|█▌        | 159/1000 [06:44<15:01,  1.07s/it]

Error extracting text from http://www.reuters.com/video/2017/05/10/turkey-balks-at-trumps-arming-of-syrian?videoId=371648242&amp;videoChannel=-13668: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2017/05/10/turkey-balks-at-trumps-arming-of-syrian?videoId=371648242&amp;videoChannel=-13668


Processing URLs:  16%|█▌        | 161/1000 [06:46<13:07,  1.06it/s]

Error extracting text from http://templatelab.com/intermediate-range-nuclear-forces-treaty/: 403 Client Error: Forbidden for url: https://templatelab.com/intermediate-range-nuclear-forces-treaty/


Processing URLs:  16%|█▋        | 164/1000 [06:52<20:54,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-army-idUSKBN13V0YI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-army-idUSKBN13V0YI


Processing URLs:  17%|█▋        | 167/1000 [06:55<15:47,  1.14s/it]

Error extracting text from http://www.hybridcars.com/toyota-tells-dealers-to-stop-selling-mirai-fcv-due-to-lack-of-refueling-stations/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/toyota-tells-dealers-to-stop-selling-mirai-fcv-due-to-lack-of-refueling-stations/


Processing URLs:  17%|█▋        | 172/1000 [07:02<20:15,  1.47s/it]

Error extracting text from http://www.lowyinterpreter.org/post/2016/08/10/UN-secretary-general-race-Whats-really-behind-the-straw-poll-results.aspx: 404 Client Error: Not Found for url: https://www.lowyinstitute.org/the-interpreter/post/2016/08/10/UN-secretary-general-race-Whats-really-behind-the-straw-poll-results.aspx


Processing URLs:  18%|█▊        | 175/1000 [07:04<10:52,  1.26it/s]

Error extracting text from https://wfirst.gsfc.nasa.gov/about.html: HTTPSConnectionPool(host='wfirst.gsfc.nasa.gov', port=443): Max retries exceeded with url: /about.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300db1be0>: Failed to resolve 'wfirst.gsfc.nasa.gov' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.handelsblatt.com/politik/international/15-laender-waren-dafuer-ein-timeout-griechenlands-fand-nicht-nur-schaeuble-gut/12475568.html: 403 Client Error: Forbidden for url: http://www.handelsblatt.com/politik/international/15-laender-waren-dafuer-ein-timeout-griechenlands-fand-nicht-nur-schaeuble-gut/12475568.html


Processing URLs:  18%|█▊        | 177/1000 [07:05<10:07,  1.35it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-politics-idUSKBN17J0PY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-politics-idUSKBN17J0PY


Processing URLs:  18%|█▊        | 179/1000 [07:07<10:39,  1.28it/s]

Error extracting text from http://researchbriefings.files.parliament.uk/documents/CBP-8039/CBP-8039.pdf: 403 Client Error: Forbidden for url: http://researchbriefings.files.parliament.uk/documents/CBP-8039/CBP-8039.pdf


Processing URLs:  18%|█▊        | 182/1000 [07:12<13:27,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-syria-idUSKCN12I21B?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-syria-idUSKCN12I21B?il=0
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN19E2G8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN19E2G8


Processing URLs:  19%|█▊        | 186/1000 [07:23<30:42,  2.26s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_49127.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49127.htm


Processing URLs:  19%|█▉        | 189/1000 [07:33<39:26,  2.92s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/09/26/world/americas/ap-lt-colombia-peace-ceremony-the-latest.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/09/26/world/americas/ap-lt-colombia-peace-ceremony-the-latest.html


Processing URLs:  19%|█▉        | 191/1000 [07:36<26:20,  1.95s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-ghouta-idUSKBN16F1AX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-ghouta-idUSKBN16F1AX


Processing URLs:  19%|█▉        | 192/1000 [07:39<31:51,  2.37s/it]

Error extracting text from http://www.reuters.com/article/us-colombia-rebels-idUSKCN0W51ZP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-colombia-rebels-idUSKCN0W51ZP


Processing URLs:  20%|█▉        | 195/1000 [07:46<35:01,  2.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-22/french-military-overstretched-as-hollande-pushes-active-role


Processing URLs:  20%|██        | 202/1000 [07:56<17:53,  1.34s/it]

Error extracting text from http://www.ibtimes.com/obama-seeks-over-one-third-rise-us-cybersecurity-funding-2299647: 403 Client Error: Forbidden for url: https://www.ibtimes.com/obama-seeks-over-one-third-rise-us-cybersecurity-funding-2299647
Error extracting text from http://www.nytimes.com/2006/07/24/opinion/24gilbert.html?pagewanted=all: 403 Client Error: Forbidden for url: http://www.nytimes.com/2006/07/24/opinion/24gilbert.html?pagewanted=all


Processing URLs:  20%|██        | 203/1000 [07:57<18:35,  1.40s/it]

Error extracting text from https://www.reuters.com/article/us-illinois-ratings/illinois-avoids-downgrade-to-junk-as-sp-affirms-bbb-minus-rating-idUSKBN19X2PW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-illinois-ratings/illinois-avoids-downgrade-to-junk-as-sp-affirms-bbb-minus-rating-idUSKBN19X2PW


Processing URLs:  20%|██        | 205/1000 [07:58<11:27,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-tribune-publshng-m-a-gannett-idUSKCN0XM17S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tribune-publshng-m-a-gannett-idUSKCN0XM17S


Processing URLs:  21%|██        | 207/1000 [07:59<09:47,  1.35it/s]

Error extracting text from https://www.reuters.com/article/us-southkorea-cyber-hackers/multi-stage-cyber-attacks-net-north-korea-millions-in-virtual-currencies-researchers-idUSKBN1ED0ZC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-cyber-hackers/multi-stage-cyber-attacks-net-north-korea-millions-in-virtual-currencies-researchers-idUSKBN1ED0ZC


Processing URLs:  21%|██        | 208/1000 [08:00<09:12,  1.43it/s]

Error extracting text from http://thehill.com/policy/defense/279524-dni-doubts-mosul-can-be-retaken-from-isis-this-year: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/279524-dni-doubts-mosul-can-be-retaken-from-isis-this-year/


Processing URLs:  21%|██        | 209/1000 [08:01<10:35,  1.24it/s]

Error extracting text from http://hyperlooptech.com/: HTTPSConnectionPool(host='virginhyperloop.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30341a450>: Failed to resolve 'virginhyperloop.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-spacex-blast-idUSKCN1182GA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spacex-blast-idUSKCN1182GA


Processing URLs:  21%|██▏       | 213/1000 [08:08<21:01,  1.60s/it]

Error extracting text from https://csglobalpartners.com/visa-free-digest-december/: 403 Client Error: Forbidden for url: https://csglobalpartners.com/visa-free-digest-december/
Error extracting text from https://www.unicef.org/press-releases/window-prevent-famine-yemen-narrowing-un-agencies-warn: 403 Client Error: Forbidden for url: https://www.unicef.org/press-releases/window-prevent-famine-yemen-narrowing-un-agencies-warn


Processing URLs:  22%|██▏       | 215/1000 [08:09<12:41,  1.03it/s]

Error extracting text from https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(20)30131-0/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(20)30131-0/fulltext
URL filtered: https://www.bloomberg.com/news/articles/2017-11-10/venezuela-bulls-say-market-rushing-to-judgment-on-eve-of-default


Processing URLs:  22%|██▏       | 220/1000 [08:15<16:50,  1.30s/it]

Error extracting text from http://www.cnbc.com/2015/12/01/us-ism-manufacturing-nov-2015.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/01/us-ism-manufacturing-nov-2015.html


Processing URLs:  22%|██▏       | 222/1000 [08:17<12:09,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16O024: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16O024


Processing URLs:  22%|██▏       | 224/1000 [08:19<15:00,  1.16s/it]

Error extracting text from http://www.unc.edu/depts/diplomat/AD_Issues/amdipl_17/articles/deatkine_arabs1.html: 404 Client Error: Not Found for url: https://www.unc.edu/a-z/diplomat/AD_Issues/amdipl_17/articles/deatkine_arabs1.html


Processing URLs:  23%|██▎       | 228/1000 [09:26<4:02:50, 18.87s/it]

Error extracting text from http://www.edmunds.com/car-news/toyota-ramps-up-production-of-2016-toyota-mirai-fuel-cell-to-meet-demand.html: HTTPConnectionPool(host='www.edmunds.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  23%|██▎       | 233/1000 [09:34<56:45,  4.44s/it]  

URL filtered: https://twitter.com/angelamatusik/status/761552004686708736?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet


Processing URLs:  24%|██▎       | 235/1000 [09:36<34:47,  2.73s/it]

Error extracting text from http://uk.mobile.reuters.com/article/idUKKBN0U80C620151225?irpc=932: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUKKBN0U80C620151225?irpc=932


Processing URLs:  24%|██▎       | 237/1000 [09:46<42:52,  3.37s/it]

Error extracting text from https://pythagorassite.files.wordpress.com/2016/03/renwick___don_t_trust_your_poll_lead___in_cowley_and_ford__2014__pdf.png?w=978: 404 Client Error: Not Found for url: https://pythagorassite.files.wordpress.com/2016/03/renwick___don_t_trust_your_poll_lead___in_cowley_and_ford__2014__pdf.png?w=978


Processing URLs:  24%|██▍       | 241/1000 [09:55<32:04,  2.54s/it]

Error extracting text from http://www.france24.com/en/20160204-ethiopia-drought-response-government-aid-agency: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160204-ethiopia-drought-response-government-aid-agency


Processing URLs:  24%|██▍       | 243/1000 [09:55<18:40,  1.48s/it]

Error extracting text from http://www.todayonline.com/singapore/ringgit-strongest-against-singdollar-more-4-months: 403 Client Error: Forbidden for url: https://www.todayonline.com/singapore/ringgit-strongest-against-singdollar-more-4-months


Processing URLs:  25%|██▍       | 246/1000 [10:01<21:13,  1.69s/it]

Error extracting text from https://www.dhs.gov/news/2021/01/27/dhs-issues-national-terrorism-advisory-system-ntas-bulletin: 403 Client Error: Forbidden for url: https://www.dhs.gov/news/2021/01/27/dhs-issues-national-terrorism-advisory-system-ntas-bulletin


Processing URLs:  25%|██▌       | 251/1000 [10:04<10:29,  1.19it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-10/iran-gets-ready-to-sell-oil-to-the-world
Error extracting text from https://news.usni.org/2017/03/24/panel-converstations-missiles-nuclear-weapons-key-rebuilding-u-s-russian-realatiohship: 403 Client Error: Forbidden for url: https://news.usni.org/2017/03/24/panel-converstations-missiles-nuclear-weapons-key-rebuilding-u-s-russian-realatiohship


Processing URLs:  25%|██▌       | 252/1000 [10:04<08:41,  1.43it/s]

URL filtered: http://foreignpolicy.com/2015/09/25/russias-game-plan-in-syria-is-simple-putin-assad/?utm_content=buffer9d22c&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  25%|██▌       | 254/1000 [10:05<06:16,  1.98it/s]

Error extracting text from http://app.yonhapnews.co.kr/YNA/Basic/ForeignGallery/view.aspx?lang=EN&amp;contents_id=PYH20180209278700341: 404 Client Error: Not Found for url: http://app.yonhapnews.co.kr/YNA/Basic/ForeignGallery/view.aspx?lang=EN&amp;contents_id=PYH20180209278700341


Processing URLs:  26%|██▌       | 260/1000 [10:20<24:06,  1.95s/it]

Error extracting text from https://www.governor.ny.gov/news/governor-cuomo-announces-autonomous-vehicle-testing-begin-new-york-state: 403 Client Error: Forbidden for url: https://www.governor.ny.gov/news/governor-cuomo-announces-autonomous-vehicle-testing-begin-new-york-state


Processing URLs:  26%|██▋       | 263/1000 [10:26<25:25,  2.07s/it]

Error extracting text from http://marketrealist.com/2017/08/whats-really-stalling-the-att-time-warner-merger/: 404 Client Error: Not Found for url: https://marketrealist.com:443/2017/08/whats-really-stalling-the-att-time-warner-merger/


Processing URLs:  26%|██▋       | 265/1000 [10:27<15:59,  1.31s/it]

Error extracting text from https://www.aier.org/article/political-economy-vs-federal-fairy-tales/: 403 Client Error: Forbidden for url: https://www.aier.org/article/political-economy-vs-federal-fairy-tales/
Error extracting text from https://www.reuters.com/world/china/eu-executive-slows-push-china-investment-deal-2021-05-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/china/eu-executive-slows-push-china-investment-deal-2021-05-05/


Processing URLs:  27%|██▋       | 271/1000 [10:34<11:13,  1.08it/s]

Error extracting text from https://www.insidetechlaw.com/autonomous-vehicles/05_china: 403 Client Error: Forbidden for url: https://www.insidetechlaw.com/autonomous-vehicles/05_china
Error extracting text from http://blogs.wsj.com/brussels/2015/11/28/turkey-eu-summit-climate-talks-nato-eu-week-ahead-nov-29-dec-4/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/brussels/2015/11/28/turkey-eu-summit-climate-talks-nato-eu-week-ahead-nov-29-dec-4/


Processing URLs:  27%|██▋       | 272/1000 [10:35<13:33,  1.12s/it]

Error extracting text from http://www.militarytimes.com/story/military/war-on-is/2016/03/31/pentagon-us-troops-mosul-iraq/82451868/: 404 Client Error: Not Found for url: https://www.militarytimes.com/story/military/war-on-is/2016/03/31/pentagon-us-troops-mosul-iraq/82451868/


Processing URLs:  28%|██▊       | 275/1000 [10:39<12:01,  1.00it/s]

Error extracting text from http://www.worldbulletin.net/world/169470/austria-declares-morocco-algeria-tunisia-safe-countries: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/world/169470/austria-declares-morocco-algeria-tunisia-safe-countries


Processing URLs:  28%|██▊       | 280/1000 [11:47<3:52:17, 19.36s/it]

Error extracting text from https://legis.wisconsin.gov/assembly/acc/media/1106/howabillbecomeslaw.pdf: HTTPSConnectionPool(host='legis.wisconsin.gov', port=443): Max retries exceeded with url: /assembly/acc/media/1106/howabillbecomeslaw.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301bca150>, 'Connection to legis.wisconsin.gov timed out. (connect timeout=60)'))


Processing URLs:  28%|██▊       | 281/1000 [11:48<2:48:39, 14.07s/it]

Error extracting text from http://www.presidency.ucsb.edu/ws/?pid=19253: 404 Client Error: Not Found for url: https://www.presidency.ucsb.edu/ws?pid=19253


Processing URLs:  28%|██▊       | 283/1000 [11:51<1:30:34,  7.58s/it]

Error extracting text from http://www.financialexpress.com/article/world-news/david-cameron-points-to-brexit-referendum-next-year/181184/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/world-news/david-cameron-points-to-brexit-referendum-next-year/181184/


Processing URLs:  29%|██▉       | 292/1000 [12:06<26:43,  2.27s/it]  

Error extracting text from http://www.criticalthreats.org/iran-news-round-february-16-2016: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-february-16-2016


Processing URLs:  30%|██▉       | 296/1000 [12:24<40:17,  3.43s/it]  

Error extracting text from http://www.reuters.com/article/us-turkey-russia-airspace-idUSKCN0V80NM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-russia-airspace-idUSKCN0V80NM


Processing URLs:  30%|██▉       | 297/1000 [12:28<41:37,  3.55s/it]

Error extracting text from http://www.newsinsight.net/FixersAndcricket.aspx#page=page-1: HTTPConnectionPool(host='www.newsinsight.net', port=80): Max retries exceeded with url: /FixersAndcricket.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301bca450>: Failed to resolve 'www.newsinsight.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 298/1000 [12:29<33:43,  2.88s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-kazakhstan-russia-idUSKBN1640DO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-kazakhstan-russia-idUSKBN1640DO


Processing URLs:  30%|███       | 301/1000 [12:50<1:18:45,  6.76s/it]

Error extracting text from http://www.investopedia.com/news/inverted-yield-curve-guide-recession/: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/news/inverted-yield-curve-guide-recession/


Processing URLs:  30%|███       | 303/1000 [12:52<45:19,  3.90s/it]  

Error extracting text from https://www.warner.senate.gov/public/_cache/files/4/f/4fa9c9ba-2b34-4854-8c19-59a0a9676a31/66DECFBC0D6E6958C2520C3A6A69EAF6.safe-tech-act---final.pdf: 403 Client Error: Forbidden for url: https://www.warner.senate.gov/public/_cache/files/4/f/4fa9c9ba-2b34-4854-8c19-59a0a9676a31/66DECFBC0D6E6958C2520C3A6A69EAF6.safe-tech-act---final.pdf


Processing URLs:  30%|███       | 305/1000 [12:54<29:24,  2.54s/it]

Error extracting text from http://www.the: HTTPConnectionPool(host='www.the', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffa514f0>: Failed to resolve 'www.the' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  31%|███       | 307/1000 [12:55<17:52,  1.55s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/355040-intrigue-grows-with-new-kaspersky-revelations: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/355040-intrigue-grows-with-new-kaspersky-revelations/


Processing URLs:  31%|███       | 308/1000 [12:56<16:34,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN15Z032: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN15Z032


Processing URLs:  31%|███       | 312/1000 [13:01<12:45,  1.11s/it]

Error extracting text from https://www.axios.com/iran-nuclear-deal-israeli-intel-vienna-7209e356-f8b5-4c11-8d2a-69c73bca94fc.html: 403 Client Error: Forbidden for url: https://www.axios.com/iran-nuclear-deal-israeli-intel-vienna-7209e356-f8b5-4c11-8d2a-69c73bca94fc.html


Processing URLs:  32%|███▏      | 317/1000 [13:11<17:25,  1.53s/it]

Error extracting text from http://uk.reuters.com/article/2015/11/05/uk-eurozone-greece-parliament-idUKKCN0SU38420151105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  32%|███▏      | 322/1000 [13:20<20:16,  1.79s/it]

Error extracting text from http://www.ibtimes.com/colombian-oil-money-flowed-clintons-state-department-took-no-action-prevent-labor-1874464: 403 Client Error: Forbidden for url: https://www.ibtimes.com/colombian-oil-money-flowed-clintons-state-department-took-no-action-prevent-labor-1874464


Processing URLs:  33%|███▎      | 331/1000 [13:34<12:49,  1.15s/it]

Error extracting text from http://www.cdm.me/english/video-brumeaud-and-pajovic-french-ratification-of-the-protocol-is-unquestionable: 403 Client Error: Forbidden for url: https://www.cdm.me/english/video-brumeaud-and-pajovic-french-ratification-of-the-protocol-is-unquestionable


Processing URLs:  33%|███▎      | 334/1000 [13:37<10:42,  1.04it/s]

URL filtered: https://www.metaculus.com/questions/3257/by-november-2nd-2020-will-twitter-temporarily-or-permanently-suspend-realdonaldtrump-or-teamtrump-or-potus-based-on-alleged-violations-of-twitters-terms-of-service/


Processing URLs:  34%|███▎      | 336/1000 [13:42<18:53,  1.71s/it]

Error extracting text from http://www.ktbs.com/story/31459773/big-protests-across-brazil-put-more-pressure-on-president: 404 Client Error: Not Found for url: https://www.ktbs.com/story/31459773/big-protests-across-brazil-put-more-pressure-on-president/


Processing URLs:  34%|███▍      | 338/1000 [13:47<21:08,  1.92s/it]

Error extracting text from https://tradingeconomics.com/united-states/government-bond-yield: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/united-states/government-bond-yield


Processing URLs:  34%|███▍      | 339/1000 [13:49<22:21,  2.03s/it]

Error extracting text from http://www.ibtimes.co.uk/saudi-aramco-consider-london-stock-exchange-2018-listing-says-ceo-1583296: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/saudi-aramco-consider-london-stock-exchange-2018-listing-says-ceo-1583296


Processing URLs:  34%|███▍      | 340/1000 [13:53<29:08,  2.65s/it]

Error extracting text from http://38north.org/2015/12/icbm122115/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  34%|███▍      | 342/1000 [13:55<19:29,  1.78s/it]

Error extracting text from http://csbcorrespondent.com/market-update-october-14-2015: 403 Client Error: Forbidden for url: https://www.southstatecorrespondent.com


Processing URLs:  34%|███▍      | 343/1000 [13:56<17:27,  1.59s/it]

URL filtered: https://www.youtube.com/watch?v=yJUZbCpGvKY


Processing URLs:  35%|███▍      | 347/1000 [13:58<08:42,  1.25it/s]

Error extracting text from http://onlinelibrary.wiley.com/doi/10.1002/2016EF000508/pdf: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/10.1002/2016EF000508/pdf


Processing URLs:  35%|███▌      | 350/1000 [14:17<43:16,  3.99s/it]

Error extracting text from https://www.newsweek.com/kyrsten-sinema-faces-no-confidence-threat-arizona-dems-over-filibuster-1632771: 403 Client Error: Forbidden for url: https://www.newsweek.com/kyrsten-sinema-faces-no-confidence-threat-arizona-dems-over-filibuster-1632771
URL filtered: https://www.youtube.com/watch?v=br0NW9ufUUw


Processing URLs:  35%|███▌      | 352/1000 [14:19<30:02,  2.78s/it]

Error extracting text from http://www.cnbc.com/2017/02/03/reuters-america-us-house-republicans-exploring-border-tax-design-changes-lawmaker.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/02/03/reuters-america-us-house-republicans-exploring-border-tax-design-changes-lawmaker.html


Processing URLs:  36%|███▌      | 355/1000 [14:22<16:17,  1.52s/it]

Error extracting text from https://theconversation.com/the-conversations-factcheck-granted-accreditation-by-international-fact-checking-network-at-poynter-74363: 403 Client Error: Forbidden for url: https://theconversation.com/the-conversations-factcheck-granted-accreditation-by-international-fact-checking-network-at-poynter-74363
Error extracting text from http://www.nytimes.com/2017/01/05/world/middleeast/netanyahu-corruption-investigation.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/01/05/world/middleeast/netanyahu-corruption-investigation.html?_r=0


Processing URLs:  36%|███▌      | 357/1000 [14:28<26:18,  2.45s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/kognitivnyy-podhod-v-upravlenii&amp;usg=ALkJrhj8FyQK0bB2q-HOzN7iFM5KuOVlNw: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/kognitivnyy-podhod-v-upravlenii&amp;usg=ALkJrhj8FyQK0bB2q-HOzN7iFM5KuOVlNw


Processing URLs:  36%|███▌      | 359/1000 [14:31<20:21,  1.91s/it]

URL filtered: https://twitter.com/AHoweBlogger/status/938084955242065920


Processing URLs:  36%|███▌      | 362/1000 [14:32<12:13,  1.15s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=es&amp;tl=en&amp;u=http%3A%2F%2Fwww.2001.com.ve%2Fen-la-agenda%2F123879%2Fjose-guerra--gobierno-debera-pagar-la-deuda-externa.html&amp;sandbox=1: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=es&amp;tl=en&amp;u=http%3A%2F%2Fwww.2001.com.ve%2Fen-la-agenda%2F123879%2Fjose-guerra--gobierno-debera-pagar-la-deuda-externa.html&amp;sandbox=1


Processing URLs:  36%|███▋      | 363/1000 [14:34<12:07,  1.14s/it]

Error extracting text from http://www.caam.org.cn/zhengceyanjiu/20160816/1005197357.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/zhengceyanjiu/20160816/1005197357.html


Processing URLs:  37%|███▋      | 366/1000 [14:36<09:57,  1.06it/s]

Error extracting text from https://www.nytimes.com/2018/08/05/technology/amazon-headquarters-hq2.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/08/05/technology/amazon-headquarters-hq2.html


Processing URLs:  37%|███▋      | 368/1000 [14:39<12:45,  1.21s/it]

Error extracting text from https://www.iea.org/oilmarketreport/omrpublic/: 404 Client Error: Not Found for url: https://www.iea.org/oilmarketreport/omrpublic/


Processing URLs:  37%|███▋      | 372/1000 [14:43<09:14,  1.13it/s]

Error extracting text from https://larswericson.wordpress.com/2016/03/30/gitrep-29mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/30/gitrep-29mar16pm/


Processing URLs:  37%|███▋      | 374/1000 [15:23<1:34:12,  9.03s/it]

Error extracting text from http://atimes.com/2016/07/why-china-will-hold-its-fire-in-the-south-china-sea-until-september/: 404 Client Error: Not Found for url: https://atimes.com/2016/07/why-china-will-hold-its-fire-in-the-south-china-sea-until-september/
URL filtered: https://www.youtube.com/watch?v=TnkaZFxO3zY#t=10.678229


Processing URLs:  38%|███▊      | 376/1000 [15:23<51:45,  4.98s/it]  

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/02/10/Grandson-of-Iran-s-khomeini-fails-election-appeal-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/02/10/Grandson-of-Iran-s-khomeini-fails-election-appeal-.html


Processing URLs:  38%|███▊      | 379/1000 [15:27<27:55,  2.70s/it]

Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=8599: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=8599


Processing URLs:  38%|███▊      | 381/1000 [15:29<19:08,  1.86s/it]

Error extracting text from http://www.sj-r.com/news/20170118/new-tax-on-sodas-sugary-drinks-it-might-be-part-of-illinois-budget-deal: 404 Client Error: OK for url: https://www.sj-r.com/news/20170118/new-tax-on-sodas-sugary-drinks-it-might-be-part-of-illinois-budget-deal


Processing URLs:  38%|███▊      | 385/1000 [15:35<15:49,  1.54s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-politics/afghanistan-political-turmoil-deepens-as-regional-leader-ousted-idUSKBN1EE16A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-politics/afghanistan-political-turmoil-deepens-as-regional-leader-ousted-idUSKBN1EE16A


Processing URLs:  40%|████      | 400/1000 [15:59<15:17,  1.53s/it]

Error extracting text from http://post.understandingwar.org/report/jabhat-al-nusra-and-isis-sources-strength%20: HTTPConnectionPool(host='post.understandingwar.org', port=80): Max retries exceeded with url: /report/jabhat-al-nusra-and-isis-sources-strength%20 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300eba840>: Failed to resolve 'post.understandingwar.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 403/1000 [16:05<19:00,  1.91s/it]

Error extracting text from http://www.ibtimes.com/kim-jong-un-seeks-modern-precise-rockets-think-tank-says-north-korea-has-ballistic-2166354: 403 Client Error: Forbidden for url: https://www.ibtimes.com/kim-jong-un-seeks-modern-precise-rockets-think-tank-says-north-korea-has-ballistic-2166354
URL filtered: http://www.bloomberg.com/news/articles/2015-10-04/brazil-s-justice-minister-says-audit-no-grounds-for-impeachment


Processing URLs:  41%|████      | 407/1000 [16:09<12:51,  1.30s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/will-not-rush-to-negotiate-nafta-with-donald-trump-mexico/articleshow/57322473.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/will-not-rush-to-negotiate-nafta-with-donald-trump-mexico/articleshow/57322473.cms


Processing URLs:  41%|████      | 408/1000 [16:09<09:55,  1.01s/it]

Error extracting text from http://www.consilium.europa.eu/press-releases-pdf/2015/12/40802207128_en_635859964800000000.pdf: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/press-releases-pdf/2015/12/40802207128_en_635859964800000000.pdf


Processing URLs:  41%|████▏     | 414/1000 [16:17<10:05,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN18H0A6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN18H0A6


Processing URLs:  42%|████▏     | 415/1000 [16:19<12:09,  1.25s/it]

Error extracting text from http://www.ibtimes.com/palestinian-authority-official-aide-erekat-arrested-reportedly-spying-israel-2268447: 403 Client Error: Forbidden for url: https://www.ibtimes.com/palestinian-authority-official-aide-erekat-arrested-reportedly-spying-israel-2268447
Error extracting text from https://www.reuters.com/article/us-germany-politics/germanys-spd-vows-to-clash-with-down-for-the-count-merkel-idUSKCN1FY1U6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/germanys-spd-vows-to-clash-with-down-for-the-count-merkel-idUSKCN1FY1U6


Processing URLs:  42%|████▏     | 417/1000 [16:20<10:05,  1.04s/it]

Error extracting text from https://www.fire.ca.gov/incidents/2020/8/17/lnu-lightning-complex-includes-hennessey-gamble-15-10-spanish-markley-13-4-11-16-walbridge/: 403 Client Error: Forbidden for url: https://www.fire.ca.gov/incidents/2020/8/17/lnu-lightning-complex-includes-hennessey-gamble-15-10-spanish-markley-13-4-11-16-walbridge/


Processing URLs:  42%|████▏     | 424/1000 [16:36<20:17,  2.11s/it]

Error extracting text from http://bit.ly/2vNogdS: 404 Client Error: Not Found for url: https://kenyannews.co.ke/tuko-news/raila-odingas-worst-fear-in-nairobi-could-be-coming-true-as-western-kenya-bound-buses-are-booked-full-13199/


Processing URLs:  42%|████▎     | 425/1000 [16:37<15:20,  1.60s/it]

Error extracting text from http://www.marinetraffic.com/en/ais/home/centerx:17/centery:37/zoom:7: 403 Client Error: Forbidden for url: https://www.marinetraffic.com/en/ais/home/centerx:17/centery:37/zoom:7


Processing URLs:  43%|████▎     | 426/1000 [16:38<13:41,  1.43s/it]

URL filtered: https://www.youtube.com/watch?v=FLE6pEeBGBw


Processing URLs:  43%|████▎     | 430/1000 [16:41<11:13,  1.18s/it]

Error extracting text from http://www.stripes.com/news/japan-scrambles-jets-after-chinese-planes-fly-over-miyako-strait-1.431096: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/japan-scrambles-jets-after-chinese-planes-fly-over-miyako-strait-1.431096


Processing URLs:  43%|████▎     | 432/1000 [16:42<08:14,  1.15it/s]

Error extracting text from http://www.latimes.com/world/mexico-americas/la-fg-colombia-farc-20160324-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/mexico-americas/la-fg-colombia-farc-20160324-story.html


Processing URLs:  44%|████▎     | 436/1000 [16:46<06:45,  1.39it/s]

Error extracting text from https://bit.ly/3e3y5MO: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/documents-publications/treaties-agreements/agreement/?id=2020025&DocLanguage=en
Error extracting text from https://southfront.org/us-combat-adviser-mission-in-iraq-expands-to-battalion-level-amid-preparations-for-mosul-advance/: HTTPSConnectionPool(host='southfront.org', port=443): Max retries exceeded with url: /us-combat-adviser-mission-in-iraq-expands-to-battalion-level-amid-preparations-for-mosul-advance/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe843830>: Failed to resolve 'southfront.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▎     | 437/1000 [16:46<05:56,  1.58it/s]

Error extracting text from http://www.chinapost.com.tw/china/national-news/2016/11/28/485183/Top-firms.htm: HTTPConnectionPool(host='www.chinapost.com.tw', port=80): Max retries exceeded with url: /china/national-news/2016/11/28/485183/Top-firms.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe842060>: Failed to resolve 'www.chinapost.com.tw' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▍     | 438/1000 [16:47<05:59,  1.57it/s]

Error extracting text from https://reut.rs/3b7DArU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/british-irish-govts-agree-june-meeting-northern-ireland-2021-05-05/


Processing URLs:  44%|████▍     | 439/1000 [16:48<06:17,  1.48it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/01/06/0200000000AEN20160106005000315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  45%|████▍     | 447/1000 [17:01<16:32,  1.80s/it]

Error extracting text from http://ec.europa.eu/taxation_customs/customs/customs_controls/cash_controls/index_en.htm: 404 Client Error: Not Found for url: https://taxation-customs.ec.europa.eu/customs/customs_controls/cash_controls/index_en.htm


Processing URLs:  45%|████▍     | 448/1000 [17:02<13:23,  1.46s/it]

Error extracting text from https://blogs.intralinks.com/2017/03/latin-america-ma-rollercoaster-hits-latest-barrel-roll/#: HTTPSConnectionPool(host='blogs.intralinks.com', port=443): Max retries exceeded with url: /2017/03/latin-america-ma-rollercoaster-hits-latest-barrel-roll/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blogs.intralinks.com'. (_ssl.c:1000)")))


Processing URLs:  46%|████▌     | 455/1000 [17:13<13:29,  1.49s/it]

Error extracting text from http://www.wkyc.com/news/nation-now/how-sailors-and-marines-got-an-assault-ship-ready-to-take-on-isis/255861896: 404 Client Error: Not Found for url: https://www.wkyc.com/news/nation-now/how-sailors-and-marines-got-an-assault-ship-ready-to-take-on-isis/255861896


Processing URLs:  46%|████▌     | 461/1000 [17:22<09:24,  1.05s/it]

Error extracting text from http://www.cameroon-concord.com/daily-news/item/5458-revealed-prince-charles-to-visit-iran: 404 Client Error: Not Found for url: https://www.cameroon-concord.com/daily-news/item/5458-revealed-prince-charles-to-visit-iran
URL filtered: https://twitter.com/DeputySecState/status/1423026455844298754
URL filtered: https://www.youtube.com/watch?v=9_chcr16C_Q
Error extracting text from http://www.nytimes.com/2015/11/17/world/middleeast/us-strikes-syria-oil.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/17/world/middleeast/us-strikes-syria-oil.html


Processing URLs:  46%|████▌     | 462/1000 [17:23<09:18,  1.04s/it]

Error extracting text from http://www.globalintelligencetrust.com/single-post/2016/08/17/WTO-Membership-for-Iran-A-Catalyst-for-Economic-Growth-and-Decentralization-of-Power: 404 Client Error: Not Found for url: http://www.globalintelligencetrust.com/single-post/2016/08/17/WTO-Membership-for-Iran-A-Catalyst-for-Economic-Growth-and-Decentralization-of-Power


Processing URLs:  47%|████▋     | 474/1000 [17:40<10:18,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-medics-idUSKBN13H23Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-medics-idUSKBN13H23Q


Processing URLs:  48%|████▊     | 475/1000 [17:41<08:42,  1.01it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/latest_polls/elections/: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/latest_polls/elections/


Processing URLs:  48%|████▊     | 477/1000 [17:42<07:05,  1.23it/s]

Error extracting text from http://www.basnews.com/index.php/en/news/kurdistan/271883: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/kurdistan/271883


Processing URLs:  48%|████▊     | 479/1000 [17:45<08:58,  1.03s/it]

URL filtered: https://www.businessinsider.com/new-trump-ad-biden-cognitive-decline-youtube-fox-news-2020-8?r=US&IR=T


Processing URLs:  49%|████▉     | 488/1000 [18:04<26:52,  3.15s/it]

URL filtered: http://www.politico.com/story/2017/10/13/twitter-russia-data-deleted-investigation-243730


Processing URLs:  49%|████▉     | 493/1000 [18:09<09:58,  1.18s/it]

Error extracting text from https://www.researchgate.net/profile/Maria_Cabrera2/publication/259010176_LHC_and_dark_matter_phenomenology_of_the_NUGHM/links/547de2a70cf2cfe203c2250d.pdf?disableCoverPage=true&amp;inViewer=true&amp;origin=publication_detail&amp;pdfJsDownload=true: 403 Client Error: Forbidden for url: https://www.researchgate.net/profile/Maria_Cabrera2/publication/259010176_LHC_and_dark_matter_phenomenology_of_the_NUGHM/links/547de2a70cf2cfe203c2250d.pdf?disableCoverPage=true&amp;inViewer=true&amp;origin=publication_detail&amp;pdfJsDownload=true
Error extracting text from http://www.straitstimes.com/asia/se-asia/talks-with-beijing-hinge-on-tribunal-ruling-manila: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  49%|████▉     | 494/1000 [18:09<08:20,  1.01it/s]

Error extracting text from https://thehill.com/homenews/house/534001-scalise-labels-capitol-rioting-domestic-terrorism: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/534001-scalise-labels-capitol-rioting-domestic-terrorism/


Processing URLs:  50%|████▉     | 495/1000 [18:09<06:31,  1.29it/s]

Error extracting text from https://www.nytimes.com/2017/03/06/us/politics/affordable-care-act-obamacare-health.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/06/us/politics/affordable-care-act-obamacare-health.html


Processing URLs:  50%|████▉     | 497/1000 [18:14<12:47,  1.53s/it]

Error extracting text from https://kitup.military.com/2017/09/m3e1.html: HTTPSConnectionPool(host='kitup.military.com', port=443): Max retries exceeded with url: /2017/09/m3e1.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  50%|█████     | 501/1000 [18:22<13:00,  1.56s/it]

Error extracting text from http://www.wsj.com/articles/gop-lawmakers-advance-bipartisan-effort-to-reauthorize-ex-im-bank-1444406397: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gop-lawmakers-advance-bipartisan-effort-to-reauthorize-ex-im-bank-1444406397


Processing URLs:  50%|█████     | 502/1000 [18:25<14:44,  1.78s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/06/26/762417/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/06/26/762417/story.html


Processing URLs:  50%|█████     | 505/1000 [18:26<07:21,  1.12it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-24/eu-said-to-search-for-google-solution-that-stands-test-of-time
Error extracting text from http://www.reuters.com/article/us-china-corruption-statistics-idUSKCN11112X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-corruption-statistics-idUSKCN11112X


Processing URLs:  51%|█████     | 509/1000 [18:43<19:39,  2.40s/it]

Error extracting text from https://news.usni.org/2017/09/26/report-russia-continues-use-nuclear-threats-intimidate-neighbors: 403 Client Error: Forbidden for url: https://news.usni.org/2017/09/26/report-russia-continues-use-nuclear-threats-intimidate-neighbors


Processing URLs:  51%|█████     | 511/1000 [18:46<14:31,  1.78s/it]

Error extracting text from http://www.wsj.com/articles/syria-hospital-hit-in-airstrike-blamed-on-russia-1461841686: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-hospital-hit-in-airstrike-blamed-on-russia-1461841686
URL filtered: http://www.bloomberg.com/news/articles/2016-01-19/south-africa-vows-to-avoid-downgrade-to-junk-business-day-says


Processing URLs:  52%|█████▏    | 515/1000 [18:49<09:46,  1.21s/it]

Error extracting text from http://www.ndb.int/medias/brics-bank-launch-twin-bonds/: 403 Client Error: Forbidden for url: https://www.ndb.int/medias/brics-bank-launch-twin-bonds/


Processing URLs:  52%|█████▏    | 524/1000 [19:01<06:34,  1.21it/s]

Error extracting text from http://www.wsj.com/articles/bank-of-japan-takes-fresh-action-1450412916: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bank-of-japan-takes-fresh-action-1450412916
Error extracting text from http://www.reuters.com/article/2015/09/24/us-mideast-crisis-syria-jets-idUSKCN0RO15V20150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/us-mideast-crisis-syria-jets-idUSKCN0RO15V20150924


Processing URLs:  53%|█████▎    | 530/1000 [19:08<09:13,  1.18s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-03/brazil-real-rises-as-traders-bet-impeachment-support-is-mounting


Processing URLs:  53%|█████▎    | 532/1000 [19:09<07:24,  1.05it/s]

Error extracting text from http://www.newsweek.com/wake-nuclear-deal-power-struggle-iran-371987: 403 Client Error: Forbidden for url: https://www.newsweek.com/wake-nuclear-deal-power-struggle-iran-371987
Error extracting text from https://www.reuters.com/world/us-issues-nord-stream-2-related-sanctions-russians-blinken-2021-08-20/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us-issues-nord-stream-2-related-sanctions-russians-blinken-2021-08-20/


Processing URLs:  54%|█████▎    | 535/1000 [19:19<16:47,  2.17s/it]

Error extracting text from http://www.consumerwatchdog.org/newsrelease/consumer-watchdog-calls-bonilla-restore-privacy-provisions-self-driving-bus-bill-says-as: 500 Server Error: Internal Server Error for url: https://consumerwatchdog.org/uncategorized/consumer-watchdog-calls-bonilla-restore-privacy-provisions-self-driving-bus-bill-says-as/


Processing URLs:  54%|█████▎    | 536/1000 [19:20<14:16,  1.85s/it]

Error extracting text from https://www.predictit.org/Contract/1565/Will-Republicans-have-a-brokered-convention-in-2016#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/1565/Will-Republicans-have-a-brokered-convention-in-2016#data


Processing URLs:  54%|█████▍    | 540/1000 [19:54<1:15:11,  9.81s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18407-nld-says-forming-a-government-not-easy.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18407-nld-says-forming-a-government-not-easy.html


Processing URLs:  54%|█████▍    | 544/1000 [19:59<24:35,  3.24s/it]  

Error extracting text from https://www.wsj.com/articles/goldman-sachs-under-fire-for-venezuela-bond-deal-1496100583: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/goldman-sachs-under-fire-for-venezuela-bond-deal-1496100583


Processing URLs:  55%|█████▍    | 545/1000 [20:00<19:17,  2.54s/it]

Error extracting text from https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=en&amp;reference=2018/0427(NLE): 404 Client Error: Not Found for url: https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=en&amp;reference=2018/0427(NLE)


Processing URLs:  55%|█████▍    | 546/1000 [20:01<14:07,  1.87s/it]

Error extracting text from https://www.wsj.com/articles/pressure-builds-on-venezuela-with-big-payments-due-this-week-1509010203: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pressure-builds-on-venezuela-with-big-payments-due-this-week-1509010203


Processing URLs:  55%|█████▍    | 547/1000 [20:04<18:37,  2.47s/it]

Error extracting text from http://www.thingsrelevant.com/magazine-synopsis/: 404 Client Error: Not Found for url: https://www.thingsrelevant.com/magazine-synopsis/


Processing URLs:  55%|█████▍    | 548/1000 [20:06<16:38,  2.21s/it]

URL filtered: https://www.metaculus.com/questions/4775/will-richard-spencer-receive-a-long-term-twitter-ban-before-2021/


Processing URLs:  56%|█████▌    | 557/1000 [20:16<06:35,  1.12it/s]

Error extracting text from https://www.timesofisrael.com/as-israel-gears-up-for-jerusalem-march-hamas-signals-it-may-fire-rockets-again/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/as-israel-gears-up-for-jerusalem-march-hamas-signals-it-may-fire-rockets-again/
URL filtered: http://www.bloomberg.com/news/articles/2016-09-12/china-s-infrastructure-planners-are-on-a-road-to-nowhere


Processing URLs:  56%|█████▌    | 562/1000 [20:25<13:13,  1.81s/it]

Error extracting text from http://thebulletin.org/will-south-korea-go-nuclear9778: 404 Client Error: Not Found for url: https://thebulletin.org/will-south-korea-go-nuclear9778/


Processing URLs:  56%|█████▋    | 565/1000 [20:29<10:44,  1.48s/it]

Error extracting text from https://www.amnesty.org/en/documents/afr16/3337/2016/en/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/documents/afr16/3337/2016/en/


Processing URLs:  57%|█████▋    | 568/1000 [20:36<12:03,  1.67s/it]

Error extracting text from http://www.cdm.me/english/kacin-negotiations-with-nato-will-take-up-to-three-months: 403 Client Error: Forbidden for url: https://www.cdm.me/english/kacin-negotiations-with-nato-will-take-up-to-three-months


Processing URLs:  57%|█████▋    | 573/1000 [20:45<11:26,  1.61s/it]

Error extracting text from http://www.amazon.com/Best-Sellers-Kindle-Store-Nonfiction/zgbs/digital-text/157325011#4: 503 Server Error: Service Unavailable for url: https://www.amazon.com/Best-Sellers-Kindle-Store-Nonfiction/zgbs/digital-text/157325011#4


Processing URLs:  57%|█████▊    | 575/1000 [20:47<09:18,  1.31s/it]

Error extracting text from https://ndews.umd.edu/sites/ndews.umd.edu/files/u1424/Dr.DanCiccaroneNDEWSwebinarFINAL033116.pdf: HTTPSConnectionPool(host='ndews.umd.edu', port=443): Max retries exceeded with url: /sites/ndews.umd.edu/files/u1424/Dr.DanCiccaroneNDEWSwebinarFINAL033116.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  58%|█████▊    | 580/1000 [21:05<20:18,  2.90s/it]

Error extracting text from https://www.reuters.com/article/britain-eu-withdrawal-agreement/uk-drops-threat-to-break-eu-exit-treaty-after-irish-border-agreement-idUSKBN28I1U3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-eu-withdrawal-agreement/uk-drops-threat-to-break-eu-exit-treaty-after-irish-border-agreement-idUSKBN28I1U3


Processing URLs:  58%|█████▊    | 582/1000 [21:06<12:21,  1.77s/it]

Error extracting text from https://medium.com/search?q=autonomous%20machines: 403 Client Error: Forbidden for url: https://medium.com/search?q=autonomous%20machines


Processing URLs:  59%|█████▊    | 586/1000 [21:15<11:19,  1.64s/it]

Error extracting text from https://theconversation.com/more-than-1-in-3-new-zealanders-remain-hesitant-or-sceptical-about-covid-19-vaccines-heres-how-to-reach-them-156489: 403 Client Error: Forbidden for url: https://theconversation.com/more-than-1-in-3-new-zealanders-remain-hesitant-or-sceptical-about-covid-19-vaccines-heres-how-to-reach-them-156489


Processing URLs:  59%|█████▊    | 587/1000 [21:17<12:06,  1.76s/it]

Error extracting text from http://www.tv5monde.com/cms/chaine-francophone/Revoir-nos-emissions/Et-si-vous-me-disiez-toute-la-verite/Episodes/p-31428-SOS-pour-le-Burundi.htm: 404 Client Error: Not Found for url: https://www.tv5monde.com/tv/videos/252-et-si-vous-me-disiez-toute-la-verite
Error extracting text from http://www.chinapost.com.tw/china/national-news/2016/04/08/462883/p1/China-Politburo.htm: HTTPConnectionPool(host='www.chinapost.com.tw', port=80): Max retries exceeded with url: /china/national-news/2016/04/08/462883/p1/China-Politburo.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3dae70>: Failed to resolve 'www.chinapost.com.tw' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  59%|█████▉    | 590/1000 [21:20<09:48,  1.44s/it]

Error extracting text from https://csglobalpartners.com/visa-free-travel-mean/: 403 Client Error: Forbidden for url: https://csglobalpartners.com/visa-free-travel-mean/


Processing URLs:  59%|█████▉    | 591/1000 [21:20<07:43,  1.13s/it]

Error extracting text from https://www.nytimes.com/2021/06/08/us/politics/filibuster-pay-equity.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/08/us/politics/filibuster-pay-equity.html
Error extracting text from https://www.reuters.com/world/us/biden-says-hopes-meet-putin-during-june-trip-europe-2021-05-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/biden-says-hopes-meet-putin-during-june-trip-europe-2021-05-04/


Processing URLs:  59%|█████▉    | 594/1000 [21:22<05:08,  1.32it/s]

Error extracting text from http://thehill.com/policy/finance/260118-week-ahead-crunch-time-for-highway-talks: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/260118-week-ahead-crunch-time-for-highway-talks/
URL filtered: https://www.bloomberg.com/news/articles/2017-06-22/amazon-vision-of-deliveries-by-drone-gets-boost-in-faa-measure


Processing URLs:  60%|█████▉    | 597/1000 [21:23<04:11,  1.60it/s]

Error extracting text from https://www.nytimes.com/2014/08/28/world/europe/ukraine-russia-novoazovsk-crimea.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2014/08/28/world/europe/ukraine-russia-novoazovsk-crimea.html?_r=0
Error extracting text from http://www.reuters.com/article/us-britain-eu-poll-idUSKCN0XP0PO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-poll-idUSKCN0XP0PO


Processing URLs:  60%|██████    | 601/1000 [21:26<04:55,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/fed-almost-certain-to-keep-interest-rates-unchanged-at-next-meeting-1452882343: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-almost-certain-to-keep-interest-rates-unchanged-at-next-meeting-1452882343


Processing URLs:  60%|██████    | 604/1000 [21:30<06:40,  1.01s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-02/oil-trades-near-one-month-low-as-u-s-fuel-supplies-seen-rising


Processing URLs:  61%|██████    | 607/1000 [21:36<11:18,  1.73s/it]

URL filtered: https://twitter.com/ActualidadRT/status/928087428539219968


Processing URLs:  61%|██████    | 611/1000 [21:50<19:28,  3.00s/it]

Error extracting text from https://raddingtonreport.com/djibouti-struggle-against-terrorism/: 503 Server Error: Service Temporarily Unavailable for url: https://raddingtonreport.com/djibouti-struggle-against-terrorism/


Processing URLs:  61%|██████    | 612/1000 [21:52<17:14,  2.67s/it]

Error extracting text from http://uk.reuters.com/article/uk-gulf-qatar-idUKKBN19A1BS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-russia-idUSKBN15P1VE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-russia-idUSKBN15P1VE


Processing URLs:  61%|██████▏   | 614/1000 [21:52<10:11,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-un-election-idUSKCN10G1Y2?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-idUSKCN10G1Y2?il=0


Processing URLs:  62%|██████▏   | 620/1000 [22:00<07:47,  1.23s/it]

Error extracting text from http://thehill.com/opinion/cybersecurity/357044-russian-hacking-highlights-need-for-greater-mobile-device-security: 403 Client Error: Forbidden for url: https://thehill.com/opinion/cybersecurity/357044-russian-hacking-highlights-need-for-greater-mobile-device-security/


Processing URLs:  62%|██████▏   | 622/1000 [22:01<05:47,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-iran-oil-production-idUSKBN15X0IH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-production-idUSKBN15X0IH


Processing URLs:  62%|██████▏   | 624/1000 [22:04<06:24,  1.02s/it]

Error extracting text from http://www.bakermckenzie.com/sanctionsnews/blog.aspx?topic=343: 403 Client Error: Forbidden for url: http://www.bakermckenzie.com/sanctionsnews/blog.aspx?topic=343
URL filtered: http://www.bloomberg.com/news/articles/2016-01-22/venezuela-s-pdvsa-says-debt-fell-by-2-billion-last-year


Processing URLs:  63%|██████▎   | 632/1000 [22:11<04:57,  1.24it/s]

Error extracting text from http://www.ayyaantuu.net/rubio-colleagues-condemn-ethiopias-crackdown-on-civil-society/: HTTPConnectionPool(host='www.ayyaantuu.net', port=80): Max retries exceeded with url: /rubio-colleagues-condemn-ethiopias-crackdown-on-civil-society/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300db3230>: Failed to resolve 'www.ayyaantuu.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africa-schedules-new-no-confidence-vote-against-zuma-idUSKBN1FM1B5?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africa-schedules-new-no-confidence-vote-against-zuma-idUSKBN1FM1B5?il=0


Processing URLs:  63%|██████▎   | 633/1000 [22:11<03:44,  1.63it/s]

Error extracting text from https://www.reuters.com/world/asia-pacific/taliban-fighters-capture-eighth-provincial-capital-six-days-2021-08-11/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/taliban-fighters-capture-eighth-provincial-capital-six-days-2021-08-11/


Processing URLs:  63%|██████▎   | 634/1000 [22:12<04:01,  1.52it/s]

Error extracting text from https://finance.yahoo.com/quote/FB/history?p=FB: 404 Client Error: Not Found for url: https://finance.yahoo.com/quote/FB/history?p=FB


Processing URLs:  64%|██████▍   | 638/1000 [22:15<04:31,  1.33it/s]

Error extracting text from http://www.nytimes.com/2016/05/05/business/tesla-says-it-will-sharply-ramp-up-production-of-model-3.html?emc=edit_th_20160505&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/05/business/tesla-says-it-will-sharply-ramp-up-production-of-model-3.html?emc=edit_th_20160505&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  64%|██████▍   | 639/1000 [22:18<07:47,  1.30s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/mirai+orders+2015.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/mirai+orders+2015/


Processing URLs:  64%|██████▍   | 643/1000 [22:26<11:02,  1.86s/it]

Error extracting text from http://atimes.com/2016/03/the-strategy-behind-chinas-adiz-in-the-east-china-sea/: 404 Client Error: Not Found for url: https://atimes.com/2016/03/the-strategy-behind-chinas-adiz-in-the-east-china-sea/


Processing URLs:  64%|██████▍   | 644/1000 [22:27<08:58,  1.51s/it]

Error extracting text from https://www.marketscreener.com/news/latest/Stocks-set-for-worst-day-in-1-month-as-virus-fears-resurface--33022541/: 403 Client Error: Forbidden. for url: https://www.marketscreener.com/news/latest/Stocks-set-for-worst-day-in-1-month-as-virus-fears-resurface--33022541/


Processing URLs:  65%|██████▍   | 646/1000 [23:29<1:52:51, 19.13s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-04-13/disappearances-blamed-on-the-police-deepen-fear-in-burundi: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  65%|██████▍   | 649/1000 [23:35<46:00,  7.87s/it]  

Error extracting text from http://www.ibtimes.co.uk/amazon-drone-approved-by-faa-already-obsolete-we-want-out-sight-flights-1493481: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/amazon-drone-approved-by-faa-already-obsolete-we-want-out-sight-flights-1493481


Processing URLs:  65%|██████▌   | 651/1000 [23:37<24:29,  4.21s/it]

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa/iran-will-scale-back-its-nuclear-commitments-if-2015-obligations-not-revived-idUSKBN2AF0HW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa/iran-will-scale-back-its-nuclear-commitments-if-2015-obligations-not-revived-idUSKBN2AF0HW


Processing URLs:  65%|██████▌   | 653/1000 [23:37<13:05,  2.26s/it]

Error extracting text from https://aminewswire.com/stories/510959242-terrorists-panic-as-citizens-unfurl-iraqi-flags-in-mosul: 404 Client Error: Not Found for url: https://aminewswire.com/stories/510959242-terrorists-panic-as-citizens-unfurl-iraqi-flags-in-mosul


Processing URLs:  66%|██████▌   | 657/1000 [23:42<07:44,  1.35s/it]

Error extracting text from http://www.barrons.com/articles/BL-231B-10828: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/BL-231B-10828


Processing URLs:  66%|██████▌   | 658/1000 [23:43<06:22,  1.12s/it]

Error extracting text from http://thehill.com/policy/healthcare/341748-fda-votes-to-recommend-gene-therapy-leukemia-treatment: 403 Client Error: Forbidden for url: https://thehill.com/policy/healthcare/341748-fda-votes-to-recommend-gene-therapy-leukemia-treatment/


Processing URLs:  66%|██████▌   | 660/1000 [23:45<05:18,  1.07it/s]

Error extracting text from https://medium.com/waymo/apply-to-be-part-of-waymos-early-rider-program-5fd996c7a86f: 403 Client Error: Forbidden for url: https://medium.com/waymo/apply-to-be-part-of-waymos-early-rider-program-5fd996c7a86f


Processing URLs:  66%|██████▌   | 662/1000 [23:55<14:35,  2.59s/it]

Error extracting text from https://www.jnj.com/johnson-johnson-initiates-pivotal-global-phase-3-clinical-trial-of-janssens-covid-19-vaccine-candidate: 403 Client Error: Forbidden for url: https://www.jnj.com/johnson-johnson-initiates-pivotal-global-phase-3-clinical-trial-of-janssens-covid-19-vaccine-candidate


Processing URLs:  66%|██████▋   | 663/1000 [23:57<14:20,  2.55s/it]

URL filtered: https://twitter.com/gemderosadiaz/status/929021514162278402
URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/saudi-arabia-lowers-crude-pricing-to-u-s-before-opec-meeting


Processing URLs:  67%|██████▋   | 669/1000 [24:03<07:42,  1.40s/it]

URL filtered: http://www.bloomberg.com/quote/BCOM:IND


Processing URLs:  67%|██████▋   | 672/1000 [24:16<14:37,  2.67s/it]

Error extracting text from https://finance.yahoo.com/news/oil-price-fundamental-daily-forecast-135059871.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/oil-price-fundamental-daily-forecast-135059871.html


Processing URLs:  68%|██████▊   | 677/1000 [24:25<10:38,  1.98s/it]

Error extracting text from https://www.wsj.com/articles/trump-moves-toward-backing-nato-candidate-over-russian-objections-1486775499: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-moves-toward-backing-nato-candidate-over-russian-objections-1486775499


Processing URLs:  68%|██████▊   | 678/1000 [24:26<08:10,  1.52s/it]

Error extracting text from http://gawker.com/how-is-donald-trump-going-to-quit-1782312998: 404 Client Error: Not Found for url: https://gawker.com/how-is-donald-trump-going-to-quit-1782312998


Processing URLs:  68%|██████▊   | 680/1000 [24:30<09:59,  1.87s/it]

Error extracting text from http://www.state.gov/secretary/remarks/2015/12/250876.htm: 404 Client Error: Not Found for url: https://www.state.gov/remarks-secretary-pompeo/


Processing URLs:  68%|██████▊   | 682/1000 [24:38<14:42,  2.78s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;ie=UTF8&amp;prev=_t&amp;rurl=translate.google.com&amp;sl=auto&amp;tl=en&amp;u=http://www.agenzianova.com/a/5696b0b8b96913.95301958/1278979/2016-01-13/difesa-immigrazione-fregata-aliseo-e-pattugliatore-spica-soccorrono-453-migranti-in-mare&amp;usg=ALkJrhgTRs3ziRSDr0gTDUi5aDv72Mdyww: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;ie=UTF8&amp;prev=_t&amp;rurl=translate.google.com&amp;sl=auto&amp;tl=en&amp;u=http://www.agenzianova.com/a/5696b0b8b96913.95301958/1278979/2016-01-13/difesa-immigrazione-fregata-aliseo-e-pattugliatore-spica-soccorrono-453-migranti-in-mare&amp;usg=ALkJrhgTRs3ziRSDr0gTDUi5aDv72Mdyww


Processing URLs:  69%|██████▉   | 689/1000 [24:57<13:13,  2.55s/it]

Error extracting text from http://www.nytimes.com/2015/11/17/world/middleeast/us-strikes-syria-oil.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/17/world/middleeast/us-strikes-syria-oil.html?_r=0


Processing URLs:  69%|██████▉   | 692/1000 [25:04<11:46,  2.30s/it]

Error extracting text from http://m.nzherald.co.nz/world/news/article.cfm?c_id=2&amp;objectid=11707736: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/world/news/article.cfm?c_id=2&amp;objectid=11707736


Processing URLs:  70%|██████▉   | 697/1000 [25:10<06:13,  1.23s/it]

Error extracting text from http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?em_pos=large&amp;emc=edit_nn_20160527&amp;nl=morning-briefing&amp;nlid=52725637: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?em_pos=large&amp;emc=edit_nn_20160527&amp;nl=morning-briefing&amp;nlid=52725637


Processing URLs:  70%|██████▉   | 698/1000 [25:13<08:52,  1.76s/it]

URL filtered: http://www.recode.net/2017/3/4/14816254/facebook-fake-news-disputed-trump-snopes-politifact-seattle-tribune


Processing URLs:  70%|███████   | 700/1000 [25:15<07:06,  1.42s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3876262/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3876262/


Processing URLs:  70%|███████   | 703/1000 [25:20<07:55,  1.60s/it]

Error extracting text from http://www.thenational.ae/uae/government/only-arab-nations-can-fix-syria-says-nicolas-sarkozy: 404 Client Error: Not Found for url: https://www.thenationalnews.com/uae/government/only-arab-nations-can-fix-syria-says-nicolas-sarkozy/


Processing URLs:  70%|███████   | 705/1000 [25:23<07:11,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-north-korea-commentary-idUSKBN16F1TJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-korea-commentary-idUSKBN16F1TJ


Processing URLs:  71%|███████   | 707/1000 [25:24<05:16,  1.08s/it]

Error extracting text from http://www.tradingeconomics.com/canada/inflation-cpi: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/canada/inflation-cpi


Processing URLs:  71%|███████   | 708/1000 [25:25<04:14,  1.15it/s]

Error extracting text from https://www.scientificamerican.com/article/republican-platform-rejects-paris-climate-agreement/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/republican-platform-rejects-paris-climate-agreement/


Processing URLs:  71%|███████▏  | 713/1000 [25:35<08:56,  1.87s/it]

Error extracting text from http://europe.newsweek.com/why-putin-could-support-brexit-435617?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/why-putin-could-support-brexit-435617


Processing URLs:  71%|███████▏  | 714/1000 [25:36<06:54,  1.45s/it]

Error extracting text from https://www.oaktreecapital.com/docs/default-source/memos/there-they-go-again-again.pdf: PyCryptodome is required for AES algorithm


Processing URLs:  72%|███████▏  | 715/1000 [26:36<1:30:40, 19.09s/it]

Error extracting text from http://www.miamiherald.com/news/politics-government/article78460847.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 718/1000 [26:38<32:29,  6.91s/it]  

Error extracting text from https://larswericson.wordpress.com/2016/03/18/gitrep-17mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/18/gitrep-17mar16pm/
Error extracting text from https://www.middleeastmonitor.com/news/middle-east/24750-french-website-pa-held-80-secret-security-meetings-with-israel-last-year: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/middle-east/24750-french-website-pa-held-80-secret-security-meetings-with-israel-last-year


Processing URLs:  72%|███████▏  | 719/1000 [26:38<22:48,  4.87s/it]

Error extracting text from http://www.khaama.com/afghan-mps-approve-stanikzai-and-habibi-as-nds-and-defense-minister-01306: 403 Client Error: Forbidden for url: http://www.khaama.com/afghan-mps-approve-stanikzai-and-habibi-as-nds-and-defense-minister-01306
URL filtered: https://www.youtube.com/watch?v=mEUKcN8dQ7o


Processing URLs:  72%|███████▏  | 721/1000 [26:39<12:26,  2.67s/it]

Error extracting text from http://www.washingtontimes.com/news/2015/dec/9/house-leaders-introduce-short-term-bill-keep-gover/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/dec/9/house-leaders-introduce-short-term-bill-keep-gover/


Processing URLs:  72%|███████▏  | 722/1000 [26:40<11:11,  2.42s/it]

Error extracting text from http://www.themoscowtimes.com/business/article/russian-ministry-predicts-more-recession-lower-incomes-and-less-employment/555849.html: 500 Server Error: Internal Server Error for url: https://www.themoscowtimes.com/business/article/russian-ministry-predicts-more-recession-lower-incomes-and-less-employment/555849.html


Processing URLs:  73%|███████▎  | 727/1000 [26:47<06:28,  1.42s/it]

Error extracting text from http://www.nytimes.com/2016/06/09/opinion/the-senates-confirmation-shutdown.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/09/opinion/the-senates-confirmation-shutdown.html


Processing URLs:  73%|███████▎  | 729/1000 [26:48<03:48,  1.19it/s]

Error extracting text from https://www.nytimes.com/2017/11/29/us/doug-jones-roy-moore-black-voters.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/29/us/doug-jones-roy-moore-black-voters.html


Processing URLs:  74%|███████▎  | 735/1000 [26:54<04:29,  1.02s/it]

Error extracting text from http://www.financialexpress.com/article/economy/janet-yellen-must-hike-rates-for-fed-up-markets-this-week/178657/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/economy/janet-yellen-must-hike-rates-for-fed-up-markets-this-week/178657/
URL filtered: https://www.bloomberg.com/politics/articles/2016-12-19/china-rejects-trump-s-comment-that-it-stole-u-s-naval-drone


Processing URLs:  74%|███████▍  | 742/1000 [27:10<06:00,  1.40s/it]

Error extracting text from http://www.cdm.me/english/putins-request-he-demands-recognition-of-dominance-across-eastern-europe-including-montenegro: 403 Client Error: Forbidden for url: https://www.cdm.me/english/putins-request-he-demands-recognition-of-dominance-across-eastern-europe-including-montenegro
Error extracting text from http://www.nytimes.com/2016/04/10/us/politics/primary-process-is-seen-as-in-conflict-with-democracy.html?emc=edit_th_20160410&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/10/us/politics/primary-process-is-seen-as-in-conflict-with-democracy.html?emc=edit_th_20160410&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  74%|███████▍  | 744/1000 [27:16<09:34,  2.24s/it]

Error extracting text from http://www.acleddata.com/wp-content/uploads/2017/01/ACLED_Codebook_2017.pdf: 404 Client Error: Not Found for url: https://acleddata.com/wp-content/uploads/2017/01/ACLED_Codebook_2017.pdf


Processing URLs:  74%|███████▍  | 745/1000 [27:19<10:50,  2.55s/it]

URL filtered: https://twitter.com/planet4589/status/1389062285482762240


Processing URLs:  75%|███████▍  | 747/1000 [27:22<08:28,  2.01s/it]

URL filtered: https://twitter.com/Newsphotog72/status/666411282485133312


Processing URLs:  75%|███████▍  | 749/1000 [28:22<54:47, 13.10s/it]

Error extracting text from https://www.cmegroup.com/markets/metals/ferrous/us-midwest-domestic-steel-premium-cru.quotes.html: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  75%|███████▌  | 752/1000 [28:24<25:26,  6.15s/it]

Error extracting text from http://www.tradingeconomics.com/malaysia/gdp-growth-annual: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/malaysia/gdp-growth-annual
Error extracting text from http://www.khaosodenglish.com/politics/2016/12/27/thailand-2017-elections-bigger-protests/: 403 Client Error: Forbidden for url: https://www.khaosodenglish.com/politics/2016/12/27/thailand-2017-elections-bigger-protests/


Processing URLs:  75%|███████▌  | 753/1000 [28:24<18:49,  4.57s/it]

Error extracting text from https://www.breakingtravelnews.com/news/article/expo-2020-welcomes-11-million-guests-to-date/: 403 Client Error: Forbidden for url: https://www.breakingtravelnews.com/news/article/expo-2020-welcomes-11-million-guests-to-date/


Processing URLs:  76%|███████▌  | 755/1000 [28:28<12:47,  3.13s/it]

Error extracting text from https://www.google.ca/amp/www.bbc.co.uk/news/amp/37487149?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: https://www.bbc.co.uk/news/37487149.amp


Processing URLs:  76%|███████▌  | 756/1000 [28:29<10:33,  2.59s/it]

Error extracting text from https://www.reuters.com/article/us-usa-security-kaspersky-russia/kaspersky-lab-to-open-software-to-review-says-nothing-to-hide-idUSKBN1CS0Y1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-security-kaspersky-russia/kaspersky-lab-to-open-software-to-review-says-nothing-to-hide-idUSKBN1CS0Y1


Processing URLs:  76%|███████▌  | 759/1000 [28:31<05:40,  1.41s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=hr&amp;u=http://politika24.net/zapad-na-nogama-putin-posjeduje-tajno-oruzje/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=hr&amp;u=http://politika24.net/zapad-na-nogama-putin-posjeduje-tajno-oruzje/&amp;prev=search


Processing URLs:  76%|███████▌  | 761/1000 [28:35<06:35,  1.65s/it]

Error extracting text from http://www.euroinvestor.com/exchanges/gtis-energy/brent-oil/2327059/chart: 404 Client Error: Not Found for url: https://www.euroinvestor.dk/exchanges/gtis-energy/brent-oil/2327059/chart


Processing URLs:  76%|███████▋  | 764/1000 [28:39<05:31,  1.40s/it]

URL filtered: http://www.economist.com/news/europe/21715031-kremlin-backed-network-inflates-its-viewership-youtube-disaster-videos-rts-propaganda


Processing URLs:  77%|███████▋  | 770/1000 [28:46<05:05,  1.33s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-11/business-leaders-urge-zuma-to-sell-south-africa-state-assets


Processing URLs:  77%|███████▋  | 773/1000 [28:49<03:57,  1.05s/it]

Error extracting text from http://www.bt.com.bn/news-national/2016/02/16/negotiators-optimistic-over-rcep-final-deal: HTTPConnectionPool(host='www.bt.com.bn', port=80): Max retries exceeded with url: /news-national/2016/02/16/negotiators-optimistic-over-rcep-final-deal (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3040eb2c0>: Failed to resolve 'www.bt.com.bn' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  78%|███████▊  | 782/1000 [29:12<07:43,  2.13s/it]

Error extracting text from http://sci-techuniverse.blogspot.com/2015/12/lockheed-martins-new-compact-fusion.html: 404 Client Error: Not Found for url: https://sci-techuniverse.blogspot.com/2015/12/lockheed-martins-new-compact-fusion.html


Processing URLs:  79%|███████▊  | 787/1000 [29:19<05:01,  1.41s/it]

Error extracting text from http://www.emergingmarketsmonitor.com/market-strategy-sovereign-default-likely-2017-14-nov-2016: HTTPConnectionPool(host='www.emergingmarketsmonitor.com', port=80): Max retries exceeded with url: /market-strategy-sovereign-default-likely-2017-14-nov-2016 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303322ff0>: Failed to resolve 'www.emergingmarketsmonitor.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▉  | 791/1000 [29:20<02:03,  1.69it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/top-us-general-iraq-warns-mosul-dam-collapse-36569327: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/top-us-general-iraq-warns-mosul-dam-collapse-36569327
URL filtered: https://socialblade.com/youtube/user/pewdiepie/realtime
URL filtered: https://www.bloomberg.com/quote/CO1:COM
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.noticiasaominuto.com.br/politica/193950/aliado-de-temer-diz-que-impeachment-de-dilma-depende-das-ruas&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.noticiasaominuto.com.br/politica/193950/aliado-de-temer-diz-que-impeachment-de-dilma-depende-das-ruas&amp;prev=search


Processing URLs:  80%|████████  | 805/1000 [30:26<25:40,  7.90s/it]

Error extracting text from http://allafrica.com/stories/201705230247.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201705230247.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30398a3f0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  81%|████████  | 811/1000 [30:42<09:53,  3.14s/it]

Error extracting text from http://www.jerusalemonline.com/news/world-news/around-the-globe/russophilia-and-russian-expansionism-in-the-middle-east-19717: HTTPConnectionPool(host='www.jerusalemonline.com', port=80): Max retries exceeded with url: /news/world-news/around-the-globe/russophilia-and-russian-expansionism-in-the-middle-east-19717 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301867080>: Failed to resolve 'www.jerusalemonline.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 820/1000 [30:58<03:45,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-ross-idUSKBN1522BH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-ross-idUSKBN1522BH


Processing URLs:  82%|████████▏ | 822/1000 [31:01<04:10,  1.40s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-12/hammond-says-u-k-is-seeking-status-quo-brexit-transition


Processing URLs:  83%|████████▎ | 829/1000 [31:15<04:52,  1.71s/it]

Error extracting text from http://news.yahoo.com/air-strike-iraqs-mosul-targets-millions-cash-us-203832907.html: 404 Client Error: Not Found for url: http://news.yahoo.com/air-strike-iraqs-mosul-targets-millions-cash-us-203832907.html


Processing URLs:  83%|████████▎ | 832/1000 [31:21<04:58,  1.77s/it]

Error extracting text from https://www.bbc.com/news/world-europe-59712020.: 404 Client Error: Not Found for url: https://www.bbc.com/news/world-europe-59712020.


Processing URLs:  84%|████████▎ | 836/1000 [31:26<03:33,  1.30s/it]

Error extracting text from http://www.defense.gov/Portals/1/features/2014/0814_iraq/docs/20151228-01_CENTCOM_News_Release_--_Statement_by_the_U.S._Central_Command_Commander_on_ISF_Progress_in_Ramadi.pdf: 404 Client Error: Not Found for url: https://www.defense.gov/Portals/1/features/2014/0814_iraq/docs/20151228-01_CENTCOM_News_Release_--_Statement_by_the_U.S._Central_Command_Commander_on_ISF_Progress_in_Ramadi.pdf
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13X1EI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13X1EI


Processing URLs:  84%|████████▎ | 837/1000 [31:27<02:49,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/mosul-offensive-to-begin-within-a-month-u-s-general-says-1473339951?mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/mosul-offensive-to-begin-within-a-month-u-s-general-says-1473339951?mg=id-wsj


Processing URLs:  84%|████████▍ | 838/1000 [31:27<02:42,  1.00s/it]

Error extracting text from http://www.dailytrust.com.ng/news/general/plateau-monarch-s-killing-sparks-violence/156105.html: 404 Client Error: Not Found for url: https://www.dailytrust.com.ng/news/general/plateau-monarch-s-killing-sparks-violence/156105.html


Processing URLs:  84%|████████▍ | 842/1000 [31:32<02:41,  1.02s/it]

Error extracting text from http://www.familysecuritymatters.org/publications/detail/russian-naval-expansion-threatens-us-influence-in-the-western-hemisphere?f=must_reads#ixzz46ar8D7uI: 403 Client Error: Forbidden for url: https://www.familysecuritymatters.org/publications/detail/russian-naval-expansion-threatens-us-influence-in-the-western-hemisphere?f=must_reads#ixzz46ar8D7uI
Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-talks-idUSKBN1AM0BI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-talks-idUSKBN1AM0BI


Processing URLs:  85%|████████▍ | 846/1000 [31:37<02:23,  1.07it/s]

Error extracting text from http://en.iranwire.com/features/7194/: 403 Client Error: Forbidden for url: https://en.iranwire.com/features/7194/
Error extracting text from http://www.reuters.com/article/us-northkorea-usa-idUSKBN17W04T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-idUSKBN17W04T


Processing URLs:  85%|████████▍ | 847/1000 [31:37<01:45,  1.44it/s]

Error extracting text from http://www.nytimes.com/2015/10/14/business/economy/a-2nd-fed-governor-opposes-raising-rates-this-year-breaking-with-yellen.html?partner=msft_msn: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/14/business/economy/a-2nd-fed-governor-opposes-raising-rates-this-year-breaking-with-yellen.html?partner=msft_msn


Processing URLs:  85%|████████▌ | 854/1000 [31:46<02:38,  1.08s/it]

Error extracting text from http://www.nytimes.com/2015/11/15/world/europe/attacks-in-paris-add-urgency-to-talks-on-ending-syria-war.html?smid=nytcore-ipad-share&amp;smprod=nytcore-ipad: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/15/world/europe/attacks-in-paris-add-urgency-to-talks-on-ending-syria-war.html?smid=nytcore-ipad-share&amp;smprod=nytcore-ipad


Processing URLs:  86%|████████▌ | 857/1000 [31:50<02:25,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/07/12/health/fda-novartis-leukemia-gene-medicine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/12/health/fda-novartis-leukemia-gene-medicine.html


Processing URLs:  86%|████████▌ | 860/1000 [31:59<05:20,  2.29s/it]

URL filtered: https://www.youtube.com/watch?v=SH2bbxp46CQ


Processing URLs:  86%|████████▌ | 862/1000 [31:59<03:05,  1.35s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2015/10/28/what-a-russian-win-in-syria-would-look-like/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2015/10/28/what-a-russian-win-in-syria-would-look-like/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d9220>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  87%|████████▋ | 866/1000 [32:04<02:11,  1.02it/s]

Error extracting text from http://www.army.mil: 403 Client Error: Forbidden for url: http://www.army.mil/


Processing URLs:  87%|████████▋ | 874/1000 [32:16<03:10,  1.51s/it]

Error extracting text from https://www.stripes.com/mattis-camp-david-meeting-will-move-trump-toward-afghanistan-war-decision-1.483415#.WZX_nDMfl-U: 404 Client Error: Not Found for url: https://www.stripes.com/mattis-camp-david-meeting-will-move-trump-toward-afghanistan-war-decision-1.483415#.WZX_nDMfl-U


Processing URLs:  88%|████████▊ | 879/1000 [32:24<03:04,  1.52s/it]

Error extracting text from https://www.consilium.europa.eu/en/documents-publications/treaties-agreements/agreement/?id=2020025&amp;DocLanguage=en: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/documents-publications/treaties-agreements/agreement/?id=2020025&amp;DocLanguage=en


Processing URLs:  88%|████████▊ | 880/1000 [32:25<02:51,  1.43s/it]

URL filtered: https://m.facebook.com/MosulEyee/posts/925082790946557
URL filtered: https://mobile.twitter.com/nickeardleybbc/status/841303874417917952?ref_src=twsrc%5Etfw


Processing URLs:  88%|████████▊ | 884/1000 [32:30<02:46,  1.43s/it]

Error extracting text from https://sentientmedia.org/which-countries-produce-the-most-meat/: 403 Client Error: Forbidden for url: https://sentientmedia.org/which-countries-produce-the-most-meat/
URL filtered: http://www.bloomberg.com/news/articles/2017-08-09/pimco-and-t-rowe-price-warn-investors-it-s-time-to-reduce-risk


Processing URLs:  89%|████████▉ | 889/1000 [32:35<02:14,  1.22s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045642/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045642/


Processing URLs:  89%|████████▉ | 894/1000 [32:49<04:35,  2.60s/it]

Error extracting text from https://gcaptain.com/u-s-concerned-chinas-new-coast-guard-law-could-escalate-maritime-disputes/: 403 Client Error: Forbidden for url: https://gcaptain.com/u-s-concerned-chinas-new-coast-guard-law-could-escalate-maritime-disputes/


Processing URLs:  90%|████████▉ | 895/1000 [32:50<03:28,  1.99s/it]

Error extracting text from http://www.ndb.int/BRICS-Bank-Very-Likely-to-Issue-Bonds-in-Russian-Ruble-Before-End-of-2016%20.php: 403 Client Error: Forbidden for url: https://www.ndb.int/BRICS-Bank-Very-Likely-to-Issue-Bonds-in-Russian-Ruble-Before-End-of-2016%20.php


Processing URLs:  90%|████████▉ | 899/1000 [32:54<01:50,  1.09s/it]

Error extracting text from http://eprints.lse.ac.uk/56289/1/democraticaudit.com-The_weather_does_not_affect_voter_turnout_but_only_if_voting_is_convenient_for_the_public.pdf: 404 Client Error: Not Found for url: http://eprints.lse.ac.uk/56289/1/democraticaudit.com-The_weather_does_not_affect_voter_turnout_but_only_if_voting_is_convenient_for_the_public.pdf


Processing URLs:  90%|█████████ | 902/1000 [32:57<01:44,  1.07s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/337268-gingrich-congress-should-abolish-special-counsel-after-comey: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/337268-gingrich-congress-should-abolish-special-counsel-after-comey/


Processing URLs:  91%|█████████ | 908/1000 [33:03<01:17,  1.18it/s]

Error extracting text from http://www.nytimes.com/2016/04/22/world/europe/obama-urges-britain-to-remain-in-the-eu.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/world/europe/obama-urges-britain-to-remain-in-the-eu.html
Error extracting text from https://www.reuters.com/article/uk-iran-nuclear-usa-white-house/us-disappointed-by-iran-move-on-nuclear-talks-remains-ready-to-engage-white-house-idUSKCN2AS0O8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-iran-nuclear-usa-white-house/us-disappointed-by-iran-move-on-nuclear-talks-remains-ready-to-engage-white-house-idUSKCN2AS0O8


Processing URLs:  91%|█████████ | 909/1000 [34:04<28:27, 18.76s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/54021: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  91%|█████████▏| 913/1000 [38:11<1:21:39, 56.32s/it]

Error extracting text from http://www.nytimes.com/2016/12/22/world/middleeast/aleppo-syria-evacuation.html?ref=middleeast: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/22/world/middleeast/aleppo-syria-evacuation.html?ref=middleeast


Processing URLs:  91%|█████████▏| 914/1000 [38:12<56:53, 39.69s/it]  

Error extracting text from https://www.bankofengland.co.uk/monetary-policy: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy


Processing URLs:  92%|█████████▏| 916/1000 [38:14<27:45, 19.83s/it]

Error extracting text from https://ir.loews.com/news-releases/news-release-details/loews-corporation-announces-plan-spin-lorillard: 403 Client Error: Forbidden for url: https://ir.loews.com/news-releases/news-release-details/loews-corporation-announces-plan-spin-lorillard


Processing URLs:  92%|█████████▏| 922/1000 [38:28<05:59,  4.61s/it]

Error extracting text from http://www.fec.gov/press/resources/2016presidential_form2dt.shtml: 403 Client Error: Forbidden for url: https://www.fec.gov/press/resources/2016presidential_form2dt.shtml


Processing URLs:  93%|█████████▎| 926/1000 [38:33<02:21,  1.91s/it]

Error extracting text from http://www.nytimes.com/2015/09/19/opinion/joe-nocera-republican-job-killers-and-the-export-import-bank.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/19/opinion/joe-nocera-republican-job-killers-and-the-export-import-bank.html
URL filtered: https://www.bloomberg.com/news/articles/2016-01-29/zambia-said-to-raise-power-prices-for-mines-as-plants-crippled


Processing URLs:  93%|█████████▎| 930/1000 [38:39<02:06,  1.81s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=daily&amp;view=chart&amp;id=jurassicpark4.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=daily&amp;view=chart&amp;id=jurassicpark4.htm


Processing URLs:  93%|█████████▎| 934/1000 [38:42<00:55,  1.19it/s]

Error extracting text from http://www.nytimes.com/2015/12/02/business/international/volkswagens-software-use-was-illegal-german-regulator-rules.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/02/business/international/volkswagens-software-use-was-illegal-german-regulator-rules.html
Error extracting text from http://www.reuters.com/article/us-navy-iran-commentary-idUSKCN1151SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-navy-iran-commentary-idUSKCN1151SB


Processing URLs:  94%|█████████▎| 935/1000 [38:42<00:42,  1.53it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2017/02/02/UN-chief-backs-plan-to-pick-Syria-delegates-to-Geneva-talks-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2017/02/02/UN-chief-backs-plan-to-pick-Syria-delegates-to-Geneva-talks-.html


Processing URLs:  94%|█████████▎| 937/1000 [38:45<00:59,  1.06it/s]

Error extracting text from http://www.iol.co.za/news/africa/deadly-political-violence-rocks-burundi-1981518: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/africa/deadly-political-violence-rocks-burundi-1981518
URL filtered: https://www.bloomberg.com/news/articles/2020-11-17/early-johnson-call-on-biden-win-smart-for-u-k-key-aide-says


Processing URLs:  94%|█████████▍| 941/1000 [38:50<01:09,  1.18s/it]

Error extracting text from http://news.abs-cbn.com/news/03/17/17/china-eyeing-monitoring-station-in-scarborough-report: 403 Client Error: Forbidden for url: http://news.abs-cbn.com/news/03/17/17/china-eyeing-monitoring-station-in-scarborough-report


Processing URLs:  94%|█████████▍| 943/1000 [38:51<00:53,  1.07it/s]

Error extracting text from https://www.nytimes.com/2017/11/14/world/middleeast/saudi-arabia-mohammed-bin-salman.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/14/world/middleeast/saudi-arabia-mohammed-bin-salman.html
URL filtered: http://www.bloomberg.com/politics/articles/2015-10-23/jeb-bush-orders-across-the-board-pay-cuts-for-struggling-campaign


Processing URLs:  95%|█████████▌| 953/1000 [39:02<00:36,  1.30it/s]

Error extracting text from http://www.businessinsider.com/iraq-poised-to-launch-anti-isis-offensive-on-mosul-2016-10: 404 Client Error: Not Found for url: https://www.businessinsider.com/iraq-poised-to-launch-anti-isis-offensive-on-mosul-2016-10
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0WX0ZF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0WX0ZF
URL filtered: https://www.bloomberg.com/quote/GBPEUR:CUR
URL filtered: https://www.bloomberg.com/news/articles/2017-09-14/boko-haram-defies-buhari-with-more-attacks-in-northeast-nigeria


Processing URLs:  96%|█████████▌| 956/1000 [39:04<00:27,  1.62it/s]

URL filtered: https://www.youtube.com/watch?v=NcWGycNyb_Q
URL filtered: https://www.bloomberg.com/gadfly/articles/2017-01-22/why-saudi-arabia-may-walk-away-from-opec-deal-by-june


Processing URLs:  96%|█████████▌| 959/1000 [39:07<00:33,  1.22it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-19/dollar-holds-two-day-gain-before-china-data-that-s-key-for-fed


Processing URLs:  96%|█████████▋| 963/1000 [39:19<01:16,  2.06s/it]

Error extracting text from http://www.reuters.com/article/2015/11/17/us-usa-economy-idUSKCN0T61LK20151117: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/17/us-usa-economy-idUSKCN0T61LK20151117


Processing URLs:  97%|█████████▋| 973/1000 [39:38<00:39,  1.48s/it]

Error extracting text from https://www.economist.com/interactive/france-2022).: 404 Client Error: Not Found for url: https://www.economist.com/interactive/france-2022).


Processing URLs:  97%|█████████▋| 974/1000 [39:38<00:29,  1.12s/it]

Error extracting text from http://drones.fsd.ch/en/homepage/: HTTPConnectionPool(host='drones.fsd.ch', port=80): Max retries exceeded with url: /en/homepage/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30175f1d0>: Failed to resolve 'drones.fsd.ch' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  98%|█████████▊| 975/1000 [39:43<00:51,  2.08s/it]

Error extracting text from http://www.joc.com/port-news/us-ports/west-coast-ports-set-keep-market-share-despite-panama-canal-expansion_20160120.html: 404 Client Error: Not Found for url: https://www.joc.com/article/west-coast-ports-set-keep-market-share-despite-panama-canal-expansion_20160120.html


Processing URLs:  98%|█████████▊| 976/1000 [39:43<00:40,  1.69s/it]

Error extracting text from https://finance.yahoo.com/news/time-inc-ceo-hadnt-even-120405230.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/time-inc-ceo-hadnt-even-120405230.html
URL filtered: https://www.youtube.com/watch?v=bgiQD56eWDk


Processing URLs:  98%|█████████▊| 978/1000 [39:44<00:25,  1.16s/it]

Error extracting text from http://asia.nikkei.com/Features/Myanmar-s-power-shift/Constitutional-change-first-item-for-likely-NLD-government: 404 Client Error: Not Found for url: https://asia.nikkei.com/Features/Myanmar-s-power-shift/Constitutional-change-first-item-for-likely-NLD-government


Processing URLs:  98%|█████████▊| 985/1000 [39:51<00:13,  1.11it/s]

Error extracting text from http://www.accuweather.com/en/weather-news/election-2016-iowa-caucus-snow-central-northern-plains-monday-night/55036213: 403 Client Error: Forbidden for url: http://www.accuweather.com/en/weather-news/election-2016-iowa-caucus-snow-central-northern-plains-monday-night/55036213


Processing URLs:  99%|█████████▉| 990/1000 [40:13<00:25,  2.54s/it]

Error extracting text from http://allafrica.com/stories/201603090325.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201603090325.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30175c800>: Failed to establish a new connection: [Errno 61] Connection refused'))
URL filtered: http://www.niemanlab.org/2016/12/slates-chrome-extension-helps-identify-fake-news-on-facebook-and-let-readers-flag-it-themselves/
URL filtered: https://twitter.com/navalny/status/956816662992482309


Processing URLs: 100%|█████████▉| 997/1000 [40:15<00:01,  1.52it/s]

Error extracting text from http://www.nytimes.com/2001/12/13/international/bush-pulls-out-of-abm-treaty-putin-calls-move-a-mistake.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2001/12/13/international/bush-pulls-out-of-abm-treaty-putin-calls-move-a-mistake.html
Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950511000734: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950511000734 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30175ff20>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-global-markets-idUSKBN19C020: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-markets-idUSKBN19C020


Processing URLs: 100%|██████████| 1000/1000 [40:17<00:00,  2.42s/it]
Processing URLs:   0%|          | 1/1000 [00:02<35:51,  2.15s/it]

Error extracting text from https://racine.craigslist.org/rvs/d/kenosha-1995-sunnybrook-33ft-rv-1500-obo/7252343531.html: 404 Client Error: Not Found for url: https://racine.craigslist.org/rvs/d/kenosha-1995-sunnybrook-33ft-rv-1500-obo/7252343531.html


Processing URLs:   1%|          | 7/1000 [00:11<22:32,  1.36s/it]

Error extracting text from https://www.un.org/press/en/2021/sc14554.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/sc14554.doc.htm


Processing URLs:   1%|          | 8/1000 [00:12<19:11,  1.16s/it]

Error extracting text from https://www.nytimes.com/2020/12/22/business/amazon-union-vote-bessemer-alabama.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/22/business/amazon-union-vote-bessemer-alabama.html


Processing URLs:   1%|          | 11/1000 [00:17<21:31,  1.31s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/01/08/4/0301000000AEN20160108010800315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   1%|▏         | 14/1000 [00:36<1:37:27,  5.93s/it]

Error extracting text from https://www.almasdarnews.com/article/ankara-halts-airstrikes-northern-aleppo-syria-vows-turkish-planes/: 522 Server Error:  for url: https://www.almasdarnews.com/article/ankara-halts-airstrikes-northern-aleppo-syria-vows-turkish-planes/


Processing URLs:   2%|▏         | 16/1000 [00:43<1:12:06,  4.40s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://wp.clicrbs.com.br/cenariopolitico/lasier-contraria-pdt-e-apoia-impeachment-de-dilma/%3Ftopo%3D52,1,1,,171,e171&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://wp.clicrbs.com.br/cenariopolitico/lasier-contraria-pdt-e-apoia-impeachment-de-dilma/%3Ftopo%3D52,1,1,,171,e171&amp;prev=search


Processing URLs:   2%|▏         | 19/1000 [00:45<33:17,  2.04s/it]  

Error extracting text from https://www.timesunion.com/news/article/Cuomo-announces-major-reopening-of-economy-16147336.php: 403 Client Error: Forbidden for url: https://www.timesunion.com/news/article/Cuomo-announces-major-reopening-of-economy-16147336.php


Processing URLs:   2%|▎         | 25/1000 [01:04<29:17,  1.80s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-06-08/america-s-shale-oil-boom-grinding-to-halt-as-u-s-forecasts-drop
Error extracting text from http://www.reuters.com/article/us-usa-congress-republicans-idUSKBN159174?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-republicans-idUSKBN159174?il=0


Processing URLs:   3%|▎         | 28/1000 [01:07<19:43,  1.22s/it]

Error extracting text from http://www.reuters.com/article/us-cyber-summit-padan/irans-hacking-ability-improving-israeli-general-idUSKBN1D02O0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-summit-padan/irans-hacking-ability-improving-israeli-general-idUSKBN1D02O0


Processing URLs:   4%|▎         | 36/1000 [01:19<22:47,  1.42s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O3UHOH6JIJUU01-12V69ACMFAE026JJE84TU6R8JC
Error extracting text from https://www.arabnews.com/node/1782421/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1782421/middle-east


Processing URLs:   4%|▎         | 37/1000 [01:26<45:20,  2.83s/it]

URL filtered: https://www.metaculus.com/questions/4776/will-charles-murray-receive-a-long-term-twitter-ban-before-2021/


Processing URLs:   4%|▍         | 39/1000 [01:26<27:36,  1.72s/it]

Error extracting text from http://www.gatewayhouse.in/indias-act-east-policy-far-beyond/: 403 Client Error: Forbidden for url: https://www.gatewayhouse.in/indias-act-east-policy-far-beyond/


Processing URLs:   4%|▍         | 43/1000 [01:32<20:12,  1.27s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0360544217310629: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0360544217310629
Error extracting text from http://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html


Processing URLs:   4%|▍         | 44/1000 [01:33<17:15,  1.08s/it]

Error extracting text from http://www.hybridcars.com/august-2016-dashboard/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/august-2016-dashboard/


Processing URLs:   6%|▌         | 56/1000 [01:51<10:39,  1.48it/s]

Error extracting text from https://www.reuters.com/article/us-italy-politics-idUSKBN29I1YG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-idUSKBN29I1YG
URL filtered: https://twitter.com/GeorginaEWright/status/1342072545047162882?s=20
Error extracting text from http://finance.yahoo.com/video/time-inc-turns-page-digital-131809471.html: 400 Client Error: Invalid HTTP Request for url: https://finance.yahoo.com/video/time-inc-turns-page-digital-131809471.html


Processing URLs:   6%|▌         | 57/1000 [01:52<10:58,  1.43it/s]

Error extracting text from http://aranews.net/2016/07/isis-relocates-its-strongholds-in-mosul-to-avoid-airstrikes/: 404 Client Error: Not Found for url: http://aranews.net/2016/07/isis-relocates-its-strongholds-in-mosul-to-avoid-airstrikes/


Processing URLs:   6%|▌         | 58/1000 [01:52<08:57,  1.75it/s]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-turkey-minister-idUSKCN0Y30TA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-turkey-minister-idUSKCN0Y30TA
URL filtered: https://twitter.com/yasufumisaito


Processing URLs:   6%|▌         | 62/1000 [01:56<11:23,  1.37it/s]

Error extracting text from http://www.nytimes.com/2015/12/05/business/energy-environment/opec-meeting-oil-production-price.html?emc=edit_th_20151205&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/05/business/energy-environment/opec-meeting-oil-production-price.html?emc=edit_th_20151205&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:   6%|▋         | 63/1000 [01:56<10:36,  1.47it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/359221-fancy-bear-capitalizes-on-new-york-terror-attacks-to-lure-new-victims: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/359221-fancy-bear-capitalizes-on-new-york-terror-attacks-to-lure-new-victims/


Processing URLs:   6%|▋         | 65/1000 [02:07<51:39,  3.31s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/registration-starts-for-iran-parliamentary-election/2015/12/19/f368f9b8-a659-11e5-8318-bd8caed8c588_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/registration-starts-for-iran-parliamentary-election/2015/12/19/f368f9b8-a659-11e5-8318-bd8caed8c588_story.html


Processing URLs:   7%|▋         | 68/1000 [02:14<39:19,  2.53s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN18401Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN18401Z


Processing URLs:   7%|▋         | 72/1000 [02:19<23:14,  1.50s/it]

Error extracting text from https://theconversation.com/how-to-get-ready-for-the-economic-recession-coming-in-2017-70638: 403 Client Error: Forbidden for url: https://theconversation.com/how-to-get-ready-for-the-economic-recession-coming-in-2017-70638


Processing URLs:   7%|▋         | 73/1000 [02:21<22:29,  1.46s/it]

URL filtered: https://mobile.twitter.com/weird_sci/status/561693052075798529/photo/1


Processing URLs:   8%|▊         | 78/1000 [02:22<08:58,  1.71it/s]

Error extracting text from http://www.balkaninsight.com/en/article/nato-visit-highlights-serbia-s-strategic-balancing-act-11-17-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-visit-highlights-serbia-s-strategic-balancing-act-11-17-2015
URL filtered: https://www.bloomberg.com/news/articles/2020-12-05/johnson-von-der-leyen-brexit-call-is-under-way-official
Error extracting text from http://www.wsj.com/articles/nato-rejects-russian-explanations-for-incursions-into-turkish-air-space-1444128156: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nato-rejects-russian-explanations-for-incursions-into-turkish-air-space-1444128156


Processing URLs:   8%|▊         | 82/1000 [02:40<48:20,  3.16s/it]

Error extracting text from http://www.nasdaq.com/markets/crude-oil-brent.aspx?timeframe=10y: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/crude-oil-brent.aspx?timeframe=10y


Processing URLs:   9%|▊         | 86/1000 [02:46<36:19,  2.38s/it]

Error extracting text from http://news.trust.org/item/20160711185340-m8p4w: 404 Client Error:  for url: https://news.trust.org:443/item/20160711185340-m8p4w
Error extracting text from http://www.timesofisrael.com/kerry-implementation-of-iran-nuclear-deal-days-away/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/kerry-implementation-of-iran-nuclear-deal-days-away/


Processing URLs:   9%|▉         | 91/1000 [02:54<25:15,  1.67s/it]

Error extracting text from http://www.chicagotribune.com/bluesky/originals/ct-bsi-cwi-cybersecurity-20171019-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/bluesky/originals/ct-bsi-cwi-cybersecurity-20171019-story.html


Processing URLs:   9%|▉         | 92/1000 [02:58<34:31,  2.28s/it]

Error extracting text from https://www.marketnews.com/content/schaeuble-plan-greece-timeout-was-widely-backed-press: 404 Client Error: Not Found for url: https://marketnews.com/content/schaeuble-plan-greece-timeout-was-widely-backed-press


Processing URLs:   9%|▉         | 93/1000 [02:59<32:55,  2.18s/it]

Error extracting text from http://en.rcipt.ir/index.aspx?fkeyid=&amp;siteid=4&amp;pageid=433&amp;newsview=286: HTTPConnectionPool(host='en.rcipt.ir', port=80): Max retries exceeded with url: /index.aspx?fkeyid=&amp;siteid=4&amp;pageid=433&amp;newsview=286 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe9133b0>: Failed to resolve 'en.rcipt.ir' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 94/1000 [03:01<27:59,  1.85s/it]

URL filtered: http://mobile.abc.net.au/news/2016-02-03/raaf-patrol-flights-facing-more-regular-resistance-from-chinese/7138100?utm_content=buffer999c2&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  11%|█         | 107/1000 [03:21<27:53,  1.87s/it]

URL filtered: https://twitter.com/faisalislam/status/938739138341359617?ref_src=twsrc%5Etfw&amp;ref_url=https%3A%2F%2Fwww.theguardian.com%2Fpolitics%2Fblog%2Flive%2F2017%2Fdec%2F07%2Fbrexit-deal-may-varadkar-eu-less-hospitable-for-foreign-talent-after-brexit-says-banking-chief-politics-live%3Fpage%3Dwith%253Ablock-5a2934375483ae06cb81c81


Processing URLs:  11%|█         | 109/1000 [03:23<22:03,  1.49s/it]

Error extracting text from http://www.theglobeandmail.com/report-on-business/international-business/latin-american-business/brazil-downgraded-to-junk-rating-by-sp-deepening-woes/article26302975/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/report-on-business/international-business/latin-american-business/brazil-downgraded-to-junk-rating-by-sp-deepening-woes/article26302975/


Processing URLs:  12%|█▏        | 115/1000 [03:31<21:28,  1.46s/it]

URL filtered: https://www.youtube.com/watch?v=3Q3j-i7GLr0


Processing URLs:  12%|█▏        | 120/1000 [03:37<22:59,  1.57s/it]

Error extracting text from http://www.timesofisrael.com/liveblog_entry/erdogans-office-confirms-russia-apology/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/liveblog_entry/erdogans-office-confirms-russia-apology/


Processing URLs:  12%|█▏        | 123/1000 [03:40<19:01,  1.30s/it]

URL filtered: https://twitter.com/georgegalloway


Processing URLs:  12%|█▎        | 125/1000 [03:43<20:01,  1.37s/it]

Error extracting text from http://www.oddschecker.com/politics/british-politics/eu-referendum/referendum-on-eu-membership-result: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/british-politics/eu-referendum/referendum-on-eu-membership-result
URL filtered: https://www.bloomberg.com/news/articles/2017-06-10/google-execs-hunker-down-for-summer-fight-with-eu-as-fines-loom


Processing URLs:  13%|█▎        | 130/1000 [03:45<10:37,  1.36it/s]

Error extracting text from https://www.autoevolution.com/news/russias-fierce-sarmat-missile-closer-to-deployment-with-three-tests-this-year-160966.html: 403 Client Error: Forbidden for url: https://www.autoevolution.com/news/russias-fierce-sarmat-missile-closer-to-deployment-with-three-tests-this-year-160966.html
URL filtered: http://www.bloomberg.com/news/articles/2015-12-01/iran-calls-on-oil-states-to-cut-excess-crude-opec-reality-check


Processing URLs:  13%|█▎        | 132/1000 [11:46<21:32:00, 89.31s/it]

Error extracting text from https://www.thespainreport.com/articles/853-160818171659-rajoy-u-turns-says-he-s-ready-for-a-confidence-vote: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/853-160818171659-rajoy-u-turns-says-he-s-ready-for-a-confidence-vote (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30364e0f0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  13%|█▎        | 134/1000 [11:49<13:14:10, 55.02s/it]

Error extracting text from http://ctovision.com/feed/: 403 Client Error: Forbidden for url: http://ctovision.com/feed/


Processing URLs:  14%|█▎        | 135/1000 [11:51<10:06:23, 42.06s/it]

Error extracting text from http://blogs.wsj.com/moneybeat/2015/10/06/tpp-could-give-bank-of-japan-cover-for-action/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2015/10/06/tpp-could-give-bank-of-japan-cover-for-action/


Processing URLs:  14%|█▎        | 137/1000 [11:52<5:56:45, 24.80s/it] 

Error extracting text from https://messengerafrica.com/2017/06/22/factbox-understanding-the-eritrea-djibouti-border-dispute/: HTTPSConnectionPool(host='messengerafrica.com', port=443): Max retries exceeded with url: /2017/06/22/factbox-understanding-the-eritrea-djibouti-border-dispute/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  14%|█▍        | 139/1000 [11:55<3:36:19, 15.07s/it]

Error extracting text from http://www.dailystar.com.lb/News/World/2016/Mar-02/340242-farc-rebels-shy-from-colombia-peace-deadline.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/World/2016/Mar-02/340242-farc-rebels-shy-from-colombia-peace-deadline.ashx


Processing URLs:  14%|█▍        | 144/1000 [12:07<1:07:58,  4.76s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/283237-north-korean-hackers-steal-fighter-jet-plans-seoul-says: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/283237-north-korean-hackers-steal-fighter-jet-plans-seoul-says/


Processing URLs:  15%|█▍        | 147/1000 [12:18<46:06,  3.24s/it]  

Error extracting text from https://www.nytimes.com/2022/01/26/us/politics/biden-scotus-nominee-filibuster.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/26/us/politics/biden-scotus-nominee-filibuster.html
Error extracting text from https://www.reuters.com/article/us-yemen-security-hodeidah-idUSKCN1AW1YK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-hodeidah-idUSKCN1AW1YK


Processing URLs:  15%|█▌        | 151/1000 [12:25<33:35,  2.37s/it]

Error extracting text from http://38north.org/2015/09/sohae090315/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  15%|█▌        | 153/1000 [12:29<31:17,  2.22s/it]

Error extracting text from https://www.reuters.com/article/virginia-protests-idINKCN1AV0WP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/virginia-protests-idINKCN1AV0WP


Processing URLs:  16%|█▌        | 156/1000 [12:32<18:49,  1.34s/it]

Error extracting text from https://www.doe.virginia.gov/news/news_releases/2021/index.shtml: 503 Server Error: Service Unavailable for url: https://www.doe.virginia.gov/news/news_releases/2021/index.shtml
Error extracting text from http://www.reuters.com/article/us-southkorea-china-idUSKBN14P2I8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-china-idUSKBN14P2I8


Processing URLs:  16%|█▌        | 158/1000 [12:33<16:11,  1.15s/it]

Error extracting text from https://www.icrc.org/ihl-nat.nsf/162d151af444ded44125673e00508141/aba339f342ad7493c1256bc8004c2772/$file/constitution%20-%20korea%20-%20en.pdf: 403 Client Error: Forbidden for url: https://www.icrc.org/ihl-nat.nsf/162d151af444ded44125673e00508141/aba339f342ad7493c1256bc8004c2772/$file/constitution%20-%20korea%20-%20en.pdf


Processing URLs:  16%|█▌        | 161/1000 [13:43<4:35:41, 19.72s/it]

Error extracting text from http://aa.com.tr/en/europe/messy-nomination-scandal-rocks-spain-s-ruling-party/644120: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▋        | 163/1000 [13:46<2:27:42, 10.59s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/north-koreas-secret-strategy-war-america-go-underground-20525: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/north-koreas-secret-strategy-war-america-go-underground-20525


Processing URLs:  17%|█▋        | 169/1000 [13:56<42:31,  3.07s/it]  

Error extracting text from http://www.unionleader.com/Half_of_NH_mayors_endorse_Clinton: 404 Client Error: Not Found for url: https://www.unionleader.com/half_of_nh_mayors_endorse_clinton/


Processing URLs:  17%|█▋        | 170/1000 [14:01<49:37,  3.59s/it]

Error extracting text from https://www.unicef.cn/en/figure-13-population-density-province-2017: 404 Client Error: Not Found for url: https://www.unicef.cn/en/figure-13-population-density-province-2017


Processing URLs:  17%|█▋        | 172/1000 [14:04<31:56,  2.31s/it]

Error extracting text from http://www.imaeil.com/sub_news/sub_news_view.php?news_id=4307&amp;yy=2017: 404 Client Error: Not Found for url: https://www.imaeil.com:443/sub_news/sub_news_view.php?news_id=4307&amp;yy=2017


Processing URLs:  17%|█▋        | 174/1000 [14:04<18:29,  1.34s/it]

Error extracting text from http://www.accuweather.com/en/weather-news/winter-storm-to-bring-rain-snow-gusty-winds-in-turkey-istanbul/55217153: 403 Client Error: Forbidden for url: http://www.accuweather.com/en/weather-news/winter-storm-to-bring-rain-snow-gusty-winds-in-turkey-istanbul/55217153


Processing URLs:  18%|█▊        | 177/1000 [14:10<19:30,  1.42s/it]

Error extracting text from http://www.jstor.org/stable/2600447?seq=1#page_scan_tab_contents: 420 Client Error: Enhance Your Calm for url: http://www.jstor.org/stable/2600447?seq=1#page_scan_tab_contents


Processing URLs:  18%|█▊        | 179/1000 [14:11<12:28,  1.10it/s]

Error extracting text from http://www.latimes.com/local/orangecounty/tn-dpt-me-wozniak-20160923-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/local/orangecounty/tn-dpt-me-wozniak-20160923-story.html


Processing URLs:  18%|█▊        | 181/1000 [14:16<22:20,  1.64s/it]

Error extracting text from http://www.americanshipper.com/Main/News/Varela_Panama_Canal_expansion_expected_to_be_compl_62579.aspx?taxonomy=Ocean1#hide: 404 Client Error: Not Found for url: http://www.americanshipper.com/Main/News/Varela_Panama_Canal_expansion_expected_to_be_compl_62579.aspx?taxonomy=Ocean1#hide


Processing URLs:  19%|█▉        | 189/1000 [14:40<59:59,  4.44s/it]

Error extracting text from https://www.washingtonpost.com/world/national-security/marines-see-afghan-forces-improve-in-helmand-battles/2018/02/01/c6284db0-079f-11e8-aa61-f3391373867e_story.html?utm_term=.2c77c1585873: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/national-security/marines-see-afghan-forces-improve-in-helmand-battles/2018/02/01/c6284db0-079f-11e8-aa61-f3391373867e_story.html?utm_term=.2c77c1585873


Processing URLs:  19%|█▉        | 193/1000 [15:47<4:31:32, 20.19s/it]

Error extracting text from https://www.usnews.com/news/best-states/north-dakota/articles/2017-03-24/company-us-want-dakota-access-pipeline-lake-crossing-upheld: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.bloomberg.com/news/articles/2018-12-22/trump-said-to-discuss-firing-fed-s-powell-after-latest-rate-hike


Processing URLs:  20%|█▉        | 197/1000 [15:53<1:36:35,  7.22s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-12-03/ryan-says-u-s-spending-agreement-may-not-be-reached-by-dec-11


Processing URLs:  20%|█▉        | 199/1000 [15:54<1:00:20,  4.52s/it]

URL filtered: https://www.youtube.com/watch?v=JXy0XnzTQu
URL filtered: https://www.bloomberg.com/news/live-blog/2017-09-18/u-k-s-may-speaks-on-brexit


Processing URLs:  21%|██        | 211/1000 [16:36<1:05:23,  4.97s/it]

Error extracting text from http://ec.europa.eu/internal_market/iprenforcement/docs/trade-secrets/120113_study_en.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/internal_market/iprenforcement/docs/trade-secrets/120113_study_en.pdf


Processing URLs:  21%|██        | 212/1000 [16:36<47:27,  3.61s/it]  

Error extracting text from http://www.wsj.com/articles/britain-is-europes-reverse-domino-1471210632: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/britain-is-europes-reverse-domino-1471210632


Processing URLs:  22%|██▏       | 217/1000 [16:48<33:25,  2.56s/it]

Error extracting text from http://www.brainpreservation.org/small-mammal-announcement/: 406 Client Error: Not Acceptable for url: http://www.brainpreservation.org/small-mammal-announcement/


Processing URLs:  22%|██▏       | 220/1000 [17:02<59:42,  4.59s/it]

Error extracting text from https://www.washingtonpost.com/national/us-set-to-release-jonathan-pollard-who-spied-for-israel/2015/11/19/96b33d00-8f39-11e5-934c-a369c80822c2_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/national/us-set-to-release-jonathan-pollard-who-spied-for-israel/2015/11/19/96b33d00-8f39-11e5-934c-a369c80822c2_story.html


Processing URLs:  22%|██▏       | 222/1000 [17:03<33:14,  2.56s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-un-rights-idUSKCN1BA0ZN?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-08-30&amp;utm_term=US%20Reuters%20News%20Now: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-un-rights-idUSKCN1BA0ZN?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-08-30&amp;utm_term=US%20Reuters%20News%20Now


Processing URLs:  22%|██▏       | 224/1000 [17:08<32:31,  2.51s/it]

Error extracting text from http://pro.boxoffice.com/long-range-forecast-captain-america-civil-war/: 403 Client Error: Forbidden for url: http://www.boxofficepro.com/long-range-forecast-captain-america-civil-war/


Processing URLs:  22%|██▎       | 225/1000 [17:09<24:52,  1.93s/it]

Error extracting text from http://www.vocativ.com/290321/inside-scalias-very-very-weird-secret-hunting-society/: 404 Client Error: Not Found for url: http://www.vocativ.com/290321/inside-scalias-very-very-weird-secret-hunting-society/


Processing URLs:  23%|██▎       | 227/1000 [17:15<28:23,  2.20s/it]

Error extracting text from https://english.alarabiya.net/en/webtv/reports/2017/01/17/CEO-Saudi-Aramco-s-stake-sale-planned-for-second-half-of-2018-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/webtv/reports/2017/01/17/CEO-Saudi-Aramco-s-stake-sale-planned-for-second-half-of-2018-.html


Processing URLs:  23%|██▎       | 228/1000 [17:16<24:28,  1.90s/it]

Error extracting text from http://www.ibtimes.co.uk/nextev-1m-electric-hypercar-spotted-testing-uk-ahead-official-launch-1586220: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/nextev-1m-electric-hypercar-spotted-testing-uk-ahead-official-launch-1586220


Processing URLs:  23%|██▎       | 229/1000 [17:18<27:05,  2.11s/it]

Error extracting text from http://www.argusmedia.com/news/article/?id=1563607: 404 Client Error: Not Found for url: https://www.argusmedia.com/not-found


Processing URLs:  23%|██▎       | 231/1000 [17:20<18:05,  1.41s/it]

Error extracting text from https://reut.rs/3p57ZeM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-politics-scotland/scottish-support-for-independence-drops-poll-shows-idUSKBN2AB142
Error extracting text from http://www.nytimes.com/2016/02/21/world/europe/british-prime-minister-announces-eu-referendum-date.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/21/world/europe/british-prime-minister-announces-eu-referendum-date.html


Processing URLs:  23%|██▎       | 232/1000 [17:20<13:16,  1.04s/it]

Error extracting text from http://splash247.com/panama-canal-authority-says-expansion-will-be-complete-by-end-of-june/: 403 Client Error: Forbidden for url: https://splash247.com/panama-canal-authority-says-expansion-will-be-complete-by-end-of-june/


Processing URLs:  24%|██▎       | 236/1000 [17:32<29:50,  2.34s/it]

Error extracting text from http://www.pscp.tv/w/bIVrZjFQbUtxT1ZYTlpWRW98MU1ueG5tdldQVndKTwP1ifT67sPcbyYVCjx9O-JsYknd4P5D1aweYT5-7FKq: 404 Client Error: Not Found for url: https://www.pscp.tv/w/bIVrZjFQbUtxT1ZYTlpWRW98MU1ueG5tdldQVndKTwP1ifT67sPcbyYVCjx9O-JsYknd4P5D1aweYT5-7FKq


Processing URLs:  24%|██▍       | 238/1000 [17:36<25:34,  2.01s/it]

Error extracting text from http://www.businessinsider.com/us-confirms-iran-has-met-the-requirements-of-landmark-nuclear-deal-2016-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/us-confirms-iran-has-met-the-requirements-of-landmark-nuclear-deal-2016-1


Processing URLs:  24%|██▍       | 239/1000 [17:36<18:50,  1.49s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-24/wheat-destined-for-ethiopia-s-hungry-gets-stuck-in-port-logjams


Processing URLs:  24%|██▍       | 241/1000 [18:37<3:06:49, 14.77s/it]

Error extracting text from http://www.aa.com.tr/en/middle-east/with-russian-air-support-pyd-strives-to-make-gains-in-nw-syria/483583: HTTPConnectionPool(host='www.aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  24%|██▍       | 244/1000 [18:55<2:13:39, 10.61s/it]

Error extracting text from http://www.recode.net/2016/11/9/13573926/donald-trump-amazon-jeff-bezos-antitrust-taxes: Exceeded 30 redirects.


Processing URLs:  25%|██▌       | 253/1000 [19:30<55:23,  4.45s/it]  

Error extracting text from http://www.reuters.com/finance/stocks/002783.SZ/key-developments#3XLBAf7LY213veTW.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/finance/stocks/002783.SZ/key-developments#3XLBAf7LY213veTW.97


Processing URLs:  26%|██▌       | 255/1000 [19:31<29:36,  2.38s/it]

Error extracting text from http://www.basnews.com/index.php/en/reports/261718#.Vum2jA5aXC4.mailto: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/reports/261718#.Vum2jA5aXC4.mailto


Processing URLs:  26%|██▌       | 257/1000 [21:33<7:46:22, 37.66s/it]

Error extracting text from https://rationalground.com/rhode-island-crosses-the-threshold-of-covid-19-testing-transparency/: HTTPSConnectionPool(host='rationalground.com', port=443): Max retries exceeded with url: /rhode-island-crosses-the-threshold-of-covid-19-testing-transparency/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30398a6f0>, 'Connection to rationalground.com timed out. (connect timeout=60)'))


Processing URLs:  26%|██▋       | 265/1000 [21:50<43:38,  3.56s/it]  

Error extracting text from https://www.reuters.com/business/energy/nord-stream-2-ceo-says-construction-work-be-finished-august-2021-07-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/nord-stream-2-ceo-says-construction-work-be-finished-august-2021-07-12/


Processing URLs:  27%|██▋       | 268/1000 [21:53<21:29,  1.76s/it]

Error extracting text from http://europe.newsweek.com/kurds-aim-take-mosul-isis-490245?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/kurds-aim-take-mosul-isis-490245
Error extracting text from http://www.reuters.com/article/us-illinois-ratings-fitch/illinois-credit-rating-stays-investment-grade-with-fitch-idUSKBN1A22C0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-illinois-ratings-fitch/illinois-credit-rating-stays-investment-grade-with-fitch-idUSKBN1A22C0


Processing URLs:  27%|██▋       | 270/1000 [21:56<19:14,  1.58s/it]

Error extracting text from https://www.nytimes.com/2018/01/18/technology/cities-amazon-headquarters.html?action=click&contentCollection=Technology&module=RelatedCoverage&region=EndOfArticle&pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/18/technology/cities-amazon-headquarters.html?action=click&contentCollection=Technology&module=RelatedCoverage&region=EndOfArticle&pgtype=article


Processing URLs:  27%|██▋       | 274/1000 [22:07<25:04,  2.07s/it]

Error extracting text from http://www.nytimes.com/2015/06/01/world/asia/china-says-it-could-set-up-air-defense-zone-in-south-china-sea.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/01/world/asia/china-says-it-could-set-up-air-defense-zone-in-south-china-sea.html


Processing URLs:  28%|██▊       | 276/1000 [22:12<24:41,  2.05s/it]

Error extracting text from http://www.rollcall.com/news/politics/north-carolina-richard-burr-competitive: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/north-carolina-richard-burr-competitive
Error extracting text from http://www.reuters.com/article/us-apple-iphone-idUSKBN0UJ1WE20160105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apple-iphone-idUSKBN0UJ1WE20160105


Processing URLs:  28%|██▊       | 278/1000 [22:18<29:49,  2.48s/it]

Error extracting text from http://eng.chinamil.com.cn/view/2017-11/03/content_7812092.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/view/2017-11/03/content_7812092.htm


Processing URLs:  28%|██▊       | 280/1000 [22:24<30:15,  2.52s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/china-military-news/2016-01/29/content_6881301.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/china-military-news/2016-01/29/content_6881301.htm


Processing URLs:  28%|██▊       | 285/1000 [22:32<18:17,  1.53s/it]

Error extracting text from https://cybermap.kaspersky.com/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  29%|██▉       | 289/1000 [22:47<29:36,  2.50s/it]

Error extracting text from http://www.nytimes.com/2016/09/22/world/middleeast/obama-syria-kurds-isis-turkey-military-commandos.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/22/world/middleeast/obama-syria-kurds-isis-turkey-military-commandos.html


Processing URLs:  29%|██▉       | 293/1000 [22:50<13:07,  1.11s/it]

Error extracting text from http://www.nytimes.com/2015/09/16/world/asia/cyberthreat-posed-by-china-and-iran-confounds-white-house.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/16/world/asia/cyberthreat-posed-by-china-and-iran-confounds-white-house.html


Processing URLs:  30%|██▉       | 296/1000 [22:53<10:06,  1.16it/s]

Error extracting text from https://www.nytimes.com/2017/03/24/us/politics/health-care-affordable-care-act.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=span-ab-top-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/24/us/politics/health-care-affordable-care-act.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=span-ab-top-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  30%|██▉       | 297/1000 [22:54<10:29,  1.12it/s]

Error extracting text from https://www.mfa.gov.il/mfa/foreignpolicy/peace/guide/pages/declaration%20of%20establishment%20of%20state%20of%20israel.aspx: HTTPSConnectionPool(host='www.mfa.gov.il', port=443): Max retries exceeded with url: /mfa/foreignpolicy/peace/guide/pages/declaration%20of%20establishment%20of%20state%20of%20israel.aspx (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  30%|██▉       | 298/1000 [23:54<3:38:11, 18.65s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2015/08/28/rousseff-faces-political-suicide-bomber-removal-threat: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  30%|███       | 303/1000 [24:05<1:05:50,  5.67s/it]

Error extracting text from https://micanaldepanama.com/expansion/press-releases/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/press-releases/


Processing URLs:  30%|███       | 304/1000 [24:05<50:10,  4.33s/it]  

Error extracting text from http://www.policyforum.net/can-china-disarm-japans-moves-south-china-sea/: 403 Client Error: Forbidden for url: http://www.policyforum.net/can-china-disarm-japans-moves-south-china-sea/


Processing URLs:  31%|███       | 309/1000 [24:12<20:05,  1.74s/it]

Error extracting text from http://www.newsweek.com/frances-le-pen-promises-eu-referendum-495557: 403 Client Error: Forbidden for url: https://www.newsweek.com/frances-le-pen-promises-eu-referendum-495557
Error extracting text from http://www.reuters.com/article/us-ukraine-cybersecurity-malware-idUSKCN0UW0R0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-cybersecurity-malware-idUSKCN0UW0R0


Processing URLs:  31%|███       | 310/1000 [24:14<18:31,  1.61s/it]

Error extracting text from http://uk.reuters.com/article/2015/11/27/uk-panama-canal-idUKKBN0TG04P20151127: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  31%|███▏      | 314/1000 [24:41<1:01:52,  5.41s/it]

URL filtered: https://www.youtube.com/watch?v=jxQ7gDtOydo


Processing URLs:  32%|███▏      | 316/1000 [24:42<35:36,  3.12s/it]  

Error extracting text from http://www.straitstimes.com/asia/east-asia/hks-chief-executive-rewarded-with-new-post: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  32%|███▏      | 317/1000 [24:47<39:13,  3.45s/it]

Error extracting text from http://www.hindustantimes.com/world-news/appointment-of-saarc-secretary-general-hit-by-india-pakistan-tensions/story-1b1wlC1CrEbquDvQRuQZkN.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/world-news/appointment-of-saarc-secretary-general-hit-by-india-pakistan-tensions/story-1b1wlC1CrEbquDvQRuQZkN.html


Processing URLs:  32%|███▏      | 322/1000 [24:50<16:56,  1.50s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096727/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096727/


Processing URLs:  32%|███▎      | 325/1000 [24:55<17:59,  1.60s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/burundi-official-members-ruling-party-killed-38461370: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/burundi-official-members-ruling-party-killed-38461370
Error extracting text from https://www.reuters.com/article/us-egypt-russia/putin-egypts-sisi-discuss-restart-of-flights-sign-nuclear-deal-idUSKBN1E51BR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-egypt-russia/putin-egypts-sisi-discuss-restart-of-flights-sign-nuclear-deal-idUSKBN1E51BR


Processing URLs:  33%|███▎      | 327/1000 [24:56<11:06,  1.01it/s]

URL filtered: https://twitter.com/ballotboxscot/status/1386953196623519744?s=21
Error extracting text from http://www.nasdaq.com/author/global-traders: 403 Client Error: Forbidden for url: http://www.nasdaq.com/author/global-traders
URL filtered: http://www.bloomberg.com/news/articles/2016-06-10/more-than-400-000-sought-brexit-vote-after-deadline-extended


Processing URLs:  33%|███▎      | 332/1000 [25:00<10:11,  1.09it/s]

URL filtered: https://twitter.com/IpsosMORI/status/744950884736577536


Processing URLs:  34%|███▎      | 336/1000 [25:03<09:35,  1.15it/s]

Error extracting text from http://thehill.com/homenews/senate/360970-senate-gop-running-out-of-options-to-stop-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/360970-senate-gop-running-out-of-options-to-stop-moore/


Processing URLs:  34%|███▎      | 337/1000 [25:05<12:20,  1.12s/it]

Error extracting text from https://gandhara.rferl.org/a/russia-china-us-navies-pakistan-aman-2021/31099814.htmlX: 404 Client Error: Not Found for url: https://gandhara.rferl.org/a/russia-china-us-navies-pakistan-aman-2021/31099814.htmlX


Processing URLs:  34%|███▍      | 345/1000 [25:15<13:59,  1.28s/it]

Error extracting text from http://nationalinterest.org/blog/the-skeptics/north-koreas-nuclear-weapons-test-12-lessons-kim-jong-uns-17655: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-skeptics/north-koreas-nuclear-weapons-test-12-lessons-kim-jong-uns-17655


Processing URLs:  35%|███▌      | 350/1000 [25:24<16:56,  1.56s/it]

Error extracting text from http://www.worldoil.com/news/2016/4/26/russia-sees-no-moves-to-cap-oil-output-before-june-opec-meeting: 500 Server Error: Internal Server Error for url: https://worldoil.com/news/2016/4/26/russia-sees-no-moves-to-cap-oil-output-before-june-opec-meeting


Processing URLs:  36%|███▌      | 357/1000 [25:42<15:53,  1.48s/it]

Error extracting text from http://www.wsj.com/articles/pentagon-warns-assad-regime-to-avoid-action-near-u-s-and-allied-forces-1471633476: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pentagon-warns-assad-regime-to-avoid-action-near-u-s-and-allied-forces-1471633476


Processing URLs:  36%|███▌      | 358/1000 [25:43<13:11,  1.23s/it]

Error extracting text from https://www.weforum.org/agenda/2017/01/why-we-should-all-have-a-basic-income: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2017/01/why-we-should-all-have-a-basic-income


Processing URLs:  36%|███▌      | 362/1000 [25:50<18:16,  1.72s/it]

Error extracting text from http://www.foxbusiness.com/economy-policy/2015/09/29/congress-moves-on-spending-bill-as-shutdown-looms/: 404 Client Error: Not Found for url: https://www.foxbusiness.com/economy-policy/2015/09/29/congress-moves-on-spending-bill-as-shutdown-looms/


Processing URLs:  37%|███▋      | 366/1000 [25:54<11:23,  1.08s/it]

Error extracting text from http://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/: 403 Client Error: Forbidden for url: http://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/


Processing URLs:  37%|███▋      | 368/1000 [25:56<11:16,  1.07s/it]

Error extracting text from http://www.agprofessional.com/news/industry/government-agency-us-commercial-drone-use-expand-tenfold-2021: HTTPConnectionPool(host='www.agprofessional.com', port=80): Max retries exceeded with url: /news/industry/government-agency-us-commercial-drone-use-expand-tenfold-2021 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe910dd0>: Failed to resolve 'www.agprofessional.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  37%|███▋      | 371/1000 [26:00<10:42,  1.02s/it]

Error extracting text from https://finviz.com/screener.ashx?v=111&amp;f=cap_midover,sec_technology: 403 Client Error: Forbidden for url: https://finviz.com/screener.ashx?v=111&amp;f=cap_midover,sec_technology


Processing URLs:  37%|███▋      | 372/1000 [26:02<13:35,  1.30s/it]

Error extracting text from http://en.trend.az/iran/nuclearp/2437030.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2437030.html


Processing URLs:  37%|███▋      | 373/1000 [26:03<13:49,  1.32s/it]

Error extracting text from https://reut.rs/3m0wls8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/who-seeks-take-political-heat-out-virus-origins-debate-2021-08-13/


Processing URLs:  38%|███▊      | 375/1000 [26:11<24:53,  2.39s/it]

Error extracting text from https://www.nytimes.com/2017/06/16/us/politics/rick-gates-russia.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/16/us/politics/rick-gates-russia.html?_r=0


Processing URLs:  38%|███▊      | 376/1000 [26:13<25:13,  2.43s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-turkey/turkeys-erdogan-calls-syrias-assad-a-terrorist-says-impossible-to-continue-with-him-idUSKBN1EL0W5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey/turkeys-erdogan-calls-syrias-assad-a-terrorist-says-impossible-to-continue-with-him-idUSKBN1EL0W5


Processing URLs:  38%|███▊      | 379/1000 [26:22<27:13,  2.63s/it]

Error extracting text from http://beta.philstar.com/headlines/2017/01/14/1662145/japan-join-phl-us-balikatan: HTTPConnectionPool(host='beta.philstar.com', port=80): Max retries exceeded with url: /headlines/2017/01/14/1662145/japan-join-phl-us-balikatan (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30300a870>: Failed to resolve 'beta.philstar.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 382/1000 [26:23<14:08,  1.37s/it]

Error extracting text from http://news.yahoo.com/brazil-supreme-court-scraps-impeachment-commission-win-president-230021460.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazil-supreme-court-scraps-impeachment-commission-win-president-230021460.html


Processing URLs:  39%|███▊      | 387/1000 [26:40<22:10,  2.17s/it]

Error extracting text from https://amp.tvnz.co.nz/news/story/JTJGY29udGVudCUyRnR2bnolMkZvbmVuZXdzJTJGc3RvcnklMkYyMDIxJTJGMDclMkYxNCUyRmNvdmlkLTE5LXZhY2NpbmU=: HTTPSConnectionPool(host='amp.tvnz.co.nz', port=443): Max retries exceeded with url: /news/story/JTJGY29udGVudCUyRnR2bnolMkZvbmVuZXdzJTJGc3RvcnklMkYyMDIxJTJGMDclMkYxNCUyRmNvdmlkLTE5LXZhY2NpbmU= (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303008a70>: Failed to resolve 'amp.tvnz.co.nz' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  39%|███▉      | 389/1000 [26:43<18:55,  1.86s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-03-04/iran-and-u-s-get-time-for-diplomacy-as-atomic-censure-withdrawn


Processing URLs:  39%|███▉      | 391/1000 [26:45<16:19,  1.61s/it]

Error extracting text from http://www.focus-economics.com/countries/south-africa/news/politics/south-african-government-delivers-austere-2015-budget-aimed-at: 404 Client Error: Not Found for url: https://www.focus-economics.com/countries/south-africa/news/politics/south-african-government-delivers-austere-2015-budget-aimed-at/


Processing URLs:  39%|███▉      | 392/1000 [26:46<14:51,  1.47s/it]

Error extracting text from http://uk.reuters.com/article/us-health-zika-brazil-idUKKCN0Y22U4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  39%|███▉      | 394/1000 [26:47<10:58,  1.09s/it]

Error extracting text from https://www.raps.org/news-and-articles/news-articles/2021/6/recon-us-signs-for-17m-courses-of-mercks-experimen: 403 Client Error: Forbidden for url: https://www.raps.org/news-and-articles/news-articles/2021/6/recon-us-signs-for-17m-courses-of-mercks-experimen


Processing URLs:  40%|███▉      | 395/1000 [26:50<16:33,  1.64s/it]

URL filtered: https://www.youtube.com/watch?v=VJboSby7nW0
URL filtered: http://www.bloomberg.com/news/articles/2016-02-12/venezuela-supreme-court-upholds-maduro-emergency-economic-decree


Processing URLs:  40%|████      | 401/1000 [27:53<2:32:00, 15.23s/it]

Error extracting text from https://www.miamiherald.com/news/nation-world/world/americas/haiti/article255199321.html: HTTPSConnectionPool(host='www.miamiherald.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  40%|████      | 403/1000 [27:55<1:27:02,  8.75s/it]

Error extracting text from http://www.reuters.com/article/2015/10/19/us-iran-nuclear-idUSKCN0SD1CT20151019#wTjWUhhqmz4OylHv.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/19/us-iran-nuclear-idUSKCN0SD1CT20151019#wTjWUhhqmz4OylHv.97


Processing URLs:  41%|████      | 406/1000 [28:02<44:48,  4.53s/it]  

URL filtered: https://www.google.ca/amp/s/digiday.com/media/german-publishers-facebook/amp/
URL filtered: http://www.bloomberg.com/news/articles/2021-08-04/understanding-the-shadow-war-between-israel-and-iran-quicktake&amp;ved=2ahUKEwj_rteRz67yAhULkxQKHX9PBNoQxfQBMAZ6BAgDEAM&amp;usg=AOvVaw18IswYPr8WpfSXfVzLxjcV
URL filtered: https://www.bloomberg.com/news/articles/2020-02-10/airbnb-freezes-beijing-check-ins-till-march-to-curb-coronavirus
URL filtered: https://www.youtube.com/watch?v=uYTJGBBjkGo


Processing URLs:  41%|████      | 412/1000 [28:06<18:04,  1.84s/it]

Error extracting text from http://www.infrastructurereportcard.org/: 403 Client Error: Forbidden for url: https://www.infrastructurereportcard.org/


Processing URLs:  41%|████▏     | 413/1000 [28:07<16:22,  1.67s/it]

Error extracting text from https://lawnewz.com/high-profile/fbi-investigating-russian-state-run-news-agency-sputnik-report-says/: HTTPSConnectionPool(host='lawnewz.com', port=443): Max retries exceeded with url: /high-profile/fbi-investigating-russian-state-run-news-agency-sputnik-report-says/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  41%|████▏     | 414/1000 [28:08<14:01,  1.44s/it]

Error extracting text from http://thehill.com/opinion/white-house/375219-the-steps-to-making-trumps-bump-stock-regulation-a-reality: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/375219-the-steps-to-making-trumps-bump-stock-regulation-a-reality/


Processing URLs:  42%|████▏     | 416/1000 [28:08<09:28,  1.03it/s]

Error extracting text from http://www.reuters.com/article/2015/12/03/us-usa-fiscal-ryan-idUSKBN0TM1QL20151203: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/03/us-usa-fiscal-ryan-idUSKBN0TM1QL20151203


Processing URLs:  42%|████▏     | 418/1000 [28:09<06:35,  1.47it/s]

Error extracting text from http://seekingalpha.com/article/3735426-opec-meets-production-free-for-all?ifp=0: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3735426-opec-meets-production-free-for-all?ifp=0
Error extracting text from http://www.nytimes.com/2015/10/22/us/politics/assad-finds-chilly-embrace-in-moscow-trip.html?emc=edit_th_20151022&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/22/us/politics/assad-finds-chilly-embrace-in-moscow-trip.html?emc=edit_th_20151022&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  42%|████▏     | 419/1000 [28:12<13:42,  1.41s/it]

Error extracting text from https://www.state.gov/documents/organization/273987.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:  42%|████▏     | 420/1000 [28:15<16:42,  1.73s/it]

Error extracting text from http://paktribune.com/news/Afghanistan-ready-to-talk-unconditionally-with-Taliban-278136.html: HTTPSConnectionPool(host='paktribune.com', port=443): Max retries exceeded with url: /news/Afghanistan-ready-to-talk-unconditionally-with-Taliban-278136.html (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'paktribune.com'. (_ssl.c:1000)")))


Processing URLs:  42%|████▏     | 421/1000 [28:18<21:45,  2.25s/it]

URL filtered: https://www.youtube.com/watch?v=8d16RYRS7IE


Processing URLs:  42%|████▏     | 424/1000 [28:24<18:29,  1.93s/it]

Error extracting text from http://www.ibtimes.co.uk/jr-subbed-philippines-drug-dealers-found-dead-president-duterte-vows-wipe-out-crime-graphic-1568504: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/jr-subbed-philippines-drug-dealers-found-dead-president-duterte-vows-wipe-out-crime-graphic-1568504


Processing URLs:  42%|████▎     | 425/1000 [28:25<16:11,  1.69s/it]

Error extracting text from http://www.andrewerickson.com/2016/07/pca-press-release-pca-case-no-2013-19-the-south-china-sea-arbitration-the-republic-of-the-philippines-v-the-peoples-republic-of-china/: 403 Client Error: Forbidden for url: http://www.andrewerickson.com/2016/07/pca-press-release-pca-case-no-2013-19-the-south-china-sea-arbitration-the-republic-of-the-philippines-v-the-peoples-republic-of-china/


Processing URLs:  43%|████▎     | 427/1000 [28:28<15:24,  1.61s/it]

Error extracting text from http://www.dailystar.com.lb/News/World/2015/Dec-02/325515-venezuelas-harried-opposition-eyes-landmark-win.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/World/2015/Dec-02/325515-venezuelas-harried-opposition-eyes-landmark-win.ashx


Processing URLs:  43%|████▎     | 431/1000 [28:35<14:07,  1.49s/it]

Error extracting text from http://www.acwa.com/news/state-budget-fees/governor-brown%E2%80%99s-budget-proposal-includes-323-million-drought-response: 403 Client Error: Forbidden for url: http://www.acwa.com/news/state-budget-fees/governor-brown%E2%80%99s-budget-proposal-includes-323-million-drought-response


Processing URLs:  44%|████▎     | 436/1000 [28:41<12:24,  1.32s/it]

Error extracting text from http://www.salon.com/2016/03/12/sexual_transmission_of_zika_virus_more_common_than_previously_believed_partner/: 404 Client Error: Not Found for url: https://www.salon.com/2016/03/12/sexual_transmission_of_zika_virus_more_common_than_previously_believed_partner/


Processing URLs:  44%|████▍     | 439/1000 [28:52<21:29,  2.30s/it]

Error extracting text from https://medium.com/@gavrilodavid/why-derek-chauvin-may-get-off-his-murder-charge-2e2ad8d0911: 403 Client Error: Forbidden for url: https://medium.com/@gavrilodavid/why-derek-chauvin-may-get-off-his-murder-charge-2e2ad8d0911


Processing URLs:  44%|████▍     | 441/1000 [29:01<33:57,  3.65s/it]

Error extracting text from http://sos.alabama.gov/newsroom/inactive-voters-voter-record-refresh-information: 403 Client Error: Forbidden for url: https://www.sos.alabama.gov/newsroom/inactive-voters-voter-record-refresh-information


Processing URLs:  44%|████▍     | 442/1000 [29:02<24:24,  2.62s/it]

Error extracting text from https://www.nytimes.com/2021/03/01/health/covid-19-coronavirus-brazil-variant.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/01/health/covid-19-coronavirus-brazil-variant.html


Processing URLs:  44%|████▍     | 444/1000 [29:07<23:54,  2.58s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekly&amp;id=avengers11.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekly&amp;id=avengers11.htm


Processing URLs:  44%|████▍     | 445/1000 [29:09<23:00,  2.49s/it]

Error extracting text from http://data.unhcr.org/mediterranean/country.php?id=83/: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/country.php?id=83/


Processing URLs:  45%|████▍     | 448/1000 [29:12<13:18,  1.45s/it]

Error extracting text from http://www.ibtimes.com/trump-vs-isis-us-russia-alliance-against-terrorism-only-way-win-syria-war-assad-says-2489852: 403 Client Error: Forbidden for url: https://www.ibtimes.com/trump-vs-isis-us-russia-alliance-against-terrorism-only-way-win-syria-war-assad-says-2489852
Error extracting text from http://www.nytimes.com/2016/03/05/opinion/how-irans-reformistsfound-their-center.html?emc=edit_th_20160305&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/05/opinion/how-irans-reformistsfound-their-center.html?emc=edit_th_20160305&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  45%|████▌     | 452/1000 [29:17<10:18,  1.13s/it]

Error extracting text from https://43alumniforjoebiden.com/: 403 Client Error: Forbidden for url: https://43alumniforamerica.com/
Error extracting text from https://www.reuters.com/article/us-myanmar-politics/eleven-killed-as-myanmar-protesters-fight-troops-with-handmade-guns-firebombs-media-idUSKBN2BV0EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics/eleven-killed-as-myanmar-protesters-fight-troops-with-handmade-guns-firebombs-media-idUSKBN2BV0EH


Processing URLs:  46%|████▌     | 458/1000 [29:29<16:56,  1.88s/it]

Error extracting text from http://predictwise.com/politics/: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/


Processing URLs:  46%|████▌     | 461/1000 [29:32<12:08,  1.35s/it]

Error extracting text from http://www.trtworld.com/asia/fighters-attack-two-iraqi-gas-facilities-154603: 404 Client Error: Not Found for url: https://www.trtworld.com:443/asia/fighters-attack-two-iraqi-gas-facilities-154603
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-bailout-idUSKBN1871I9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-bailout-idUSKBN1871I9


Processing URLs:  46%|████▌     | 462/1000 [29:33<09:35,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/house-plans-vote-on-bill-to-lift-ban-on-oil-exports-1442329626: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-plans-vote-on-bill-to-lift-ban-on-oil-exports-1442329626


Processing URLs:  47%|████▋     | 467/1000 [29:38<08:11,  1.08it/s]

Error extracting text from https://www.irrawaddy.com/news/burma/vice-president-harris-says-us-committed-to-supporting-people-of-myanmar.html: 403 Client Error: Forbidden for url: https://www.irrawaddy.com/news/burma/vice-president-harris-says-us-committed-to-supporting-people-of-myanmar.html


Processing URLs:  47%|████▋     | 470/1000 [29:42<07:42,  1.15it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/295385-dems-request-ethics-probe-of-johnson: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/295385-dems-request-ethics-probe-of-johnson/
Error extracting text from http://news.yahoo.com/two-iran-ex-presidents-urge-voters-back-pro-083241096.html: 404 Client Error: Not Found for url: http://news.yahoo.com/two-iran-ex-presidents-urge-voters-back-pro-083241096.html


Processing URLs:  47%|████▋     | 474/1000 [29:47<08:31,  1.03it/s]

Error extracting text from http://www.worldbulletin.net/nato-montenegro-begin-membership-talks/169470/nato-montenegro-begin-membership-talks: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/nato-montenegro-begin-membership-talks/169470/nato-montenegro-begin-membership-talks


Processing URLs:  48%|████▊     | 477/1000 [29:48<05:07,  1.70it/s]

URL filtered: https://www.rferl.org/a/facebook-russia-advertisement-us-presidential-election-trump/28768268.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YI12E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YI12E


Processing URLs:  48%|████▊     | 478/1000 [29:48<04:08,  2.10it/s]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement/#: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement/


Processing URLs:  49%|████▊     | 487/1000 [30:11<14:00,  1.64s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKBN16Z2A2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  50%|████▉     | 495/1000 [30:28<14:28,  1.72s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-anc-to-force-zuma-to-quit-as-president-reports-idUSKBN1F90A5?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-anc-to-force-zuma-to-quit-as-president-reports-idUSKBN1F90A5?feedType=RSS&amp;feedName=worldNews
URL filtered: http://www.bloomberg.com/news/articles/2016-04-01/saudi-arabia-to-sell-stake-in-parent-of-state-oil-giant-by-2018
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-social-democrats-seek-clarity-on-coalition-talks-idUSKBN1E50TP?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-social-democrats-seek-clarity-on-coalition-talks-idUSKBN1E50TP?il=0


Processing URLs:  50%|████▉     | 497/1000 [30:42<28:45,  3.43s/it]

Error extracting text from http://www.trtworld.com/mea/burundi-peace-talks-resume-in-tanzania-without-opposition-111095: 404 Client Error: Not Found for url: https://www.trtworld.com:443/mea/burundi-peace-talks-resume-in-tanzania-without-opposition-111095


Processing URLs:  50%|█████     | 500/1000 [30:47<18:38,  2.24s/it]

URL filtered: https://www.youtube.com/watch?v=E_gAeHlIrWc#t=15.87601
URL filtered: https://www.bloomberg.com/news/articles/2017-08-22/tillerson-pushes-pakistan-to-prod-taliban-to-negotiating-table


Processing URLs:  51%|█████     | 510/1000 [30:56<08:46,  1.07s/it]

Error extracting text from http://www.lse.co.uk/AllNews.asp?code=3qb6eahc&amp;headline=OPEC_Holds_Off_On_Lowering_Production_Target: 403 Client Error: Forbidden for url: https://www.lse.co.uk/AllNews.asp?code=3qb6eahc&amp;headline=OPEC_Holds_Off_On_Lowering_Production_Target


Processing URLs:  51%|█████     | 511/1000 [30:56<06:43,  1.21it/s]

Error extracting text from https://www.wsj.com/articles/cost-to-insure-against-a-venezuela-default-hits-record-1510165600: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/cost-to-insure-against-a-venezuela-default-hits-record-1510165600


Processing URLs:  51%|█████▏    | 513/1000 [30:59<08:03,  1.01it/s]

Error extracting text from https://s22.q4cdn.com/826641620/files/doc_financials/2021/q1/Q1&#39;21-Selected-Metrics-and-Financials.pdf: 451 Client Error:  for url: https://s22.q4cdn.com/826641620/files/doc_financials/2021/q1/Q1&#39;21-Selected-Metrics-and-Financials.pdf


Processing URLs:  52%|█████▏    | 519/1000 [31:13<13:39,  1.70s/it]

Error extracting text from http://www.conservativehome.com/platform/2015/12/theresa-villiers-mp-an-agreement-that-brings-new-hope-to-northern-ireland.html: 403 Client Error: Forbidden for url: http://conservativehome.com/platform/2015/12/theresa-villiers-mp-an-agreement-that-brings-new-hope-to-northern-ireland.html


Processing URLs:  52%|█████▏    | 524/1000 [31:19<08:44,  1.10s/it]

Error extracting text from http://www.wsj.com/articles/a-trump-strategy-to-end-syrias-nightmare-1481847575: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-trump-strategy-to-end-syrias-nightmare-1481847575
Error extracting text from http://www.reuters.com/article/us-usa-trump-tax-exclusive-idUSKBN1622J5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-tax-exclusive-idUSKBN1622J5


Processing URLs:  53%|█████▎    | 526/1000 [31:19<05:44,  1.38it/s]

Error extracting text from https://www.scotsman.com/news/weather/scottish-election-2021-bad-weather-and-unseasonal-snowfall-causes-chaos-across-scotland-on-polling-day-3226817: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/weather/scottish-election-2021-bad-weather-and-unseasonal-snowfall-causes-chaos-across-scotland-on-polling-day-3226817


Processing URLs:  53%|█████▎    | 527/1000 [31:32<33:42,  4.28s/it]

URL filtered: https://www.cnbc.com/2019/12/05/amazon-faces-us-antitrust-scrutiny-on-cloud-business-bloomberg.html


Processing URLs:  53%|█████▎    | 531/1000 [31:39<19:00,  2.43s/it]

Error extracting text from http://www.morningstar.com/news/market-watch/TDJNMW_20170206269/update-oil-falls-as-concerns-over-us-output-persist.html: 500 Server Error: Internal Server Error for url: https://www.morningstar.com/news/marketwatch/20170206269/update-oil-falls-as-concerns-over-us-output-persist


Processing URLs:  53%|█████▎    | 534/1000 [31:44<13:34,  1.75s/it]

Error extracting text from http://www.wsj.com/articles/paris-attacks-prompt-geopolitical-shift-in-west-1447623348: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/paris-attacks-prompt-geopolitical-shift-in-west-1447623348


Processing URLs:  54%|█████▎    | 537/1000 [31:47<07:50,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/congress-seen-likely-to-lift-u-s-oil-export-ban-1449874465: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/congress-seen-likely-to-lift-u-s-oil-export-ban-1449874465
Error extracting text from http://www.reuters.com/article/2015/10/14/us-montenegro-nato-idUSKCN0S80WD20151014: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/14/us-montenegro-nato-idUSKCN0S80WD20151014


Processing URLs:  54%|█████▍    | 540/1000 [31:48<04:42,  1.63it/s]

Error extracting text from http://www.wsj.com/articles/aramco-deals-face-uncertainty-under-trump-administration-1481917544: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/aramco-deals-face-uncertainty-under-trump-administration-1481917544
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-china-idUSKBN1AE02S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-china-idUSKBN1AE02S


Processing URLs:  54%|█████▍    | 542/1000 [31:55<14:59,  1.96s/it]

Error extracting text from http://thebogotapost.com/2016/02/14/peace-process-latest-8/: 404 Client Error: Not Found for url: https://thebogotapost.com/2016/02/14/peace-process-latest-8/


Processing URLs:  54%|█████▍    | 543/1000 [31:55<11:33,  1.52s/it]

Error extracting text from https://www.theranos.com/news/posts/custom/theranos-facts: HTTPSConnectionPool(host='www.theranos.com', port=443): Max retries exceeded with url: /news/posts/custom/theranos-facts (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x3039892e0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  55%|█████▍    | 549/1000 [32:12<12:49,  1.71s/it]

Error extracting text from https://www.wsj.com/articles/feds-harker-supports-start-of-bond-buying-pullback-later-this-year-11625162443: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-harker-supports-start-of-bond-buying-pullback-later-this-year-11625162443
URL filtered: http://www.bloomberg.com/news/articles/2016-02-09/greece-bonds-punished-as-eu-politics-is-back-on-investor-radars


Processing URLs:  56%|█████▌    | 558/1000 [32:25<12:03,  1.64s/it]

Error extracting text from https://www.nytimes.com/2017/08/29/magazine/the-new-front-in-the-gerrymandering-wars-democracy-vs-math.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/29/magazine/the-new-front-in-the-gerrymandering-wars-democracy-vs-math.html


Processing URLs:  56%|█████▌    | 559/1000 [32:26<10:25,  1.42s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=22285&amp;Kw1=Polisario+Front&amp;Kw2=Morocco&amp;Kw3=: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=22285&amp;Kw1=Polisario+Front&amp;Kw2=Morocco&amp;Kw3=


Processing URLs:  56%|█████▌    | 562/1000 [32:29<07:44,  1.06s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html#: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html


Processing URLs:  57%|█████▋    | 566/1000 [32:35<09:14,  1.28s/it]

Error extracting text from http://www.wsj.com/articles/venezuelas-pdvsa-bonds-rally-after-new-swap-proposal-1475008769: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-pdvsa-bonds-rally-after-new-swap-proposal-1475008769


Processing URLs:  57%|█████▋    | 570/1000 [32:41<10:21,  1.45s/it]

Error extracting text from http://www.ibtimes.com/russia-turkey-relations-cyberattack-turkish-hackers-hits-kremlin-communications-2247655: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russia-turkey-relations-cyberattack-turkish-hackers-hits-kremlin-communications-2247655


Processing URLs:  57%|█████▋    | 571/1000 [33:41<2:16:04, 19.03s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-10-04/the-latest-un-official-decries-calamity-in-syrias-aleppo: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  57%|█████▋    | 574/1000 [33:45<52:30,  7.39s/it]  

Error extracting text from http://cs.cl/: 404 Client Error: Not Found for url: http://cs.cl/
URL filtered: https://twitter.com/relevantorgans/status/785591179379257344


Processing URLs:  58%|█████▊    | 576/1000 [33:47<31:00,  4.39s/it]

URL filtered: https://www.youtube.com/watch?v=iQpuYUhMiYY


Processing URLs:  58%|█████▊    | 579/1000 [33:49<17:21,  2.47s/it]

Error extracting text from http://in.reuters.com/article/us-nigeria-polio-idINKCN10M249: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  58%|█████▊    | 585/1000 [33:55<07:59,  1.15s/it]

Error extracting text from http://www.laprensasa.com/309_america-in-english/3346834_china-malaysia-begin-joint-military-drills-in-strait-of-malacca.html: 404 Client Error: Not Found for url: http://www.laprensasa.com/309_america-in-english/3346834_china-malaysia-begin-joint-military-drills-in-strait-of-malacca.html


Processing URLs:  59%|█████▉    | 588/1000 [33:59<07:15,  1.06s/it]

Error extracting text from http://www.pakistantoday.com.pk/2016/04/14/comment/panama-leaks-cyber-war-fare/: 403 Client Error: Forbidden for url: http://www.pakistantoday.com.pk/2016/04/14/comment/panama-leaks-cyber-war-fare/


Processing URLs:  59%|█████▉    | 590/1000 [34:03<11:13,  1.64s/it]

Error extracting text from https://www.janssen.com/johnson-johnson-announces-submission-application-us-fda-emergency-use-authorization-its: 403 Client Error: Forbidden for url: https://www.janssen.com/johnson-johnson-announces-submission-application-us-fda-emergency-use-authorization-its


Processing URLs:  59%|█████▉    | 591/1000 [34:05<12:21,  1.81s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-27/pound-falls-as-may-said-to-prepare-for-new-scottish-referendum


Processing URLs:  60%|██████    | 603/1000 [34:32<16:53,  2.55s/it]

Error extracting text from http://www.pollingreport.com/cong_rep.htm: 406 Client Error: Not Acceptable for url: http://www.pollingreport.com/cong_rep.htm


Processing URLs:  60%|██████    | 604/1000 [34:33<13:49,  2.09s/it]

Error extracting text from http://www.asahi.com/ajw/articles/AJ201704180038.html: 404 Client Error: Not Found for url: https://www.asahi.com/ajw/articles/AJ201704180038.html


Processing URLs:  60%|██████    | 605/1000 [34:34<11:08,  1.69s/it]

Error extracting text from https://www.presstv.com/Detail/2021/03/19/647621/Zarif-Biden-JCPOA: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2021/03/19/647621/Zarif-Biden-JCPOA


Processing URLs:  61%|██████    | 607/1000 [34:35<07:09,  1.09s/it]

Error extracting text from https://www.reuters.com/article/saudi-budget-int-idUSKBN28P2MJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/saudi-budget-int-idUSKBN28P2MJ
Error extracting text from http://ir.tesla.com/releasedetail.cfm?ReleaseID=978031: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?ReleaseID=978031


Processing URLs:  61%|██████    | 608/1000 [34:37<07:52,  1.20s/it]

Error extracting text from https://edmontonjournal.com/news/local-news/qa-with-university-of-alberta-professor-whose-lab-uncovered-how-an-oral-drug-attacks-the-covid-19-virus: 403 Client Error: Forbidden for url: https://edmontonjournal.com/news/local-news/qa-with-university-of-alberta-professor-whose-lab-uncovered-how-an-oral-drug-attacks-the-covid-19-virus


Processing URLs:  61%|██████    | 610/1000 [34:39<07:54,  1.22s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5338263/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5338263/


Processing URLs:  61%|██████    | 611/1000 [34:41<09:41,  1.50s/it]

Error extracting text from https://lucidmotors.com/car/reserve: 404 Client Error: Not Found for url: https://lucidmotors.com/car/reserve


Processing URLs:  61%|██████    | 612/1000 [34:42<08:32,  1.32s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-07-21/in-trump-convention-shadow-ohio-s-portman-faces-trade-backlash
URL filtered: http://www.bloomberg.com/news/articles/2016-10-07/u-s-confirms-russia-behind-hacking-attacks-to-disrupt-elections


Processing URLs:  62%|██████▏   | 615/1000 [34:43<04:27,  1.44it/s]

Error extracting text from http://www.economiccalendar.com/2017/03/20/iran-looks-to-swiftly-boost-crude-oil-export-to-eu-in-coming-months/: 404 Client Error: Not Found for url: http://www.economiccalendar.com/2017/03/20/iran-looks-to-swiftly-boost-crude-oil-export-to-eu-in-coming-months/


Processing URLs:  62%|██████▏   | 619/1000 [34:49<08:46,  1.38s/it]

Error extracting text from https://camargopharma.com/resources/blog/emergency-use-authorizations-what-is-an-eua-and-does-your-product-qualify/: 403 Client Error: Forbidden for url: https://premierconsulting.com/resources/blog/emergency-use-authorizations-what-is-an-eua-and-does-your-product-qualify/


Processing URLs:  62%|██████▏   | 621/1000 [34:51<07:16,  1.15s/it]

URL filtered: https://www.youtube.com/watch?v=3oEA6zK_8u8#t=44s


Processing URLs:  62%|██████▏   | 624/1000 [34:53<05:12,  1.21it/s]

Error extracting text from http://www.rand.org/: 403 Client Error: Forbidden for url: https://www.rand.org/
URL filtered: https://www.bloomberg.com/news/articles/2017-05-09/top-oil-trader-warns-shaky-demand-risks-scuppering-opec-mission


Processing URLs:  63%|██████▎   | 629/1000 [35:01<10:42,  1.73s/it]

Error extracting text from http://www.38north.org/2017/12/nampo120117/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  63%|██████▎   | 630/1000 [35:02<09:55,  1.61s/it]

Error extracting text from http://wpo.st/oWAl0: 503 Server Error: Service Unavailable: Back-end server is at capacity for url: http://wpo.st/oWAl0


Processing URLs:  64%|██████▎   | 636/1000 [35:10<07:11,  1.19s/it]

Error extracting text from https://www.google.ca/amp/mobile.reuters.com/article/amp/idUSKCN1201X5?client=ms-android-rogers-ca: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKCN1201X5
Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-phillips-idUSKBN15I2H1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-phillips-idUSKBN15I2H1


Processing URLs:  64%|██████▍   | 642/1000 [35:17<07:28,  1.25s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-26/saudi-aramco-sees-oil-demand-steady-as-supply-growth-slows
URL filtered: https://www.youtube.com/watch?v=cD_AejDc1fA


Processing URLs:  65%|██████▍   | 646/1000 [35:18<04:10,  1.41it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/jul/29/iran-threatens-close-strait-hormuz-again/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jul/29/iran-threatens-close-strait-hormuz-again/


Processing URLs:  65%|██████▍   | 648/1000 [35:21<05:50,  1.01it/s]

Error extracting text from https://af.reuters.com/article/topNews/idAFKBN1FS1B9-OZATP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  65%|██████▌   | 650/1000 [35:22<04:34,  1.28it/s]

Error extracting text from http://www.sciencemag.org/news/2016/05/us-advisers-sign-plan-reviewing-risky-virus-studies: 403 Client Error: Forbidden for url: https://www.science.org/news/2016/05/us-advisers-sign-plan-reviewing-risky-virus-studies
Error extracting text from http://www.nytimes.com/2015/06/23/world/asia/taliban-attack-afghan-parliament-and-seize-a-2nd-district-in-north.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/23/world/asia/taliban-attack-afghan-parliament-and-seize-a-2nd-district-in-north.html


Processing URLs:  65%|██████▌   | 653/1000 [35:28<08:29,  1.47s/it]

Error extracting text from http://www.un.org/en/ga/sessions/emergency.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/sessions/emergency.shtml


Processing URLs:  66%|██████▌   | 656/1000 [35:36<13:10,  2.30s/it]

URL filtered: http://www.al-monitor.com/pulse/originals/2016/07/iran-saudi-arabia-twitter-social-media-psychological-war.html


Processing URLs:  66%|██████▌   | 660/1000 [35:45<12:39,  2.23s/it]

Error extracting text from https://www.thenation.com/article/did-moscow-get-help-from-the-trump-campaign-in-its-social-media-trolling/: 404 Client Error: Not Found for url: https://www.thenation.com/article/did-moscow-get-help-from-the-trump-campaign-in-its-social-media-trolling/


Processing URLs:  66%|██████▌   | 662/1000 [35:45<07:36,  1.35s/it]

Error extracting text from http://www.el-nacional.com/economia/Venezuela-riesgos-inversiones-China_0_767923241.html: 403 Client Error: Forbidden for url: https://www.elnacional.com/economia/Venezuela-riesgos-inversiones-China_0_767923241.html


Processing URLs:  66%|██████▋   | 663/1000 [35:49<10:48,  1.92s/it]

Error extracting text from http://www.hellenicshippingnews.com/iran-sees-oil-output-rising-to-4-million-bpd-by-year-end/: 404 Client Error: Not Found for url: https://www.hellenicshippingnews.com/iran-sees-oil-output-rising-to-4-million-bpd-by-year-end/


Processing URLs:  66%|██████▋   | 665/1000 [35:53<09:43,  1.74s/it]

Error extracting text from http://www.reuters.com/article/2015/10/15/us-usa-fed-dudley-idUSKCN0S91WU20151015: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/15/us-usa-fed-dudley-idUSKCN0S91WU20151015


Processing URLs:  67%|██████▋   | 666/1000 [35:55<10:04,  1.81s/it]

Error extracting text from http://www.channelnewsasia.com/news/singapore/singapore-china-complete/1871162.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/singapore/singapore-china-complete/1871162.html


Processing URLs:  67%|██████▋   | 670/1000 [35:57<04:13,  1.30it/s]

Error extracting text from http://www.nytimes.com/2015/12/03/business/economy/janet-yellen-federal-reserve-interest-rates.html?emc=edit_th_20151203&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/business/economy/janet-yellen-federal-reserve-interest-rates.html?emc=edit_th_20151203&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://www.israeldefense.co.il/en/node/28401: 403 Client Error: Forbidden for url: http://www.israeldefense.co.il/en/node/28401
Error extracting text from https://ycharts.com/companies/AAPL/market_cap: 403 Client Error: Forbidden for url: https://ycharts.com/companies/AAPL/market_cap


Processing URLs:  67%|██████▋   | 673/1000 [36:00<04:10,  1.30it/s]

Error extracting text from http://www.thepoliticalinsider.com/breaking-fbi-is-ready-to-indict-hillary-rodham-clinton/: 403 Client Error: Forbidden for url: https://www.thepoliticalinsider.com/breaking-fbi-is-ready-to-indict-hillary-rodham-clinton/
Error extracting text from https://shape.nato.int/news-archive/2017/exercise-ample-strike-takes-place-for-4th-consecutive-year: 403 Client Error: Forbidden for url: https://shape.nato.int/news-archive/2017/exercise-ample-strike-takes-place-for-4th-consecutive-year


Processing URLs:  67%|██████▋   | 674/1000 [36:00<03:30,  1.55it/s]

Error extracting text from https://www.wsj.com/articles/fed-debated-timing-mechanics-of-stimulus-pullback-at-july-meeting-11629309648?mod=hp_lead_pos1: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-debated-timing-mechanics-of-stimulus-pullback-at-july-meeting-11629309648?mod=hp_lead_pos1


Processing URLs:  68%|██████▊   | 675/1000 [36:02<05:56,  1.10s/it]

Error extracting text from http://G.R.R.Martin: HTTPConnectionPool(host='g.r.r.martin', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023dc0b0>: Failed to resolve 'g.r.r.martin' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 680/1000 [36:07<05:43,  1.07s/it]

Error extracting text from http://www.scottaaronson.com/blog/: 406 Client Error: Not Acceptable for url: http://www.scottaaronson.com/blog/


Processing URLs:  68%|██████▊   | 681/1000 [36:08<05:45,  1.08s/it]

Error extracting text from https://www.dogpile.com/serp?qc=images&amp;q=michigan+stadium+full: 403 Client Error: Forbidden for url: https://www.dogpile.com/captcha?url=https%3A%2F%2Fwww.dogpile.com%2Fserp%3Fqc%3Dimages%26amp%3Bq%3Dmichigan%2Bstadium%2Bfull


Processing URLs:  68%|██████▊   | 683/1000 [36:10<04:40,  1.13it/s]

Error extracting text from http://fox6now.com/2016/04/12/ron-johnson-outraised-by-russ-feingold-emboldened-by-primary-results/: 403 Client Error: Forbidden for url: http://fox6now.com/2016/04/12/ron-johnson-outraised-by-russ-feingold-emboldened-by-primary-results/


Processing URLs:  68%|██████▊   | 684/1000 [36:20<18:55,  3.59s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-trade-idUSKBN168397: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-trade-idUSKBN168397


Processing URLs:  69%|██████▊   | 686/1000 [36:21<11:16,  2.15s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-28/bolsonaro-says-outcomes-at-elections-include-victory-or-death


Processing URLs:  69%|██████▉   | 689/1000 [36:32<17:38,  3.40s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-01/aramco-cuts-asia-oil-pricing-as-saudis-seen-losing-market-share


Processing URLs:  69%|██████▉   | 691/1000 [36:33<11:03,  2.15s/it]

Error extracting text from https://www.jnj.com/johnson-johnson-expands-phase-2a-clinical-trial-of-covid-19-vaccine-candidate-to-include-adolescents: 403 Client Error: Forbidden for url: https://www.jnj.com/johnson-johnson-expands-phase-2a-clinical-trial-of-covid-19-vaccine-candidate-to-include-adolescents


Processing URLs:  69%|██████▉   | 694/1000 [36:36<07:59,  1.57s/it]

Error extracting text from http://www.wsj.com/articles/peruvian-presidential-candidate-keiko-fujimori-extends-her-lead-in-poll-1463967678: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/peruvian-presidential-candidate-keiko-fujimori-extends-her-lead-in-poll-1463967678


Processing URLs:  70%|██████▉   | 698/1000 [36:51<15:22,  3.05s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-18/superforecasters-see-24-chance-of-brexit-as-economy-wins-out


Processing URLs:  70%|███████   | 702/1000 [36:55<08:20,  1.68s/it]

URL filtered: https://twitter.com/afp/status/670152360576987136


Processing URLs:  70%|███████   | 705/1000 [36:57<06:14,  1.27s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-01-25/trump-masters-the-art-of-the-non-endorsement-


Processing URLs:  71%|███████   | 708/1000 [37:01<06:29,  1.33s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/syrian-monitor-jets-bel/2238918.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/syrian-monitor-jets-bel/2238918.html


Processing URLs:  71%|███████▏  | 713/1000 [37:10<07:01,  1.47s/it]

Error extracting text from http://www.reuters.com/article/2015/11/24/us-iran-nuclear-idUSKBN0TD23R20151124: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/24/us-iran-nuclear-idUSKBN0TD23R20151124


Processing URLs:  71%|███████▏  | 714/1000 [37:13<08:31,  1.79s/it]

Error extracting text from http://www.eluniversal.com/noticias/daily-news/venezuelan-revenues-usd-147-million-january-february-2016_246629: 404 Client Error: Not Found for url: https://www.eluniversal.com/noticias/daily-news/venezuelan-revenues-usd-147-million-january-february-2016_246629


Processing URLs:  72%|███████▏  | 719/1000 [37:20<06:10,  1.32s/it]

Error extracting text from http://www.dispatchtimes.com/s-korea-says-it-agrees-with-china-to-seek-summit-with-japan/76125/: 404 Client Error: Not Found for url: http://www.dispatchtimes.com/s-korea-says-it-agrees-with-china-to-seek-summit-with-japan/76125/


Processing URLs:  72%|███████▏  | 721/1000 [37:22<05:39,  1.22s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/09/15/UN-235-000-migrants-ready-to-head-to-italy.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/09/15/UN-235-000-migrants-ready-to-head-to-italy.html


Processing URLs:  72%|███████▎  | 725/1000 [37:29<08:14,  1.80s/it]

Error extracting text from https://www.reuters.com/world/africa/nine-ethiopian-groups-form-anti-government-alliance-2021-11-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/nine-ethiopian-groups-form-anti-government-alliance-2021-11-05/


Processing URLs:  73%|███████▎  | 730/1000 [37:52<20:02,  4.46s/it]

Error extracting text from https://www.nytimes.com/2017/05/23/us/politics/drone-surveillance-policy.html?module=WatchingPortal&amp;region=c-column-middle-span-region&amp;pgType=Homepage&amp;action=click&amp;mediaId=thumb_square&amp;state=standard&amp;contentPlacement=12&amp;version=internal&amp;contentCollection=www.nytimes.com&amp;contentId=https%3A%2F%2Fwww.nytimes.com%2F2017%2F05%2F23%2Fus%2Fpolitics%2Fdrone-surveillance-policy.html&amp;eventName=Watching-article-click&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/23/us/politics/drone-surveillance-policy.html?module=WatchingPortal&amp;region=c-column-middle-span-region&amp;pgType=Homepage&amp;action=click&amp;mediaId=thumb_square&amp;state=standard&amp;contentPlacement=12&amp;version=internal&amp;contentCollection=www.nytimes.com&amp;contentId=https%3A%2F%2Fwww.nytimes.com%2F2017%2F05%2F23%2Fus%2Fpolitics%2Fdrone-surveillance-policy.html&amp;eventName=Watching-article-click&amp;_r=0


Processing URLs:  73%|███████▎  | 733/1000 [38:55<1:03:04, 14.17s/it]

Error extracting text from https://www.cmegroup.com/trading/interest-rates/countdown-to-fomc.html#: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://uk.news.yahoo.com/brazilian-president-dilma-rousseff-faces-090349027.html#l4zuEJp: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/brazilian-president-dilma-rousseff-faces-090349027.html#l4zuEJp


Processing URLs:  74%|███████▎  | 735/1000 [39:00<35:58,  8.14s/it]  

Error extracting text from http://www.diariodopoder.com.br/noticia.php?i=42089395162: 403 Client Error: Forbidden for url: https://diariodopoder.com.br/noticia.php?i=42089395162
Error extracting text from http://www.reuters.com/article/us-peru-election-survey-idUSKCN0YO2PR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-survey-idUSKCN0YO2PR


Processing URLs:  74%|███████▎  | 736/1000 [39:02<27:49,  6.32s/it]

Error extracting text from https://bit.ly/3cOjrsu: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2021/february-2021


Processing URLs:  74%|███████▍  | 738/1000 [39:04<15:39,  3.59s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-19/fed-still-concerned-by-low-inflation-as-rate-liftoff-approaches


Processing URLs:  74%|███████▍  | 741/1000 [39:05<07:33,  1.75s/it]

Error extracting text from https://www.france24.com/en/20190524-high-security-ebola-burials-spark-dismay-anger-dr-congo: 403 Client Error: Forbidden for url: https://www.france24.com/en/20190524-high-security-ebola-burials-spark-dismay-anger-dr-congo


Processing URLs:  74%|███████▍  | 744/1000 [39:10<06:37,  1.55s/it]

Error extracting text from https://www.nytimes.com/2020/09/25/us/politics/rbg-retirement-obama.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/09/25/us/politics/rbg-retirement-obama.html


Processing URLs:  75%|███████▌  | 751/1000 [39:19<05:32,  1.34s/it]

Error extracting text from http://asianjournal.com/news/ambassador-kim-ph-us-to-maintain-2017-joint-exercises/: 404 Client Error: Not Found for url: https://asianjournal.com/news/ambassador-kim-ph-us-to-maintain-2017-joint-exercises/


Processing URLs:  76%|███████▌  | 755/1000 [39:35<13:06,  3.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-23/erdogan-doesn-t-care-at-all-if-turkey-gets-downgraded-to-junk


Processing URLs:  76%|███████▌  | 757/1000 [39:35<07:31,  1.86s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/262710-poll-sanders-holds-double-digit-lead-in-new-hampshire: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/262710-poll-sanders-holds-double-digit-lead-in-new-hampshire/


Processing URLs:  76%|███████▌  | 760/1000 [39:39<06:01,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-vietnam-idUSKCN10K2NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-vietnam-idUSKCN10K2NE


Processing URLs:  76%|███████▋  | 765/1000 [39:45<03:56,  1.00s/it]

Error extracting text from https://www.business-standard.com/article/international/biden-says-us-will-replenish-israel-s-iron-dome-air-defense-systems-121052100102_1.html: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/international/biden-says-us-will-replenish-israel-s-iron-dome-air-defense-systems-121052100102_1.html


Processing URLs:  77%|███████▋  | 767/1000 [39:48<05:20,  1.37s/it]

Error extracting text from http://atimes.com/2015/09/north-korea-may-launch-long-range-missile-around-its-anniversary-seoul/: 404 Client Error: Not Found for url: https://atimes.com/2015/09/north-korea-may-launch-long-range-missile-around-its-anniversary-seoul/


Processing URLs:  77%|███████▋  | 771/1000 [40:03<10:55,  2.86s/it]

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/isis-military-wali-killed-mosul/amp/?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/isis-military-wali-killed-mosul/amp/


Processing URLs:  77%|███████▋  | 772/1000 [40:04<08:16,  2.18s/it]

Error extracting text from http://www.nato.int/nato_static_fl2014/assets/pdf/pdf_2016_01/20160128_SG_AnnualReport_2015_en.pdf: 403 Client Error: Forbidden for url: http://www.nato.int/nato_static_fl2014/assets/pdf/pdf_2016_01/20160128_SG_AnnualReport_2015_en.pdf


Processing URLs:  78%|███████▊  | 775/1000 [40:12<07:53,  2.11s/it]

Error extracting text from http://lgbc-scotland.gov.uk/: HTTPConnectionPool(host='lgbc-scotland.gov.uk', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe843ec0>: Failed to resolve 'lgbc-scotland.gov.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  78%|███████▊  | 776/1000 [40:21<15:36,  4.18s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/iran-asks-un-nuclear-chief-to-confirm-it-still-follows-deal/2017/10/29/5cbfab88-bcaa-11e7-9294-705f80164f6e_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/iran-asks-un-nuclear-chief-to-confirm-it-still-follows-deal/2017/10/29/5cbfab88-bcaa-11e7-9294-705f80164f6e_story.html


Processing URLs:  78%|███████▊  | 781/1000 [40:32<10:07,  2.78s/it]

Error extracting text from https://www.justsecurity.org/46531/cyber-sovereignty-north-korea-risk-inaction/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/46531/cyber-sovereignty-north-korea-risk-inaction/


Processing URLs:  79%|███████▊  | 786/1000 [40:39<04:49,  1.35s/it]

URL filtered: https://www.youtube.com/watch?v=FJlAN4jQUbs
Error extracting text from https://www.reuters.com/article/us-ethiopia-conflict/ethiopias-regional-tigray-forces-name-conditions-for-peace-with-government-idUSKBN2AJ2AX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ethiopia-conflict/ethiopias-regional-tigray-forces-name-conditions-for-peace-with-government-idUSKBN2AJ2AX


Processing URLs:  79%|███████▊  | 787/1000 [40:40<04:27,  1.26s/it]

Error extracting text from http://finance.yahoo.com/news/forget-model-3-tesla-needs-185936849.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/forget-model-3-tesla-needs-185936849.html


Processing URLs:  79%|███████▉  | 788/1000 [40:41<04:20,  1.23s/it]

Error extracting text from https://www.ipredict.co.nz/app.php?do=contract_detail&amp;contract=EURO.DEP.2016: 404 Client Error: Not Found for url: http://www.ipredict.co.nz/app.php?do=contract_detail&amp;contract=EURO.DEP.2016


Processing URLs:  79%|███████▉  | 789/1000 [40:42<04:17,  1.22s/it]

Error extracting text from http://www.huffingtonpost.co.za/2018/02/13/vote-of-no-confidence-this-is-how-the-numbers-stack-up_a_23360024/: 404 Client Error: Not Found for url: https://www.huffingtonpost.co.za/2018/02/13/vote-of-no-confidence-this-is-how-the-numbers-stack-up_a_23360024/


Processing URLs:  79%|███████▉  | 790/1000 [40:45<06:05,  1.74s/it]

Error extracting text from http://m.rasmussenreports.com/public_cont/election_2016/trump_41_clinton_39: 404 Client Error: Not Found for url: https://www.rasmussenreports.com:443/public_cont/election_2016/trump_41_clinton_39


Processing URLs:  79%|███████▉  | 794/1000 [41:14<16:27,  4.79s/it]

Error extracting text from http://www.powerlineblog.com/archives/2016/01/the-fall-of-ramadi-and-the-outlook-in-iraq.php: 403 Client Error: Forbidden for url: https://www.powerlineblog.com/archives/2016/01/the-fall-of-ramadi-and-the-outlook-in-iraq.php


Processing URLs:  80%|███████▉  | 795/1000 [41:14<12:21,  3.62s/it]

Error extracting text from http://www.dailypress.com/news/politics/dp-nws-ga-era-shenanigans-20150203-story.html: 404 Client Error: Not Found for url: https://www.dailypress.com/news/politics/dp-nws-ga-era-shenanigans-20150203-story.html


Processing URLs:  80%|███████▉  | 797/1000 [41:15<06:36,  1.95s/it]

Error extracting text from http://www.nytimes.com/2016/01/21/world/middleeast/syria-peace-talks-john-kerry-sergey-lavrov.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/21/world/middleeast/syria-peace-talks-john-kerry-sergey-lavrov.html


Processing URLs:  80%|████████  | 804/1000 [42:32<1:05:47, 20.14s/it]

Error extracting text from http://www.spa.gov.sa/viewstory.php?lang=en&amp;newsid=1700594: HTTPSConnectionPool(host='oportal.spa.gov.sa', port=443): Max retries exceeded with url: /?lang=en&amp;newsid=1700594 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ffc6e690>, 'Connection to oportal.spa.gov.sa timed out. (connect timeout=60)'))


Processing URLs:  81%|████████  | 807/1000 [42:34<23:08,  7.19s/it]  

Error extracting text from http://www.nytimes.com/2016/01/11/us/politics/ted-cruz-rises-in-iowa-on-tide-of-evangelical-support.html?emc=edit_th_20160111&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/11/us/politics/ted-cruz-rises-in-iowa-on-tide-of-evangelical-support.html?emc=edit_th_20160111&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://www.latimes.com/world/la-fg-air-war-syria-20161028-story,amp.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-air-war-syria-20161028-story,amp.html


Processing URLs:  82%|████████▏ | 815/1000 [42:53<05:46,  1.87s/it]

Error extracting text from http://www.reuters.com/article/us-usa-markets-tesla-idUSKBN16624D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-markets-tesla-idUSKBN16624D


Processing URLs:  82%|████████▏ | 817/1000 [42:59<06:40,  2.19s/it]

Error extracting text from http://www.wsj.com/articles/u-s-moves-to-give-iran-limited-access-to-dollars-1459468597: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-moves-to-give-iran-limited-access-to-dollars-1459468597


Processing URLs:  83%|████████▎ | 828/1000 [43:17<03:56,  1.38s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/267363-trump-maintains-iowa-lead-new-poll: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/267363-trump-maintains-iowa-lead-new-poll/


Processing URLs:  83%|████████▎ | 830/1000 [43:32<10:44,  3.79s/it]

Error extracting text from http://english.aawsat.com/2016/07/article55354302/saudi-minister-energy-ipo-saudi-aramco-depends-oil-stock-market: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/07/article55354302/saudi-minister-energy-ipo-saudi-aramco-depends-oil-stock-market


Processing URLs:  83%|████████▎ | 832/1000 [43:35<07:03,  2.52s/it]

Error extracting text from http://quotes.wsj.com/fx/USDGBP/advanced-chart: 403 Client Error: Forbidden for url: https://quotes.wsj.com/fx/USDGBP/advanced-chart


Processing URLs:  83%|████████▎ | 834/1000 [43:37<04:37,  1.67s/it]

Error extracting text from https://www.freightwaves.com/news/driver-employment-market-may-be-improving-amid-historic-pay-increases: 403 Client Error: Forbidden for url: https://www.freightwaves.com/news/driver-employment-market-may-be-improving-amid-historic-pay-increases


Processing URLs:  84%|████████▎ | 837/1000 [43:43<04:31,  1.67s/it]

Error extracting text from https://www.reuters.com/article/us-trade-nafta-canada-trump/trump-could-use-nafta-withdrawal-letter-as-negotiating-leverage-idUSKBN1F000B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta-canada-trump/trump-could-use-nafta-withdrawal-letter-as-negotiating-leverage-idUSKBN1F000B


Processing URLs:  84%|████████▍ | 841/1000 [43:49<03:54,  1.48s/it]

Error extracting text from https://www.hpcwire.com/2016/11/04/intel-cern-support-lhc-experiments/: 403 Client Error: Forbidden for url: https://www.hpcwire.com/2016/11/04/intel-cern-support-lhc-experiments/


Processing URLs:  84%|████████▍ | 843/1000 [43:52<03:52,  1.48s/it]

Error extracting text from https://legiscan.com/AR/legislation/2017: 403 Client Error: Forbidden for url: https://legiscan.com/AR/legislation/2017


Processing URLs:  84%|████████▍ | 845/1000 [43:54<03:32,  1.37s/it]

Error extracting text from http://www.mfa.gov.tr/no_-33_-30-january-2016_-press-release-regarding-the-violation-of-turkish-airspace-on-29-january-2016-by-a-rf-aircraft.en.mfa: HTTPSConnectionPool(host='www.mfa.gov.tr', port=443): Max retries exceeded with url: /no_-33_-30-january-2016_-press-release-regarding-the-violation-of-turkish-airspace-on-29-january-2016-by-a-rf-aircraft.en.mfa (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  85%|████████▍ | 846/1000 [43:55<02:59,  1.17s/it]

Error extracting text from http://forexreportdaily.com/2015/10/13/5712-house-oks-lifting-ban-on-crude-oil-exports/: 404 Client Error: Not Found for url: http://forexreportdaily.com/2015/10/13/5712-house-oks-lifting-ban-on-crude-oil-exports/


Processing URLs:  85%|████████▍ | 849/1000 [43:57<02:12,  1.14it/s]

Error extracting text from http://sports.yahoo.com/news/dhoni-lauds-india-crickets-turnaround-kings-051711707--spt.html;_ylt=AwrXgiI3wfJWkTIALoVNbK5_;_ylu=X3oDMTEyOWExdDM5BGNvbG8DZ3ExBHBvcwM0BHZ0aWQDQjAzMDNfMQRzZWMDc2M-: 404 Client Error: Not Found for url: https://sports.yahoo.com/news/dhoni-lauds-india-crickets-turnaround-kings-051711707--spt.html;_ylt=AwrXgiI3wfJWkTIALoVNbK5_;_ylu=X3oDMTEyOWExdDM5BGNvbG8DZ3ExBHBvcwM0BHZ0aWQDQjAzMDNfMQRzZWMDc2M-


Processing URLs:  85%|████████▌ | 852/1000 [44:02<03:49,  1.55s/it]

Error extracting text from http://www.merics.org/fileadmin/user_upload/Externe_Publikationen/Connectivity_Wars_ECFR_2016.pdf: 404 Client Error: Not Found for url: https://merics.org/fileadmin/user_upload/Externe_Publikationen/Connectivity_Wars_ECFR_2016.pdf


Processing URLs:  85%|████████▌ | 854/1000 [44:17<09:52,  4.06s/it]

Error extracting text from http://www.fredericksburg.com/news/news-wire/who-s-next-a-look-at-south-korean-presidential-contenders/article_e98e4ef3-d58d-582b-bb8c-a278396d70af.html: 404 Client Error: Not Found for url: https://fredericksburg.com/news/news-wire/who-s-next-a-look-at-south-korean-presidential-contenders/article_e98e4ef3-d58d-582b-bb8c-a278396d70af.html


Processing URLs:  86%|████████▌ | 858/1000 [44:23<04:27,  1.89s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AC1S9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AC1S9


Processing URLs:  86%|████████▌ | 862/1000 [44:28<03:16,  1.42s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O0K4OM6S972D01-3GKDKJN03C9R3BQU63T1FRRKKB


Processing URLs:  86%|████████▋ | 864/1000 [44:28<01:51,  1.22it/s]

Error extracting text from https://www.nytimes.com/2017/03/29/us/politics/what-cold-war-intrigue-can-tell-us-about-the-trump-russia-inquiry.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/29/us/politics/what-cold-war-intrigue-can-tell-us-about-the-trump-russia-inquiry.html?_r=0


Processing URLs:  87%|████████▋ | 866/1000 [44:30<01:45,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN11Z1Y9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN11Z1Y9


Processing URLs:  87%|████████▋ | 869/1000 [44:38<04:17,  1.97s/it]

Error extracting text from http://www.reuters.com/article/us-china-usa-southchinasea-exclusive-idUSKBN161029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-usa-southchinasea-exclusive-idUSKBN161029
Error extracting text from https://www.reuters.com/business/energy/nord-stream-2-ceo-says-construction-work-be-finished-august-2021-07-12/#:~:text=FRANKFURT%2C%20July%2012%20(Reuters),chief%20executive%20officer%20Matthias%20Warnig: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/nord-stream-2-ceo-says-construction-work-be-finished-august-2021-07-12/#:~:text=FRANKFURT%2C%20July%2012%20(Reuters),chief%20executive%20officer%20Matthias%20Warnig


Processing URLs:  87%|████████▋ | 872/1000 [45:40<33:15, 15.59s/it]

Error extracting text from http://www.usnews.com/opinion/articles/2015/10/13/reauthorize-the-export-import-bank-to-save-americas-jobs: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  88%|████████▊ | 875/1000 [45:48<16:21,  7.85s/it]

Error extracting text from http://www.iranpolitik.com/2012/05/07/iran-election-watch/iran-election-watch-2012-main-principalist-groups-emerge-weak-majority/: 404 Client Error: Not Found for url: http://www.iranpolitik.com/2012/05/07/iran-election-watch/iran-election-watch-2012-main-principalist-groups-emerge-weak-majority/


Processing URLs:  88%|████████▊ | 878/1000 [45:54<08:22,  4.12s/it]

Error extracting text from https://markgregoryeconomics.ey.com/2016/02/09/reasons-for-strong-mergers-acquisitions-growth/: HTTPSConnectionPool(host='markgregoryeconomics.ey.com', port=443): Max retries exceeded with url: /2016/02/09/reasons-for-strong-mergers-acquisitions-growth/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303887b60>: Failed to resolve 'markgregoryeconomics.ey.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  88%|████████▊ | 880/1000 [45:55<04:53,  2.44s/it]

URL filtered: https://twitter.com/JZarif/status/1367526457254309888


Processing URLs:  88%|████████▊ | 882/1000 [45:56<03:23,  1.72s/it]

Error extracting text from https://pbs.twimg.com/media/E9RsQdwVEAACi-z?format=jpg&amp;name=medium: 404 Client Error: Not Found for url: https://pbs.twimg.com/media/E9RsQdwVEAACi-z?format=jpg&amp;name=medium


Processing URLs:  89%|████████▉ | 889/1000 [46:19<04:30,  2.44s/it]

Error extracting text from http://www.ibtimes.com.au/china-building-south-china-sea-islands-nation-likely-declare-adiz-spurring-military-preparations: 403 Client Error: Forbidden for url: https://www.ibtimes.com.au/china-building-south-china-sea-islands-nation-likely-declare-adiz-spurring-military-preparations


Processing URLs:  89%|████████▉ | 891/1000 [46:24<04:58,  2.74s/it]

Error extracting text from http://www.timera-energy.com/content/uploads/2015/03/Timera-LNG-glut-asset-value-170308.pdf: 404 Client Error: Not Found for url: https://timera-energy.com/content/uploads/2015/03/Timera-LNG-glut-asset-value-170308.pdf


Processing URLs:  89%|████████▉ | 892/1000 [46:26<04:05,  2.28s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-syria-safezones-idUSKBN1592O8?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-syria-safezones-idUSKBN1592O8?il=0


Processing URLs:  89%|████████▉ | 893/1000 [46:26<02:58,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-ukraine-crisis-poroshenko-idUSKCN11C0PV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-poroshenko-idUSKCN11C0PV


Processing URLs:  90%|████████▉ | 895/1000 [46:28<02:10,  1.24s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/senate-races/290792-koch-network-launches-2m-ad-blitz-in-pennsylvania-senate-race: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/senate-races/290792-koch-network-launches-2m-ad-blitz-in-pennsylvania-senate-race/


Processing URLs:  90%|████████▉ | 896/1000 [46:28<01:41,  1.03it/s]

Error extracting text from http://allafrica.com/stories/201605061219.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201605061219.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x300c90e00>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  90%|█████████ | 901/1000 [46:38<02:44,  1.67s/it]

Error extracting text from http://allafrica.com/stories/201608030143.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201608030143.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x300c926c0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  90%|█████████ | 905/1000 [46:43<02:05,  1.32s/it]

Error extracting text from http://www.wsj.com/articles/fed-holds-rates-near-zero-but-signals-possible-hike-at-its-next-meeting-1446055373?tesla=y: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-holds-rates-near-zero-but-signals-possible-hike-at-its-next-meeting-1446055373?tesla=y


Processing URLs:  91%|█████████ | 906/1000 [47:43<29:42, 18.97s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-07-21/number-of-private-military-contractors-in-afghanistan-drops-precipitously-as-biden-pushes-withdrawal-plan: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  91%|█████████ | 907/1000 [48:43<48:30, 31.30s/it]

Error extracting text from http://www.usnews.com/news/articles/2015/06/29/ghost-fleet-depicts-war-between-china-us: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  91%|█████████ | 909/1000 [48:45<23:51, 15.73s/it]

Error extracting text from http://www.wsj.com/articles/russia-seeks-investigation-of-lithuanian-red-army-deserter-1410191086: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-seeks-investigation-of-lithuanian-red-army-deserter-1410191086


Processing URLs:  91%|█████████ | 911/1000 [48:46<11:52,  8.01s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-new-hampshire-ayotte-vs-hassan: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-new-hampshire-ayotte-vs-hassan
Error extracting text from https://www.wsj.com/articles/asian-shares-broadly-weaker-japan-stocks-fall-for-fourth-session-1488942309: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/asian-shares-broadly-weaker-japan-stocks-fall-for-fourth-session-1488942309


Processing URLs:  92%|█████████▏| 918/1000 [49:05<04:56,  3.61s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/gorsuch-willing-to-limit-environmental-groups-in-land-cases/articleshow/57479387.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/gorsuch-willing-to-limit-environmental-groups-in-land-cases/articleshow/57479387.cms


Processing URLs:  92%|█████████▏| 920/1000 [49:08<03:04,  2.31s/it]

Error extracting text from https://www.reuters.com/article/us-iran-oil-exports/iran-crude-exports-hit-five-year-high-near-pre-sanctions-levels-source-idUSKCN11M0XL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-exports/iran-crude-exports-hit-five-year-high-near-pre-sanctions-levels-source-idUSKCN11M0XL


Processing URLs:  92%|█████████▏| 922/1000 [49:08<01:35,  1.23s/it]

Error extracting text from http://www.bt.com.bn/frontpage-news-national/2015/06/03/russia-brunei-plan-1st-naval-drill-next-year: HTTPConnectionPool(host='www.bt.com.bn', port=80): Max retries exceeded with url: /frontpage-news-national/2015/06/03/russia-brunei-plan-1st-naval-drill-next-year (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feeb6840>: Failed to resolve 'www.bt.com.bn' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.wsj.com/articles/no-political-deal-likely-in-syria-despite-shaky-cease-fire-1457006507: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/no-political-deal-likely-in-syria-despite-shaky-cease-fire-1457006507


Processing URLs:  93%|█████████▎| 926/1000 [49:17<02:12,  1.79s/it]

Error extracting text from https://bit.ly/3lPFd1E: 404 Client Error: Not Found for url: https://www.spectator.co.uk/article/sturgeon-fights-on-%C2%ADbut-at-what-cost


Processing URLs:  93%|█████████▎| 928/1000 [49:19<01:51,  1.54s/it]

Error extracting text from http://www.ibtimes.co.uk/malaysia-1mdb-scandal-foreign-investors-take-fright-walls-close-pm-najib-razak-1542282: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/malaysia-1mdb-scandal-foreign-investors-take-fright-walls-close-pm-najib-razak-1542282


Processing URLs:  93%|█████████▎| 929/1000 [49:21<02:06,  1.78s/it]

Error extracting text from http://www.newsfultoncounty.com/world/news/0629084-update-2-refugee-boat-sinks-off-turkeys-aegean-coast-25-dead: 403 Client Error: Forbidden for url: https://www.newsfultoncounty.com/world/news/0629084-update-2-refugee-boat-sinks-off-turkeys-aegean-coast-25-dead


Processing URLs:  93%|█████████▎| 932/1000 [49:28<02:13,  1.96s/it]

Error extracting text from http://www.ibtimes.co.uk/germany-warns-mps-not-travel-turkey-after-armenian-genocide-resolution-1564996: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/germany-warns-mps-not-travel-turkey-after-armenian-genocide-resolution-1564996


Processing URLs:  93%|█████████▎| 933/1000 [49:32<03:03,  2.73s/it]

URL filtered: https://twitter.com/thekarami/status/720576584617443328/photo/1


Processing URLs:  94%|█████████▎| 935/1000 [49:34<02:07,  1.97s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/rcep-countries-conclude-auckland-round-eye-next-steps: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/rcep-countries-conclude-auckland-round-eye-next-steps


Processing URLs:  94%|█████████▍| 940/1000 [49:39<00:59,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-exemptions-exclusive-idUSKCN1173LA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-exemptions-exclusive-idUSKCN1173LA
URL filtered: https://twitter.com/aliostad/status/1497519061554630658


Processing URLs:  94%|█████████▍| 945/1000 [49:49<01:41,  1.84s/it]

Error extracting text from https://reut.rs/2T1jslz: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/us-no-longer-sees-taiwan-problem-china-ties-official-says-2021-06-24/


Processing URLs:  95%|█████████▍| 948/1000 [49:53<01:14,  1.43s/it]

Error extracting text from http://caracaschronicles.com/2015/10/27/presenting-the-caracas-chronicles-legislative-elections-forecasting-app-now-in-color/: 403 Client Error: Forbidden for url: http://caracaschronicles.com/2015/10/27/presenting-the-caracas-chronicles-legislative-elections-forecasting-app-now-in-color/


Processing URLs:  95%|█████████▍| 949/1000 [49:54<01:07,  1.33s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/trumps-stance-iran-emboldens-hard-liners-iran-46510147: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/trumps-stance-iran-emboldens-hard-liners-iran-46510147


Processing URLs:  95%|█████████▌| 950/1000 [49:54<00:56,  1.13s/it]

Error extracting text from http://www.weeklystandard.com/too-complicated/article/2006775#: 404 Client Error: Not Found for url: http://www.weeklystandard.com/too-complicated/article/2006775


Processing URLs:  95%|█████████▌| 951/1000 [49:56<01:04,  1.31s/it]

Error extracting text from http://www.atimes.com/article/china-establish-court-obor-disputes/: 404 Client Error: Not Found for url: https://atimes.com/article/china-establish-court-obor-disputes/


Processing URLs:  96%|█████████▌| 956/1000 [50:10<01:37,  2.21s/it]

Error extracting text from http://www.wsj.com/articles/battle-lines-form-on-impeachment-of-brazilian-president-dilma-rousseff-1449259960: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/battle-lines-form-on-impeachment-of-brazilian-president-dilma-rousseff-1449259960


Processing URLs:  96%|█████████▌| 959/1000 [50:11<00:43,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/03/16/world/asia/rex-tillerson-asia-trump-us-japan.html?rref=collection%2Fsectioncollection%2Fasia: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/16/world/asia/rex-tillerson-asia-trump-us-japan.html?rref=collection%2Fsectioncollection%2Fasia


Processing URLs:  96%|█████████▋| 963/1000 [51:19<12:00, 19.47s/it]

Error extracting text from https://www.teamusa.org/Media/News/USOPC/092421-Audio-USOPC-Leadership-Press-Briefing: HTTPSConnectionPool(host='www.teamusa.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  96%|█████████▋| 964/1000 [51:24<09:01, 15.05s/it]

Error extracting text from http://www.portfolio-adviser.com/analysis/1025653/pa-analysis-extra-qe-japan-verdict: 404 Client Error: Not Found for url: https://portfolio-adviser.com/analysis/1025653/pa-analysis-extra-qe-japan-verdict


Processing URLs:  97%|█████████▋| 969/1000 [51:29<01:40,  3.25s/it]

Error extracting text from http://www.wsj.com/articles/u-n-court-rules-it-can-arbitrate-south-china-sea-dispute-1446175002: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-court-rules-it-can-arbitrate-south-china-sea-dispute-1446175002
URL filtered: https://www.youtube.com/watch?v=oZzpZVUfWBc


Processing URLs:  97%|█████████▋| 972/1000 [51:31<00:45,  1.62s/it]

Error extracting text from http://www.nytimes.com/2016/07/20/business/dealbook/monsanto-rejects-bayers-revised-takeover-bid.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/20/business/dealbook/monsanto-rejects-bayers-revised-takeover-bid.html


Processing URLs:  97%|█████████▋| 973/1000 [51:31<00:36,  1.36s/it]

Error extracting text from http://thehill.com/homenews/administration/320943-report-kushner-ivanka-pushed-to-strike-climate-deal-criticism-from: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/320943-report-kushner-ivanka-pushed-to-strike-climate-deal-criticism-from/


Processing URLs:  98%|█████████▊| 976/1000 [51:36<00:33,  1.41s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-17/fed-leaves-interest-rates-unchanged-at-zero-0-25-target-range


Processing URLs:  98%|█████████▊| 984/1000 [51:45<00:18,  1.16s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/opinions_143400.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_143400.htm?selectedLocale=en


Processing URLs:  99%|█████████▊| 986/1000 [51:52<00:33,  2.36s/it]

URL filtered: http://www.bloombergview.com/articles/2016-02-28/china-s-welcome-action-against-north-korea


Processing URLs:  99%|█████████▉| 988/1000 [51:53<00:18,  1.56s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/colombia-peace-sides-signing-accord-43500076: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/colombia-peace-sides-signing-accord-43500076


Processing URLs:  99%|█████████▉| 990/1000 [52:06<00:41,  4.19s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/time-runs-out-for-venezuela-to-elect-new-president/2017/01/09/6fdfdc28-d67e-11e6-a0e6-d502d6751bc8_story.html?utm_term=.a4d328b77dec: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/time-runs-out-for-venezuela-to-elect-new-president/2017/01/09/6fdfdc28-d67e-11e6-a0e6-d502d6751bc8_story.html?utm_term=.a4d328b77dec


Processing URLs:  99%|█████████▉| 992/1000 [52:09<00:23,  2.97s/it]

Error extracting text from http://www.hardnewsmedia.com/2016/07/south-china-sea-backyard-bully: 404 Client Error: Not Found for url: https://hardnewsmedia.com/2016/07/south-china-sea-backyard-bully


Processing URLs: 100%|█████████▉| 997/1000 [52:17<00:05,  1.73s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/military-boys-set-timer-device-on-nawaz-sharif-government/articleshow/54876629.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/military-boys-set-timer-device-on-nawaz-sharif-government/articleshow/54876629.cms


Processing URLs: 100%|█████████▉| 999/1000 [52:18<00:01,  1.20s/it]

Error extracting text from http://www.tradingeconomics.com/commodity/brent-crude-oil: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/commodity/brent-crude-oil


Processing URLs: 100%|██████████| 1000/1000 [52:23<00:00,  3.14s/it]


Error extracting text from https://www.iatanews.com/2021/03/17/the-space-launch-system-nasas-last-rocket/: 404 Client Error: Not Found for url: https://iatanews.com/2021/03/17/the-space-launch-system-nasas-last-rocket/


Processing URLs:   0%|          | 3/1000 [00:04<18:00,  1.08s/it]

Error extracting text from http://news.yahoo.com/rouhani-enters-iran-election-row-over-barred-candidates-100337039.html: 404 Client Error: Not Found for url: http://news.yahoo.com/rouhani-enters-iran-election-row-over-barred-candidates-100337039.html


Processing URLs:   1%|          | 6/1000 [00:14<47:12,  2.85s/it]  

Error extracting text from http://www.amazon.com/The-Court-World-American-Realities/dp/1101946199: 500 Server Error:  for url: https://www.amazon.com/The-Court-World-American-Realities/dp/1101946199


Processing URLs:   1%|          | 12/1000 [00:29<35:17,  2.14s/it]  

Error extracting text from http://www.wsj.com/article_email/donald-trump-forges-new-blue-collar-coalition-among-republicans-1449272326-lMyQjAxMTA1MjAyNTMwMjUyWj: 403 Client Error: Forbidden for url: https://www.wsj.com/article_email/donald-trump-forges-new-blue-collar-coalition-among-republicans-1449272326-lMyQjAxMTA1MjAyNTMwMjUyWj


Processing URLs:   2%|▏         | 17/1000 [01:40<1:58:21,  7.22s/it]

Error extracting text from http://thebulletin.org/status-us-nuclear-weapons-turkey: 404 Client Error: Not Found for url: https://thebulletin.org/status-us-nuclear-weapons-turkey/


Processing URLs:   2%|▏         | 20/1000 [01:43<51:57,  3.18s/it]  

Error extracting text from https://pastebin.com/G86BiXPU: 404 Client Error: Not Found for url: https://pastebin.com/G86BiXPU


Processing URLs:   2%|▏         | 21/1000 [01:44<40:52,  2.51s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57675#.WdQ112hSyM8: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57675#.WdQ112hSyM8


Processing URLs:   2%|▏         | 22/1000 [01:50<57:48,  3.55s/it]

Error extracting text from http://famagusta-gazette.com/south-korea-expects-north-to-conduct-nuclear-test-next-year-p31553-69.htm: 404 Client Error: Not Found for url: https://www.famagusta-gazette.com/south-korea-expects-north-to-conduct-nuclear-test-next-year-p31553-69.htm


Processing URLs:   2%|▏         | 23/1000 [01:51<41:48,  2.57s/it]

Error extracting text from http://csat.au.af.mil/2025/volume3/vol3ch15.pdf: HTTPConnectionPool(host='csat.au.af.mil', port=80): Max retries exceeded with url: /2025/volume3/vol3ch15.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303200440>: Failed to resolve 'csat.au.af.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   2%|▏         | 24/1000 [01:52<37:25,  2.30s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-03/dakota-access-pipeline-seen-operational-in-second-quarter


Processing URLs:   3%|▎         | 26/1000 [01:53<21:12,  1.31s/it]

Error extracting text from https://www.yahoo.com/news/us-strikes-kill-fallujahs-commander-dozens-more-fighters-182854138.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/us-strikes-kill-fallujahs-commander-dozens-more-fighters-182854138.html


Processing URLs:   3%|▎         | 30/1000 [01:57<17:58,  1.11s/it]

Error extracting text from http://www.reuters.com/article/new-york-times-results-idUSL3N15J4WG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/new-york-times-results-idUSL3N15J4WG


Processing URLs:   3%|▎         | 32/1000 [01:59<19:55,  1.24s/it]

Error extracting text from http://www.orlandosentinel.com/news/space/os-space-florida-spacex-20161019-story.html: 404 Client Error: Not Found for url: https://www.orlandosentinel.com/news/space/os-space-florida-spacex-20161019-story.html


Processing URLs:   3%|▎         | 34/1000 [02:01<15:39,  1.03it/s]

Error extracting text from http://www.nytimes.com/2015/11/16/world/middleeast/tensions-in-iran-after-nuclear-deal-grow-in-hostility.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/16/world/middleeast/tensions-in-iran-after-nuclear-deal-grow-in-hostility.html?_r=0


Processing URLs:   4%|▎         | 35/1000 [02:01<12:04,  1.33it/s]

Error extracting text from http://english.alarabiya.net/en/webtv/reports/2016/07/21/Saudi-FM-Adel-Al-Jubeir-responds-to-Iran-criticism-in-Brussels-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/webtv/reports/2016/07/21/Saudi-FM-Adel-Al-Jubeir-responds-to-Iran-criticism-in-Brussels-.html


Processing URLs:   4%|▍         | 38/1000 [02:15<1:05:45,  4.10s/it]

URL filtered: https://twitter.com/lnaguilar/status/689862372694085633/photo/1


Processing URLs:   4%|▍         | 44/1000 [03:21<4:51:32, 18.30s/it]

Error extracting text from http://www.scotlandvotes.com/news/poll-of-polls-snp-set-to-win-another-majority: HTTPConnectionPool(host='www.scotlandvotes.com', port=80): Max retries exceeded with url: /news/poll-of-polls-snp-set-to-win-another-majority (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffc6d220>, 'Connection to www.scotlandvotes.com timed out. (connect timeout=60)'))


Processing URLs:   5%|▍         | 49/1000 [03:26<1:06:12,  4.18s/it]

Error extracting text from https://osesgy.unmissions.org/press-statement-un-special-envoy-yemen-hans-grundberg-two-month-truce: HTTPSConnectionPool(host='osesgy.unmissions.org', port=443): Max retries exceeded with url: /press-statement-un-special-envoy-yemen-hans-grundberg-two-month-truce (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: https://www.youtube.com/watch?v=YXg-NVB56pk
Error extracting text from http://www.washingtontimes.com/news/2015/sep/22/donald-trump-leading-ben-carson-carly-fiorina-iowa/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/sep/22/donald-trump-leading-ben-carson-carly-fiorina-iowa/
URL filtered: http://www.bloomberg.com/news/articles/2015-12-04/payrolls-in-u-s-increased-more-than-forecast-in-november


Processing URLs:   5%|▌         | 51/1000 [03:29<46:36,  2.95s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-12-11/fed-hikes-seen-starting-with-yield-curve-flattest-in-generation


Processing URLs:   5%|▌         | 54/1000 [03:31<30:15,  1.92s/it]

Error extracting text from https://www.fiercebiotech.com/biotech/merck-has-better-luck-second-covid-drug-attempt-as-it-sees-a-positive-early-molnupiravir: 403 Client Error: Forbidden for url: https://www.fiercebiotech.com/biotech/merck-has-better-luck-second-covid-drug-attempt-as-it-sees-a-positive-early-molnupiravir


Processing URLs:   6%|▌         | 55/1000 [03:32<27:31,  1.75s/it]

Error extracting text from http://www.ibtimes.co.uk/british-sas-capture-3-isis-chiefs-daring-lightning-raids-net-closes-around-mosul-1558793: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/british-sas-capture-3-isis-chiefs-daring-lightning-raids-net-closes-around-mosul-1558793


Processing URLs:   6%|▌         | 56/1000 [03:43<1:01:09,  3.89s/it]

URL filtered: http://www.al-monitor.com/pulse/originals/2017/04/us-turkey-agreement-syria-assad-tillerson-ankara.html?utm_source=dlvr.it&amp;utm_medium=twitter


Processing URLs:   6%|▌         | 58/1000 [03:43<36:39,  2.34s/it]  

Error extracting text from http://www.wsj.com/video/how-drone-delivery-is-saving-lives-in-rural-rwanda/665024AF-058F-4CB5-A1E1-C93BC67BAE3E.html?mod=trending_now_video_4: 403 Client Error: Forbidden for url: https://www.wsj.com/video/how-drone-delivery-is-saving-lives-in-rural-rwanda/665024AF-058F-4CB5-A1E1-C93BC67BAE3E.html?mod=trending_now_video_4


Processing URLs:   6%|▌         | 59/1000 [03:44<33:55,  2.16s/it]

Error extracting text from https://www.stripes.com/news/massive-russian-border-drill-has-us-nato-s-attention-1.484417#.WaNZdyiGPIU: 404 Client Error: Not Found for url: https://www.stripes.com/news/massive-russian-border-drill-has-us-nato-s-attention-1.484417#.WaNZdyiGPIU


Processing URLs:   6%|▌         | 60/1000 [03:45<27:41,  1.77s/it]

Error extracting text from https://www.un.org/press/en/2015/ga11732.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2015/ga11732.doc.htm


Processing URLs:   6%|▋         | 64/1000 [03:50<19:45,  1.27s/it]

Error extracting text from http://www.nytimes.com/2016/09/11/technology/no-driver-bring-it-on-how-pittsburgh-became-ubers-testing-ground.html?emc=edit_th_20160911&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/11/technology/no-driver-bring-it-on-how-pittsburgh-became-ubers-testing-ground.html?emc=edit_th_20160911&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:   7%|▋         | 67/1000 [03:53<14:07,  1.10it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/361263-dems-say-congress-should-send-400m-to-states-for-election-cyber-upgrades: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/361263-dems-say-congress-should-send-400m-to-states-for-election-cyber-upgrades/
Error extracting text from http://www.nytimes.com/2016/03/23/world/middleeast/bashar-al-assad-syria-russia-west.html?emc=edit_th_20160326&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/23/world/middleeast/bashar-al-assad-syria-russia-west.html?emc=edit_th_20160326&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=1


Processing URLs:   7%|▋         | 69/1000 [03:57<23:03,  1.49s/it]

URL filtered: https://oversightboard.com/news/226612455899839-oversight-board-upholds-former-president-trump-s-suspension-finds-facebook-failed-to-impose-proper-penalty/


Processing URLs:   7%|▋         | 72/1000 [03:59<13:37,  1.13it/s]

Error extracting text from https://www.france24.com/en/live-news/20210908-blinken-warns-us-getting-closer-to-giving-up-on-iran-nuclear-deal: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210908-blinken-warns-us-getting-closer-to-giving-up-on-iran-nuclear-deal


Processing URLs:   8%|▊         | 75/1000 [04:03<21:22,  1.39s/it]

Error extracting text from http://www.iec.ch/smartgrid/standards/: 404 Client Error: Not Found for url: https://www.iec.ch/smartgrid/standards/


Processing URLs:   8%|▊         | 81/1000 [04:12<22:35,  1.47s/it]

URL filtered: http://www.bloombergview.com/articles/2016-04-04/time-is-running-out-again-for-greece


Processing URLs:   9%|▊         | 86/1000 [04:16<13:48,  1.10it/s]

Error extracting text from http://www.behavioraleconomics.com/BEGuide2016.pdf: 403 Client Error: Forbidden for url: http://www.behavioraleconomics.com/BEGuide2016.pdf
Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-philippines-idUSKCN10M0KL?mod=related&amp;channelName=south-china-sea: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-philippines-idUSKCN10M0KL?mod=related&amp;channelName=south-china-sea


Processing URLs:   9%|▉         | 89/1000 [04:19<12:19,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-usa-selfdriving-idUSKCN11Q001?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-selfdriving-idUSKCN11Q001?il=0


Processing URLs:   9%|▉         | 91/1000 [04:21<12:28,  1.21it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdoneylopes.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdoneylopes.com.br/&amp;prev=search


Processing URLs:   9%|▉         | 92/1000 [04:22<11:55,  1.27it/s]

Error extracting text from http://truckyeah.jalopnik.com/new-ev-company-claims-to-have-2-3-billion-in-pre-order-1781909567: 404 Client Error: Not Found for url: https://truckyeah.jalopnik.com/new-ev-company-claims-to-have-2-3-billion-in-pre-order-1781909567
URL filtered: https://www.youtube.com/watch?v=CpbXWguQtRo
Error extracting text from http://www.reuters.com/article/us-russia-us-relations-idUSKBN1670VT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-us-relations-idUSKBN1670VT?il=0


Processing URLs:  10%|█         | 103/1000 [04:33<15:07,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-brexit-idUSKBN1A10YE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-brexit-idUSKBN1A10YE


Processing URLs:  11%|█         | 106/1000 [04:38<21:37,  1.45s/it]

URL filtered: https://twitter.com/cbsmornings/status/1451899857069817859


Processing URLs:  11%|█         | 109/1000 [04:40<13:06,  1.13it/s]

Error extracting text from http://www.huewire.com/technologies/us-files-civil-suit-against-volkswagen-for-environment-violations/58381/: 404 Client Error: Not Found for url: http://www.huewire.com/technologies/us-files-civil-suit-against-volkswagen-for-environment-violations/58381/


Processing URLs:  11%|█         | 112/1000 [04:43<12:28,  1.19it/s]

Error extracting text from https://www.reuters.com/business/energy/russian-gas-flows-germany-rise-despite-belarus-threat-2021-11-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/russian-gas-flows-germany-rise-despite-belarus-threat-2021-11-15/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-saudi-idUSKCN0VH1YX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-saudi-idUSKCN0VH1YX


Processing URLs:  12%|█▏        | 119/1000 [04:54<15:22,  1.05s/it]

Error extracting text from https://www.predictit.org/Market/1321/Which-party-will-win-the-US-Senate-race-in-Ohio-in-2016: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1321/Which-party-will-win-the-US-Senate-race-in-Ohio-in-2016
Error extracting text from http://www.reuters.com/article/us-turkey-security-kurds-idUSKBN12Y2XA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-kurds-idUSKBN12Y2XA


Processing URLs:  12%|█▎        | 125/1000 [05:04<18:05,  1.24s/it]

Error extracting text from https://www.zdf.de/nachrichten/politik/corona-warn-app-neue-funktion-100.html: 404 Client Error: Not Found for url: https://www.zdf.de/nachrichten/politik/corona-warn-app-neue-funktion-100.html
Error extracting text from http://www.nytimes.com/2015/12/13/us/politics/ted-cruz-surges-past-donald-trump-to-lead-in-iowa-poll.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/13/us/politics/ted-cruz-surges-past-donald-trump-to-lead-in-iowa-poll.html


Processing URLs:  13%|█▎        | 127/1000 [05:05<12:33,  1.16it/s]

Error extracting text from https://hillreporter.com/judge-orders-cruella-devos-to-testify-in-lawsuit-over-loan-forgiveness-101529: 404 Client Error: Not Found for url: https://hillreporter.com/judge-orders-cruella-devos-to-testify-in-lawsuit-over-loan-forgiveness-101529
Error extracting text from http://www.nytimes.com/2016/05/16/world/americas/dying-infants-and-no-medicine-inside-venezuelas-failing-hospitals.html?&amp;moduleDetail=section-news-0&amp;action=click&amp;contentCollection=Americas&amp;region=Footer&amp;module=MoreInSection&amp;version=WhatsNext&amp;contentID=WhatsNext&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/16/world/americas/dying-infants-and-no-medicine-inside-venezuelas-failing-hospitals.html?&amp;moduleDetail=section-news-0&amp;action=click&amp;contentCollection=Americas&amp;region=Footer&amp;module=MoreInSection&amp;version=WhatsNext&amp;contentID=WhatsNext&amp;pgtype=article


Processing URLs:  13%|█▎        | 132/1000 [05:10<12:55,  1.12it/s]

Error extracting text from http://nuclearfutures.princeton.edu/wp-content/uploads/2014/10/vonhippel-glaser-memo-oct2014.pdf: 504 Server Error: Target in maintenance. for url: https://nuclearfutures.princeton.edu/wp-content/uploads/2014/10/vonhippel-glaser-memo-oct2014.pdf


Processing URLs:  13%|█▎        | 133/1000 [05:10<10:35,  1.36it/s]

Error extracting text from https://www.bls.gov/charts/employment-situation/persons-not-in-the-labor-force-who-want-a-job.htm#: 403 Client Error: Forbidden for url: https://www.bls.gov/charts/employment-situation/persons-not-in-the-labor-force-who-want-a-job.htm


Processing URLs:  14%|█▎        | 135/1000 [05:11<10:18,  1.40it/s]

Error extracting text from https://apps.fcc.gov/edocs_public/attachmatch/DOC-347870A1.pdf: 403 Client Error: Forbidden for url: https://apps.fcc.gov/edocs_public/attachmatch/DOC-347870A1.pdf


Processing URLs:  14%|█▎        | 136/1000 [06:12<4:27:28, 18.57s/it]

Error extracting text from http://wasabi-now.com/article/84350b0cc54b71e005c8bc1638b93043: HTTPConnectionPool(host='wasabi-now.com', port=80): Max retries exceeded with url: /article/84350b0cc54b71e005c8bc1638b93043 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fece3260>, 'Connection to wasabi-now.com timed out. (connect timeout=60)'))


Processing URLs:  14%|█▍        | 138/1000 [06:22<2:42:06, 11.28s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-security-kurds-idUSKCN11H065: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-kurds-idUSKCN11H065
URL filtered: https://twitter.com/MeGovernment/status/740476386847166464


Processing URLs:  14%|█▍        | 145/1000 [06:43<46:20,  3.25s/it]  

Error extracting text from https://www.38north.org/2017/10/sinpo101117/: 403 Client Error: Forbidden for url: https://www.38north.org/2017/10/sinpo101117/


Processing URLs:  15%|█▌        | 151/1000 [06:48<14:46,  1.04s/it]

Error extracting text from http://www.governing.com/topics/politics/gov-north-carolina-southern-progressivism.html: 403 Client Error: Forbidden for url: https://www.governing.com/topics/politics/gov-north-carolina-southern-progressivism.html
Error extracting text from http://www.nytimes.com/2015/09/21/opinion/paul-krugman-the-rage-of-the-bankers.html?emc=edit_th_20150921&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/21/opinion/paul-krugman-the-rage-of-the-bankers.html?emc=edit_th_20150921&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  16%|█▌        | 157/1000 [07:00<29:54,  2.13s/it]

Error extracting text from http://www.theglobeandmail.com/news/news-video/video-russia-and-rebels-cast-doubt-over-syria-ceasefire/article31945635/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/news-video/video-russia-and-rebels-cast-doubt-over-syria-ceasefire/article31945635/
URL filtered: https://twitter.com/terryglavin/status/1431680318331834368


Processing URLs:  16%|█▌        | 160/1000 [08:03<3:47:03, 16.22s/it]

Error extracting text from http://aa.com.tr/en/middle-east/4-iran-linked-afghan-shia-militiamen-killed-in-syria/1064275: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▋        | 164/1000 [08:05<1:23:37,  6.00s/it]

Error extracting text from http://www.tradingeconomics.com/turkey/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/turkey/rating


Processing URLs:  17%|█▋        | 166/1000 [08:12<1:03:52,  4.60s/it]

Error extracting text from https://medium.com/incerto/the-most-intolerant-wins-the-dictatorship-of-the-small-minority-3f1f83ce4e15#.cq32okfgf: 403 Client Error: Forbidden for url: https://medium.com/incerto/the-most-intolerant-wins-the-dictatorship-of-the-small-minority-3f1f83ce4e15#.cq32okfgf
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/tag/impeachment-de-dilma/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/tag/impeachment-de-dilma/&amp;prev=search


Processing URLs:  17%|█▋        | 171/1000 [08:17<25:49,  1.87s/it]  

Error extracting text from https://simpleflying.com/emirates-triples-summer-passengers/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  17%|█▋        | 174/1000 [08:18<12:53,  1.07it/s]

Error extracting text from https://www.bls.gov/cpi/: 403 Client Error: Forbidden for url: https://www.bls.gov/cpi/
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-scarborough-exclu-idUSKCN0WK01B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-scarborough-exclu-idUSKCN0WK01B


Processing URLs:  18%|█▊        | 175/1000 [08:19<12:50,  1.07it/s]

Error extracting text from https://www.espn.com/mlb/story/_/id/33322364/mlb-delays-start-spring-training-march-5-cba-negotiations-resume-monday: 403 Client Error: Forbidden for url: https://www.espn.com/mlb/story/_/id/33322364/mlb-delays-start-spring-training-march-5-cba-negotiations-resume-monday


Processing URLs:  18%|█▊        | 177/1000 [08:21<12:29,  1.10it/s]

Error extracting text from https://www.nytimes.com/2017/01/30/us/politics/donald-trump-administration.html?mtrref=undefined&amp;gwh=0BD1B3BFC170B7E3D56F840F57088912&amp;gwt=pay&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/30/us/politics/donald-trump-administration.html?mtrref=undefined&amp;gwh=0BD1B3BFC170B7E3D56F840F57088912&amp;gwt=pay&amp;_r=0


Processing URLs:  18%|█▊        | 183/1000 [08:26<11:32,  1.18it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/11/11/Iran-has-stopped-dismantling-nuclear-centrifuges-senior-official.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/11/11/Iran-has-stopped-dismantling-nuclear-centrifuges-senior-official.html


Processing URLs:  18%|█▊        | 185/1000 [08:28<09:31,  1.43it/s]

Error extracting text from https://www.wsj.com/articles/breakaway-regions-in-ukraine-issue-call-up-orders-as-russia-tests-missiles-11645277438: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/breakaway-regions-in-ukraine-issue-call-up-orders-as-russia-tests-missiles-11645277438


Processing URLs:  19%|█▊        | 186/1000 [08:28<07:30,  1.81it/s]

Error extracting text from http://english.alarabiya.net/en/business/energy/2016/07/13/Saudi-energy-minister-Aramco-IPO-depends-on-oil-stock-market.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/energy/2016/07/13/Saudi-energy-minister-Aramco-IPO-depends-on-oil-stock-market.html


Processing URLs:  19%|█▉        | 190/1000 [08:46<45:18,  3.36s/it]

URL filtered: https://www.bloomberg.com/news/articles/2022-03-09/ukraine-open-to-neutrality-but-won-t-yield-territory-aide-says?srnd=premium


Processing URLs:  19%|█▉        | 193/1000 [08:47<23:15,  1.73s/it]

Error extracting text from https://www.numbersusa.com/news/obama-admin-petitions-supreme-court-rehear-amnesty-case: 404 Client Error: Not Found for url: https://www.numbersusa.com/news/obama-admin-petitions-supreme-court-rehear-amnesty-case


Processing URLs:  20%|█▉        | 195/1000 [08:49<17:11,  1.28s/it]

Error extracting text from http://www.channelnewsasia.com/news/business/malaysia-expected-to-save/2465756.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/malaysia-expected-to-save/2465756.html
Error extracting text from http://www.reuters.com/article/us-turkey-referendum-germany-gabriel-idUSKBN16P0EA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-germany-gabriel-idUSKBN16P0EA?il=0


Processing URLs:  20%|█▉        | 196/1000 [08:51<18:04,  1.35s/it]

Error extracting text from https://bit.ly/3cEdkWl: 404 Client Error: Not Found for url: https://www.spectator.co.uk/article/sturgeon-is-making-salmond-s-mistake-in-her-fight-for-scottish-independence/amp


Processing URLs:  20%|██        | 204/1000 [09:07<24:48,  1.87s/it]

Error extracting text from https://www.health.govt.nz/our-work/diseases-and-conditions/covid-19-novel-coronavirus/covid-19-vaccines/covid-19-vaccine-strategy-planning-insights/covid-19-purchasing-vaccines: 403 Client Error: Forbidden for url: https://www.health.govt.nz/our-work/diseases-and-conditions/covid-19-novel-coronavirus/covid-19-vaccines/covid-19-vaccines-available-new-zealand/covid-19-purchasing-vaccines


Processing URLs:  20%|██        | 205/1000 [09:08<19:40,  1.48s/it]

URL filtered: https://www.youtube.com/watch?v=WSwbfwy56n8


Processing URLs:  21%|██        | 207/1000 [09:10<16:53,  1.28s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/nigerian-senate-refuses-30-billion-loan-plan-43222409: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/nigerian-senate-refuses-30-billion-loan-plan-43222409


Processing URLs:  21%|██        | 209/1000 [09:15<22:54,  1.74s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/359827-ceo-of-data-firm-that-aided-trump-reached-out-to-assange-in-june-2016: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/359827-ceo-of-data-firm-that-aided-trump-reached-out-to-assange-in-june-2016/


Processing URLs:  21%|██        | 210/1000 [09:16<19:00,  1.44s/it]

Error extracting text from https://blogs.intralinks.com/2017/03/q2-2017-ma-remain-strong-continental-europe-uk-stalls/: HTTPSConnectionPool(host='blogs.intralinks.com', port=443): Max retries exceeded with url: /2017/03/q2-2017-ma-remain-strong-continental-europe-uk-stalls/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blogs.intralinks.com'. (_ssl.c:1000)")))


Processing URLs:  21%|██        | 212/1000 [09:16<12:06,  1.09it/s]

Error extracting text from http://www.nytimes.com/2015/10/09/business/dealbook/imf-economies-lima-china-turkey-brazil.html?ref=business: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/09/business/dealbook/imf-economies-lima-china-turkey-brazil.html?ref=business


Processing URLs:  21%|██▏       | 213/1000 [09:34<1:13:52,  5.63s/it]

Error extracting text from http://www.ew.com/article/2015/04/03/george-rr-martin-winds-date: 406 Client Error: Not Acceptable for url: https://www.ew.com/article/2015/04/03/george-rr-martin-winds-date


Processing URLs:  22%|██▏       | 216/1000 [09:41<46:01,  3.52s/it]  

Error extracting text from http://cherna.gora.me/news/washington-supports-montenegros-membership-in-nato/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/washington-supports-montenegros-membership-in-nato/


Processing URLs:  22%|██▏       | 219/1000 [10:08<1:11:45,  5.51s/it]

Error extracting text from https://www.nytimes.com/2017/04/06/world/asia/rodrigo-duterte-south-china-sea.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/06/world/asia/rodrigo-duterte-south-china-sea.html?_r=0


Processing URLs:  22%|██▏       | 220/1000 [10:08<53:46,  4.14s/it]  

URL filtered: https://www.bustle.com/p/how-many-times-has-trump-tweeted-as-president-twitter-is-his-best-friend-8011368


Processing URLs:  22%|██▏       | 222/1000 [11:09<3:31:05, 16.28s/it]

Error extracting text from http://en.kremlin.ru/acts/news/54640: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  22%|██▏       | 223/1000 [11:11<2:43:47, 12.65s/it]

Error extracting text from https://hypermind.com/hypermind/app.html?fwd=#welcome: 404 Client Error: Not Found for url: https://www.hypermind.com/hypermind/app.html?fwd=#welcome


Processing URLs:  23%|██▎       | 227/1000 [11:14<49:22,  3.83s/it]  

Error extracting text from http://english.alarabiya.net/en/business/energy/2016/04/06/Iran-expects-4-mbpd-oil-output-by-March-2017.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/energy/2016/04/06/Iran-expects-4-mbpd-oil-output-by-March-2017.html
Error extracting text from http://www.reuters.com/article/us-germany-politics/jamaica-or-bust-merkel-launching-crunch-german-coalition-talks-idUSKBN1CP14A?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/jamaica-or-bust-merkel-launching-crunch-german-coalition-talks-idUSKBN1CP14A?il=0


Processing URLs:  23%|██▎       | 232/1000 [11:20<18:54,  1.48s/it]

Error extracting text from https://www.leclairryan.com/files/Uploads/Documents/Environmental%20Enforcement%20Webinar%20Presentation%20final%2010%2024%2012.pdf: HTTPSConnectionPool(host='www.leclairryan.com', port=443): Max retries exceeded with url: /files/Uploads/Documents/Environmental%20Enforcement%20Webinar%20Presentation%20final%2010%2024%2012.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3032026c0>: Failed to resolve 'www.leclairryan.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  24%|██▍       | 239/1000 [11:33<18:39,  1.47s/it]

Error extracting text from http://www.reuters.com/article/us-nato-hungary-idUSKCN0VY119: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-hungary-idUSKCN0VY119


Processing URLs:  24%|██▍       | 241/1000 [11:36<19:00,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-afghanistan-airforce-exclus-idUSKCN1B22GY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-afghanistan-airforce-exclus-idUSKCN1B22GY


Processing URLs:  25%|██▍       | 249/1000 [12:46<3:52:19, 18.56s/it]

Error extracting text from http://www.thecountrycaller.com/44489-tesla-motors-inc-tsla-how-are-preparations-for-model-3-production-coming-along/: HTTPConnectionPool(host='www.thecountrycaller.com', port=80): Max retries exceeded with url: /44489-tesla-motors-inc-tsla-how-are-preparations-for-model-3-production-coming-along/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3021da450>, 'Connection to www.thecountrycaller.com timed out. (connect timeout=60)'))


Processing URLs:  25%|██▌       | 250/1000 [12:46<2:45:33, 13.25s/it]

Error extracting text from https://www.chathamhouse.org/2021/04/what-next-insurgency-cabo-delgado: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/2021/04/what-next-insurgency-cabo-delgado
URL filtered: https://www.youtube.com/watch?v=oXc4uspb8J0


Processing URLs:  26%|██▌       | 256/1000 [12:53<36:32,  2.95s/it]  

Error extracting text from https://www.geekwire.com/2020/virgin-galactic-unveils-supersonic-airplane-concept-boeing-rolls-royce-supporters/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2020/virgin-galactic-unveils-supersonic-airplane-concept-boeing-rolls-royce-supporters/


Processing URLs:  26%|██▌       | 259/1000 [12:57<25:34,  2.07s/it]

Error extracting text from https://www.spacex.com/reusability-key-making-human-life-multi-planetary: 404 Client Error: Not Found for url: https://www.spacex.com/reusability-key-making-human-life-multi-planetary


Processing URLs:  26%|██▌       | 260/1000 [12:58<22:50,  1.85s/it]

URL filtered: https://twitter.com/SOccultus/status/1376500211061825546


Processing URLs:  27%|██▋       | 267/1000 [13:08<15:39,  1.28s/it]

Error extracting text from https://www.reuters.com/article/us-uber-britain/unfit-uber-stripped-of-london-license-idUSKCN1BX151: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-uber-britain/unfit-uber-stripped-of-london-license-idUSKCN1BX151


Processing URLs:  27%|██▋       | 269/1000 [13:12<20:10,  1.66s/it]

Error extracting text from http://www.khl.com/magazines/international-construction/detail/item115584/May-opening-confirmed-for-Panama-Canal: 404 Client Error: Not Found for url: https://www.khl.com/magazines/international-construction/detail/item115584/May-opening-confirmed-for-Panama-Canal


Processing URLs:  27%|██▋       | 270/1000 [14:15<3:59:36, 19.69s/it]

Error extracting text from http://government.ru/docs/19758/: HTTPConnectionPool(host='government.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 271/1000 [14:15<2:50:01, 13.99s/it]

Error extracting text from http://www.cdm.me/english/polyakova-by-joining-nato-you-will-exit-from-the-gray-zone-between-the-eu-and-russia: 403 Client Error: Forbidden for url: https://www.cdm.me/english/polyakova-by-joining-nato-you-will-exit-from-the-gray-zone-between-the-eu-and-russia


Processing URLs:  28%|██▊       | 277/1000 [14:25<34:46,  2.89s/it]  

Error extracting text from http://www.nti.org/analysis/atomic-pulse/outpacing-cyber-hackers-preventing-catastrophic-cyberattacks-nuclear-facilities/: 403 Client Error: Forbidden for url: https://www.nti.org/analysis/atomic-pulse/outpacing-cyber-hackers-preventing-catastrophic-cyberattacks-nuclear-facilities/


Processing URLs:  28%|██▊       | 279/1000 [14:27<22:34,  1.88s/it]

Error extracting text from http://www.peruviantimes.com/05/late-left-wing-surge-in-peru-pre-election-polls/26295/: 406 Client Error: Not Acceptable for url: http://www.peruviantimes.com/05/late-left-wing-surge-in-peru-pre-election-polls/26295/


Processing URLs:  29%|██▊       | 286/1000 [14:42<23:25,  1.97s/it]

Error extracting text from https://reut.rs/3eOLXui: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/mozambiques-gas-ambitions-rest-distant-hope-peace-2021-07-22/
URL filtered: https://www.bloomberg.com/news/articles/2017-09-11/venezuela-bond-bull-says-sanctions-may-keep-maduro-paying-debt


Processing URLs:  29%|██▉       | 294/1000 [14:50<09:57,  1.18it/s]

Error extracting text from https://www.reuters.com/article/us-china-parliament-defence-idUSKBN2AX07Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-parliament-defence-idUSKBN2AX07Z
Error extracting text from http://bigstory.ap.org/article/albania-prosecutor-confrontation-us-ambassador: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/albania-prosecutor-confrontation-us-ambassador (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe55d1f0>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 295/1000 [14:52<14:25,  1.23s/it]

Error extracting text from https://ec.europa.eu/info/sites/info/files/brexit_files/com_831_1_en_act_part1_v2.pdf: 404 Client Error: Not Found for url: https://commission.europa.eu/sites/default/files/brexit_files/com_831_1_en_act_part1_v2.pdf


Processing URLs:  30%|██▉       | 299/1000 [14:57<13:02,  1.12s/it]

Error extracting text from http://www.chicagotribune.com/news/local/politics/ct-bruce-rauner-amazon-st-louis-met-0919-20170918-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/local/politics/ct-bruce-rauner-amazon-st-louis-met-0919-20170918-story.html


Processing URLs:  30%|███       | 301/1000 [14:58<10:48,  1.08it/s]

Error extracting text from http://www.energyintel.com/pages/login.aspx?fid=art&amp;DocId=908286&amp;ts=1: 403 Client Error: Forbidden for url: https://www.energyintel.com/pages/login.aspx?fid=art&amp;DocId=908286&amp;ts=1


Processing URLs:  30%|███       | 305/1000 [15:06<19:33,  1.69s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-19/trump-un-envoy-russia-s-election-interference-is-warfare


Processing URLs:  31%|███▏      | 314/1000 [15:15<11:07,  1.03it/s]

Error extracting text from https://www.wsj.com/articles/how-europe-tripped-in-covid-19-vaccine-race-11612293218: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/how-europe-tripped-in-covid-19-vaccine-race-11612293218


Processing URLs:  32%|███▏      | 316/1000 [15:16<09:32,  1.20it/s]

Error extracting text from http://www.readingeagle.com/berks-country/article/opinion-proactive-approach-stems-spread-of-avian-flu: 404 Client Error: Not Found for url: https://www.readingeagle.com/berks-country/article/opinion-proactive-approach-stems-spread-of-avian-flu
Error extracting text from http://blogs.wsj.com/briefly/2015/10/07/5-things-to-watch-in-the-fed-minutes/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/briefly/2015/10/07/5-things-to-watch-in-the-fed-minutes/


Processing URLs:  32%|███▏      | 324/1000 [15:29<11:07,  1.01it/s]

Error extracting text from https://translate.google.nl/translate?hl=nl&amp;sl=de&amp;tl=en&amp;u=http%3A%2F%2Fwww.tagesanzeiger.ch%2Fausland%2Fnaher-osten-und-afrika%2FNun-beginnt-der-Wettlauf-um-Auftraege-aus-dem-Iran%2Fstory%2F17824008: 400 Client Error: Bad Request for url: https://translate.google.nl/translate?hl=nl&amp;sl=de&amp;tl=en&amp;u=http%3A%2F%2Fwww.tagesanzeiger.ch%2Fausland%2Fnaher-osten-und-afrika%2FNun-beginnt-der-Wettlauf-um-Auftraege-aus-dem-Iran%2Fstory%2F17824008


Processing URLs:  33%|███▎      | 326/1000 [15:34<20:02,  1.78s/it]

Error extracting text from https://uk.reuters.com/article/uk-afghanistan-blast-trump/trump-rejects-peace-talks-with-taliban-in-departure-from-afghan-strategy-idUKKBN1FI2BR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  33%|███▎      | 328/1000 [15:36<16:08,  1.44s/it]

Error extracting text from https://m.tvxs.gr/mo/i/333816/f/news/kosmos/den-tha-erthei-stin-athina-o-poytin-gia-tin-parelasi-tis-25is-martioy.html: 403 Client Error: Forbidden for url: https://m.tvxs.gr/mo/i/333816/f/news/kosmos/den-tha-erthei-stin-athina-o-poytin-gia-tin-parelasi-tis-25is-martioy.html


Processing URLs:  33%|███▎      | 331/1000 [15:48<38:36,  3.46s/it]

Error extracting text from http://en.abna24.com/cultural/archive/2016/08/27/774756/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/cultural/archive/2016/08/27/774756/story.html


Processing URLs:  33%|███▎      | 334/1000 [15:55<29:35,  2.67s/it]

Error extracting text from http://www.avim.org.tr/bulten/en/116233: 404 Client Error: Not Found for url: http://www.avim.org.tr/bulten/en/116233


Processing URLs:  34%|███▍      | 341/1000 [16:12<25:43,  2.34s/it]

Error extracting text from http://www.stop-djihadisme.gouv.fr/#xtor=AD-300: 403 Client Error: Forbidden for url: http://www.stop-djihadisme.gouv.fr/#xtor=AD-300


Processing URLs:  35%|███▍      | 347/1000 [17:23<3:31:48, 19.46s/it]

Error extracting text from http://www.fas.org:8080/sgp/crs/mideast/IN10474.pdf: HTTPConnectionPool(host='www.fas.org', port=8080): Max retries exceeded with url: /sgp/crs/mideast/IN10474.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3029d5b20>, 'Connection to www.fas.org timed out. (connect timeout=60)'))


Processing URLs:  35%|███▌      | 353/1000 [17:30<33:37,  3.12s/it]  

Error extracting text from http://www.newsletter.co.uk/news/quitting-stormont-would-leave-us-in-a-worse-system-1-7049088#ixzz3qvXT2Mp6: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/quitting-stormont-would-leave-us-in-a-worse-system-1-7049088#ixzz3qvXT2Mp6


Processing URLs:  35%|███▌      | 354/1000 [17:31<27:23,  2.54s/it]

Error extracting text from http://www.payvand.com/news/15/dec/1134.html: 404 Client Error: Not Found for url: http://www.payvand.com/news/15/dec/1134.html


Processing URLs:  36%|███▌      | 356/1000 [17:31<14:26,  1.34s/it]

Error extracting text from http://www.wsj.com/articles/vietnam-seizes-chinese-ship-state-media-reports-1459659194: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/vietnam-seizes-chinese-ship-state-media-reports-1459659194
Error extracting text from http://www.basnews.com/index.php/en/news/iraq/300288: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/300288


Processing URLs:  36%|███▌      | 358/1000 [17:33<12:31,  1.17s/it]

Error extracting text from https://en.oxforddictionaries.com/definition/downgrade: 403 Client Error: Forbidden for url: https://languages.oup.com/


Processing URLs:  36%|███▌      | 359/1000 [17:34<11:31,  1.08s/it]

Error extracting text from https://www.newsweek.com/proud-boys-intended-kill-mike-pence-nancy-pelosi-fbi-witness-says-1562062: 403 Client Error: Forbidden for url: https://www.newsweek.com/proud-boys-intended-kill-mike-pence-nancy-pelosi-fbi-witness-says-1562062


Processing URLs:  36%|███▌      | 362/1000 [17:42<18:34,  1.75s/it]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2140244: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2140244


Processing URLs:  37%|███▋      | 366/1000 [17:47<11:44,  1.11s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/10/11/Putin-Russia-does-not-want-to-get-involved-in-inter-religious-war-in-Syria-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/10/11/Putin-Russia-does-not-want-to-get-involved-in-inter-religious-war-in-Syria-.html


Processing URLs:  37%|███▋      | 367/1000 [18:03<59:07,  5.60s/it]

Error extracting text from https://www.almasdarnews.com/article/syrian-army-amasses-11000-soldiers-aleppo-offensive/: 522 Server Error:  for url: https://www.almasdarnews.com/article/syrian-army-amasses-11000-soldiers-aleppo-offensive/


Processing URLs:  37%|███▋      | 368/1000 [18:03<43:04,  4.09s/it]

Error extracting text from http://thehill.com/policy/defense/263021-obama-to-get-isis-war-update-at-pentagon: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/263021-obama-to-get-isis-war-update-at-pentagon/


Processing URLs:  37%|███▋      | 370/1000 [18:05<25:17,  2.41s/it]

Error extracting text from http://afghanistantimes.af/running-parliamentary-district-council-elections-impossible-this-year/: 403 Client Error: Forbidden for url: https://afghanistantimes.af/running-parliamentary-district-council-elections-impossible-this-year/


Processing URLs:  37%|███▋      | 371/1000 [18:06<19:52,  1.90s/it]

Error extracting text from http://aranews.net/2016/08/iraqi-government-warns-kurdish-peshmerga-not-enter-mosul-barzani-promises-supportive-role/: 404 Client Error: Not Found for url: http://aranews.net/2016/08/iraqi-government-warns-kurdish-peshmerga-not-enter-mosul-barzani-promises-supportive-role/


Processing URLs:  37%|███▋      | 372/1000 [18:07<16:49,  1.61s/it]

Error extracting text from http://killedbypolice.net/: HTTPConnectionPool(host='killedbypolice.net', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5eab0>: Failed to resolve 'killedbypolice.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 375/1000 [18:07<07:59,  1.30it/s]

Error extracting text from http://www.sfgate.com/business/article/Lucid-Motors-unveils-new-electric-car-to-10796760.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/business/article/Lucid-Motors-unveils-new-electric-car-to-10796760.php
Error extracting text from http://www.nytimes.com/2016/04/22/us/politics/with-uncertainty-at-top-of-ticket-republicans-back-off-in-some-states.html?emc=edit_th_20160422&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/us/politics/with-uncertainty-at-top-of-ticket-republicans-back-off-in-some-states.html?emc=edit_th_20160422&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  38%|███▊      | 378/1000 [18:09<06:20,  1.63it/s]

Error extracting text from http://allafrica.com/stories/201709120043.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201709120043.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3013076b0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.straitstimes.com/world/middle-east/turkeys-erdogan-expects-parliament-to-restore-capital-punishment: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  38%|███▊      | 384/1000 [18:15<09:33,  1.07it/s]

Error extracting text from http://seekingalpha.com/article/1328821-these-3-defense-contractors-should-benefit-from-north-korean-saber-rattling: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/1328821-these-3-defense-contractors-should-benefit-from-north-korean-saber-rattling


Processing URLs:  39%|███▉      | 390/1000 [18:29<17:54,  1.76s/it]

Error extracting text from http://www.al-monitor.com/pulse/contents/afp/2015/12/syria-conflict-opposition-saudi-assad.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/contents/afp/2015/12/syria-conflict-opposition-saudi-assad.html


Processing URLs:  39%|███▉      | 394/1000 [18:34<13:57,  1.38s/it]

Error extracting text from http://www.realcleardefense.com/video/2016/08/29/is_the_iraqi_army_ready_to_liberate_mosul.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/video/2016/08/29/is_the_iraqi_army_ready_to_liberate_mosul.html


Processing URLs:  40%|███▉      | 396/1000 [18:36<11:03,  1.10s/it]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2015/05/28-syria-sanctions/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2015/05/28-syria-sanctions/


Processing URLs:  40%|███▉      | 397/1000 [18:37<10:31,  1.05s/it]

Error extracting text from https://reut.rs/3EWsTpH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/haitis-elections-postponed-after-electoral-council-dismissed-2021-09-28/


Processing URLs:  40%|████      | 400/1000 [18:43<15:52,  1.59s/it]

Error extracting text from https://www.reuters.com/world/europe/daughter-jailed-kremlin-critic-alexei-navalny-says-he-needs-doctor-2021-04-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/daughter-jailed-kremlin-critic-alexei-navalny-says-he-needs-doctor-2021-04-18/


Processing URLs:  40%|████      | 404/1000 [18:47<11:55,  1.20s/it]

URL filtered: https://www.youtube.com/watch?v=niJtpcgtUQs


Processing URLs:  41%|████      | 408/1000 [18:52<11:56,  1.21s/it]

Error extracting text from http://www.wsj.com/articles/china-russia-plan-naval-drills-in-south-china-sea-1469707620: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-russia-plan-naval-drills-in-south-china-sea-1469707620


Processing URLs:  41%|████      | 411/1000 [18:57<14:08,  1.44s/it]

Error extracting text from https://globalguessing.com/metaculus-mondays-vol10/#2022-olympic-boycott: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /metaculus-mondays-vol10/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  41%|████▏     | 413/1000 [18:59<11:30,  1.18s/it]

Error extracting text from https://www.dogpile.com/serp?qc=images&amp;q=royal+albert+hall+full: 403 Client Error: Forbidden for url: https://www.dogpile.com/captcha?url=https%3A%2F%2Fwww.dogpile.com%2Fserp%3Fqc%3Dimages%26amp%3Bq%3Droyal%2Balbert%2Bhall%2Bfull


Processing URLs:  42%|████▏     | 415/1000 [19:03<13:41,  1.40s/it]

Error extracting text from http://www.reuters.com/article/us-iran-opec-oil-idUSKBN1820I1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-opec-oil-idUSKBN1820I1
URL filtered: https://twitter.com/USAmbUN/status/1377388955860140034


Processing URLs:  42%|████▏     | 417/1000 [19:04<10:56,  1.13s/it]

Error extracting text from https://uk.reuters.com/article/uk-usa-russia-diplomacy-idUKKCN1BB2I5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  42%|████▏     | 421/1000 [19:19<37:41,  3.91s/it]

Error extracting text from https://www.washingtonpost.com/business/question-mark-for-nov-jobs-report-did-pay-growth-continue/2015/12/04/062a799e-9a45-11e5-aca6-1ae3be6f06d2_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/business/question-mark-for-nov-jobs-report-did-pay-growth-continue/2015/12/04/062a799e-9a45-11e5-aca6-1ae3be6f06d2_story.html


Processing URLs:  42%|████▎     | 425/1000 [19:29<20:32,  2.14s/it]

Error extracting text from http://www.nytimes.com/2015/10/10/us/politics/donald-trump-presidential-race.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/10/us/politics/donald-trump-presidential-race.html
Error extracting text from https://uk.news.yahoo.com/japan-factory-output-shrinks-again-024156979.html#TTnhxa5: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/japan-factory-output-shrinks-again-024156979.html#TTnhxa5


Processing URLs:  43%|████▎     | 426/1000 [19:31<20:33,  2.15s/it]

Error extracting text from https://www.yardeni.com/pub/atatruck.pdf: 403 Client Error: Forbidden for url: https://yardeni.com/our-charts/


Processing URLs:  43%|████▎     | 428/1000 [19:36<21:47,  2.28s/it]

Error extracting text from http://www.institutionalinvestor.com/article/3589012/banking-and-capital-markets-emerging-markets/how-prince-mohammed-aims-to-wean-saudi-arabia-off-of-oil.html#.WATkw8Ka3mQ: 404 Client Error: Not Found for url: https://www.institutionalinvestor.com/article/3589012/banking-and-capital-markets-emerging-markets/how-prince-mohammed-aims-to-wean-saudi-arabia-off-of-oil.html#.WATkw8Ka3mQ


Processing URLs:  43%|████▎     | 431/1000 [19:41<16:12,  1.71s/it]

Error extracting text from https://www.nytimes.com/2017/06/19/world/middleeast/russia-syria.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/19/world/middleeast/russia-syria.html
Error extracting text from http://www.nytimes.com/2016/09/19/world/middleeast/syria-civil-war-bashar-al-assad-refugees-islamic-state.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/19/world/middleeast/syria-civil-war-bashar-al-assad-refugees-islamic-state.html


Processing URLs:  43%|████▎     | 433/1000 [19:42<09:42,  1.03s/it]

Error extracting text from http://www.imdb.com/title/tt0062153/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0062153/


Processing URLs:  44%|████▎     | 435/1000 [19:48<17:03,  1.81s/it]

Error extracting text from https://af.reuters.com/article/topNews/idAFKBN1AJ1LR-OZATP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  44%|████▎     | 436/1000 [19:48<12:55,  1.38s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2845.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2845.htm


Processing URLs:  44%|████▍     | 440/1000 [19:53<12:29,  1.34s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XF2C3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XF2C3


Processing URLs:  44%|████▍     | 444/1000 [20:01<15:41,  1.69s/it]

Error extracting text from http://e.cfr.org: HTTPConnectionPool(host='e.cfr.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5e180>: Failed to resolve 'e.cfr.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  45%|████▍     | 446/1000 [20:02<10:52,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-montenegro-election-idUSKCN12P2HY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-montenegro-election-idUSKCN12P2HY?il=0


Processing URLs:  45%|████▍     | 448/1000 [20:06<12:24,  1.35s/it]

Error extracting text from http://www.isro.gov.in/technology-development-programmes/reusable-launch-vehicle-technology-demonstration-program-rlv-td: 404 Client Error: Not Found for url: https://www.isro.gov.in/technology-development-programmes/reusable-launch-vehicle-technology-demonstration-program-rlv-td


Processing URLs:  45%|████▌     | 453/1000 [20:38<57:06,  6.26s/it]

Error extracting text from http://www.investopedia.com/terms/q/quarter.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/q/quarter.asp


Processing URLs:  46%|████▌     | 456/1000 [20:43<29:35,  3.26s/it]

Error extracting text from https://www.nasdaq.com/articles/blockchain-bites%3A-mastercard-bny-mellon-embrace-crypto-amazon-floats-digital-currency: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/blockchain-bites%3A-mastercard-bny-mellon-embrace-crypto-amazon-floats-digital-currency


Processing URLs:  46%|████▌     | 457/1000 [20:45<26:05,  2.88s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-may-19-2016: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-may-19-2016


Processing URLs:  46%|████▌     | 458/1000 [20:46<20:24,  2.26s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Putin-likely-to-visit-Japan-at-year-s-end-report: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Putin-likely-to-visit-Japan-at-year-s-end-report


Processing URLs:  46%|████▌     | 460/1000 [20:49<16:30,  1.83s/it]

Error extracting text from http://www.brecorder.com/world/north-america/279815-un-security-council-to-meet-on-syria-crisis-at-2000-gmt-diplomats.html: 404 Client Error: Not Found for url: https://www.brecorder.com/world/north-america/279815-un-security-council-to-meet-on-syria-crisis-at-2000-gmt-diplomats.html


Processing URLs:  47%|████▋     | 467/1000 [21:00<14:46,  1.66s/it]

URL filtered: https://www.youtube.com/watch?v=2dxQW4qh5Oc


Processing URLs:  47%|████▋     | 470/1000 [21:00<06:46,  1.30it/s]

Error extracting text from http://www.wsj.com/articles/time-inc-shuts-down-all-you-first-print-magazine-closure-under-ceo-joe-ripp-1445276923: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-shuts-down-all-you-first-print-magazine-closure-under-ceo-joe-ripp-1445276923
Error extracting text from http://www.wsj.com/articles/feeding-greeces-tax-addiction-is-starving-its-economy-1470336614: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feeding-greeces-tax-addiction-is-starving-its-economy-1470336614


Processing URLs:  47%|████▋     | 473/1000 [21:06<11:03,  1.26s/it]

Error extracting text from http://www.state.gov/j/inl/rls/nrcrpt/2016/vol1/253257.htm: 404 Client Error: Not Found for url: https://www.state.gov/j/inl/rls/nrcrpt/2016/vol1/253257.htm


Processing URLs:  48%|████▊     | 476/1000 [21:09<11:09,  1.28s/it]



Processing URLs:  48%|████▊     | 478/1000 [22:10<1:55:40, 13.30s/it]

Error extracting text from http://www.carlisle.army.mil/banner/article.cfm?id=54450: HTTPConnectionPool(host='www.carlisle.army.mil', port=80): Max retries exceeded with url: /banner/article.cfm?id=54450 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3013079e0>, 'Connection to www.carlisle.army.mil timed out. (connect timeout=60)'))
Error extracting text from https://www.hindustantimes.com/world-news/not-aware-china-on-reports-of-its-soldiers-detention-in-arunachal-pradesh-101633691888935-amp.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/not-aware-china-on-reports-of-its-soldiers-detention-in-arunachal-pradesh-101633691888935-amp.html


Processing URLs:  48%|████▊     | 481/1000 [22:13<44:36,  5.16s/it]  

Error extracting text from https://www.americanpress.com/news/informer/the-informer-billions-of-dollars-at-stake-with-olympics/article_df38ef38-d1ba-11eb-9b53-8b0d43a1eb4e.html: 404 Client Error: Not Found for url: https://www.americanpress.com/news/informer/the-informer-billions-of-dollars-at-stake-with-olympics/article_df38ef38-d1ba-11eb-9b53-8b0d43a1eb4e.html
Error extracting text from http://www.nato.int/cps/en/natohq/news_125370.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_125370.htm


Processing URLs:  48%|████▊     | 484/1000 [22:17<20:39,  2.40s/it]

Error extracting text from http://www.wsj.com/articles/house-votes-to-lift-oil-export-ban-1444411778: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-votes-to-lift-oil-export-ban-1444411778


Processing URLs:  49%|████▊     | 486/1000 [23:19<2:49:30, 19.79s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2020-07-13/elon-musk-s-tesla-won-t-ride-the-big-tech-bubble-forever


Processing URLs:  49%|████▉     | 488/1000 [23:21<1:34:00, 11.02s/it]

Error extracting text from http://thecipherbrief.com/article/caliphate-crime: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/caliphate-crime


Processing URLs:  49%|████▉     | 490/1000 [23:24<57:28,  6.76s/it]  

Error extracting text from https://www.trucknews.com/business-management/u-s-truck-tonnage-shrinks-in-april-but-outlook-is-strong/1003151179/: 403 Client Error: Forbidden for url: https://www.trucknews.com/business-management/u-s-truck-tonnage-shrinks-in-april-but-outlook-is-strong/1003151179/


Processing URLs:  49%|████▉     | 491/1000 [23:25<43:01,  5.07s/it]

Error extracting text from http://www.nzherald.co.nz/business/news/article.cfm?c_id=3&amp;objectid=11728702: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/business/news/article.cfm?c_id=3&amp;objectid=11728702


Processing URLs:  49%|████▉     | 493/1000 [23:27<26:24,  3.12s/it]

Error extracting text from https://www.rigzone.com/news/usa_eia_releases_new_oil_price_forecast-09-sep-2021-166398-article/: 403 Client Error: Forbidden for url: https://www.rigzone.com/news/usa_eia_releases_new_oil_price_forecast-09-sep-2021-166398-article/


Processing URLs:  50%|████▉     | 495/1000 [23:40<36:43,  4.36s/it]

Error extracting text from https://news.antiwar.com/2021/08/11/report-us-considering-possible-evacuation-kabul-embassy/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2021/08/11/report-us-considering-possible-evacuation-kabul-embassy/


Processing URLs:  50%|████▉     | 499/1000 [23:50<23:21,  2.80s/it]

Error extracting text from http://www.reuters.com/article/2015/07/14/china-ipofreeze-idUSL4N0ZT4CP20150714: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/07/14/china-ipofreeze-idUSL4N0ZT4CP20150714


Processing URLs:  50%|█████     | 502/1000 [23:55<14:15,  1.72s/it]

Error extracting text from https://www.scotsman.com/health/coronavirus/scottish-voters-rewarding-nicola-sturgeons-handling-pandemic-covid-19-sends-votes-snp-3095664: 403 Client Error: Forbidden for url: https://www.scotsman.com/health/coronavirus/scottish-voters-rewarding-nicola-sturgeons-handling-pandemic-covid-19-sends-votes-snp-3095664


Processing URLs:  51%|█████     | 506/1000 [24:05<15:05,  1.83s/it]

Error extracting text from http://www.chem.info/news/2015/12/opec-likely-wont-move-boost-oil-price-amid-infighting: 500 Server Error: Internal Server Error for url: http://www.chem.info/news/2015/12/opec-likely-wont-move-boost-oil-price-amid-infighting


Processing URLs:  51%|█████     | 508/1000 [24:07<10:44,  1.31s/it]

Error extracting text from http://www.nytimes.com/reuters/2015/10/12/business/12reuters-opec-oil-kuwait.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2015/10/12/business/12reuters-opec-oil-kuwait.html


Processing URLs:  51%|█████     | 512/1000 [24:12<09:41,  1.19s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/polls/258081-poll-carson-opens-up-14-point-lead-over-trump-in-iowa: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/polls/258081-poll-carson-opens-up-14-point-lead-over-trump-in-iowa/
URL filtered: https://www.youtube.com/watch?v=tzM5Ar55KPE


Processing URLs:  52%|█████▏    | 515/1000 [24:14<07:13,  1.12it/s]

Error extracting text from https://chinapower.csis.org/maritime-forces-destabilizing-asia/: 403 Client Error: Forbidden for url: https://chinapower.csis.org/maritime-forces-destabilizing-asia/


Processing URLs:  52%|█████▏    | 516/1000 [24:14<06:28,  1.25it/s]

Error extracting text from http://thehill.com/policy/national-security/349858-mueller-seeks-to-interview-spicer-priebus-others-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/349858-mueller-seeks-to-interview-spicer-priebus-others-report/


Processing URLs:  52%|█████▏    | 517/1000 [24:15<05:15,  1.53it/s]

Error extracting text from https://www.nytimes.com/2017/07/14/world/asia/isis-afghanistan-leader-abu-sayed.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/14/world/asia/isis-afghanistan-leader-abu-sayed.html


Processing URLs:  52%|█████▏    | 518/1000 [24:16<05:49,  1.38it/s]

Error extracting text from https://www.reuters.com/article/us-venezuela-politics/venezuelan-opposition-pins-hopes-on-elections-as-protests-falter-idUSKCN1BG25K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics/venezuelan-opposition-pins-hopes-on-elections-as-protests-falter-idUSKCN1BG25K
URL filtered: http://www.bloomberg.com/news/articles/2015-09-01/iran-s-oil-market-ambitions-may-have-already-killed-opec-talks


Processing URLs:  52%|█████▏    | 524/1000 [24:17<02:51,  2.78it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-03-01/sugar-cane-fuel-wins-in-brazil-as-cheap-ethanol-beats-gasoline
URL filtered: https://twitter.com/hashtag/iranelection
Error extracting text from https://www.predictit.org/markets/detail/7164/Will-the-Senate-end-filibuster-on-any-bill-with-less-than-3-5-support-in-2021).: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/7164/Will-the-Senate-end-filibuster-on-any-bill-with-less-than-3-5-support-in-2021).


Processing URLs:  53%|█████▎    | 529/1000 [24:23<07:44,  1.01it/s]

Error extracting text from https://theconversation.com/two-governments-claim-to-run-myanmar-so-who-gets-the-countrys-seat-at-the-un-167885: 403 Client Error: Forbidden for url: https://theconversation.com/two-governments-claim-to-run-myanmar-so-who-gets-the-countrys-seat-at-the-un-167885


Processing URLs:  53%|█████▎    | 530/1000 [24:26<11:26,  1.46s/it]

Error extracting text from https://reut.rs/2KxNYPm: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  53%|█████▎    | 531/1000 [24:27<09:24,  1.20s/it]

Error extracting text from http://thehill.com/policy/healthcare/322864-right-revolts-on-obamacare-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/healthcare/322864-right-revolts-on-obamacare-bill/


Processing URLs:  54%|█████▎    | 537/1000 [24:45<12:26,  1.61s/it]

Error extracting text from https://www.yahoo.com/news/anti-fighters-syria-dig-prevent-surprises-194458914.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/anti-fighters-syria-dig-prevent-surprises-194458914.html


Processing URLs:  54%|█████▍    | 538/1000 [24:45<09:19,  1.21s/it]

Error extracting text from https://www.nytimes.com/2017/03/12/us/politics/trump-loosen-counterterrorism-rules.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/12/us/politics/trump-loosen-counterterrorism-rules.html
URL filtered: http://www.bloomberg.com/news/articles/2015-09-25/post-boehner-december-shutdown-more-likely-ex-im-s-chances-dim


Processing URLs:  54%|█████▍    | 543/1000 [24:52<08:21,  1.10s/it]

Error extracting text from https://www.wsj.com/articles/theresa-may-pours-cold-water-on-second-scottish-referendum-before-brexit-1489675704: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/theresa-may-pours-cold-water-on-second-scottish-referendum-before-brexit-1489675704
Error extracting text from https://www.hindustantimes.com/world-news/pla-modernises-xinjiang-s-military-units-in-reaction-to-india-china-lac-row-101621231048385.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/pla-modernises-xinjiang-s-military-units-in-reaction-to-india-china-lac-row-101621231048385.html


Processing URLs:  54%|█████▍    | 544/1000 [24:53<08:01,  1.06s/it]

Error extracting text from https://www.confidencial.com.ni/politica/opositores-lamentan-que-regimen-desperdicia-una-salida-a-la-crisis-sociopolitica/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/politica/opositores-lamentan-que-regimen-desperdicia-una-salida-a-la-crisis-sociopolitica/


Processing URLs:  55%|█████▍    | 549/1000 [25:03<13:03,  1.74s/it]

Error extracting text from http://www.mfa.gov.pl/en/news/mfa_statement_on_the_polish_government_s_response_to_commission_recommendation_of_27_07_2016: HTTPConnectionPool(host='www.mfa.gov.pl', port=80): Max retries exceeded with url: /en/news/mfa_statement_on_the_polish_government_s_response_to_commission_recommendation_of_27_07_2016 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb5d00>: Failed to resolve 'www.mfa.gov.pl' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▌    | 551/1000 [25:04<08:28,  1.13s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/0001850391/000119312521077493/d122075ds1.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/0001850391/000119312521077493/d122075ds1.htm


Processing URLs:  55%|█████▌    | 554/1000 [25:11<12:22,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-britain-idUSKCN0VF09X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-britain-idUSKCN0VF09X


Processing URLs:  56%|█████▌    | 556/1000 [25:12<08:03,  1.09s/it]

Error extracting text from https://www.kickstarter.com/: 403 Client Error: Forbidden for url: https://www.kickstarter.com/


Processing URLs:  56%|█████▌    | 557/1000 [25:13<08:15,  1.12s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/turkey-extends-mandate-for-troops-in-iraq-syria/3172960.html?cx_tag=morestories4ucna&amp;cid=tg:recos:morestories4ucna:standard#cxrecs_s: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/turkey-extends-mandate-for-troops-in-iraq-syria/3172960.html?cx_tag=morestories4ucna&amp;cid=tg:recos:morestories4ucna:standard#cxrecs_s


Processing URLs:  56%|█████▌    | 558/1000 [25:14<07:00,  1.05it/s]

Error extracting text from http://thehill.com/blogs/pundits-blog/defense/322479-how-us-should-respond-to-russias-missile-treaty-violation: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/defense/322479-how-us-should-respond-to-russias-missile-treaty-violation/


Processing URLs:  56%|█████▌    | 559/1000 [25:15<07:37,  1.04s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20170224/1405205543.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20170224/1405205543.html


Processing URLs:  56%|█████▋    | 563/1000 [25:20<07:45,  1.07s/it]

URL filtered: https://www.instagram.com/syrianpresidency/


Processing URLs:  57%|█████▋    | 568/1000 [25:23<03:58,  1.81it/s]

Error extracting text from http://www.reuters.com/article/us-poland-russia-nato-idUSKCN0XG0UB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-russia-nato-idUSKCN0XG0UB
Error extracting text from http://www.reuters.com/article/2015/11/30/brazil-corruption-cunha-idUSE6N12L00Q20151130: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/brazil-corruption-cunha-idUSE6N12L00Q20151130


Processing URLs:  57%|█████▋    | 571/1000 [25:26<06:47,  1.05it/s]

URL filtered: https://twitter.com/chikabenjamin27


Processing URLs:  58%|█████▊    | 578/1000 [25:40<11:34,  1.64s/it]

Error extracting text from http://pakobserver.net/2016/05/20/will-nawaz-sail-through-the-ongoing-crisis/: 403 Client Error: Forbidden for url: http://pakobserver.net/2016/05/20/will-nawaz-sail-through-the-ongoing-crisis/


Processing URLs:  58%|█████▊    | 580/1000 [25:40<06:29,  1.08it/s]

Error extracting text from https://news.pn/en/RussiaInvadedUkraine/121719: 403 Client Error: Forbidden for url: https://news.pn/en/RussiaInvadedUkraine/121719


Processing URLs:  58%|█████▊    | 581/1000 [25:41<05:56,  1.18it/s]

Error extracting text from https://reut.rs/3nIu7N2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/australian-pm-meets-with-former-govt-employee-who-alleges-rape-parliament-house-2021-04-30/


Processing URLs:  58%|█████▊    | 583/1000 [25:42<05:03,  1.37it/s]

Error extracting text from http://pure.ltu.se/portal/files/102425643/Vol_5_3_8.pdf: HTTPConnectionPool(host='pure.ltu.se', port=80): Max retries exceeded with url: /portal/files/102425643/Vol_5_3_8.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3013046e0>: Failed to resolve 'pure.ltu.se' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 584/1000 [25:44<07:43,  1.11s/it]

Error extracting text from http://www.freemalaysiatoday.com/category/business/2013/03/18/gold-as-a-currency-in-islamic-finance/: 404 Client Error: Not Found for url: https://www.freemalaysiatoday.com/category/business/2013/03/18/gold-as-a-currency-in-islamic-finance/


Processing URLs:  59%|█████▉    | 589/1000 [25:52<07:37,  1.11s/it]

Error extracting text from http://www.nytimes.com/2016/01/07/opinion/campaign-stops/how-donald-trump-loses.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/07/opinion/campaign-stops/how-donald-trump-loses.html
Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1C30R3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1C30R3


Processing URLs:  59%|█████▉    | 591/1000 [25:57<12:33,  1.84s/it]

URL filtered: http://www.reuters.com/article/2015/10/29/us-southchinasea-usa-china-idUSKCN0SM2ER20151029?feedType=RSS&amp;feedName=topNews&amp;utm_source=twitter


Processing URLs:  60%|█████▉    | 595/1000 [26:07<13:49,  2.05s/it]

Error extracting text from http://www.nytimes.com/2015/12/23/world/europe/election-results-in-spain-are-a-stinging-end-to-europes-year.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/23/world/europe/election-results-in-spain-are-a-stinging-end-to-europes-year.html?_r=0


Processing URLs:  60%|█████▉    | 597/1000 [26:09<11:30,  1.71s/it]

Error extracting text from http://aftindia.org/wp-content/uploads/2016/02/NAM_2016_Special_301_Comments.pdf: 404 Client Error: Not Found for url: https://aftindia.org/wp-content/uploads/2016/02/NAM_2016_Special_301_Comments.pdf


Processing URLs:  60%|██████    | 600/1000 [26:11<06:41,  1.00s/it]

URL filtered: http://carnegieendowment.org/2015/12/20/sectarian-twitter-wars-sunni-shia-conflict-and-cooperation-in-digital-age/in6n
Error extracting text from http://www.reuters.com/article/2015/12/02/us-markets-stocks-idUSKBN0TL18X20151202#qERICodoUBkeMyTI.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/02/us-markets-stocks-idUSKBN0TL18X20151202#qERICodoUBkeMyTI.97


Processing URLs:  60%|██████    | 602/1000 [26:13<06:21,  1.04it/s]

Error extracting text from https://medium.com/@elias.brockman/britains-divorce-bill-when-will-they-pay-1bc800ce970c: 403 Client Error: Forbidden for url: https://medium.com/@elias.brockman/britains-divorce-bill-when-will-they-pay-1bc800ce970c


Processing URLs:  60%|██████    | 604/1000 [26:16<07:49,  1.19s/it]

Error extracting text from http://www.ibtimes.co.uk/tories-tell-cameron-name-date-when-he-going-go-anger-over-eu-referendum-hits-breaking-point-1561615: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/tories-tell-cameron-name-date-when-he-going-go-anger-over-eu-referendum-hits-breaking-point-1561615


Processing URLs:  61%|██████    | 606/1000 [27:18<1:59:24, 18.18s/it]

Error extracting text from http://www.cmegroup.com/trading/interest-rates/fed-funds.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  61%|██████    | 610/1000 [27:24<37:42,  5.80s/it]  

Error extracting text from http://www.elpais.cr/2015/12/08/argumentos-basados-en-la-resolucion-2249-del-consejo-de-seguridad-en-el-reciente-informe-del-primer-ministro-britanico-sobre-siria-una-necesaria-clarificacion/: 404 Client Error: Not Found for url: https://www.elpais.cr/2015/12/08/argumentos-basados-en-la-resolucion-2249-del-consejo-de-seguridad-en-el-reciente-informe-del-primer-ministro-britanico-sobre-siria-una-necesaria-clarificacion/


Processing URLs:  61%|██████    | 612/1000 [27:27<23:19,  3.61s/it]

Error extracting text from http://www.isie.tn/communiques/2014/12/08/decision-de-linstance-superieure-independante-pour-les-elections-relatives-la-proclamation-des-resultats-definitifs-du-premier-tour-des-elections-presidentielles-2014/: HTTPSConnectionPool(host='www.isie.tn', port=443): Max retries exceeded with url: /communiques/2014/12/08/decision-de-linstance-superieure-independante-pour-les-elections-relatives-la-proclamation-des-resultats-definitifs-du-premier-tour-des-elections-presidentielles-2014/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  62%|██████▏   | 616/1000 [27:33<12:23,  1.94s/it]

Error extracting text from http://www2.politicalbetting.com/index.php/archives/2016/03/28/the-eu-referendum-a-battle-between-the-social-classes/: 404 Client Error: Not Found for url: http://www2.politicalbetting.com/index.php/archives/2016/03/28/the-eu-referendum-a-battle-between-the-social-classes/


Processing URLs:  62%|██████▏   | 620/1000 [27:38<08:26,  1.33s/it]

URL filtered: https://www.youtube.com/watch?v=lQwJQkEh2QY


Processing URLs:  62%|██████▏   | 624/1000 [27:40<04:54,  1.28it/s]

Error extracting text from http://www.nytimes.com/2016/07/29/us/politics/donald-trump-russia-obama-putin.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/29/us/politics/donald-trump-russia-obama-putin.html


Processing URLs:  63%|██████▎   | 628/1000 [27:57<15:07,  2.44s/it]

Error extracting text from http://www.businessinsider.com.au/stocks-in-tokyo-are-under-pressure-after-the-bank-of-japan-surprised-markets-with-an-easing-today-2015-12: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/stocks-in-tokyo-are-under-pressure-after-the-bank-of-japan-surprised-markets-with-an-easing-today-2015-12


Processing URLs:  63%|██████▎   | 629/1000 [27:58<14:03,  2.27s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-14/saudis-tell-opec-they-eased-cuts-by-pumping-10-million-barrels


Processing URLs:  64%|██████▎   | 637/1000 [28:06<06:07,  1.01s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/myanmar-junta-seeks-international-cooperation-over-covid-19-crisis-2021-07-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/myanmar-junta-seeks-international-cooperation-over-covid-19-crisis-2021-07-28/


Processing URLs:  64%|██████▍   | 639/1000 [28:10<08:55,  1.48s/it]

Error extracting text from http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/15/opinion/david-brooks-the-biden-formation-story.html?_r=0


Processing URLs:  64%|██████▍   | 643/1000 [28:19<09:52,  1.66s/it]

Error extracting text from https://www.wsj.com/articles/defiant-governor-in-northern-afghanistan-tests-president-ashraf-ghani-1516050446: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/defiant-governor-in-northern-afghanistan-tests-president-ashraf-ghani-1516050446


Processing URLs:  65%|██████▌   | 652/1000 [29:06<12:10,  2.10s/it]  

Error extracting text from http://mobile.nytimes.com/2016/03/09/world/asia/south-china-sea-militarization.html?referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/09/world/asia/south-china-sea-militarization.html?referer=https://www.google.com/


Processing URLs:  65%|██████▌   | 654/1000 [29:08<08:38,  1.50s/it]

Error extracting text from https://www.wsj.com/articles/covid-19-variant-in-brazil-overwhelms-local-hospitals-hits-younger-patients-11614705337: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/covid-19-variant-in-brazil-overwhelms-local-hospitals-hits-younger-patients-11614705337


Processing URLs:  66%|██████▌   | 656/1000 [29:09<04:52,  1.18it/s]

Error extracting text from https://www.publications.parliament.uk/pa/ld201617/ldselect/ldeucom/125/12507.htm: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/ld201617/ldselect/ldeucom/125/12507.htm
Error extracting text from https://www.neweurope.eu/article/polands-continues-quest-to-delay-the-start-of-nord-stream-2/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/polands-continues-quest-to-delay-the-start-of-nord-stream-2/


Processing URLs:  66%|██████▌   | 657/1000 [29:09<03:47,  1.51it/s]

Error extracting text from https://www.nytimes.com/2016/02/16/world/asia/afghanistan-opium-heroin-taliban-helmand.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/02/16/world/asia/afghanistan-opium-heroin-taliban-helmand.html


Processing URLs:  66%|██████▌   | 661/1000 [29:13<04:53,  1.16it/s]

Error extracting text from http://www.sigmalive.com/en/news/greece/138837/tsipras-could-have-been-braver-in-eu-negotiations-on-greece: 403 Client Error: Forbidden for url: https://www.sigmalive.com/en/news/greece/138837/tsipras-could-have-been-braver-in-eu-negotiations-on-greece


Processing URLs:  66%|██████▋   | 663/1000 [29:16<07:20,  1.31s/it]

Error extracting text from http://www.mti.gov.eg/English/Pages/default.aspx: 403 Client Error: Forbidden for url: http://www.mti.gov.eg/English/Pages/default.aspx


Processing URLs:  66%|██████▋   | 665/1000 [29:19<06:57,  1.25s/it]

Error extracting text from https://seekingalpha.com/symbol/CO1:COM: 403 Client Error: Forbidden for url: https://seekingalpha.com/symbol/CO1:COM


Processing URLs:  68%|██████▊   | 676/1000 [29:45<18:59,  3.52s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/services-paralyzed-greeks-strike-pension-reform-36705841: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/services-paralyzed-greeks-strike-pension-reform-36705841


Processing URLs:  68%|██████▊   | 679/1000 [29:48<09:38,  1.80s/it]

Error extracting text from http://uk.reuters.com/article/iaea-report-on-irans-compliance-with-nuc-idUKL8N14Z40Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from https://www.reuters.com/article/turkey-crypto-currency-idUSL1N2M90AO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/turkey-crypto-currency-idUSL1N2M90AO


Processing URLs:  68%|██████▊   | 683/1000 [29:51<05:53,  1.11s/it]

Error extracting text from http://www.imdb.com/title/tt6679360/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt6679360/


Processing URLs:  69%|██████▊   | 686/1000 [29:58<08:52,  1.70s/it]

Error extracting text from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.281.7793&rep=rep1&type=pdf: 401 Client Error: Unauthorized for url: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.281.7793&rep=rep1&type=pdf


Processing URLs:  69%|██████▊   | 687/1000 [30:00<09:06,  1.75s/it]

Error extracting text from http://blogs.spectator.co.uk/2017/02/nicola-sturgeons-neverendum-hammering-scottish-economy/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2017/02/nicola-sturgeons-neverendum-hammering-scottish-economy/


Processing URLs:  69%|██████▉   | 688/1000 [31:00<1:37:52, 18.82s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-07-07/us-iran-violence-poised-to-escalate-after-new-strikes-on-american-bases-in-iraq: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  69%|██████▉   | 692/1000 [31:03<32:15,  6.29s/it]  

Error extracting text from http://comment-news.com/source/www.nytimes.com/2018/01/19/world/europe/angela-merkel-germany-spd.html/: 404 Client Error: Not Found for url: https://comment-news.com/source/www.nytimes.com/2018/01/19/world/europe/angela-merkel-germany-spd.html/


Processing URLs:  70%|██████▉   | 697/1000 [31:12<10:55,  2.16s/it]

Error extracting text from https://www.iranhumanrights.org/2015/12/bill-to-end-death-penalty-for-drug-crimes/: 403 Client Error: Forbidden for url: https://www.iranhumanrights.org/2015/12/bill-to-end-death-penalty-for-drug-crimes/


Processing URLs:  70%|██████▉   | 698/1000 [31:13<07:52,  1.56s/it]

Error extracting text from http://www.reuters.com/article/2015/10/09/us-usa-oilexports-house-idUSKCN0S31TG20151009#KbiqgBfbfIboXdyd.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/09/us-usa-oilexports-house-idUSKCN0S31TG20151009#KbiqgBfbfIboXdyd.97


Processing URLs:  70%|███████   | 701/1000 [31:13<03:31,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0XG1W4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0XG1W4
Error extracting text from https://www.oddschecker.com/us/soccer/uefa-champions-league: 403 Client Error: Forbidden for url: https://www.oddschecker.com/us/soccer/uefa-champions-league


Processing URLs:  70%|███████   | 703/1000 [31:16<04:58,  1.01s/it]

Error extracting text from http://www.tradingeconomics.com/united-states/gdp-growth: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/gdp-growth


Processing URLs:  72%|███████▏  | 715/1000 [31:45<07:10,  1.51s/it]

Error extracting text from https://www.al-monitor.com/pulse/afp/2017/10/israel-palestinians-conflict-politics-gaza-abbas.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/afp/2017/10/israel-palestinians-conflict-politics-gaza-abbas.html


Processing URLs:  72%|███████▏  | 717/1000 [31:46<04:41,  1.01it/s]

Error extracting text from https://www.france24.com/en/live-news/20210706-scenarios-for-afghanistan-with-foreign-troops-all-but-gone: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210706-scenarios-for-afghanistan-with-foreign-troops-all-but-gone
Error extracting text from http://www.reuters.com/article/us-iran-missiles-un-idUSKCN0WG1NG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-un-idUSKCN0WG1NG


Processing URLs:  72%|███████▏  | 719/1000 [31:46<02:45,  1.69it/s]

Error extracting text from http://www.nytimes.com/2016/06/05/world/middleeast/iraqi-army-seen-as-ill-equipped-to-retake-mosul-from-isis-despite-us-aid.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/05/world/middleeast/iraqi-army-seen-as-ill-equipped-to-retake-mosul-from-isis-despite-us-aid.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  72%|███████▏  | 720/1000 [31:47<02:19,  2.01it/s]

Error extracting text from https://www.nytimes.com/live/2021/04/29/business/stock-market-today: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/04/29/business/stock-market-today


Processing URLs:  72%|███████▏  | 722/1000 [31:49<03:29,  1.32it/s]

Error extracting text from https://www.wsj.com/articles/north-korea-launches-suspected-ballistic-missile-off-its-east-coast-south-korea-says-11643502959: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-launches-suspected-ballistic-missile-off-its-east-coast-south-korea-says-11643502959


Processing URLs:  72%|███████▏  | 724/1000 [31:51<03:45,  1.23it/s]

Error extracting text from http://www.realclearpolitics.com/articles/2016/06/09/obama_endorses_clinton_sanders_continues_campaign_130837.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2016/06/09/obama_endorses_clinton_sanders_continues_campaign_130837.html


Processing URLs:  73%|███████▎  | 730/1000 [31:59<06:18,  1.40s/it]

Error extracting text from http://abcnews.go.com/Business/wireStory/feds-seek-autopilot-data-tesla-crash-probe-40515954: 404 Client Error: Not Found for url: https://abcnews.go.com/Business/wireStory/feds-seek-autopilot-data-tesla-crash-probe-40515954


Processing URLs:  73%|███████▎  | 732/1000 [32:02<05:50,  1.31s/it]

Error extracting text from https://www.reuters.com/business/environment/magnitude-7-quake-strikes-western-haiti-usgs-2021-08-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/environment/magnitude-7-quake-strikes-western-haiti-usgs-2021-08-14/


Processing URLs:  73%|███████▎  | 734/1000 [32:04<05:31,  1.25s/it]

Error extracting text from http://www.fairewinds.org/nuclear-energy-education//anticipating-the-unthinkable?rq=indian%20point: 404 Client Error: Not Found for url: https://www.fairewinds.org/nuclear-energy-education//anticipating-the-unthinkable?rq=indian%20point


Processing URLs:  74%|███████▎  | 737/1000 [32:09<06:35,  1.50s/it]

Error extracting text from http://breakingnews.sy/en/article/7771.html?m=0: 404 Client Error: Not Found for url: http://breakingnews.sy/en/article/7771.html?m=0


Processing URLs:  74%|███████▍  | 738/1000 [32:10<05:50,  1.34s/it]

Error extracting text from https://www.reuters.com/world/europe/jailed-kremlin-critic-navalny-growing-risk-kidney-failure-medics-union-2021-04-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/jailed-kremlin-critic-navalny-growing-risk-kidney-failure-medics-union-2021-04-17/


Processing URLs:  74%|███████▍  | 742/1000 [32:12<03:39,  1.18it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html


Processing URLs:  74%|███████▍  | 745/1000 [32:16<05:01,  1.18s/it]

Error extracting text from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.644.5605&amp;rep=rep1&amp;type=pdf: 401 Client Error: Unauthorized for url: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.644.5605&amp;rep=rep1&amp;type=pdf


Processing URLs:  75%|███████▍  | 746/1000 [32:19<07:08,  1.69s/it]

URL filtered: https://twitter.com/BorisJohnson/status/1416764592043315204


Processing URLs:  75%|███████▍  | 748/1000 [32:20<04:36,  1.10s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-samuelson-trump-impeach-kelly-perspec-0804-jm-20170803-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-samuelson-trump-impeach-kelly-perspec-0804-jm-20170803-story.html


Processing URLs:  75%|███████▌  | 752/1000 [32:27<05:30,  1.33s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=54682: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=54682
Error extracting text from https://news.usni.org/2017/09/25/navy-marine-corps-providing-around-clock-hurricane-maria-relief?utm_source=USNI+News&amp;utm_campaign=9f6a4b1075-USNI_NEWS_DAILY&amp;utm_med: 403 Client Error: Forbidden for url: https://news.usni.org/2017/09/25/navy-marine-corps-providing-around-clock-hurricane-maria-relief?utm_source=USNI+News&amp;utm_campaign=9f6a4b1075-USNI_NEWS_DAILY&amp;utm_med


Processing URLs:  75%|███████▌  | 754/1000 [32:28<03:16,  1.25it/s]

Error extracting text from https://www.middleeastmonitor.com/news/middle-east/23173-abbas-ousts-opposition-members-from-the-plo: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/middle-east/23173-abbas-ousts-opposition-members-from-the-plo


Processing URLs:  76%|███████▌  | 757/1000 [32:31<03:54,  1.04it/s]

Error extracting text from http://www.iran-daily.com/News/136392.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  76%|███████▌  | 762/1000 [32:40<04:35,  1.16s/it]

URL filtered: https://twitter.com/GJ_Analytics/status/737963706554953728
Error extracting text from http://www.reuters.com/article/us-saudi-prince-arrests-idUSKBN1A51QW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-prince-arrests-idUSKBN1A51QW?il=0


Processing URLs:  76%|███████▋  | 765/1000 [32:42<03:39,  1.07it/s]

Error extracting text from https://www.iranhumanrights.org/2017/11/irgc-hackers-target-iranian-journalists-based-abroad-with-malware-campaign/: 403 Client Error: Forbidden for url: https://www.iranhumanrights.org/2017/11/irgc-hackers-target-iranian-journalists-based-abroad-with-malware-campaign/


Processing URLs:  77%|███████▋  | 766/1000 [32:43<03:48,  1.02it/s]

Error extracting text from https://apple.news/ABlRqnCvnRR2Hh4gMCnKESQ: 404 Client Error: Not Found for url: https://apple.news/ABlRqnCvnRR2Hh4gMCnKESQ


Processing URLs:  77%|███████▋  | 767/1000 [33:44<1:10:04, 18.04s/it]

Error extracting text from http://www.jag.navy.mil/documents/NWP_1-14M_Commanders_Handbook.pdf: HTTPConnectionPool(host='www.jag.navy.mil', port=80): Max retries exceeded with url: /documents/NWP_1-14M_Commanders_Handbook.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x301cd0c80>, 'Connection to www.jag.navy.mil timed out. (connect timeout=60)'))


Processing URLs:  77%|███████▋  | 769/1000 [33:51<41:08, 10.69s/it]  

Error extracting text from http://www.dinarupdates.com/showthread.php?30297-Special-Forces-Daash-leaders-began-to-flee-from-Mosul-2-20: 403 Client Error: Forbidden for url: https://www.dinarupdates.com/showthread.php?30297-Special-Forces-Daash-leaders-began-to-flee-from-Mosul-2-20


Processing URLs:  77%|███████▋  | 770/1000 [33:52<29:58,  7.82s/it]

Error extracting text from https://jamanetwork.com/journals/jama/fullarticle/2773108: 403 Client Error: Forbidden for url: https://jamanetwork.com/journals/jama/fullarticle/2773108
URL filtered: https://twitter.com/markus_pausch/status/1396518490526396416?s=20


Processing URLs:  78%|███████▊  | 778/1000 [34:05<09:24,  2.54s/it]

Error extracting text from https://www.vdh.virginia.gov/coronavirus/covid-19-data-insights/weekly-health-district-case-data/: 404 Client Error: Not Found for url: https://www.vdh.virginia.gov/coronavirus/covid-19-data-insights/weekly-health-district-case-data/


Processing URLs:  78%|███████▊  | 785/1000 [34:13<03:08,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-oil-opec-algeria-idUSKCN1252CN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-opec-algeria-idUSKCN1252CN
Error extracting text from http://www.reuters.com/article/us-usa-security-iran-idUSKCN0YR0EA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-security-iran-idUSKCN0YR0EA?il=0


Processing URLs:  79%|███████▊  | 786/1000 [34:15<04:32,  1.27s/it]

Error extracting text from http://www.reuters.com/article/eurozone-greece-imf-idUSL5N1833PE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/eurozone-greece-imf-idUSL5N1833PE


Processing URLs:  79%|███████▉  | 793/1000 [34:31<08:58,  2.60s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-economy-analysis-idUSKBN16S1ZL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-economy-analysis-idUSKBN16S1ZL


Processing URLs:  79%|███████▉  | 794/1000 [34:31<06:35,  1.92s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-government-idUSKBN13N1ZQ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-government-idUSKBN13N1ZQ?il=0


Processing URLs:  80%|███████▉  | 797/1000 [34:35<04:42,  1.39s/it]

Error extracting text from https://www.nytimes.com/2016/07/10/world/middleeast/iran-once-quiet-about-its-casualties-in-syria-and-iraq-now-glorifies-them.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/07/10/world/middleeast/iran-once-quiet-about-its-casualties-in-syria-and-iraq-now-glorifies-them.html?_r=0


Processing URLs:  80%|████████  | 800/1000 [35:38<45:12, 13.56s/it]  

Error extracting text from http://www.u.tv/News/2015/10/06/No-real-progress-on-welfare---Villiers-46394: HTTPSConnectionPool(host='www.itv.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://www.axios.com/current-status-of-the-waymo-uber-lawsuit-2466532911.html: 403 Client Error: Forbidden for url: https://www.axios.com/current-status-of-the-waymo-uber-lawsuit-2466532911.html


Processing URLs:  80%|████████  | 803/1000 [35:39<16:28,  5.02s/it]

Error extracting text from https://www.occ.org.nz/assets/Uploads/AgesEthnicityMarch2018.pdf: 404 Client Error: Not Found for url: https://www.manamokopuna.org.nz/assets/Uploads/AgesEthnicityMarch2018.pdf


Processing URLs:  80%|████████  | 805/1000 [35:46<13:08,  4.04s/it]

Error extracting text from http://europe.newsweek.com/barack-obama-says-recapture-mosul-isis-set-end-2016-449637: 403 Client Error: Forbidden for url: https://www.newsweek.com/barack-obama-says-recapture-mosul-isis-set-end-2016-449637


Processing URLs:  81%|████████  | 806/1000 [35:46<09:39,  2.99s/it]

Error extracting text from http://thehill.com/business-a-lobbying/334377-white-house-aides-prepare-for-hefty-legal-bills-report: 403 Client Error: Forbidden for url: https://thehill.com/business-a-lobbying/334377-white-house-aides-prepare-for-hefty-legal-bills-report/


Processing URLs:  81%|████████  | 811/1000 [35:59<08:43,  2.77s/it]

Error extracting text from http://www.nato.int/: 403 Client Error: Forbidden for url: http://www.nato.int/


Processing URLs:  81%|████████  | 812/1000 [36:01<07:08,  2.28s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0XJ1YM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0XJ1YM


Processing URLs:  81%|████████▏ | 813/1000 [36:01<05:36,  1.80s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-bonds-advisers/venezuela-creditors-recoil-at-proposed-caracas-bondholder-meeting-idUSKBN1D8343?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-advisers/venezuela-creditors-recoil-at-proposed-caracas-bondholder-meeting-idUSKBN1D8343?il=0


Processing URLs:  82%|████████▏ | 818/1000 [36:09<05:05,  1.68s/it]

Error extracting text from https://www.presstv.com/Detail/2021/06/14/659036/Israel-Hezbollah-Iran-Hamas: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2021/06/14/659036/Israel-Hezbollah-Iran-Hamas


Processing URLs:  82%|████████▏ | 819/1000 [36:10<04:49,  1.60s/it]

Error extracting text from https://amti.csis.org/chinas-sam-shelters-spratlys/: 403 Client Error: Forbidden for url: https://amti.csis.org/chinas-sam-shelters-spratlys/


Processing URLs:  82%|████████▏ | 821/1000 [36:14<05:01,  1.68s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-28/spanish-socialists-turn-on-sanchez-edging-rajoy-closer-to-power


Processing URLs:  82%|████████▏ | 823/1000 [36:16<04:13,  1.43s/it]

Error extracting text from http://en.abna24.com/service/africa/archive/2016/04/12/746785/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/africa/archive/2016/04/12/746785/story.html


Processing URLs:  82%|████████▎ | 825/1000 [36:18<03:53,  1.34s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-16/facebook-is-said-to-seek-staff-with-national-security-clearance


Processing URLs:  83%|████████▎ | 827/1000 [36:20<03:07,  1.08s/it]

URL filtered: http://www.bloombergview.com/articles/2015-12-16/fed-increase-is-the-most-important-thing-ever-oh-wait-


Processing URLs:  83%|████████▎ | 831/1000 [36:21<01:32,  1.83it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/oh/ohio_senate_portman_vs_strickland-5386.html#: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/oh/ohio_senate_portman_vs_strickland-5386.html


Processing URLs:  83%|████████▎ | 832/1000 [36:23<02:31,  1.11it/s]

URL filtered: https://twitter.com/emilylmullin/status/885212890814451712/photo/1?ref_src=twsrc%5Etfw&amp;ref_url=https%3A%2F%2Fstat.liveblog.pro%2Flb-stat%2Fblogs%2F5953d2838ef0c40120e61610%2Findex.html


Processing URLs:  84%|████████▍ | 839/1000 [36:31<03:06,  1.16s/it]

Error extracting text from https://www.reuters.com/article/us-usa-terrorism/u-s-report-warns-of-threats-from-white-supremacists-militias-idUSKBN2B92SG?feedType=mktg&amp;feedName=topNews&amp;WT.mc_id=Partner-Google: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-terrorism/u-s-report-warns-of-threats-from-white-supremacists-militias-idUSKBN2B92SG?feedType=mktg&amp;feedName=topNews&amp;WT.mc_id=Partner-Google


Processing URLs:  84%|████████▍ | 843/1000 [36:34<02:08,  1.23it/s]

Error extracting text from http://www.wsj.com: 403 Client Error: Forbidden for url: https://www.wsj.com/


Processing URLs:  85%|████████▍ | 846/1000 [36:36<01:59,  1.28it/s]

URL filtered: http://www.bloombergview.com/articles/2015-12-17/new-russian-air-defenses-in-syria-keep-u-s-grounded?cmpid=wsdemand


Processing URLs:  85%|████████▍ | 848/1000 [37:36<35:18, 13.94s/it]

Error extracting text from http://www.aa.com.tr/en/politics/putin-uk-russia-relations-not-at-their-best-/474363: HTTPConnectionPool(host='www.aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  85%|████████▌ | 850/1000 [37:39<20:43,  8.29s/it]

Error extracting text from https://www.axios.com/russia-war-israel-bennett-zelensky-told-to-surrender-d5c53a0b-5940-4b09-85e4-ede244a2f5a1.html: 403 Client Error: Forbidden for url: https://www.axios.com/russia-war-israel-bennett-zelensky-told-to-surrender-d5c53a0b-5940-4b09-85e4-ede244a2f5a1.html
URL filtered: http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/melikkaylan/2017/02/02/new-reasons-to-remember-the-lurid-russia-dossier-on-trump/&amp;refURL=https://www.linkedin.com/&amp;referrer=https://www.linkedin.com/#3a6f917361e6


Processing URLs:  86%|████████▌ | 855/1000 [38:44<45:02, 18.64s/it]

Error extracting text from http://www.idigitaltimes.com/target-black-friday-2015-ads-leaked-iphone-ipad-apple-watch-deals-beats-headphones-96-488878: HTTPConnectionPool(host='www.idigitaltimes.com', port=80): Max retries exceeded with url: /target-black-friday-2015-ads-leaked-iphone-ipad-apple-watch-deals-beats-headphones-96-488878 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3021da870>, 'Connection to www.idigitaltimes.com timed out. (connect timeout=60)'))


Processing URLs:  86%|████████▌ | 857/1000 [38:52<27:01, 11.34s/it]

Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15O1Z3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15O1Z3


Processing URLs:  86%|████████▌ | 859/1000 [38:58<16:24,  6.98s/it]

Error extracting text from http://amazon.co.uk/: 503 Server Error: Service Unavailable for url: https://www.amazon.co.uk/


Processing URLs:  86%|████████▌ | 861/1000 [38:58<08:22,  3.62s/it]

Error extracting text from http://www.france24.com/en/20160216-turkey-syria-ground-offensive-nato: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160216-turkey-syria-ground-offensive-nato


Processing URLs:  86%|████████▌ | 862/1000 [38:59<06:30,  2.83s/it]

Error extracting text from http://english.farsnews.com/newstext.aspx?nn=13940803001273: HTTPConnectionPool(host='english.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940803001273 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3027eba70>: Failed to resolve 'english.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  87%|████████▋ | 869/1000 [39:14<04:03,  1.86s/it]

Error extracting text from http://uk.reuters.com/article/uk-russia-iran-zarif-idUKKCN0QF1XA20150810: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.youtube.com/watch?v=FscIgtDJFXg
Error extracting text from http://www.latimes.com/business/la-fi-jobs-report-20151204-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-jobs-report-20151204-story.html


Processing URLs:  87%|████████▋ | 874/1000 [39:19<02:20,  1.11s/it]

Error extracting text from https://core.ac.uk/download/pdf/211327106.pdf: 403 Client Error: Forbidden for url: https://core.ac.uk/download/pdf/211327106.pdf


Processing URLs:  88%|████████▊ | 876/1000 [39:20<01:29,  1.39it/s]

Error extracting text from http://web.scps.nyu.edu/global.affairs/msga/people/faculty/galeotti.html: HTTPConnectionPool(host='web.scps.nyu.edu', port=80): Max retries exceeded with url: /global.affairs/msga/people/faculty/galeotti.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fefc5100>: Failed to resolve 'web.scps.nyu.edu' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-asean-philippines-idUSKBN1AL05R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-philippines-idUSKBN1AL05R


Processing URLs:  88%|████████▊ | 877/1000 [39:22<02:04,  1.02s/it]

URL filtered: https://www.youtube.com/watch?v=sfIxGADjzv4


Processing URLs:  88%|████████▊ | 882/1000 [39:26<01:39,  1.19it/s]

Error extracting text from http://www.journals.uchicago.edu/doi/full/10.1086/686141: 403 Client Error: Forbidden for url: https://www.journals.uchicago.edu/doi/full/10.1086/686141
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKCN0VK0WQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKCN0VK0WQ


Processing URLs:  89%|████████▊ | 886/1000 [40:30<34:45, 18.29s/it]

Error extracting text from https://archive.is/JcAS2: HTTPSConnectionPool(host='archive.is', port=443): Read timed out. (read timeout=60)


Processing URLs:  89%|████████▊ | 887/1000 [40:31<25:15, 13.41s/it]

Error extracting text from http://www.lockheedmartin.com/us/products/HybridAirship.html: 404 Client Error: Not Found for url: https://www.lockheedmartin.com/en-us/products/hybrid-airship.html
URL filtered: https://twitter.com/Josiensor/status/789209510871195648
URL filtered: https://www.youtube.com/results?search_query=syria+gopro


Processing URLs:  89%|████████▉ | 890/1000 [41:31<31:21, 17.10s/it]

Error extracting text from http://www.usnews.com/news/articles/2015/12/11/british-defense-minister-fallon-path-to-syrian-peace-brightened-by-unity-on-assads-future: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  89%|████████▉ | 891/1000 [41:32<24:37, 13.56s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/02/19/EU-to-hold-migration-summit-with-Turkey-in-early-March.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/02/19/EU-to-hold-migration-summit-with-Turkey-in-early-March.html


Processing URLs:  90%|████████▉ | 895/1000 [41:37<08:40,  4.96s/it]

Error extracting text from http://enenews.com/attack-campaign-against-u-s-nuclear-workers-malware-installs-broader-and-more-ambitious-than-previously-thought-campaign-began-nearly-2-months-ago: 404 Client Error: Not Found for url: http://enenews.com/attack-campaign-against-u-s-nuclear-workers-malware-installs-broader-and-more-ambitious-than-previously-thought-campaign-began-nearly-2-months-ago
Error extracting text from http://www.reuters.com/article/us-russia-turkey-agriculture-idUSKBN17Z1PT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-turkey-agriculture-idUSKBN17Z1PT?il=0


Processing URLs:  90%|████████▉ | 899/1000 [41:45<04:33,  2.71s/it]

URL filtered: https://www.youtube.com/watch?v=TYeVQzTVyLk


Processing URLs:  90%|█████████ | 903/1000 [41:47<01:50,  1.14s/it]

Error extracting text from http://webcache.googleusercontent.com/search?q=cache:ct75-R2jfooJ:iprnewswire.com/devalued-yuan-set-to-take-bite-out-of-apple-give-boost-to-chinese-rivals/+&amp;cd=2&amp;hl=en&amp;ct=clnk&amp;gl=us: 404 Client Error: Not Found for url: http://webcache.googleusercontent.com/search?q=cache:ct75-R2jfooJ:iprnewswire.com/devalued-yuan-set-to-take-bite-out-of-apple-give-boost-to-chinese-rivals/+&amp;cd=2&amp;hl=en&amp;ct=clnk&amp;gl=us
Error extracting text from http://www.extremetech.com/extreme/235572-with-chevrolet-bolt-the-new-normal-for-evs-will-be-200-miles-range-30k-base-price: 403 Client Error: Forbidden for url: http://www.extremetech.com/extreme/235572-with-chevrolet-bolt-the-new-normal-for-evs-will-be-200-miles-range-30k-base-price


Processing URLs:  90%|█████████ | 904/1000 [41:48<01:58,  1.24s/it]

Error extracting text from http://ca.reuters.com/article/technologyNews/idCAKCN0VY30K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  90%|█████████ | 905/1000 [41:53<03:28,  2.19s/it]

Error extracting text from http://www.providingforpeacekeeping.org/2015/06/26/peacekeeping-contributor-profile-burundi/: HTTPSConnectionPool(host='www.providingforpeacekeeping.org', port=443): Max retries exceeded with url: /2015/06/26/peacekeeping-contributor-profile-burundi/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  91%|█████████ | 906/1000 [41:53<02:41,  1.72s/it]

Error extracting text from http://thehill.com/policy/finance/254514-house-dems-bullish-on-ex-im: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/254514-house-dems-bullish-on-ex-im/


Processing URLs:  91%|█████████ | 907/1000 [41:54<02:04,  1.34s/it]

Error extracting text from https://www.valuepenguin.com/average-student-loan-debt: 403 Client Error: Forbidden for url: https://www.valuepenguin.com/average-student-loan-debt


Processing URLs:  91%|█████████ | 911/1000 [41:57<01:08,  1.30it/s]

Error extracting text from http://postimg.org/image/4ig97z2p5/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/4ig97z2p5/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300c91cd0>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  91%|█████████ | 912/1000 [42:28<14:23,  9.82s/it]

Error extracting text from http://www.todayszaman.com/index/rosatom: 522 Server Error:  for url: http://www.todayszaman.com/index/rosatom


Processing URLs:  92%|█████████▏| 915/1000 [42:41<08:45,  6.18s/it]

Error extracting text from https://www.nytimes.com/2017/07/31/world/americas/venezuelas-opposition-riding-high-not-long-ago-suffers-a-crippling-blow.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/31/world/americas/venezuelas-opposition-riding-high-not-long-ago-suffers-a-crippling-blow.html


Processing URLs:  92%|█████████▏| 916/1000 [42:44<07:19,  5.23s/it]

Error extracting text from http://www.unionleader.com/Mark_Connolly_hires_Colin_Pio_to_manage_governors_campaign: 404 Client Error: Not Found for url: https://www.unionleader.com/mark_connolly_hires_colin_pio_to_manage_governors_campaign/


Processing URLs:  92%|█████████▏| 917/1000 [42:46<05:37,  4.06s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827737/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827737/


Processing URLs:  92%|█████████▏| 919/1000 [42:57<05:58,  4.42s/it]

Error extracting text from http://www.ibtimes.com/chinas-president-xi-says-internet-must-be-governed-order-stresses-cyber-sovereignty-2227533: 403 Client Error: Forbidden for url: https://www.ibtimes.com/chinas-president-xi-says-internet-must-be-governed-order-stresses-cyber-sovereignty-2227533


Processing URLs:  92%|█████████▏| 921/1000 [42:57<03:04,  2.34s/it]

Error extracting text from https://www.weforum.org/events/the-davos-agenda-2021/about: 403 Client Error: Forbidden for url: https://www.weforum.org/events/the-davos-agenda-2021/about
Error extracting text from http://www.reuters.com/article/us-china-yuan-midpoint-idUSKBN0UL07Z20160107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-yuan-midpoint-idUSKBN0UL07Z20160107


Processing URLs:  92%|█████████▏| 922/1000 [42:58<02:12,  1.70s/it]

Error extracting text from http://www.wsj.com/articles/china-car-sales-hit-the-brakes-in-february-1457602662: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-car-sales-hit-the-brakes-in-february-1457602662


Processing URLs:  93%|█████████▎| 926/1000 [43:01<01:09,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-libya-security-islamic-state-un-idUSKCN0WC2LI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-libya-security-islamic-state-un-idUSKCN0WC2LI
Error extracting text from http://www.reuters.com/article/us-turkey-economy-bankruptcy-idUSKCN0XV0EF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-economy-bankruptcy-idUSKCN0XV0EF


Processing URLs:  94%|█████████▎| 937/1000 [43:18<01:31,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-politics-idUSKBN13B1BK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-politics-idUSKBN13B1BK


Processing URLs:  95%|█████████▍| 948/1000 [43:42<00:58,  1.12s/it]

Error extracting text from https://www.nytimes.com/2021/12/07/us/politics/debt-ceiling-deal-congress.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/07/us/politics/debt-ceiling-deal-congress.html
Error extracting text from http://www.nato.int/cps/en/natohq/news_132047.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_132047.htm


Processing URLs:  95%|█████████▌| 951/1000 [44:46<15:51, 19.42s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/54022: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  95%|█████████▌| 952/1000 [44:47<11:00, 13.75s/it]

Error extracting text from https://theconversation.com/the-uks-speedy-covid-19-vaccine-rollout-surprise-success-or-planned-perfection-155922: 403 Client Error: Forbidden for url: https://theconversation.com/the-uks-speedy-covid-19-vaccine-rollout-surprise-success-or-planned-perfection-155922


Processing URLs:  95%|█████████▌| 953/1000 [44:47<07:44,  9.87s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-27/emerging-stocks-to-currencies-gain-before-fed-concludes-review


Processing URLs:  96%|█████████▌| 958/1000 [44:53<02:08,  3.05s/it]

Error extracting text from http://www.therepublic.com/view/story/142f0b0d79d5426cb3e6a7f7f1a52f54/EU--Europe-Migrants-The-Latest: 403 Client Error: Forbidden for url: https://www.therepublic.com/view/story/142f0b0d79d5426cb3e6a7f7f1a52f54/EU--Europe-Migrants-The-Latest


Processing URLs:  96%|█████████▌| 962/1000 [45:00<01:15,  2.00s/it]

Error extracting text from http://www.taek.gov.tr/en/belgeler-formlar/documents/: HTTPSConnectionPool(host='www.taek.gov.tr', port=443): Max retries exceeded with url: /en/belgeler-formlar/documents/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  96%|█████████▋| 965/1000 [45:05<00:57,  1.64s/it]

Error extracting text from https://uk.reuters.com/article/uk-northkorea-missile-internet/russian-firm-provides-new-internet-connection-to-north-korea-idUKKCN1C70D0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  97%|█████████▋| 967/1000 [45:08<00:59,  1.79s/it]

URL filtered: https://www.linkedin.com/vsearch/f?type=all&amp;keywords=superforecaster&amp;orig=GLHD&amp;rsid=&amp;pageKey=oz-winner&amp;trkInfo=tarId%3A1463657311517&amp;search=Search


Processing URLs:  97%|█████████▋| 969/1000 [45:09<00:36,  1.17s/it]

Error extracting text from http://finance.yahoo.com/news/report-sec-investigating-tesla-possible-202458823.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/report-sec-investigating-tesla-possible-202458823.html


Processing URLs:  97%|█████████▋| 972/1000 [45:15<00:49,  1.77s/it]

Error extracting text from http://www.aaai.org/Conferences/AAAI/aaai16.php: 403 Client Error: Forbidden for url: http://aaai.org/Conferences/AAAI/aaai16.php


Processing URLs:  97%|█████████▋| 973/1000 [45:16<00:42,  1.59s/it]

URL filtered: https://www.npr.org/2021/05/05/987679590/facebook-justified-in-banning-donald-trump-social-medias-oversight-board-rules
Error extracting text from http://www.reuters.com/article/us-cuba-energy-russia-idUSKBN17Z2B8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cuba-energy-russia-idUSKBN17Z2B8


Processing URLs:  98%|█████████▊| 978/1000 [45:20<00:20,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/un-marking-70th-anniversary-but-facing-crisis-opens-annual-meeting-1443442065: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/un-marking-70th-anniversary-but-facing-crisis-opens-annual-meeting-1443442065


Processing URLs:  98%|█████████▊| 981/1000 [45:30<00:43,  2.27s/it]

Error extracting text from http://finance.yahoo.com/news/crude-oil-spiking-141800736.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/crude-oil-spiking-141800736.html


Processing URLs:  98%|█████████▊| 983/1000 [45:33<00:33,  1.94s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/brazil-party-vote-boosts-rousseff-in-impeachment-fight


Processing URLs:  99%|█████████▊| 987/1000 [45:36<00:13,  1.01s/it]

Error extracting text from https://www.reuters.com/article/us-florida-shooting-guns/firearms-debate-rages-as-florida-rally-coincides-with-gun-show-idUSKCN1G10WB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-florida-shooting-guns/firearms-debate-rages-as-florida-rally-coincides-with-gun-show-idUSKCN1G10WB
Error extracting text from https://www.nasdaq.com/articles/walmarts-flipkart-to-spin-off-digital-payments-business-2020-12-03: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/walmarts-flipkart-to-spin-off-digital-payments-business-2020-12-03
URL filtered: https://www.engadget.com/2017/02/06/facebook-and-google-tackle-fake-news-ahead-of-french-elections/


Processing URLs:  99%|█████████▉| 991/1000 [45:37<00:05,  1.57it/s]

Error extracting text from http://thehill.com/opinion/cybersecurity/356377-a-plan-for-defending-us-manufacturers-from-cyberattacks: 403 Client Error: Forbidden for url: https://thehill.com/opinion/cybersecurity/356377-a-plan-for-defending-us-manufacturers-from-cyberattacks/
URL filtered: https://www.youtube.com/watch?v=HhunRo5FybQ


Processing URLs:  99%|█████████▉| 993/1000 [45:38<00:03,  2.26it/s]

Error extracting text from http://www.latino-review.com/news/captain-america-civil-war-breaks-new-fandango-record-as-top-pre-seller-among-superhero-movies: 404 Client Error: Not Found for url: http://www.latino-review.com/news/captain-america-civil-war-breaks-new-fandango-record-as-top-pre-seller-among-superhero-movies


Processing URLs:  99%|█████████▉| 994/1000 [45:39<00:04,  1.44it/s]

Error extracting text from http://mobile.nytimes.com/2016/01/07/us/politics/increasingly-iowans-say-their-caucuses-are-ted-cruzs-to-lose.html?referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/01/07/us/politics/increasingly-iowans-say-their-caucuses-are-ted-cruzs-to-lose.html?referer=https://www.google.com/


Processing URLs: 100%|█████████▉| 995/1000 [45:41<00:04,  1.05it/s]

Error extracting text from http://brillouinenergy.com/about/: 404 Client Error: Not Found for url: https://brillouinenergy.com/about/


Processing URLs: 100%|█████████▉| 996/1000 [45:42<00:03,  1.06it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20161116/0905200985.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20161116/0905200985.html


Processing URLs: 100%|██████████| 1000/1000 [45:48<00:00,  2.75s/it]
Processing URLs:   0%|          | 2/1000 [00:00<06:39,  2.50it/s]

Error extracting text from http://www.nytimes.com/2015/11/04/world/asia/leaders-of-china-and-taiwan-to-meet-for-first-time-since-1949.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/04/world/asia/leaders-of-china-and-taiwan-to-meet-for-first-time-since-1949.html


Processing URLs:   0%|          | 4/1000 [00:05<28:04,  1.69s/it]

Error extracting text from http://reut.rs/1VAyN5T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   1%|          | 11/1000 [00:14<19:45,  1.20s/it]

URL filtered: https://www.bitrates.com/news/p/facebooks-diem-will-the-global-cryptocurrency-go-live-in-2021


Processing URLs:   2%|▏         | 15/1000 [00:19<20:41,  1.26s/it]

URL filtered: https://twitter.com/nugmyanmar/status/1441065841974521864


Processing URLs:   2%|▏         | 17/1000 [00:20<16:45,  1.02s/it]

Error extracting text from http://www.opb.org/news/article/npr-the-fight-for-mosul-underway-for-a-month-is-only-just-beginning/: 404 Client Error: Not Found for url: https://www.opb.org/news/article/npr-the-fight-for-mosul-underway-for-a-month-is-only-just-beginning/


Processing URLs:   2%|▏         | 23/1000 [00:31<26:33,  1.63s/it]

URL filtered: https://twitter.com/scribblercat/status/1497338480061526018


Processing URLs:   3%|▎         | 26/1000 [00:32<16:36,  1.02s/it]

Error extracting text from https://www.thecipherbrief.com/article/trump-russia-and-cia-allies-and-adversaries-confused-1091: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/trump-russia-and-cia-allies-and-adversaries-confused-1091


Processing URLs:   3%|▎         | 30/1000 [00:38<18:09,  1.12s/it]

Error extracting text from https://www.wsj.com/articles/venezuela-to-make-1-1-billion-payment-for-pdvsa-bond-friday-1509664294: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuela-to-make-1-1-billion-payment-for-pdvsa-bond-friday-1509664294


Processing URLs:   3%|▎         | 32/1000 [00:39<12:18,  1.31it/s]

Error extracting text from http://www.mercurynews.com/business/ci_29920605/visa-involved-tesla-factory-expansion-sparks-debate: 404 Client Error: Not Found for url: https://www.mercurynews.com/business/ci_29920605/visa-involved-tesla-factory-expansion-sparks-debate
Error extracting text from http://www.nytimes.com/2016/02/03/opinion/to-end-syrias-war-help-assads-officers-defect.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/03/opinion/to-end-syrias-war-help-assads-officers-defect.html


Processing URLs:   4%|▎         | 36/1000 [00:42<13:22,  1.20it/s]

Error extracting text from http://www.governor.ny.gov/news/governor-cuomo-announces-first-successful-autonomous-vehicle-demonstrations-new-york-state: 403 Client Error: Forbidden for url: https://www.governor.ny.gov/news/governor-cuomo-announces-first-successful-autonomous-vehicle-demonstrations-new-york-state


Processing URLs:   4%|▍         | 45/1000 [01:04<19:52,  1.25s/it]  

Error extracting text from http://www.nytimes.com/2015/11/16/business/international/japan-economy-contracts-0-8-returning-to-recession.html?emc=edit_th_20151116&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/16/business/international/japan-economy-contracts-0-8-returning-to-recession.html?emc=edit_th_20151116&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:   5%|▍         | 46/1000 [01:06<21:59,  1.38s/it]

Error extracting text from http://www.londonmapper.org.uk/analysis/poverty-and-wealth-1980-2010/: HTTPSConnectionPool(host='london.worldmapper.org', port=443): Max retries exceeded with url: /analysis/poverty-and-wealth-1980-2010/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'london.worldmapper.org'. (_ssl.c:1000)")))


Processing URLs:   5%|▍         | 47/1000 [01:10<33:46,  2.13s/it]

Error extracting text from http://www.cfr.org/nigeria/nigeria-security-tracker/p29483&quot: 404 Client Error: Not Found for url: https://www.cfr.org/nigeria/nigeria-security-tracker/p29483&quot


Processing URLs:   5%|▍         | 48/1000 [01:10<26:00,  1.64s/it]

Error extracting text from https://theconversation.com/europe-wades-into-debate-over-polands-constitutional-crisis-53575: 403 Client Error: Forbidden for url: https://theconversation.com/europe-wades-into-debate-over-polands-constitutional-crisis-53575


Processing URLs:   5%|▌         | 53/1000 [01:19<25:35,  1.62s/it]

Error extracting text from http://archive.mid.ru//bdomp/brp_4.nsf/english/FDB1C2C6F7427FE4C3257D88004155B5: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Error extracting text from http://www.nytimes.com/2016/10/17/world/middleeast/in-isis-held-mosul-beheadings-and-hints-of-resistance-as-battle-nears.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/17/world/middleeast/in-isis-held-mosul-beheadings-and-hints-of-resistance-as-battle-nears.html?_r=0


Processing URLs:   5%|▌         | 54/1000 [01:21<28:30,  1.81s/it]

Error extracting text from https://www.nbc12.com/2021/01/22/russian-welcomes-us-proposal-extend-nuclear-treaty/: 404 Client Error: Not Found for url: https://www.12onyourside.com/2021/01/22/russian-welcomes-us-proposal-extend-nuclear-treaty/


Processing URLs:   6%|▌         | 57/1000 [01:25<17:59,  1.15s/it]

Error extracting text from http://www.iol.co.za/news/politics/scopa-blasts-anti-corruption-task-team-2068219: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/scopa-blasts-anti-corruption-task-team-2068219
Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuela-offers-chocolates-but-little-else-to-creditors-idUSKBN1DD0IG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuela-offers-chocolates-but-little-else-to-creditors-idUSKBN1DD0IG?il=0


Processing URLs:   6%|▌         | 60/1000 [01:45<1:14:25,  4.75s/it]

Error extracting text from https://uk.reuters.com/article/us-britain-eu/ten-days-to-crack-brexit-deal-eu-tells-may-idUKKBN1DO0PJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   6%|▋         | 65/1000 [01:51<22:54,  1.47s/it]  

Error extracting text from http://www.wsj.com/articles/congressional-leaders-agree-to-lift-40-year-ban-on-oil-exports-1450242995: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/congressional-leaders-agree-to-lift-40-year-ban-on-oil-exports-1450242995
Error extracting text from http://www.hindustantimes.com/india-news/india-used-goa-brics-summit-to-outmanoeuvre-pakistan-chinese-media/story-c4vOGR2L37mboPnGk8R4QP.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/india-news/india-used-goa-brics-summit-to-outmanoeuvre-pakistan-chinese-media/story-c4vOGR2L37mboPnGk8R4QP.html


Processing URLs:   7%|▋         | 66/1000 [01:51<19:43,  1.27s/it]

URL filtered: https://twitter.com/Charles_Lister/status/1396887935132241920


Processing URLs:   7%|▋         | 71/1000 [02:01<24:17,  1.57s/it]

Error extracting text from http://www.cio.com/article/3088566/security/researchers-steal-data-from-a-pc-by-controlling-noise-from-fans.html: 404 Client Error: Not Found for url: https://www.cio.com/article/3088566/security/researchers-steal-data-from-a-pc-by-controlling-noise-from-fans.html


Processing URLs:   7%|▋         | 73/1000 [02:03<22:15,  1.44s/it]

Error extracting text from https://af.reuters.com/article/commoditiesNews/idAFL2N1N20IT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:   8%|▊         | 76/1000 [02:07<17:55,  1.16s/it]

Error extracting text from https://www.fbi.gov/news/pressrel/press-releases/statement-by-fbi-director-james-b.-comey-on-the-investigation-of-secretary-hillary-clintons-use-of-a-personal-e-mail-system: 403 Client Error: Forbidden for url: https://www.fbi.gov/news/pressrel/press-releases/statement-by-fbi-director-james-b.-comey-on-the-investigation-of-secretary-hillary-clintons-use-of-a-personal-e-mail-system
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16M01B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16M01B


Processing URLs:   8%|▊         | 81/1000 [02:12<14:14,  1.08it/s]

Error extracting text from https://www.reuters.com/article/us-china-coastguard-law/china-authorises-coast-guard-to-fire-on-foreign-vessels-if-needed-idUSKBN29R1ER: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-coastguard-law/china-authorises-coast-guard-to-fire-on-foreign-vessels-if-needed-idUSKBN29R1ER
Error extracting text from https://www.nato.int/cps/en/natohq/topics_49187.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/topics_49187.htm


Processing URLs:   9%|▉         | 88/1000 [02:20<14:09,  1.07it/s]

Error extracting text from http://www.amazon.com/Dark-Territory-Secret-History-Cyber-ebook/dp/B010MHABUY/ref=sr_1_sc_1?s=digital-text&amp;ie=UTF8&amp;qid=1457643750&amp;sr=1-1-spell&amp;keywords=Dark+Terriorty: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dark-Territory-Secret-History-Cyber-ebook/dp/B010MHABUY/ref=sr_1_sc_1?s=digital-text&amp;ie=UTF8&amp;qid=1457643750&amp;sr=1-1-spell&amp;keywords=Dark+Terriorty
Error extracting text from http://www.reuters.com/article/2015/11/10/us-venezuela-election-idUSKCN0SZ33U20151110#WczPj0eczrYPf9oy.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/us-venezuela-election-idUSKCN0SZ33U20151110#WczPj0eczrYPf9oy.99
Error extracting text from http://www.washingtontimes.com/news/2017/may/4/congress-approves-1-trillion-spending-bill/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/may/4/congress-approves-1-trillion-spending-bill/


Processing URLs:   9%|▉         | 92/1000 [02:25<16:00,  1.06s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-05/chicago-tests-bond-market-alchemy-with-debut-of-aaa-rated-debt


Processing URLs:  10%|▉         | 99/1000 [03:30<54:20,  3.62s/it]  

Error extracting text from http://www.wsj.com/articles/yemen-bombing-kills-at-least-23-people-1482046988: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/yemen-bombing-kills-at-least-23-people-1482046988


Processing URLs:  10%|█         | 101/1000 [03:33<38:30,  2.57s/it]

Error extracting text from http://engineering.tamu.edu/hyperloop/teams: 404 Client Error: Not Found for url: https://engineering.tamu.edu/hyperloop/teams/index.html
URL filtered: https://www.bloomberg.com/news/articles/2017-11-08/russian-hackers-fueled-catalan-separatism-madrid-institute-says?cmpid=socialflow-twitter-business&amp;utm_content=business&amp;utm_campaign=socialflow-organic&amp;utm_source=twitter&amp;utm_medium=social


Processing URLs:  10%|█         | 105/1000 [03:38<21:07,  1.42s/it]

Error extracting text from https://www.wsj.com/articles/joe-manchins-voting-compromise-11624401491: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/joe-manchins-voting-compromise-11624401491


Processing URLs:  11%|█         | 107/1000 [03:56<1:03:57,  4.30s/it]

Error extracting text from http://www.roadandtrack.com/new-cars/future-cars/videos/a31252/nextev-electric-supercar-nurburgring/: 403 Client Error: Forbidden for url: http://www.roadandtrack.com/new-cars/future-cars/videos/a31252/nextev-electric-supercar-nurburgring/


Processing URLs:  11%|█         | 109/1000 [03:59<41:26,  2.79s/it]  

Error extracting text from http://www.wsj.com/articles/in-syria-crisis-russia-expands-alliance-with-iran-increases-missile-presence-1475784644: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-syria-crisis-russia-expands-alliance-with-iran-increases-missile-presence-1475784644


Processing URLs:  11%|█         | 111/1000 [04:01<29:02,  1.96s/it]

Error extracting text from http://finance.yahoo.com/news/japan-scrambles-jets-china-makes-025813597.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/japan-scrambles-jets-china-makes-025813597.html


Processing URLs:  11%|█▏        | 113/1000 [04:15<1:11:45,  4.85s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/vietnam-warns-china-over-oil-rig-activities/2016/01/19/e83c37ba-bf15-11e5-98c8-7fab78677d51_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/vietnam-warns-china-over-oil-rig-activities/2016/01/19/e83c37ba-bf15-11e5-98c8-7fab78677d51_story.html


Processing URLs:  11%|█▏        | 114/1000 [04:17<55:25,  3.75s/it]  

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-28/ryan-said-to-forge-unexpected-alliance-with-bannon-on-border-tax


Processing URLs:  12%|█▏        | 123/1000 [04:24<12:55,  1.13it/s]

Error extracting text from http://finance.yahoo.com/news/30-oil-irrational-rebound-inevitable-194048472.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/30-oil-irrational-rebound-inevitable-194048472.html
Error extracting text from https://www.iol.co.za/business-report/economy/breaking-anc-will-speak-to-zuma-about-exit-this-week-13028001: 403 Client Error: Forbidden for url: https://www.iol.co.za/business-report/economy/breaking-anc-will-speak-to-zuma-about-exit-this-week-13028001
URL filtered: https://www.rt.com/viral/379469-facebook-disputed-news-crackdown/


Processing URLs:  13%|█▎        | 126/1000 [04:28<16:19,  1.12s/it]

Error extracting text from http://www.latimes.com/politics/la-na-pol-court-guns-20180220-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-court-guns-20180220-story.html


Processing URLs:  14%|█▎        | 136/1000 [04:54<19:05,  1.33s/it]  

Error extracting text from http://www.latimes.com/politics/la-na-pol-trump-special-counsel-20170616-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-trump-special-counsel-20170616-story.html
Error extracting text from http://www.wsj.com/articles/russia-to-continue-military-support-to-syria-says-vladimir-putin-1442313043: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-to-continue-military-support-to-syria-says-vladimir-putin-1442313043


Processing URLs:  14%|█▍        | 139/1000 [04:57<13:28,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/calls-mount-for-ouster-of-brazil-president-dilma-rousseff-1457996676: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/calls-mount-for-ouster-of-brazil-president-dilma-rousseff-1457996676
Error extracting text from https://www.wsj.com/articles/north-koreas-missiles-and-nuclear-weapons-everything-you-need-to-know-11610712018#:~:text=North%20Korea&: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-koreas-missiles-and-nuclear-weapons-everything-you-need-to-know-11610712018#:~:text=North%20Korea&


Processing URLs:  14%|█▍        | 143/1000 [05:04<18:01,  1.26s/it]

Error extracting text from http://www.nytimes.com/2016/03/01/world/middleeast/iran-elections.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/01/world/middleeast/iran-elections.html?_r=0


Processing URLs:  14%|█▍        | 144/1000 [05:08<28:38,  2.01s/it]

Error extracting text from https://www.thetimes.co.uk/article/nicola-sturgeon-launches-snp-election-campaign-with-shot-at-alex-salmonds-self-interest-3lggsmzjg: 404 Client Error: Not Found for url: https://www.thetimes.co.uk/article/nicola-sturgeon-launches-snp-election-campaign-with-shot-at-alex-salmonds-self-interest-3lggsmzjg


Processing URLs:  15%|█▍        | 146/1000 [05:12<27:03,  1.90s/it]

Error extracting text from http://thehill.com/homenews/house/322903-the-hills-whip-list-where-republicans-stand-on-obamacare-repeal-plan: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/322903-the-hills-whip-list-where-republicans-stand-on-obamacare-repeal-plan/


Processing URLs:  15%|█▍        | 147/1000 [05:12<19:54,  1.40s/it]

Error extracting text from https://www.nytimes.com/2021/02/05/world/asia/myanmar-coup-china-united-states.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/05/world/asia/myanmar-coup-china-united-states.html


Processing URLs:  15%|█▍        | 148/1000 [05:12<14:56,  1.05s/it]

Error extracting text from https://www.nytimes.com/live/2022/03/16/world/ukraine-russia-war/us-officials-say-russian-troop-deaths-are-climbing-threatening-its-militarys-morale: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/03/16/world/ukraine-russia-war/us-officials-say-russian-troop-deaths-are-climbing-threatening-its-militarys-morale


Processing URLs:  15%|█▌        | 152/1000 [05:19<21:05,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=aJ7apo19fDM


Processing URLs:  16%|█▌        | 156/1000 [05:21<10:04,  1.40it/s]

Error extracting text from http://www.worldbulletin.net/balkans/165196/growing-support-for-montenegro-nato-membership-bid: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/balkans/165196/growing-support-for-montenegro-nato-membership-bid
Error extracting text from http://www.reuters.com/article/idUSKCN0ZK06E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0ZK06E


Processing URLs:  16%|█▌        | 160/1000 [05:24<07:52,  1.78it/s]

Error extracting text from https://theconversation.com/scotland-heads-towards-a-second-independence-referendum-74491: 403 Client Error: Forbidden for url: https://theconversation.com/scotland-heads-towards-a-second-independence-referendum-74491
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN14S0QH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN14S0QH


Processing URLs:  17%|█▋        | 167/1000 [05:38<27:20,  1.97s/it]

Error extracting text from http://www.newsweek.com/russia-sends-1-million-bullets-kurds-mosul-offensive-isis-490458: 403 Client Error: Forbidden for url: https://www.newsweek.com/russia-sends-1-million-bullets-kurds-mosul-offensive-isis-490458
URL filtered: http://www.bloomberg.com/news/articles/2016-05-09/elon-musk-s-tesla-strategy-win-big-by-falling-short


Processing URLs:  17%|█▋        | 173/1000 [05:51<27:46,  2.01s/it]

Error extracting text from http://www.foxnews.com/leisure/2016/09/09/bad-electrical-connection-blamed-for-tesla-model-s-fire-in-france/: 503 Server Error: Service Unavailable for url: https://www.foxnews.com/leisure/2016/09/09/bad-electrical-connection-blamed-for-tesla-model-s-fire-in-france/


Processing URLs:  18%|█▊        | 176/1000 [05:53<15:31,  1.13s/it]

Error extracting text from http://www.nytimes.com/2015/11/14/world/middleeast/sinjar-iraq-islamic-state.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/14/world/middleeast/sinjar-iraq-islamic-state.html


Processing URLs:  18%|█▊        | 177/1000 [05:54<15:44,  1.15s/it]

Error extracting text from http://in.reuters.com/article/us-brazil-corruption-appeals-idINKBN0TN0ZY20151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  18%|█▊        | 178/1000 [05:56<17:13,  1.26s/it]

Error extracting text from http://newsdaily.com/2016/02/china-says-really-needs-south-china-sea-defenses-in-face-of-united-states/#0HJMK3fg6p5rTDgY.99: 403 Client Error: Forbidden for url: https://www.sciencedaily.com/2016/02/china-says-really-needs-south-china-sea-defenses-in-face-of-united-states/#0HJMK3fg6p5rTDgY.99


Processing URLs:  18%|█▊        | 181/1000 [05:59<16:04,  1.18s/it]

URL filtered: http://www.bloomberg.com/quote/SHCOMP:IND


Processing URLs:  18%|█▊        | 183/1000 [05:59<10:30,  1.30it/s]

Error extracting text from http://www.heraldnet.com/article/20151002/NEWS02/151009816&gt: 403 Client Error: Forbidden for url: http://www.heraldnet.com/article/20151002/NEWS02/151009816&gt


Processing URLs:  18%|█▊        | 185/1000 [06:01<12:13,  1.11it/s]

Error extracting text from https://www.reuters.com/world/europe/romania-remains-vaccine-sceptical-despite-surge-covid-19-cases-2021-10-11/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/romania-remains-vaccine-sceptical-despite-surge-covid-19-cases-2021-10-11/


Processing URLs:  19%|█▉        | 189/1000 [06:13<30:26,  2.25s/it]

Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN0ZT177?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN0ZT177?il=0


Processing URLs:  19%|█▉        | 190/1000 [06:14<26:45,  1.98s/it]

Error extracting text from http://www.scotsman.com/news/politics/independent-scotland-would-have-to-reapply-to-eu-1-3232221: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/independent-scotland-would-have-to-reapply-to-eu-1-3232221


Processing URLs:  19%|█▉        | 193/1000 [06:36<1:07:37,  5.03s/it]

Error extracting text from https://www.afghanistan-analysts.org/afghanistan-election-conundrum-2-a-tight-date-and-a-debate-about-technology/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/afghanistan-election-conundrum-2-a-tight-date-and-a-debate-about-technology/


Processing URLs:  20%|█▉        | 195/1000 [06:46<1:14:12,  5.53s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/japan-us-s-korea-agree-to-step-up-pressure-on-n-korea/2016/10/27/e4a02946-9c04-11e6-b552-b1f85e484086_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/japan-us-s-korea-agree-to-step-up-pressure-on-n-korea/2016/10/27/e4a02946-9c04-11e6-b552-b1f85e484086_story.html


Processing URLs:  20%|█▉        | 197/1000 [06:49<44:13,  3.30s/it]  

Error extracting text from http://www.reuters.com/article/2015/09/17/us-usa-oilexports-house-idUSKCN0RH26Z20150917: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/17/us-usa-oilexports-house-idUSKCN0RH26Z20150917


Processing URLs:  20%|██        | 200/1000 [07:11<1:15:51,  5.69s/it]

Error extracting text from http://www.wsj.com/articles/hilsenrath-analysis-jobs-report-clears-the-way-for-fed-rate-increase-1449237755: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hilsenrath-analysis-jobs-report-clears-the-way-for-fed-rate-increase-1449237755


Processing URLs:  21%|██        | 207/1000 [07:27<39:40,  3.00s/it]  

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKCN0X800I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKCN0X800I


Processing URLs:  21%|██        | 209/1000 [07:37<51:19,  3.89s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/police-ask-canada-government-to-postpone-legal-marijuana/2017/09/12/3f46af38-97fd-11e7-af6a-6555caaeb8dc_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/police-ask-canada-government-to-postpone-legal-marijuana/2017/09/12/3f46af38-97fd-11e7-af6a-6555caaeb8dc_story.html


Processing URLs:  21%|██        | 211/1000 [07:39<35:09,  2.67s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_120085.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_120085.htm
Error extracting text from http://ajw.asahi.com/article/asia/korean_peninsula/AJ201509280034: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/asia/korean_peninsula/AJ201509280034 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3032039b0>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  22%|██▏       | 215/1000 [07:42<16:29,  1.26s/it]

Error extracting text from http://www.wsj.com/articles/kuczynski-maintains-thin-lead-in-peru-presidential-election-1465305666: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/kuczynski-maintains-thin-lead-in-peru-presidential-election-1465305666


Processing URLs:  22%|██▏       | 218/1000 [07:47<20:34,  1.58s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-10-17/iran-to-boost-oil-output-to-4-million-barrels-as-opec-plans-cut


Processing URLs:  22%|██▏       | 222/1000 [07:50<10:41,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-iran-military-gulf-idUSKCN1101SC?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-military-gulf-idUSKCN1101SC?mod=related&amp;channelName=worldNews
Error extracting text from http://www.reuters.com/article/2015/11/01/us-southkorea-japan-china-idUSKCN0SQ1GV20151101#ga357hG12V5G7Ejk.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/01/us-southkorea-japan-china-idUSKCN0SQ1GV20151101#ga357hG12V5G7Ejk.97


Processing URLs:  22%|██▏       | 224/1000 [07:51<08:16,  1.56it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.tasnimnews.com/fa/news/1394/12/08/1012869/: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.tasnimnews.com/fa/news/1394/12/08/1012869/


Processing URLs:  23%|██▎       | 226/1000 [07:56<19:05,  1.48s/it]

Error extracting text from https://www.cairn.info/les-chocs-du-futur--9782100779413-page-248.htm: 403 Client Error: Forbidden for url: https://www.cairn.info/les-chocs-du-futur--9782100779413-page-248.htm


Processing URLs:  23%|██▎       | 234/1000 [08:17<17:53,  1.40s/it]

Error extracting text from https://www.nytimes.com/2017/04/19/business/energy-environment/exxon-mobil-russia-sanctions-waiver-oil.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/19/business/energy-environment/exxon-mobil-russia-sanctions-waiver-oil.html


Processing URLs:  24%|██▎       | 235/1000 [08:18<14:19,  1.12s/it]

Error extracting text from https://github.com/Rochester-NRT/AlphaGo: 404 Client Error: Not Found for url: https://github.com/Rochester-NRT/AlphaGo


Processing URLs:  24%|██▍       | 238/1000 [08:24<16:42,  1.32s/it]

Error extracting text from http://www.wsj.com/articles/nasa-advisory-group-raises-concerns-about-spacex-rocket-fueling-plans-1477955860?mod=e2tw: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nasa-advisory-group-raises-concerns-about-spacex-rocket-fueling-plans-1477955860?mod=e2tw
Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0WB235: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0WB235


Processing URLs:  24%|██▍       | 240/1000 [08:25<11:41,  1.08it/s]

Error extracting text from http://www.wsj.com/articles/south-korean-president-seeks-irans-help-on-pyongyang-sanctions-1462182852: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korean-president-seeks-irans-help-on-pyongyang-sanctions-1462182852


Processing URLs:  24%|██▍       | 241/1000 [08:30<28:53,  2.28s/it]

Error extracting text from http://wallstreetwindow.com/node/13082: 404 Client Error: Not Found for url: https://wallstreetwindow.com/node/13082


Processing URLs:  24%|██▍       | 244/1000 [08:34<18:01,  1.43s/it]

Error extracting text from https://www.nytimes.com/2021/01/01/us/politics/iran-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/01/us/politics/iran-trump.html


Processing URLs:  25%|██▍       | 246/1000 [08:36<14:34,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=zVatBy_4GpM


Processing URLs:  25%|██▍       | 249/1000 [08:56<57:15,  4.57s/it]

Error extracting text from https://www.washingtonpost.com/politics/whitehouse/key-members-of-trumps-circle-under-scrutiny-for-russia-ties/2017/03/13/7d88ec1c-0837-11e7-bd19-fd3afa0f7e2a_story.html?utm_term=.998ed2985b0c: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/whitehouse/key-members-of-trumps-circle-under-scrutiny-for-russia-ties/2017/03/13/7d88ec1c-0837-11e7-bd19-fd3afa0f7e2a_story.html?utm_term=.998ed2985b0c


Processing URLs:  25%|██▌       | 252/1000 [08:59<29:43,  2.38s/it]

Error extracting text from https://www.middleeastmonitor.com/20210130-ethiopia-accuses-sudan-of-occupying-its-lands-fighting-by-proxy-for-third-party/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20210130-ethiopia-accuses-sudan-of-occupying-its-lands-fighting-by-proxy-for-third-party/


Processing URLs:  25%|██▌       | 253/1000 [09:03<32:50,  2.64s/it]

Error extracting text from http://38north.org/2015/09/schilling092815/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  25%|██▌       | 254/1000 [09:03<24:10,  1.94s/it]

Error extracting text from https://www.yahoo.com/news/eu-ministers-scramble-salvage-us-trade-deal-040001311.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/eu-ministers-scramble-salvage-us-trade-deal-040001311.html


Processing URLs:  26%|██▌       | 256/1000 [09:07<27:33,  2.22s/it]

Error extracting text from http://www.kiro7.com/news/trending-now/obamacare-immigration-canceled-executive-orders-what-donald-trump-says-he-will-do-the-first-100_/481233001: 404 Client Error: Not Found for url: https://www.kiro7.com/news/trending-now/obamacare-immigration-canceled-executive-orders-what-donald-trump-says-he-will-do-the-first-100_/481233001/


Processing URLs:  26%|██▌       | 257/1000 [09:08<20:14,  1.63s/it]

Error extracting text from https://www.nytimes.com/live/2021/04/08/business/stock-market-today#amazon-union-vote: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/04/08/business/stock-market-today#amazon-union-vote


Processing URLs:  26%|██▌       | 258/1000 [09:08<15:59,  1.29s/it]

Error extracting text from https://www.insightonconflict.org/conflicts/kashmir/conflict-profile/: HTTPSConnectionPool(host='www.insightonconflict.org', port=443): Max retries exceeded with url: /conflicts/kashmir/conflict-profile/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30364f320>: Failed to resolve 'www.insightonconflict.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  26%|██▌       | 261/1000 [10:28<4:24:09, 21.45s/it]

Error extracting text from http://aa.com.tr/en/middle-east/4-iran-linked-afghan-shia-militiamen-killed-in-syria/1064275 : HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 267/1000 [10:35<42:47,  3.50s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-01-12/putin-says-sheltering-assad-would-be-easier-than-snowden-asylum


Processing URLs:  27%|██▋       | 273/1000 [10:41<17:49,  1.47s/it]

Error extracting text from https://www.nytimes.com/2017/05/02/business/greece-debt-crisis-eu-deal.html?emc=edit_mbe_20170502&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/02/business/greece-debt-crisis-eu-deal.html?emc=edit_mbe_20170502&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1


Processing URLs:  28%|██▊       | 277/1000 [10:45<12:24,  1.03s/it]

Error extracting text from https://www.timesofisrael.com/poll-most-israelis-want-netanyahu-to-resign-if-police-recommend-indictment/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/poll-most-israelis-want-netanyahu-to-resign-if-police-recommend-indictment/


Processing URLs:  28%|██▊       | 280/1000 [10:56<24:23,  2.03s/it]

Error extracting text from http://zikavirusnet.com/: HTTPConnectionPool(host='zikavirusnet.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3033203e0>: Failed to resolve 'zikavirusnet.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  28%|██▊       | 283/1000 [10:59<16:17,  1.36s/it]

Error extracting text from https://theconversation.com/why-uk-could-be-doomed-to-years-without-proper-access-to-world-trade-61782: 403 Client Error: Forbidden for url: https://theconversation.com/why-uk-could-be-doomed-to-years-without-proper-access-to-world-trade-61782


Processing URLs:  28%|██▊       | 284/1000 [11:00<17:00,  1.43s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-04/brazil-assets-poised-to-rally-as-lula-targeted-in-police-raid


Processing URLs:  29%|██▊       | 287/1000 [11:03<12:43,  1.07s/it]

Error extracting text from http://nationalinterest.org/feature/deadly-lessons-the-last-time-china-america-went-war-11558: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/deadly-lessons-the-last-time-china-america-went-war-11558


Processing URLs:  29%|██▉       | 289/1000 [11:04<10:19,  1.15it/s]

Error extracting text from http://www.channelnewsasia.com/news/world/poland-crisis-escalates/2671990.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/poland-crisis-escalates/2671990.html
Error extracting text from http://www.reuters.com/article/us-britain-regulation-ipos-idUSKBN19Y0VG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-regulation-ipos-idUSKBN19Y0VG


Processing URLs:  29%|██▉       | 290/1000 [11:06<12:22,  1.05s/it]

Error extracting text from http://uk.reuters.com/article/uk-iran-total-southpars-idUKKBN19O1A2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  29%|██▉       | 291/1000 [11:07<12:48,  1.08s/it]

URL filtered: https://www.bloomberg.com/news/features/2021-06-27/did-covid-come-from-a-lab-scientist-at-wuhan-institute-speaks-out


Processing URLs:  29%|██▉       | 293/1000 [11:07<08:11,  1.44it/s]

Error extracting text from http://www.chron.com/news/article/Top-Massachusetts-Democrats-lining-up-behind-6633938.php: 403 Client Error: Forbidden for url: https://www.chron.com/news/article/Top-Massachusetts-Democrats-lining-up-behind-6633938.php
URL filtered: https://twitter.com/EJ_Burrows/status/1497253014960324614


Processing URLs:  30%|███       | 300/1000 [11:20<14:10,  1.22s/it]

Error extracting text from http://www.thesun.co.uk/sol/homepage/news/politics/6882592/Norway-urges-Britain-to-leave-EU.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/news/politics/6882592/Norway-urges-Britain-to-leave-EU.html


Processing URLs:  31%|███       | 306/1000 [11:28<12:57,  1.12s/it]

Error extracting text from http://www.therepublic.com/2017/02/01/un-united-nations-syria-2/: 403 Client Error: Forbidden for url: https://www.therepublic.com/2017/02/01/un-united-nations-syria-2/


Processing URLs:  31%|███       | 309/1000 [11:33<18:25,  1.60s/it]

Error extracting text from http://taskandpurpose.com/afghanistan-helmand-marines-trump-strategy/: 404 Client Error: Not Found for url: https://taskandpurpose.com/afghanistan-helmand-marines-trump-strategy/


Processing URLs:  31%|███       | 311/1000 [11:37<17:12,  1.50s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://agenciabrasil.ebc.com.br/politica/noticia/2016-03/comeca-contagem-de-prazo-para-dilma-apresentar-defesa-comissao-do&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://agenciabrasil.ebc.com.br/politica/noticia/2016-03/comeca-contagem-de-prazo-para-dilma-apresentar-defesa-comissao-do&amp;prev=search


Processing URLs:  31%|███▏      | 314/1000 [11:43<22:55,  2.00s/it]

Error extracting text from http://seenandoverheard.blog.dayton.com/2015/08/12/what-you-need-to-know-about-mike-turners-bride-to-be/: 404 Client Error: Not Found for url: https://www.dayton.com/blog/seen-and-overheard/2015/08/12/what-you-need-to-know-about-mike-turners-bride-to-be/


Processing URLs:  32%|███▏      | 315/1000 [11:45<23:50,  2.09s/it]

Error extracting text from http://www.ibtimes.co.uk/just-world-bias-has-twisted-media-coverage-donald-trump-campaign-1547151: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/just-world-bias-has-twisted-media-coverage-donald-trump-campaign-1547151


Processing URLs:  32%|███▏      | 316/1000 [11:46<18:40,  1.64s/it]

Error extracting text from http://www.palestine-studies.org/jps/fulltext/41405: 403 Client Error: Forbidden for url: https://www.palestine-studies.org/jps/fulltext/41405


Processing URLs:  32%|███▏      | 317/1000 [11:47<15:21,  1.35s/it]

Error extracting text from http://globalriskinsights.com/2016/04/missed-deadline-farc-deal-colombian-government/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/04/missed-deadline-farc-deal-colombian-government/
URL filtered: https://twitter.com/ChristopherJM/status/1473274123681992706?s=20


Processing URLs:  32%|███▏      | 321/1000 [11:50<10:37,  1.07it/s]

Error extracting text from https://www.nytimes.com/2017/10/03/world/middleeast/mattis-iran-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/03/world/middleeast/mattis-iran-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;amp


Processing URLs:  32%|███▏      | 322/1000 [11:53<18:06,  1.60s/it]

Error extracting text from http://www.theweek.co.uk/eu-referendum/63710/eu-referendum-david-cameron-criticised-for-lack-of-brexit-plan: 404 Client Error: Not Found for url: https://theweek.com/eu-referendum/63710/eu-referendum-david-cameron-criticised-for-lack-of-brexit-plan


Processing URLs:  32%|███▎      | 325/1000 [12:05<28:12,  2.51s/it]

Error extracting text from https://asean.org/asean-china-reaffirm-commitment-strong-partnership/: 403 Client Error: Forbidden for url: https://asean.org/asean-china-reaffirm-commitment-strong-partnership/


Processing URLs:  33%|███▎      | 334/1000 [12:19<16:43,  1.51s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2016/01/15/0401000000AEN20160115010900315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  34%|███▎      | 336/1000 [12:22<17:13,  1.56s/it]

Error extracting text from https://www2.helsinki.fi/en/verifin-finnish-institute-for-verification-of-the-chemical-weapons-convention/finnish-actions-implementing-united-nations-resolution-1540: 404 Client Error: Not Found for url: https://www.helsinki.fi/en/verifin-finnish-institute-for-verification-of-the-chemical-weapons-convention/finnish-actions-implementing-united-nations-resolution-1540
URL filtered: https://twitter.com/potus


Processing URLs:  34%|███▍      | 338/1000 [12:23<13:01,  1.18s/it]

Error extracting text from http://www.channelnewsasia.com/news/business/euro-barely-changed-ahead/2209660.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/euro-barely-changed-ahead/2209660.html


Processing URLs:  34%|███▍      | 341/1000 [12:31<26:35,  2.42s/it]

Error extracting text from http://www.dtt-net.com/en/news/3648/39/Census-launched-in-North-Macedonia-after-19-years/: 404 Client Error: Not Found for url: https://dtt-net.com/en/news/3648/39/Census-launched-in-North-Macedonia-after-19-years/


Processing URLs:  34%|███▍      | 343/1000 [12:34<20:10,  1.84s/it]

URL filtered: https://www.youtube.com/watch?v=bY3uW-QYfj4


Processing URLs:  34%|███▍      | 345/1000 [12:34<11:48,  1.08s/it]

Error extracting text from https://www3.nhk.or.jp/nhkworld/en/news/20180211_11/: 404 Client Error: Not Found for url: https://www3.nhk.or.jp/nhkworld/en/news/20180211_11/


Processing URLs:  35%|███▌      | 350/1000 [12:44<17:25,  1.61s/it]

Error extracting text from http://finance.yahoo.com/news/north-korea-playing-clever-game-040416804.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/north-korea-playing-clever-game-040416804.html


Processing URLs:  36%|███▌      | 356/1000 [12:54<15:24,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-casualties-idUSKCN1050EN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-casualties-idUSKCN1050EN


Processing URLs:  36%|███▌      | 358/1000 [12:58<18:42,  1.75s/it]

URL filtered: http://www.tripwire.com/state-of-security/latest-security-news/blackenergy-involved-in-targeted-attack-against-boryspil-airport-says-ukraine/#.VpzbafJKuv8.twitter


Processing URLs:  36%|███▌      | 361/1000 [13:03<18:12,  1.71s/it]

Error extracting text from https://www.reuters.com/business/energy/certifying-nord-stream-2-is-no-threat-gas-supply-eu-german-ministry-2021-10-26/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/certifying-nord-stream-2-is-no-threat-gas-supply-eu-german-ministry-2021-10-26/


Processing URLs:  36%|███▋      | 365/1000 [13:06<11:54,  1.13s/it]

Error extracting text from http://cdec.water.ca.gov/cdecapp/resapp/getResGraphsMain.action: 404 Client Error: Not Found for url: http://cdec.water.ca.gov/cdecapp/resapp/getResGraphsMain.action


Processing URLs:  37%|███▋      | 367/1000 [13:09<14:06,  1.34s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-dispute-timeline-history-chinese-us-involvement-contested-region-2158499: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-dispute-timeline-history-chinese-us-involvement-contested-region-2158499
Error extracting text from https://www.reuters.com/world/us/kremlin-says-date-location-putin-biden-summit-not-yet-decided-2021-04-26/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/kremlin-says-date-location-putin-biden-summit-not-yet-decided-2021-04-26/


Processing URLs:  37%|███▋      | 370/1000 [13:14<15:59,  1.52s/it]

Error extracting text from http://www.newsweek.com/us-russia-nuclear-arms-race-over-and-russia-has-won-581704: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-russia-nuclear-arms-race-over-and-russia-has-won-581704


Processing URLs:  38%|███▊      | 375/1000 [13:21<16:31,  1.59s/it]

URL filtered: https://twitter.com/azarijahromi/status/1221879096592031755


Processing URLs:  38%|███▊      | 377/1000 [13:24<16:12,  1.56s/it]

Error extracting text from http://www.thenational.ae/opinion/comment/mosul-victory-will-heighten-displaced-persons-crisis: 404 Client Error: Not Found for url: https://www.thenationalnews.com/opinion/comment/mosul-victory-will-heighten-displaced-persons-crisis/


Processing URLs:  38%|███▊      | 380/1000 [13:28<14:35,  1.41s/it]

Error extracting text from https://dailystar.com.lb/News/Middle-East/2018/Feb-21/438802-netanyahu-confidant-agrees-to-testify-against-him-reports.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2018/Feb-21/438802-netanyahu-confidant-agrees-to-testify-against-him-reports.ashx


Processing URLs:  38%|███▊      | 382/1000 [13:31<15:00,  1.46s/it]

Error extracting text from https://www.espn.com/college-sports/story/_/id/31172473/supreme-court-questions-validity-amateurism-ncaa-business-model: 403 Client Error: Forbidden for url: https://www.espn.com/college-sports/story/_/id/31172473/supreme-court-questions-validity-amateurism-ncaa-business-model


Processing URLs:  39%|███▊      | 387/1000 [13:44<18:51,  1.85s/it]

Error extracting text from http://www.straitstimes.com/world/europe/whats-next-for-germany-after-collapse-of-coalition-talks: 403 Client Error: Forbidden for url: https://www.straitstimes.com/world/europe/whats-next-for-germany-after-collapse-of-coalition-talks


Processing URLs:  39%|███▉      | 389/1000 [13:47<17:55,  1.76s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VE1R8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VE1R8


Processing URLs:  39%|███▉      | 390/1000 [13:47<13:19,  1.31s/it]

Error extracting text from http://www.wsj.com/articles/moderates-leading-in-tehran-in-iranian-elections-1456658601: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/moderates-leading-in-tehran-in-iranian-elections-1456658601


Processing URLs:  39%|███▉      | 393/1000 [13:50<10:29,  1.04s/it]

Error extracting text from http://www.nytimes.com/2015/12/29/world/middleeast/iran-hands-over-stockpile-of-enriched-uranium-to-russia.html?emc=edit_au_20151228&amp;nl=afternoonupdate&amp;nlid=42208600&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/29/world/middleeast/iran-hands-over-stockpile-of-enriched-uranium-to-russia.html?emc=edit_au_20151228&amp;nl=afternoonupdate&amp;nlid=42208600&amp;_r=0


Processing URLs:  39%|███▉      | 394/1000 [13:51<11:52,  1.18s/it]

Error extracting text from https://www.justsecurity.org/44293/20-questions-answered-russia-investigations: 403 Client Error: Forbidden for url: https://www.justsecurity.org/44293/20-questions-answered-russia-investigations


Processing URLs:  40%|████      | 401/1000 [14:58<3:09:28, 18.98s/it]

Error extracting text from http://aa.com.tr/en/world/foreign-powers-controlling-syrias-assad-family/559410: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  40%|████      | 402/1000 [15:00<2:18:05, 13.86s/it]

Error extracting text from https://in.reuters.com/article/britain-politics-cyber/iran-was-behind-cyber-attack-on-british-lawmakers-in-june-the-times-idINKBN1CJ0AD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  41%|████      | 406/1000 [15:04<37:37,  3.80s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-syria/kremlin-syria-peoples-congress-being-actively-discussed-idUSKBN1CP198?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-syria/kremlin-syria-peoples-congress-being-actively-discussed-idUSKBN1CP198?il=0


Processing URLs:  41%|████▏     | 413/1000 [15:10<12:30,  1.28s/it]

Error extracting text from http://www.jsonline.com/blogs/news/378686971.html: 404 Client Error: OK for url: https://www.jsonline.com/blogs/news/378686971.html/


Processing URLs:  42%|████▏     | 416/1000 [15:16<14:12,  1.46s/it]

Error extracting text from http://tricorder.xprize.org/about/schedule: 404 Client Error: Not Found for url: http://tricorder.xprize.org/about/schedule
URL filtered: https://www.bloomberg.com/news/articles/2018-02-14/nafta-breakup-would-leave-a-bitter-taste-on-valentine-s-day


Processing URLs:  42%|████▏     | 421/1000 [15:24<14:33,  1.51s/it]

Error extracting text from https://www.reuters.com/article/illinois-bonds/illinois-keeps-bbb-minus-sp-rating-for-6-bln-bond-sale-idUSL2N1MK0U7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/illinois-bonds/illinois-keeps-bbb-minus-sp-rating-for-6-bln-bond-sale-idUSL2N1MK0U7


Processing URLs:  42%|████▏     | 422/1000 [15:25<12:00,  1.25s/it]

Error extracting text from https://biocomplexity.virginia.edu/project/covid-19-pandemic-response: 403 Client Error: Forbidden for url: https://biocomplexity.virginia.edu/project/covid-19-pandemic-response


Processing URLs:  42%|████▎     | 425/1000 [15:28<09:20,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-usa-shale-permian-insight-idUSKCN18D0CN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-shale-permian-insight-idUSKCN18D0CN


Processing URLs:  43%|████▎     | 429/1000 [15:32<08:17,  1.15it/s]

Error extracting text from http://www.wsj.com/articles/the-feds-big-december-rates-problem-1446056910: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-feds-big-december-rates-problem-1446056910


Processing URLs:  43%|████▎     | 431/1000 [15:32<05:18,  1.79it/s]

Error extracting text from http://ir.tesla.com/releasedetail.cfm?releaseid=978031: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?releaseid=978031


Processing URLs:  43%|████▎     | 432/1000 [15:33<05:17,  1.79it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/272170-poll-trump-loses-head-to-head-vs-cruz-rubio: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/272170-poll-trump-loses-head-to-head-vs-cruz-rubio/


Processing URLs:  43%|████▎     | 434/1000 [15:35<08:21,  1.13it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-08/renzi-says-he-s-betting-on-cameron-sees-anti-brexit-deal-soon


Processing URLs:  44%|████▍     | 440/1000 [15:42<10:14,  1.10s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/03/01/738213/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/03/01/738213/story.html


Processing URLs:  44%|████▍     | 443/1000 [15:45<09:59,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-thailand-politics-idUSKBN1330UW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-politics-idUSKBN1330UW


Processing URLs:  45%|████▍     | 447/1000 [15:54<17:13,  1.87s/it]

Error extracting text from http://www.mofa.go.jp/me_a/me2/ir/page4e_000238.html: 403 Client Error: Forbidden for url: http://www.mofa.go.jp/me_a/me2/ir/page4e_000238.html


Processing URLs:  45%|████▍     | 448/1000 [15:55<17:14,  1.87s/it]

Error extracting text from http://www.ibtimes.com/north-korea-continues-evade-un-sanctions-get-material-nuclear-ballistic-missiles-2301165: 403 Client Error: Forbidden for url: https://www.ibtimes.com/north-korea-continues-evade-un-sanctions-get-material-nuclear-ballistic-missiles-2301165


Processing URLs:  45%|████▌     | 453/1000 [16:04<12:38,  1.39s/it]

Error extracting text from http://www.nytimes.com/2016/03/23/world/middleeast/bashar-al-assad-syria-russia-west.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/23/world/middleeast/bashar-al-assad-syria-russia-west.html?_r=0


Processing URLs:  46%|████▌     | 458/1000 [16:11<12:09,  1.35s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/iran-starts-taking-nuclea/2233874.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/iran-starts-taking-nuclea/2233874.html


Processing URLs:  46%|████▌     | 461/1000 [16:13<08:59,  1.00s/it]

Error extracting text from http://www.reuters.com/article/2015/09/11/us-mideast-syria-russia-idUSKCN0RB0ZD20150911: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/11/us-mideast-syria-russia-idUSKCN0RB0ZD20150911


Processing URLs:  46%|████▋     | 465/1000 [16:16<06:25,  1.39it/s]

Error extracting text from http://www.reuters.com/article/2015/09/03/us-iran-nuclear-khamenei-idUSKCN0R310720150903: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/03/us-iran-nuclear-khamenei-idUSKCN0R310720150903
Error extracting text from http://www.reuters.com/article/us-brazil-corruption-santana-idUSKCN0VZ2EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-santana-idUSKCN0VZ2EH


Processing URLs:  47%|████▋     | 468/1000 [16:24<17:03,  1.92s/it]

Error extracting text from http://andrewgelman.com/wp-content/uploads/2014/03/EBMA_conditions6.pdf: 403 Client Error: Forbidden for url: https://statmodeling.stat.columbia.edu/wp-content/uploads/2014/03/EBMA_conditions6.pdf


Processing URLs:  47%|████▋     | 469/1000 [16:25<14:05,  1.59s/it]

URL filtered: http://america.aljazeera.com/opinions/2014/4/iran-twitter-rouhaniinternetcensorship.html


Processing URLs:  48%|████▊     | 476/1000 [16:35<11:59,  1.37s/it]

Error extracting text from https://bit.ly/2K7B4I2: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-report/2020/august/monetary-policy-report-august-2020
Error extracting text from http://www.reuters.com/article/us-usa-fiscal-idUSKBN0TR1WW20151208: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-idUSKBN0TR1WW20151208


Processing URLs:  48%|████▊     | 478/1000 [16:37<10:43,  1.23s/it]

Error extracting text from http://oklo.org/2014/12/18/dead-voices-on-air/: 406 Client Error: Not Acceptable for url: http://oklo.org/2014/12/18/dead-voices-on-air/


Processing URLs:  48%|████▊     | 481/1000 [16:44<14:12,  1.64s/it]

Error extracting text from http://articles.chicagotribune.com/2010-05-14/opinion/ct-oped-0514-british-20100514_1_campaign-spending-candidates-election-day: 404 Client Error: Not Found for url: https://www.chicagotribune.com/ct-xpm-2010-05-14-ct-oped-0514-british-20100514-story.html


Processing URLs:  48%|████▊     | 483/1000 [16:45<08:53,  1.03s/it]

Error extracting text from http://in.reuters.com/article/iran-energy-china-contract-idINKBN15220Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from http://www.nato.int/cps/en/natolive/topics_48891.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_48891.htm


Processing URLs:  49%|████▊     | 487/1000 [16:50<12:10,  1.42s/it]

URL filtered: https://www.paymentssource.com/news/facebooks-diem-project-lays-groundwork-for-merchant-acceptance


Processing URLs:  49%|████▉     | 492/1000 [16:57<10:36,  1.25s/it]

Error extracting text from https://www.thelocal.se/20210621/sweden-stefan-lofven-loses-no-confidence-motion/: 403 Client Error: Forbidden for url: https://www.thelocal.se/20210621/sweden-stefan-lofven-loses-no-confidence-motion


Processing URLs:  49%|████▉     | 493/1000 [17:01<17:15,  2.04s/it]

URL filtered: https://www.thecipherbrief.com/article/exclusive/preventing-zombie-intel-officers-stealing-any-secrets-they-can-1091?utm_content=buffer9c44b&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  50%|████▉     | 496/1000 [17:03<10:30,  1.25s/it]

Error extracting text from https://www.barchart.com/futures/quotes/TG*1: 403 Client Error: Forbidden for url: https://www.barchart.com/futures/quotes/TG*1


Processing URLs:  50%|████▉     | 497/1000 [17:05<13:18,  1.59s/it]

URL filtered: https://www.youtube.com/watch?v=8d-4Agf3nRE


Processing URLs:  50%|█████     | 500/1000 [17:16<21:01,  2.52s/it]

Error extracting text from http://www.australianetworknews.com/russian-president-vladimir-putin-threatens-nato/: HTTPConnectionPool(host='www.australianetworknews.com', port=80): Max retries exceeded with url: /russian-president-vladimir-putin-threatens-nato/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300eb82f0>: Failed to resolve 'www.australianetworknews.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.metacritic.com/movie/belfast: 403 Client Error: Forbidden for url: https://www.metacritic.com/movie/belfast
URL filtered: https://www.youtube.com/watch?v=hluT4oprkmA


Processing URLs:  50%|█████     | 503/1000 [17:21<15:18,  1.85s/it]

Error extracting text from http://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUKL5N1221K820151011: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUKL5N1221K820151011


Processing URLs:  50%|█████     | 504/1000 [17:22<13:43,  1.66s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-OW4CHT6JTSFH01-32VFHQMFC9H5UVFIM1GQ0QAOSF


Processing URLs:  51%|█████     | 506/1000 [17:22<08:47,  1.07s/it]

Error extracting text from http://www.japantimes.co.jp/news/2017/04/28/asia-pacific/china-revises-mapping-law-bolster-claims-south-china-sea-land-taiwan/#.WQOYMVK-Lq0: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2017/04/28/asia-pacific/china-revises-mapping-law-bolster-claims-south-china-sea-land-taiwan/#.WQOYMVK-Lq0


Processing URLs:  51%|█████     | 507/1000 [17:23<07:46,  1.06it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/356413-week-ahead-lawmakers-zero-in-on-kaspersky: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/356413-week-ahead-lawmakers-zero-in-on-kaspersky/


Processing URLs:  51%|█████     | 512/1000 [17:28<07:00,  1.16it/s]

Error extracting text from http://www.nytimes.com/2016/07/12/world/middleeast/us-iraq-mosul.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/12/world/middleeast/us-iraq-mosul.html?ref=world


Processing URLs:  52%|█████▏    | 519/1000 [17:44<09:01,  1.12s/it]

Error extracting text from https://www.middleeastmonitor.com/news/middle-east/22126-poll-64-of-palestinians-support-abolishment-of-oslo-accords: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/middle-east/22126-poll-64-of-palestinians-support-abolishment-of-oslo-accords
Error extracting text from http://www.nytimes.com/2016/06/21/world/asia/indonesia-south-china-sea-fishing.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/21/world/asia/indonesia-south-china-sea-fishing.html


Processing URLs:  52%|█████▎    | 525/1000 [18:08<33:13,  4.20s/it]

Error extracting text from https://www.senate.gov/legislative/LIS/roll_call_votes/vote1171/vote_117_1_00231.htm),: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/LIS/roll_call_votes/vote1171/vote_117_1_00231.htm),


Processing URLs:  53%|█████▎    | 526/1000 [18:09<24:51,  3.15s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-08/what-does-erdogan-want-unlimited-powers-to-tailor-a-new-turkey


Processing URLs:  53%|█████▎    | 532/1000 [18:16<11:34,  1.48s/it]

Error extracting text from https://www.reuters.com/article/us-brazil-economy-pension/brazils-temer-unveils-pension-reform-sets-retirement-age-at-65-idUSKBN13U1VT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-economy-pension/brazils-temer-unveils-pension-reform-sets-retirement-age-at-65-idUSKBN13U1VT


Processing URLs:  53%|█████▎    | 533/1000 [18:18<12:48,  1.65s/it]

Error extracting text from http://www.globalpost.com/article/6720096/2016/01/14/s-korea-may-seek-criminal-charges-against-volkswagen-over-data: 404 Client Error: Not Found for url: https://theworld.org/article/6720096/2016/01/14/s-korea-may-seek-criminal-charges-against-volkswagen-over-data


Processing URLs:  54%|█████▎    | 535/1000 [18:20<10:54,  1.41s/it]

Error extracting text from http://www.fao.org/news/story/en/item/471251/icode/: 404 Client Error: Not Found for url: https://www.fao.org/news/story/en/item/471251/icode/


Processing URLs:  54%|█████▎    | 537/1000 [18:23<10:12,  1.32s/it]

Error extracting text from http://shop.jada.or.jp/i-shop/top.pasp?to=tp: HTTPConnectionPool(host='shop.jada.or.jp', port=80): Max retries exceeded with url: /i-shop/top.pasp?to=tp (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30398bb00>: Failed to resolve 'shop.jada.or.jp' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  54%|█████▍    | 540/1000 [18:25<07:20,  1.05it/s]

Error extracting text from http://www.dhakatribune.com/world/2016/08/06/hezbollah-partition-iraq-syria-possible-outcome-war/: 403 Client Error: Forbidden for url: https://www.dhakatribune.com/world/2016/08/06/hezbollah-partition-iraq-syria-possible-outcome-war/


Processing URLs:  54%|█████▍    | 542/1000 [18:27<06:11,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-california-drought-brown-idUSKCN0US2UR20160114: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-california-drought-brown-idUSKCN0US2UR20160114


Processing URLs:  54%|█████▍    | 543/1000 [18:28<06:00,  1.27it/s]

Error extracting text from https://www.governor.ny.gov/news/governor-cuomo-announces-approval-first-application-autonomous-vehicle-demonstration-new-york: 403 Client Error: Forbidden for url: https://www.governor.ny.gov/news/governor-cuomo-announces-approval-first-application-autonomous-vehicle-demonstration-new-york
URL filtered: https://www.youtube.com/watch?v=nhyakhoE4CM


Processing URLs:  55%|█████▍    | 545/1000 [18:30<06:54,  1.10it/s]

Error extracting text from http://www.modernmeadow.com/about-us/: 404 Client Error: Not Found for url: https://www.modernmeadow.com/about-us/


Processing URLs:  55%|█████▌    | 550/1000 [18:38<08:17,  1.10s/it]

Error extracting text from http://www.timesofisrael.com/boosted-by-nuke-deal-iran-ups-funding-to-hezbollah-hamas/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/boosted-by-nuke-deal-iran-ups-funding-to-hezbollah-hamas/


Processing URLs:  55%|█████▌    | 554/1000 [18:42<05:39,  1.31it/s]

Error extracting text from https://www.nytimes.com/reuters/2017/11/09/business/09reuters-venezuela-bonds.html?_r=0tx: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/11/09/business/09reuters-venezuela-bonds.html?_r=0tx


Processing URLs:  56%|█████▌    | 555/1000 [18:43<07:29,  1.01s/it]

Error extracting text from https://www.dni.gov/index.php/newsroom/reports-publications/reports-publications-2021/item/2204-2021-annual-threat-assessment-of-the-u-s-intelligence-community: 404 Client Error: Not Found for url: https://www.dni.gov/index.php/newsroom/reports-publications/reports-publications-2021/item/2204-2021-annual-threat-assessment-of-the-u-s-intelligence-community


Processing URLs:  56%|█████▌    | 557/1000 [18:46<09:46,  1.32s/it]

Error extracting text from http://www.esrl.noaa.gov/research/themes/o3/: 404 Client Error: Not Found for url: https://www.esrl.noaa.gov/research/themes/o3/


Processing URLs:  56%|█████▌    | 561/1000 [18:52<09:22,  1.28s/it]

Error extracting text from https://aibirds.org/man-vs-machine-challenge/previous-results.html: HTTPSConnectionPool(host='aibirds.org', port=443): Max retries exceeded with url: /man-vs-machine-challenge/previous-results.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  56%|█████▌    | 562/1000 [19:53<2:20:17, 19.22s/it]

Error extracting text from http://www.aa.com.tr/en/turkey/russia-bombs-turkish-aid-agency-bakery-in-syria/483161-: HTTPConnectionPool(host='www.aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  56%|█████▋    | 563/1000 [19:53<1:38:32, 13.53s/it]

Error extracting text from http://www.wsj.com/articles/brazils-supreme-court-declines-to-halt-dilma-rousseffs-impeachment-1462985229: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-supreme-court-declines-to-halt-dilma-rousseffs-impeachment-1462985229


Processing URLs:  57%|█████▋    | 567/1000 [19:55<25:02,  3.47s/it]  

Error extracting text from http://www.highpoint.edu/blog/2016/09/hpu-poll-in-nc-clinton-leads-trump-and-cooper-leads-mccrory/: 403 Client Error: Forbidden for url: http://www.highpoint.edu/blog/2016/09/hpu-poll-in-nc-clinton-leads-trump-and-cooper-leads-mccrory/


Processing URLs:  57%|█████▋    | 568/1000 [19:56<18:56,  2.63s/it]

Error extracting text from http://warontherocks.com/2016/06/natos-open-door-leads-to-an-identity-crisis/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/06/natos-open-door-leads-to-an-identity-crisis/


Processing URLs:  57%|█████▋    | 573/1000 [20:04<14:21,  2.02s/it]

Error extracting text from http://www.godominicanrepublic.com/events/upcoming/: 404 Client Error: Not Found for url: https://www.godominicanrepublic.com/events/upcoming/


Processing URLs:  58%|█████▊    | 576/1000 [20:08<10:18,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-security-idUSKCN10S0EQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-idUSKCN10S0EQ


Processing URLs:  58%|█████▊    | 579/1000 [20:11<07:52,  1.12s/it]

Error extracting text from http://www.icmunlimited.com/data/media/pdf/Voting_29thFeb16_pv.pdf: 403 Client Error: Forbidden for url: http://www.icmunlimited.com/data/media/pdf/Voting_29thFeb16_pv.pdf


Processing URLs:  58%|█████▊    | 582/1000 [20:15<08:11,  1.18s/it]

Error extracting text from http://scadastrangelove.blogspot.com/search/label/Releases: 404 Client Error: Not Found for url: http://scadastrangelove.blogspot.com/search/label/Releases


Processing URLs:  58%|█████▊    | 584/1000 [20:19<11:12,  1.62s/it]

Error extracting text from http://www.mfa.gov.tr/turkey_s-commercial-and-economic-relations-with-russian-federation.en.mfa: HTTPSConnectionPool(host='www.mfa.gov.tr', port=443): Max retries exceeded with url: /turkey_s-commercial-and-economic-relations-with-russian-federation.en.mfa (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  59%|█████▊    | 586/1000 [20:21<09:46,  1.42s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-21/venezuela-goes-on-oil-world-tour-as-deadline-for-bond-swap-looms


Processing URLs:  59%|█████▉    | 588/1000 [20:22<05:52,  1.17it/s]

Error extracting text from https://thehill.com/policy/finance/478636-here-are-the-10-senators-who-voted-against-trumps-north-american-trade-deal: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/478636-here-are-the-10-senators-who-voted-against-trumps-north-american-trade-deal/
URL filtered: https://www.bloomberg.com/news/articles/2021-07-02/breyer-hires-four-law-clerks-for-next-term-high-court-confirms


Processing URLs:  59%|█████▉    | 593/1000 [20:25<04:03,  1.67it/s]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/10/30/64/0401000000AEN20151030002800315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  59%|█████▉    | 594/1000 [20:26<04:58,  1.36it/s]

Error extracting text from https://newswwc.com/united-states/mediators-push-to-restore-cease-fire-in-gaza-after-overnight-strikes/: 404 Client Error: Not Found for url: https://newswwc.com/united-states/mediators-push-to-restore-cease-fire-in-gaza-after-overnight-strikes/


Processing URLs:  60%|█████▉    | 597/1000 [20:32<09:33,  1.42s/it]

Error extracting text from http://www.parliament.scot/parliamentarybusiness/28877.aspx?SearchType=Advance&amp;ReferenceNumbers=S5M-04710&amp;ResultsPerPage=10: 403 Client Error: Forbidden for url: https://www.parliament.scot/parliamentarybusiness/28877.aspx?SearchType=Advance&amp;ReferenceNumbers=S5M-04710&amp;ResultsPerPage=10


Processing URLs:  60%|█████▉    | 598/1000 [20:33<08:02,  1.20s/it]

Error extracting text from https://www.independent.co.uk/news/uk/politics/matt-hancock-affair-latest-news-b1872558.html: 404 Client Error: Not Found for url: https://www.independent.co.uk/news/uk/politics/matt-hancock-affair-latest-news-b1872558.html


Processing URLs:  60%|██████    | 601/1000 [20:35<06:12,  1.07it/s]

Error extracting text from http://www.kdmid.ru/docs.aspx?lst=country_wiki&amp;it=/%D0%A1%D0%BE%D0%B3%D0%BB%D0%B0%D1%88%D0%B5%D0%BD%D0%B8%D0%B5%20%D0%BE%D0%B1%20%D1%83%D1%81%D0%BB%D0%BE%D0%B2%D0%B8%D1%8F%D1%85%20%D0%B2%D0%B7%D0%B0%D0%B8%D0%BC%D0%BD%D1%8B%D1%85%20%D0%BF%D0%BE%D0%B5%D0%B7%D0%B4%D0%BE%D0%BA%20(%D0%90%D0%BD%D0%BA%D0%B0%D1%80%D0%B0,%2012%20%D0%BC%D0%B0%D1%8F%202010%20%D0%B3%D0%BE%D0%B4%D0%B0).aspx: 404 Client Error: Not Found for url: https://www.kdmid.ru/docs.aspx?lst=country_wiki&amp;it=/%D0%A1%D0%BE%D0%B3%D0%BB%D0%B0%D1%88%D0%B5%D0%BD%D0%B8%D0%B5%20%D0%BE%D0%B1%20%D1%83%D1%81%D0%BB%D0%BE%D0%B2%D0%B8%D1%8F%D1%85%20%D0%B2%D0%B7%D0%B0%D0%B8%D0%BC%D0%BD%D1%8B%D1%85%20%D0%BF%D0%BE%D0%B5%D0%B7%D0%B4%D0%BE%D0%BA%20(%D0%90%D0%BD%D0%BA%D0%B0%D1%80%D0%B0,%2012%20%D0%BC%D0%B0%D1%8F%202010%20%D0%B3%D0%BE%D0%B4%D0%B0).aspx
URL filtered: https://twitter.com/BreakingNews/status/704303375509114880
Error extracting text from http://www.wsj.com/articles/senior-chinese-military-officer-visi

Processing URLs:  61%|██████    | 606/1000 [20:42<06:56,  1.06s/it]

Error extracting text from http://www.reuters.com/article/us-china-politics-league-exclusive-idUSKCN1200OL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-politics-league-exclusive-idUSKCN1200OL


Processing URLs:  61%|██████    | 610/1000 [21:47<2:02:20, 18.82s/it]

Error extracting text from https://betting.betfair.com/politics/uk-politics/brexit-betting-update-eu-referendum-whos-right---the-polls-or-the-markets-030316-204.html: HTTPSConnectionPool(host='betting.betfair.com', port=443): Max retries exceeded with url: /politics/uk-politics/brexit-betting-update-eu-referendum-whos-right---the-polls-or-the-markets-030316-204.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe270aa0>, 'Connection to betting.betfair.com timed out. (connect timeout=60)'))
URL filtered: https://twitter.com/markknoller/status/656863798901501953


Processing URLs:  62%|██████▏   | 615/1000 [21:51<32:37,  5.08s/it]  

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN1AP2AR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN1AP2AR


Processing URLs:  62%|██████▏   | 620/1000 [22:01<15:04,  2.38s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382922/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382922/


Processing URLs:  62%|██████▏   | 623/1000 [22:06<10:58,  1.75s/it]

Error extracting text from http://www.reuters.com/article/us-hongkong-china-politics-idUSKBN13204P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-china-politics-idUSKBN13204P


Processing URLs:  62%|██████▏   | 624/1000 [22:07<08:46,  1.40s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/324661-report-russian-individuals-invested-nearly-100m-in-trump: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/324661-report-russian-individuals-invested-nearly-100m-in-trump/


Processing URLs:  63%|██████▎   | 627/1000 [22:13<12:46,  2.05s/it]

URL filtered: https://twitter.com/katyafimava/status/1456169806638497794
Error extracting text from https://www.uefa.com/uefachampionsleague/news/0275-151e9aea97a2-343677ff5a8f-1000--champions-league-final-possible-line-ups-selection-dilemmas-tea/: 403 Client Error: Forbidden for url: https://www.uefa.com/uefachampionsleague/news/0275-151e9aea97a2-343677ff5a8f-1000--champions-league-final-possible-line-ups-selection-dilemmas-tea/


Processing URLs:  63%|██████▎   | 630/1000 [22:14<07:04,  1.15s/it]

Error extracting text from https://amzn.to/3fJw4WX: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Snark-Mean-Personal-Ruining-conversation/dp/1416599452


Processing URLs:  63%|██████▎   | 633/1000 [22:16<05:02,  1.21it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.sueddeutsche.de/wirtschaft/volkswagen-ein-kronzeuge-packt-aus-1.2829840&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.sueddeutsche.de/wirtschaft/volkswagen-ein-kronzeuge-packt-aus-1.2829840&amp;prev=search
Error extracting text from http://www.wsj.com/articles/effort-to-force-vote-on-ex-im-bank-reauthorization-gains-some-republican-support-1443718801: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/effort-to-force-vote-on-ex-im-bank-reauthorization-gains-some-republican-support-1443718801


Processing URLs:  64%|██████▎   | 636/1000 [22:33<19:33,  3.22s/it]

Error extracting text from https://globalguessing.com/metaculus-mondays-vol15/: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /metaculus-mondays-vol15/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  64%|██████▍   | 638/1000 [22:36<14:26,  2.39s/it]

Error extracting text from http://af.reuters.com/article/topNews/idAFKCN0YW17H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  64%|██████▍   | 639/1000 [22:37<12:46,  2.12s/it]

URL filtered: https://twitter.com/hashtag/Brexit?src=hash&amp;amp;ref_src=twsrc%5Etfw&quot;&gt;#Brexit&lt;/a&gt


Processing URLs:  64%|██████▍   | 642/1000 [22:40<08:05,  1.36s/it]

Error extracting text from http://www.newsweek.com/burundi-345-cases-torture-and-ill-treatment-2016-says-un-449061: 403 Client Error: Forbidden for url: https://www.newsweek.com/burundi-345-cases-torture-and-ill-treatment-2016-says-un-449061
Error extracting text from http://www.opec.org/opec_web/en/press_room/2189.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2189.htm


Processing URLs:  65%|██████▍   | 646/1000 [22:44<06:25,  1.09s/it]

Error extracting text from http://www.latimes.com/nation/la-na-clinton-email-probe-20160327-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-clinton-email-probe-20160327-story.html


Processing URLs:  65%|██████▌   | 650/1000 [22:49<07:14,  1.24s/it]

Error extracting text from https://news.yahoo.com/malaysia-says-100-china-boats-intrude-waters-060648630.html: 404 Client Error: Not Found for url: https://news.yahoo.com/malaysia-says-100-china-boats-intrude-waters-060648630.html


Processing URLs:  65%|██████▌   | 654/1000 [22:51<03:32,  1.63it/s]

Error extracting text from https://strana.ua/news/322516-nastuplenie-vsu-na-donbasse-cheho-zhdat-ot-obostrenija-v-zone-oos.html: HTTPSConnectionPool(host='strana.ua', port=443): Max retries exceeded with url: /news/322516-nastuplenie-vsu-na-donbasse-cheho-zhdat-ot-obostrenija-v-zone-oos.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3040e9040>: Failed to resolve 'strana.ua' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-germany-election-merkel-idUSKBN1A00BZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-merkel-idUSKBN1A00BZ


Processing URLs:  66%|██████▌   | 657/1000 [22:53<02:44,  2.08it/s]

Error extracting text from https://www.nytimes.com/2017/03/06/opinion/the-case-for-a-border-adjusted-tax.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/06/opinion/the-case-for-a-border-adjusted-tax.html


Processing URLs:  66%|██████▌   | 658/1000 [22:53<02:09,  2.65it/s]

Error extracting text from http://www.nytimes.com/2016/08/17/us/shadow-brokers-leak-raises-alarming-question-was-the-nsa-hacked.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/17/us/shadow-brokers-leak-raises-alarming-question-was-the-nsa-hacked.html


Processing URLs:  66%|██████▌   | 659/1000 [22:54<03:09,  1.80it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/germany-17-now-investigation-volkswagen-probe-37482912: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/germany-17-now-investigation-volkswagen-probe-37482912


Processing URLs:  66%|██████▌   | 662/1000 [22:58<05:15,  1.07it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-06/payrolls-in-u-s-climb-most-this-year-as-jobless-rate-reaches-5-


Processing URLs:  66%|██████▋   | 665/1000 [23:02<05:45,  1.03s/it]

Error extracting text from http://greece.greekreporter.com/2016/04/01/greece-rushes-to-complete-bailout-program-review-by-april-13/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/04/01/greece-rushes-to-complete-bailout-program-review-by-april-13/


Processing URLs:  67%|██████▋   | 666/1000 [23:02<04:53,  1.14it/s]

Error extracting text from http://www.businessinsider.com.au/heres-what-we-think-is-going-to-happen-in-2016-v2-2016-1: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/heres-what-we-think-is-going-to-happen-in-2016-v2-2016-1


Processing URLs:  67%|██████▋   | 671/1000 [23:19<11:52,  2.17s/it]

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa/top-u-s-general-says-exiting-iran-nuclear-pact-would-make-future-deals-tough-idUSKCN1C12OF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa/top-u-s-general-says-exiting-iran-nuclear-pact-would-make-future-deals-tough-idUSKCN1C12OF


Processing URLs:  67%|██████▋   | 674/1000 [23:22<08:14,  1.52s/it]

Error extracting text from https://investors.modernatx.com/news-releases/news-release-details/moderna-announces-publication-results-pivotal-phase-3-trial: 403 Client Error: Forbidden for url: https://investors.modernatx.com/news-releases/news-release-details/moderna-announces-publication-results-pivotal-phase-3-trial


Processing URLs:  68%|██████▊   | 677/1000 [23:29<09:05,  1.69s/it]

Error extracting text from http://postimg.org/image/igp3f2zc5/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/igp3f2zc5/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3027e9970>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/view/articles/2016-11-02/saudi-arabia-s-bond-success-hides-its-financial-peril


Processing URLs:  69%|██████▊   | 687/1000 [23:42<09:09,  1.75s/it]

Error extracting text from http://www.reuters.com/article/us-usa-oil-funds-analysis-idUSKBN1812DK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-oil-funds-analysis-idUSKBN1812DK


Processing URLs:  69%|██████▉   | 694/1000 [23:52<08:25,  1.65s/it]

Error extracting text from https://www.quandl.com/data/OECD/MEI_FIN_XF_TUR_A-Turkey-Reserve-Assets-Sdr-Millions-Annual: 404 Client Error: Not Found for url: https://data.nasdaq.com/data/OECD/MEI_FIN_XF_TUR_A-Turkey-Reserve-Assets-Sdr-Millions-Annual
URL filtered: https://twitter.com/chrischirp/status/1473362907471699979


Processing URLs:  70%|██████▉   | 698/1000 [23:57<07:16,  1.45s/it]

Error extracting text from http://blogs.wsj.com/washwire/2016/10/16/pence-acknowledges-evidence-russia-behind-cyber-attacks/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2016/10/16/pence-acknowledges-evidence-russia-behind-cyber-attacks/


Processing URLs:  71%|███████   | 707/1000 [24:08<05:46,  1.18s/it]

URL filtered: https://www.youtube.com/watch?v=-vgIIs0CGqw


Processing URLs:  71%|███████   | 711/1000 [24:16<08:46,  1.82s/it]

Error extracting text from https://www.bankofengland.co.uk/working-paper/2018/overnight-index-swap-market-based-measures-of-monetary-policy-expectations: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/working-paper/2018/overnight-index-swap-market-based-measures-of-monetary-policy-expectations


Processing URLs:  71%|███████▏  | 713/1000 [24:17<06:13,  1.30s/it]

Error extracting text from https://www.reuters.com/article/us-yemen-security-timeline/timeline-yemens-slide-into-political-crisis-and-war-idUSKCN1R20HO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-timeline/timeline-yemens-slide-into-political-crisis-and-war-idUSKCN1R20HO?il=0


Processing URLs:  72%|███████▏  | 716/1000 [24:24<08:10,  1.73s/it]

Error extracting text from https://www.transparency.org/country/#ITA: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#ITA


Processing URLs:  72%|███████▏  | 722/1000 [24:34<06:31,  1.41s/it]

Error extracting text from http://nyti.ms/1RLab4S: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/11/world/europe/bloody-sunday-massacre-arrest-northern-ireland.html?smid=pl-share


Processing URLs:  72%|███████▏  | 724/1000 [24:36<06:10,  1.34s/it]

Error extracting text from http://www.leginfo.ca.gov/pub/dailyfile/asm/assembly_Regular_Session.pdf: 404 Client Error: Not found for url: http://www.leginfo.ca.gov/pub/dailyfile/asm/assembly_Regular_Session.pdf


Processing URLs:  72%|███████▎  | 725/1000 [24:37<05:20,  1.16s/it]

URL filtered: https://twitter.com/BarakRavid/status/1497214423756197889


Processing URLs:  73%|███████▎  | 729/1000 [25:02<28:41,  6.35s/it]

Error extracting text from https://www.recode.net/2017/2/9/14462390/trump-freeze-regulation-faa-drone-delivery: Exceeded 30 redirects.


Processing URLs:  73%|███████▎  | 732/1000 [25:08<17:57,  4.02s/it]

Error extracting text from http://en.granma.cu/mundo/2016-01-22/farc-ep-and-colombian-government-begin-talks-on-paramilitarism: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  73%|███████▎  | 734/1000 [25:10<10:37,  2.40s/it]

Error extracting text from http://www.wsj.com/articles/cameron-pushes-bid-to-redefine-u-k-european-union-ties-on-brussels-visit-1454080533: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/cameron-pushes-bid-to-redefine-u-k-european-union-ties-on-brussels-visit-1454080533


Processing URLs:  74%|███████▍  | 738/1000 [25:20<09:32,  2.19s/it]

Error extracting text from http://election.princeton.edu/2016/02/06/the-post-iowa-bounce-goes-to-hillary-clinton/: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /02/06/the-post-iowa-bounce-goes-to-hillary-clinton/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30153ae10>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  74%|███████▍  | 740/1000 [25:21<05:08,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/iraqi-troops-ordered-to-pause-mosul-advance-1476871308: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-troops-ordered-to-pause-mosul-advance-1476871308
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.tasnimnews.com/fa/news/1394/12/11/1014786/: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fa&amp;u=http://www.tasnimnews.com/fa/news/1394/12/11/1014786/


Processing URLs:  74%|███████▍  | 742/1000 [25:23<04:40,  1.09s/it]

Error extracting text from http://www.theregister.co.uk/2016/04/12/sweden_suspects_russian_hackers_hit_air_traffic_control/: 403 Client Error: Forbidden for url: https://www.theregister.com/2016/04/12/sweden_suspects_russian_hackers_hit_air_traffic_control/


Processing URLs:  74%|███████▍  | 745/1000 [25:26<04:52,  1.15s/it]

Error extracting text from http://emarketalerts.forecast1.com/mic/eabstract.cfm?recno=238130: HTTPConnectionPool(host='emarketalerts.forecast1.com', port=80): Max retries exceeded with url: /mic/eabstract.cfm?recno=238130 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303ec9910>: Failed to resolve 'emarketalerts.forecast1.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  75%|███████▌  | 751/1000 [25:39<08:28,  2.04s/it]

URL filtered: https://twitter.com/tconnellyRTE/status/1336309588858187778


Processing URLs:  76%|███████▌  | 755/1000 [25:44<06:05,  1.49s/it]

Error extracting text from https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=prc_hicp_manr&amp;lang=en: 403 Client Error: Forbidden for url: https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=prc_hicp_manr&amp;lang=en


Processing URLs:  76%|███████▌  | 757/1000 [25:51<08:24,  2.07s/it]

Error extracting text from https://www.yahoo.com/news/iraqi-troops-face-stiff-resistance-eastern-mosul-072125392.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/iraqi-troops-face-stiff-resistance-eastern-mosul-072125392.html


Processing URLs:  76%|███████▌  | 759/1000 [25:52<05:35,  1.39s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-02/american-jobs-machine-sputters-as-global-woes-chip-at-growth


Processing URLs:  76%|███████▌  | 761/1000 [25:53<03:51,  1.03it/s]

Error extracting text from http://www.financialexpress.com/article/economy/three-years-on-rcep-trade-talks-yet-to-cross-first-stage/248785/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/economy/three-years-on-rcep-trade-talks-yet-to-cross-first-stage/248785/


Processing URLs:  76%|███████▋  | 764/1000 [25:56<03:51,  1.02it/s]

Error extracting text from http://ac.els-cdn.com/S1877705814011357/1-s2.0-S1877705814011357-main.pdf?_tid=e5798050-6e06-11e6-84ae-00000aab0f02&amp;acdnat=1472488838_fe7f0e3cf2de6662864698607d69b1f4: HTTPConnectionPool(host='ac.els-cdn.com', port=80): Max retries exceeded with url: /S1877705814011357/1-s2.0-S1877705814011357-main.pdf?_tid=e5798050-6e06-11e6-84ae-00000aab0f02&amp;acdnat=1472488838_fe7f0e3cf2de6662864698607d69b1f4 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30153bf20>: Failed to resolve 'ac.els-cdn.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  77%|███████▋  | 768/1000 [26:00<03:30,  1.10it/s]

Error extracting text from http://schrts.co/lTZMlf: 404 Client Error: Not Found for url: https://schrts.co:443/lTZMlf
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-violinist-idUSKBN1AE01M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-violinist-idUSKBN1AE01M


Processing URLs:  77%|███████▋  | 770/1000 [26:05<06:38,  1.73s/it]

Error extracting text from https://www.healthnavigator.org.nz/healthy-living/c/covid-19-vaccine-misconceptions/: 404 Client Error: Not Found for url: https://healthify.nz/hauora-wellbeing/c/covid-19-vaccine-misconceptions


Processing URLs:  77%|███████▋  | 771/1000 [26:06<05:57,  1.56s/it]

Error extracting text from http://www.reuters.com/article/2015/11/11/us-mideast-crisis-syria-opposition-idUSKCN0T01AI20151111: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/11/us-mideast-crisis-syria-opposition-idUSKCN0T01AI20151111


Processing URLs:  78%|███████▊  | 775/1000 [26:10<03:30,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-petrobras-assets-onshore-idUSKCN0W7008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-petrobras-assets-onshore-idUSKCN0W7008
Error extracting text from http://www.oddschecker.com/tips/tv-and-specials/20160614-brexit-what-does-the-data-say: 403 Client Error: Forbidden for url: http://www.oddschecker.com/tips/tv-and-specials/20160614-brexit-what-does-the-data-say


Processing URLs:  78%|███████▊  | 777/1000 [26:11<02:28,  1.51it/s]

Error extracting text from http://www.reuters.com/article/2015/09/28/us-usa-fed-evans-idUSKCN0RS26H20150928: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/28/us-usa-fed-evans-idUSKCN0RS26H20150928
URL filtered: http://www.bbc.com/persian/iran/2015/12/151213_u08_saudi_new_ambo_tehran?ocid=socialflow_twitter


Processing URLs:  78%|███████▊  | 781/1000 [26:14<02:51,  1.27it/s]

Error extracting text from http://www.wsj.com/articles/house-votes-to-lift-oil-export-ban-1444411778?mod=djemalertMARKET: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-votes-to-lift-oil-export-ban-1444411778?mod=djemalertMARKET


Processing URLs:  78%|███████▊  | 784/1000 [26:18<03:20,  1.08it/s]

URL filtered: https://en.wikipedia.org/wiki/Twitter_suspensions
URL filtered: https://www.bloomberg.com/news/articles/2017-09-13/uber-s-chief-legal-officer-is-said-to-be-leaving-the-company


Processing URLs:  79%|███████▉  | 788/1000 [26:19<01:42,  2.07it/s]

Error extracting text from http://www.nytimes.com/2016/02/04/world/middleeast/syria-peace-talks-geneva-de-mistura.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/04/world/middleeast/syria-peace-talks-geneva-de-mistura.html?_r=0
Error extracting text from http://www.reuters.com/article/us-iraq-mosul-idUSKBN12G0Z1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iraq-mosul-idUSKBN12G0Z1


Processing URLs:  79%|███████▉  | 792/1000 [26:23<03:09,  1.10it/s]

Error extracting text from http://www.autoevolution.com/news/nextev-1360-hp-electric-hypercar-laps-nurburgring-a-chinese-ev-record-attempt-112326.html#: 403 Client Error: Forbidden for url: https://www.autoevolution.com/news/nextev-1360-hp-electric-hypercar-laps-nurburgring-a-chinese-ev-record-attempt-112326.html


Processing URLs:  80%|███████▉  | 797/1000 [26:31<03:36,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/02/02/us/politics/the-campaign-to-destroy-obamacare-hits-a-wall.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/02/us/politics/the-campaign-to-destroy-obamacare-hits-a-wall.html?_r=0


Processing URLs:  80%|███████▉  | 799/1000 [26:37<05:48,  1.74s/it]

Error extracting text from https://www.nytimes.com/2017/05/24/world/asia/south-china-sea-us-navy-warship-spratly-islands.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/24/world/asia/south-china-sea-us-navy-warship-spratly-islands.html?_r=0


Processing URLs:  80%|████████  | 800/1000 [26:41<07:48,  2.34s/it]

Error extracting text from http://www.start.umd.edu/baad/narratives/haqqani-network: 404 Client Error: Not Found for url: https://www.start.umd.edu/baad/narratives/haqqani-network


Processing URLs:  80%|████████  | 801/1000 [26:51<16:04,  4.85s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/un-to-vote-wednesday-on-new-north-korea-sanctions/2016/03/02/f64d41d0-e046-11e5-8c00-8aa03741dced_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/un-to-vote-wednesday-on-new-north-korea-sanctions/2016/03/02/f64d41d0-e046-11e5-8c00-8aa03741dced_story.html


Processing URLs:  80%|████████  | 802/1000 [26:52<11:43,  3.55s/it]

Error extracting text from https://www.c-span.org/2016-White-House-Correspondents-Association-Dinner/: 403 Client Error: Forbidden for url: https://www.c-span.org/2016-White-House-Correspondents-Association-Dinner/


Processing URLs:  80%|████████  | 803/1000 [26:53<09:41,  2.95s/it]

URL filtered: https://www.youtube.com/watch?v=sIooFGRBZJY


Processing URLs:  81%|████████  | 810/1000 [26:59<03:51,  1.22s/it]

Error extracting text from http://features.foreignpolicy.com/first-helmand-then-afghanistan/: 404 Client Error: Not Found for url: http://features.foreignpolicy.com/first-helmand-then-afghanistan/


Processing URLs:  82%|████████▏ | 815/1000 [27:16<06:20,  2.06s/it]

Error extracting text from https://www.oecd.org/economic-outlook/: 403 Client Error: Forbidden for url: https://www.oecd.org/economic-outlook/


Processing URLs:  82%|████████▏ | 819/1000 [27:23<05:26,  1.80s/it]

Error extracting text from http://www.amazon.com/Playing-Edge-American-Intelligence-Terror-ebook/dp/B00Y9HIMG4: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Playing-Edge-American-Intelligence-Terror-ebook/dp/B00Y9HIMG4


Processing URLs:  82%|████████▏ | 820/1000 [27:23<03:59,  1.33s/it]

Error extracting text from https://www.nytimes.com/2017/11/20/us/politics/north-korea-trump-terror.html?ref=todayspaper: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/20/us/politics/north-korea-trump-terror.html?ref=todayspaper


Processing URLs:  82%|████████▏ | 822/1000 [27:26<03:44,  1.26s/it]

Error extracting text from http://news.softpedia.com/news/cisco-starts-reviewing-code-after-juniper-finds-hidden-backdoor-497971.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/cisco-starts-reviewing-code-after-juniper-finds-hidden-backdoor-497971.shtml


Processing URLs:  83%|████████▎ | 826/1000 [27:48<10:49,  3.73s/it]

Error extracting text from http://www.businessinsider.com.au/warren-buffett-recommends-sp-500-index-2014-3: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/warren-buffett-recommends-sp-500-index-2014-3


Processing URLs:  83%|████████▎ | 829/1000 [27:54<06:51,  2.40s/it]

Error extracting text from http://asia.nikkei.com/Markets/Equities/IPOs-resuming-soon-as-market-regains-calm: 404 Client Error: Not Found for url: https://asia.nikkei.com/Markets/Equities/IPOs-resuming-soon-as-market-regains-calm


Processing URLs:  83%|████████▎ | 834/1000 [28:08<06:14,  2.25s/it]

Error extracting text from http://www.wsj.com/articles/what-happens-after-isis-falls-1473435007: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/what-happens-after-isis-falls-1473435007


Processing URLs:  84%|████████▎ | 837/1000 [28:10<03:24,  1.26s/it]

Error extracting text from http://www.defenddemocracy.org/media-hit/emanuele-ottolenghi-hardliners-set-to-dominate-irans-february-elections/: 403 Client Error: Forbidden for url: http://www.fdd.org/media-hit/emanuele-ottolenghi-hardliners-set-to-dominate-irans-february-elections/


Processing URLs:  84%|████████▍ | 842/1000 [28:17<03:01,  1.15s/it]

Error extracting text from https://www.timesofisrael.com/liveblog-july-13-2021/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/liveblog-july-13-2021/


Processing URLs:  84%|████████▍ | 843/1000 [28:17<02:20,  1.12it/s]

Error extracting text from https://www.nytimes.com/2017/01/25/health/bird-flu-china.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/25/health/bird-flu-china.html


Processing URLs:  85%|████████▌ | 851/1000 [28:34<04:27,  1.80s/it]

Error extracting text from https://english.alarabiya.net/News/world/2022/02/24/Ukraine-says-at-least-seven-killed-and-nine-wounded-by-Russian-shelling: 403 Client Error: Forbidden for url: https://english.alarabiya.net/News/world/2022/02/24/Ukraine-says-at-least-seven-killed-and-nine-wounded-by-Russian-shelling


Processing URLs:  85%|████████▌ | 854/1000 [28:38<03:20,  1.37s/it]

Error extracting text from http://www.wsj.com/articles/italian-finance-minister-calls-for-caution-on-eurozone-financial-reforms-1453627257: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/italian-finance-minister-calls-for-caution-on-eurozone-financial-reforms-1453627257


Processing URLs:  86%|████████▌ | 856/1000 [28:43<04:50,  2.02s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/iaea-to-close-iran-weapon-probe-in-key-step-for-ending-sanctions


Processing URLs:  87%|████████▋ | 866/1000 [28:53<01:39,  1.35it/s]

Error extracting text from https://www.nytimes.com/2017/06/14/world/asia/isis-captures-tora-bora-afghanistan.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=9&amp;pgtype=collection: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/14/world/asia/isis-captures-tora-bora-afghanistan.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=9&amp;pgtype=collection
Error extracting text from http://www.wsj.com/articles/donald-trump-sets-a-bar-for-russia-and-china-1484360380?mod=e2tw: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/donald-trump-sets-a-bar-for-russia-and-china-1484360380?mod=e2tw


Processing URLs:  87%|████████▋ | 871/1000 [28:59<02:14,  1.04s/it]

Error extracting text from https://www.predictit.org/markets/detail/2721/Which-party-will-win-the-2020-US-presidential-election: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/2721/Which-party-will-win-the-2020-US-presidential-election


Processing URLs:  88%|████████▊ | 877/1000 [29:34<07:34,  3.70s/it]

Error extracting text from http://www.reuters.com/article/us-opec-oil-idUSKBN12Z1FM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-oil-idUSKBN12Z1FM
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WP320: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WP320


Processing URLs:  88%|████████▊ | 879/1000 [29:36<04:51,  2.41s/it]

Error extracting text from http://www.themoscowtimes.com/opinion/article/bogged-down-in-the-middle-east-russia-loses-honest-broker-image/555715.html: 500 Server Error: Internal Server Error for url: https://www.themoscowtimes.com/opinion/article/bogged-down-in-the-middle-east-russia-loses-honest-broker-image/555715.html


Processing URLs:  88%|████████▊ | 880/1000 [29:37<04:15,  2.13s/it]

Error extracting text from http://af.reuters.com/article/chadNews/idAFL3N1HM2VH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  88%|████████▊ | 883/1000 [29:43<03:38,  1.87s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/22/double-median-filter/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/22/double-median-filter/
URL filtered: http://www.wired.com/2016/01/apple-sold-a-record-number-of-iphones-but-just-barely/?mbid=social_twitter


Processing URLs:  88%|████████▊ | 885/1000 [29:47<03:38,  1.90s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-18/amazon-s-delivery-drone-research-focuses-on-avoiding-birds


Processing URLs:  89%|████████▉ | 891/1000 [29:56<02:29,  1.37s/it]

Error extracting text from http://www.wsj.com/articles/saudi-arabias-powerful-oil-minister-ali-al-naimi-is-fired-1462632035: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabias-powerful-oil-minister-ali-al-naimi-is-fired-1462632035


Processing URLs:  89%|████████▉ | 892/1000 [29:58<02:24,  1.34s/it]



Processing URLs:  90%|████████▉ | 898/1000 [30:08<03:18,  1.95s/it]

Error extracting text from http://thebulletin.org/north-koreas-nuclear-weapons-what-now/basis-breakthrough-pyongyang-statement: 404 Client Error: Not Found for url: https://thebulletin.org/north-koreas-nuclear-weapons-what-now/basis-breakthrough-pyongyang-statement/


Processing URLs:  90%|█████████ | 903/1000 [30:22<03:06,  1.92s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/how-the-brexit-summit-will-unfold-and-why-you-should-care-q-a
Error extracting text from https://www.piie.com/blogs/trade-and-investment-policy-watch/biden-and-europe-remove-trumps-steel-and-aluminum-tariffs: 403 Client Error: Forbidden for url: https://www.piie.com/blogs/trade-and-investment-policy-watch/biden-and-europe-remove-trumps-steel-and-aluminum-tariffs


Processing URLs:  90%|█████████ | 905/1000 [30:25<02:53,  1.82s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-newzealand-cases/new-zealand-reports-13-new-confirmed-cases-of-coronavirus-idUSKCN25E04Y?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-newzealand-cases/new-zealand-reports-13-new-confirmed-cases-of-coronavirus-idUSKCN25E04Y?il=0


Processing URLs:  91%|█████████ | 909/1000 [30:38<04:20,  2.86s/it]

Error extracting text from http://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary: 403 Client Error: Forbidden for url: https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary


Processing URLs:  91%|█████████ | 911/1000 [30:39<02:30,  1.69s/it]

Error extracting text from https://www.axios.com/how-companies-are-bringing-veterans-into-the-tech-industry-2508243875.html: 403 Client Error: Forbidden for url: https://www.axios.com/how-companies-are-bringing-veterans-into-the-tech-industry-2508243875.html


Processing URLs:  91%|█████████ | 912/1000 [30:49<06:08,  4.19s/it]

Error extracting text from https://www.washingtonpost.com/politics/key-senate-race-in-ohio-showing-increasing-promise-for-gop/2016/08/03/827f3c78-5980-11e6-8b48-0cb344221131_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/key-senate-race-in-ohio-showing-increasing-promise-for-gop/2016/08/03/827f3c78-5980-11e6-8b48-0cb344221131_story.html


Processing URLs:  92%|█████████▏| 916/1000 [30:56<02:55,  2.09s/it]

Error extracting text from http://www.un.org/press/en/2006/sc8792.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2006/sc8792.doc.htm


Processing URLs:  92%|█████████▏| 918/1000 [31:00<03:04,  2.25s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-12-03/house-passes-five-year-305-billion-u-s-highway-funding-plan
URL filtered: https://www.statista.com/statistics/1029918/facebook-users-russia-age-gender/


Processing URLs:  92%|█████████▏| 921/1000 [31:01<01:28,  1.12s/it]

Error extracting text from https://www.science.org/content/article/sars-viruses-may-jump-animals-people-hundreds-thousands-times-year: 403 Client Error: Forbidden for url: https://www.science.org/content/article/sars-viruses-may-jump-animals-people-hundreds-thousands-times-year
URL filtered: https://www.bloomberg.com/news/articles/2021-03-18/scottish-nationalists-may-fall-short-of-majority-poll-shows


Processing URLs:  92%|█████████▏| 924/1000 [31:14<03:59,  3.15s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-06-08/trump-says-no-reason-to-raise-1-billion-for-campaign


Processing URLs:  93%|█████████▎| 927/1000 [31:15<02:08,  1.75s/it]

Error extracting text from http://www.nytimes.com/2016/07/08/us/politics/donald-trump-president.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/08/us/politics/donald-trump-president.html?_r=0


Processing URLs:  93%|█████████▎| 929/1000 [31:17<01:33,  1.32s/it]

Error extracting text from https://publications.parliament.uk/pa/cm5802/cmselect/cmfaff/203/20305.htm: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/cm5802/cmselect/cmfaff/203/20305.htm


Processing URLs:  93%|█████████▎| 932/1000 [31:24<02:10,  1.91s/it]

URL filtered: https://www.youtube.com/watch?v=W6KHSNKpmdg


Processing URLs:  94%|█████████▍| 938/1000 [31:33<01:53,  1.84s/it]

Error extracting text from https://www.reuters.com/article/israel-gulf-usa/in-break-with-past-uae-and-bahrain-forge-ties-with-israel-at-white-house-idUSKBN2660L1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/israel-gulf-usa/in-break-with-past-uae-and-bahrain-forge-ties-with-israel-at-white-house-idUSKBN2660L1


Processing URLs:  94%|█████████▍| 942/1000 [31:45<02:17,  2.37s/it]

Error extracting text from http://www.defensenews.com/longform/defense/policy-budget/warfare/2016/01/08/republican-democratic-hopes-for-anti-isis-coalition/78342876/: 404 Client Error: Not Found for url: https://www.defensenews.com/longform/defense/policy-budget/warfare/2016/01/08/republican-democratic-hopes-for-anti-isis-coalition/78342876/


Processing URLs:  94%|█████████▍| 943/1000 [31:46<01:46,  1.87s/it]

Error extracting text from https://theconversation.com/after-its-president-was-assassinated-haiti-needs-international-help-more-than-ever-164285: 403 Client Error: Forbidden for url: https://theconversation.com/after-its-president-was-assassinated-haiti-needs-international-help-more-than-ever-164285


Processing URLs:  94%|█████████▍| 945/1000 [31:47<01:11,  1.30s/it]

Error extracting text from http://www.iiss.org/en/iiss%20voices/blogsections/iiss-voices-2017-adeb/july-eb75/three-strikes-against-claims-that-iran-is-violating-the-nuclear-accord-f965: 404 Client Error: Not Found for url: https://www.iiss.org/online-analysis/online-analysis//2017/07/iran-violate-nuclear-accord
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN13T0LZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN13T0LZ?il=0


Processing URLs:  95%|█████████▍| 946/1000 [31:48<00:52,  1.03it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/02/18/world/europe/ap-eu-europe-migrants-the-latest.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/02/18/world/europe/ap-eu-europe-migrants-the-latest.html?_r=0


Processing URLs:  95%|█████████▍| 947/1000 [32:48<16:04, 18.21s/it]

Error extracting text from http://www.seattletimes.com/nation-world/spains-regional-elections-reaffirm-rajoy-in-galicia/Acting: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  95%|█████████▍| 948/1000 [32:49<11:25, 13.18s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu/parting-is-such-sweet-sorrow-eu-and-uk-clinch-narrow-brexit-accord-idUSKBN28Y02R?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/parting-is-such-sweet-sorrow-eu-and-uk-clinch-narrow-brexit-accord-idUSKBN28Y02R?il=0
URL filtered: https://twitter.com/raveenaujmaya/status/787789302558257153


Processing URLs:  95%|█████████▌| 950/1000 [32:50<06:07,  7.35s/it]

Error extracting text from https://www.newsweek.com/arizona-west-virginia-gop-voters-back-democrats-election-bill-conservative-opposition-mounts-1591695: 403 Client Error: Forbidden for url: https://www.newsweek.com/arizona-west-virginia-gop-voters-back-democrats-election-bill-conservative-opposition-mounts-1591695


Processing URLs:  95%|█████████▌| 953/1000 [32:53<02:49,  3.61s/it]

Error extracting text from http://en.abna24.com/cultural/archive/2016/07/11/765261/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/cultural/archive/2016/07/11/765261/story.html
Error extracting text from http://www.latimes.com/politics/la-fg-trump-putin-20171111-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-fg-trump-putin-20171111-story.html


Processing URLs:  95%|█████████▌| 954/1000 [32:53<02:03,  2.69s/it]

Error extracting text from http://m.hindustantimes.com/world-news/nawaz-sharif-orders-action-over-news-story-on-national-security/story-hUizPziP8U344V30WXnUyL.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/nawaz-sharif-orders-action-over-news-story-on-national-security/story-hUizPziP8U344V30WXnUyL.html


Processing URLs:  96%|█████████▌| 958/1000 [33:01<01:24,  2.02s/it]

Error extracting text from https://www.cia.gov/library/publications/intelligence-history/richard-helms-collection/richard-helms.pdf: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/intelligence-history/richard-helms-collection/richard-helms.pdf


Processing URLs:  96%|█████████▌| 960/1000 [33:01<00:43,  1.10s/it]

Error extracting text from https://www.nytimes.com/2017/02/14/us/politics/mike-flynn-resign-pence-russia.html?ribbon-ad-idx=4&amp;rref=politics&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=Politics&amp;pgtype=article&amp;pagewanted=all: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/14/us/politics/mike-flynn-resign-pence-russia.html?ribbon-ad-idx=4&amp;rref=politics&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=Politics&amp;pgtype=article&amp;pagewanted=all
Error extracting text from http://www.reuters.com/article/us-russia-poland-sanctions/russia-preparing-retaliation-against-poland-over-war-memorial-row-agencies-idUSKBN1A41VB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-poland-sanctions/russia-preparing-retaliation-against-poland-over-war-memorial-row-agencies-idUSKBN1A41VB


Processing URLs:  96%|█████████▌| 962/1000 [33:01<00:23,  1.64it/s]

Error extracting text from http://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN15V250?feedType=RSS&amp;feedName=politicsNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN15V250?feedType=RSS&amp;feedName=politicsNews
Error extracting text from http://www.nytimes.com/2016/08/21/world/europe/turkey-wedding-bombing.html?ref=europe: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/21/world/europe/turkey-wedding-bombing.html?ref=europe


Processing URLs:  97%|█████████▋| 968/1000 [33:15<01:04,  2.01s/it]

Error extracting text from http://www.oddschecker.com/politics/world-politics/united-nations/female-secretary-general: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/world-politics/united-nations/female-secretary-general


Processing URLs:  97%|█████████▋| 972/1000 [33:21<00:46,  1.65s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/burundi-bans-three-un-human-rights-investigators/3195622.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/burundi-bans-three-un-human-rights-investigators/3195622.html


Processing URLs:  98%|█████████▊| 975/1000 [33:23<00:24,  1.03it/s]

Error extracting text from https://www.un.org/en/ga/credentials/credentials.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/credentials/credentials.shtml


Processing URLs:  98%|█████████▊| 976/1000 [34:23<07:30, 18.75s/it]

Error extracting text from http://intelgame.acera.unimelb.edu.au/?body=about: HTTPConnectionPool(host='intelgame.acera.unimelb.edu.au', port=80): Max retries exceeded with url: /?body=about (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe8429c0>, 'Connection to intelgame.acera.unimelb.edu.au timed out. (connect timeout=60)'))


Processing URLs:  98%|█████████▊| 978/1000 [34:27<03:37,  9.91s/it]

Error extracting text from https://www.reuters.com/business/energy/eu-court-adviser-says-nord-stream-2-can-challenge-eu-rules-2021-10-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/eu-court-adviser-says-nord-stream-2-can-challenge-eu-rules-2021-10-06/


Processing URLs:  98%|█████████▊| 983/1000 [34:38<00:51,  3.06s/it]

Error extracting text from http://bangkok.coconuts.co/2017/02/09/thailands-next-election-exactly-one-year-deputy-pm: 403 Client Error: Forbidden for url: https://coconuts.co/bangkok/2017/02/09/thailands-next-election-exactly-one-year-deputy-pm


Processing URLs:  98%|█████████▊| 984/1000 [34:39<00:36,  2.31s/it]

Error extracting text from http://thehill.com/policy/energy-environment/323959-trumps-defense-secretary-calls-climate-change-a-national-security: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/323959-trumps-defense-secretary-calls-climate-change-a-national-security/
Error extracting text from http://www.nigeriatoday.ng/2016/08/fulani-herdsmen-kill-6-in-kaduna/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/08/fulani-herdsmen-kill-6-in-kaduna/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304723f50>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▉| 988/1000 [34:42<00:16,  1.34s/it]

Error extracting text from https://www.nytimes.com/2018/05/02/business/tesla-earnings-model-3.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/05/02/business/tesla-earnings-model-3.html
URL filtered: https://www.bloomberg.com/graphics/2017-arctic/the-political-arctic/


Processing URLs:  99%|█████████▉| 990/1000 [34:44<00:10,  1.08s/it]

URL filtered: https://www.youtube.com/watch?v=rF5z4oUBrmo


Processing URLs:  99%|█████████▉| 992/1000 [34:44<00:05,  1.40it/s]

Error extracting text from https://financefeeds.com/is-amazon-about-to-announce-a-proprietary-cryptocurrency-project/: 403 Client Error: Forbidden for url: https://financefeeds.com/is-amazon-about-to-announce-a-proprietary-cryptocurrency-project/


Processing URLs: 100%|█████████▉| 996/1000 [35:00<00:12,  3.02s/it]

URL filtered: https://twitter.com/GebeilyM/status/826400993298567168


Processing URLs: 100%|██████████| 1000/1000 [35:07<00:00,  2.11s/it]


Error extracting text from http://nyti.ms/1iWEucI: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/13/world/middleeast/sinjar-isis-iraq-syria.html?smid=pl-share


Processing URLs:   0%|          | 1/1000 [00:00<07:22,  2.26it/s]

Error extracting text from https://thehill.com/homenews/administration/534561-biden-taps-wendy-sherman-for-no-2-state-department-post: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/534561-biden-taps-wendy-sherman-for-no-2-state-department-post/


Processing URLs:   0%|          | 2/1000 [00:03<36:01,  2.17s/it]

URL filtered: https://twitter.com/KelseyTuoc


Processing URLs:   0%|          | 4/1000 [00:04<16:46,  1.01s/it]

Error extracting text from http://www.techeye.net/news/string-theory-might-be-about-to-finally-be-killed-off: 403 Client Error: Forbidden for url: http://www.techeye.net/news/string-theory-might-be-about-to-finally-be-killed-off


Processing URLs:   1%|          | 7/1000 [00:08<18:36,  1.12s/it]

Error extracting text from http://ndb.int/BRICSbankNDBtosellGreenrenminbibonds.php: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /BRICSbankNDBtosellGreenrenminbibonds.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe74d0d0>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|          | 9/1000 [00:11<18:57,  1.15s/it]

Error extracting text from http://www.policyexchange.org.uk/images/WolfsonPrize/wolfson%20economics%20prize%20winning%20entry.pdf: 403 Client Error: Forbidden for url: http://www.policyexchange.org.uk/images/WolfsonPrize/wolfson%20economics%20prize%20winning%20entry.pdf


Processing URLs:   1%|          | 10/1000 [00:12<21:07,  1.28s/it]

Error extracting text from https://larswericson.wordpress.com/2015/10/13/our-hostages-versus-their-hostages/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/10/13/our-hostages-versus-their-hostages/


Processing URLs:   1%|          | 12/1000 [00:16<28:35,  1.74s/it]

Error extracting text from http://afrobarometer.org/press/presidents-approval-rating-remains-high-despite-rising-pessimism-about-zimbabwes-economic-conditions: 404 Client Error: Not Found for url: https://www.afrobarometer.org/press/presidents-approval-rating-remains-high-despite-rising-pessimism-about-zimbabwes-economic-conditions


Processing URLs:   1%|▏         | 13/1000 [00:20<35:54,  2.18s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-12-13/fed-seen-delivering-one-of-the-most-hawkish-pivots-in-years


Processing URLs:   2%|▏         | 17/1000 [00:22<17:37,  1.08s/it]

Error extracting text from https://www.wsj.com/articles/trump-administration-sanctions-25-iranian-entities-1486135896: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-administration-sanctions-25-iranian-entities-1486135896


Processing URLs:   2%|▏         | 19/1000 [00:26<21:20,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-iraq-oil-pipeline-turkey-idUSKCN0VW24U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iraq-oil-pipeline-turkey-idUSKCN0VW24U


Processing URLs:   2%|▏         | 20/1000 [00:28<22:25,  1.37s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-07-05/comey-says-no-clinton-charges-recommended-despite-carelessness?cmpid=wsdemand


Processing URLs:   2%|▏         | 22/1000 [00:28<14:08,  1.15it/s]

Error extracting text from http://www.sfgate.com/drought/article/California-water-limits-might-be-eased-slightly-6762776.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/drought/article/California-water-limits-might-be-eased-slightly-6762776.php


Processing URLs:   3%|▎         | 28/1000 [00:47<31:38,  1.95s/it]  

Error extracting text from http://www.gov.me/en/News/147367/PM-dukanovic-NATO-bombing-1999-most-powerful-argument-for-Montenegro-s-accession-to-NATO-Alliance.html: 404 Client Error: not found for url: https://www.gov.me/en/News/147367/PM-dukanovic-NATO-bombing-1999-most-powerful-argument-for-Montenegro-s-accession-to-NATO-Alliance.html


Processing URLs:   3%|▎         | 30/1000 [00:57<52:35,  3.25s/it]  

Error extracting text from http://www.reconsidermedia.com/podcast/brexit: 406 Client Error: Not Acceptable for url: http://www.reconsidermedia.com/podcast/brexit


Processing URLs:   3%|▎         | 32/1000 [01:03<46:14,  2.87s/it]

URL filtered: https://www.youtube.com/watch?v=cj8ZNgnzSSU


Processing URLs:   4%|▎         | 35/1000 [01:04<23:21,  1.45s/it]



Processing URLs:   4%|▎         | 36/1000 [01:06<22:49,  1.42s/it]

Error extracting text from http://www.pewforum.org/files/2016/01/PF_2016-01-27_religion-politics_FINAL.pdf: 404 Client Error: Not Found for url: https://www.pewresearch.org/religion/files/2016/01/PF_2016-01-27_religion-politics_FINAL.pdf


Processing URLs:   4%|▍         | 44/1000 [01:17<20:46,  1.30s/it]

Error extracting text from http://www.newsweek.com/2015/12/25/diabetes-drug-could-be-anti-aging-miracle-404370.html: 403 Client Error: Forbidden for url: https://www.newsweek.com/2015/12/25/diabetes-drug-could-be-anti-aging-miracle-404370.html
URL filtered: http://www.bloomberg.com/news/articles/2016-02-08/zuma-meets-south-african-ceos-as-investor-confidence-fades


Processing URLs:   5%|▍         | 46/1000 [01:19<20:36,  1.30s/it]

Error extracting text from http://shsucj.blogspot.ca/2016/08/real-talk-wcj-mike-coates-stp-nuclear.html?m=1: 404 Client Error: Not Found for url: http://shsucj.blogspot.com/2016/08/real-talk-wcj-mike-coates-stp-nuclear.html?m=1


Processing URLs:   5%|▍         | 48/1000 [01:20<14:02,  1.13it/s]

Error extracting text from http://www.icc-cricket.com/team-rankings/t20i: 404 Client Error: Not Found for url: https://www.icc-cricket.com/team-rankings/t20i
Error extracting text from https://www.axios.com/us-embassy-citizens-afghanistan-taliban-attack-f4e63b65-dbfb-4901-abd0-09ee4cedac67.html: 403 Client Error: Forbidden for url: https://www.axios.com/us-embassy-citizens-afghanistan-taliban-attack-f4e63b65-dbfb-4901-abd0-09ee4cedac67.html


Processing URLs:   5%|▌         | 50/1000 [01:24<20:17,  1.28s/it]

Error extracting text from http://blogs.reuters.com/breakingviews/2015/08/21/asian-capital-controls-are-a-real-risk-once-again/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2015/08/21/asian-capital-controls-are-a-real-risk-once-again/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e0b740>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   6%|▌         | 55/1000 [01:30<15:42,  1.00it/s]

URL filtered: https://animalcharityevaluators.org/research/other-topics/trends-in-meat-production/?utm_sq=fybuv0h8j3&utm_source=Twitter&utm_medium=social&utm_campaign=Effect-Altruism&utm_content=Educate-Animals


Processing URLs:   6%|▌         | 60/1000 [01:35<18:52,  1.20s/it]

Error extracting text from http://www.foxnews.com/world/2015/10/14/cuban-military-forces-deployed-to-syria-to-operate-russian-tanks-say-sources/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/10/14/cuban-military-forces-deployed-to-syria-to-operate-russian-tanks-say-sources/


Processing URLs:   6%|▌         | 62/1000 [01:38<21:55,  1.40s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/01/28/89/0301000000AEN20160128005352315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   6%|▋         | 65/1000 [01:42<24:29,  1.57s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-29/china-sends-japan-a-don-t-meddle-message-via-an-ex-navy-ship


Processing URLs:   7%|▋         | 67/1000 [01:45<23:12,  1.49s/it]

Error extracting text from http://www.interpretermag.com/wp-content/uploads/2014/11/The_Menace_of_Unreality_Final.pdf: 404 Client Error: Not Found for url: http://www.interpretermag.com/wp-content/uploads/2014/11/The_Menace_of_Unreality_Final.pdf


Processing URLs:   7%|▋         | 69/1000 [02:48<4:24:32, 17.05s/it]

Error extracting text from http://aa.com.tr/en/middle-east/violence-kills-11-in-southern-yemen/883417: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:   7%|▋         | 70/1000 [02:49<3:15:05, 12.59s/it]

Error extracting text from https://nationalinterest.org/blog/politics/why-joe-biden-should-boycott-china%E2%80%99s-2022-winter-olympics-175292: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/politics/why-joe-biden-should-boycott-china%E2%80%99s-2022-winter-olympics-175292


Processing URLs:   7%|▋         | 71/1000 [02:51<2:30:36,  9.73s/it]

Error extracting text from http://in.reuters.com/article/germany-russia-idINKBN13X16C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:   7%|▋         | 72/1000 [02:52<1:48:44,  7.03s/it]

Error extracting text from https://www.nytimes.com/2015/04/17/world/americas/mexicos-president-rolls-out-plan-to-save-endangered-porpoise.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/04/17/world/americas/mexicos-president-rolls-out-plan-to-save-endangered-porpoise.html


Processing URLs:   7%|▋         | 73/1000 [03:52<5:45:48, 22.38s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-10-03/the-latest-eu-party-leader-urges-sacking-of-uks-johnson: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.cnbc.com/2017/10/01/facebook-plans-to-give-russia-tied-ad-data-to-congress.html


Processing URLs:   8%|▊         | 75/1000 [03:53<3:12:14, 12.47s/it]

URL filtered: https://www.wsj.com/articles/elon-musk-offers-to-buy-rest-of-twitter-for-54-20-a-share-11649932296


Processing URLs:   9%|▊         | 87/1000 [04:06<20:53,  1.37s/it]  

Error extracting text from http://finance.yahoo.com/news/oil-tumbles-saudi-arabia-says-121851545.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/oil-tumbles-saudi-arabia-says-121851545.html


Processing URLs:   9%|▉         | 90/1000 [04:11<20:45,  1.37s/it]

Error extracting text from http://www.nytimes.com/2016/03/03/us/politics/white-house-vetting-jane-kelly-judge-supreme-court.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/03/us/politics/white-house-vetting-jane-kelly-judge-supreme-court.html?_r=0


Processing URLs:   9%|▉         | 92/1000 [04:14<20:40,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-russia-usa-treaty-idUSKBN17V0KQ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-treaty-idUSKBN17V0KQ?il=0


Processing URLs:   9%|▉         | 93/1000 [04:16<21:42,  1.44s/it]

Error extracting text from http://uk.reuters.com/article/uk-turkey-politics-minister-idUKKBN17M0Q5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  10%|▉         | 95/1000 [04:19<22:20,  1.48s/it]

Error extracting text from http://uk.reuters.com/article/us-southchinasea-china-kerry-idUKKCN0ZM2GU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  10%|▉         | 98/1000 [04:31<44:32,  2.96s/it]  

Error extracting text from http://news.yahoo.com/eu-leaders-agree-brexit-deal-lithuania-president-211603383.html: 404 Client Error: Not Found for url: http://news.yahoo.com/eu-leaders-agree-brexit-deal-lithuania-president-211603383.html


Processing URLs:  10%|█         | 101/1000 [04:33<23:34,  1.57s/it]

Error extracting text from https://www.tcmb.gov.tr/wps/wcm/connect/EN/TCMB+EN/Main+Menu/Statistics/Balance+of+Payments+and+Related+Statistics/International+Reserves-Foreign+Currency+Liquidity/Data/: 404 Client Error: Not Found for url: https://www.tcmb.gov.tr/wps/wcm/connect/EN/TCMB+EN/Main+Menu/Statistics/Balance+of+Payments+and+Related+Statistics/International+Reserves-Foreign+Currency+Liquidity/Data/


Processing URLs:  10%|█         | 102/1000 [04:35<24:23,  1.63s/it]

Error extracting text from http://www.nationalreview.com/corner/426270/what-ben-carsons-mannatech-answer-tells-us-jim-geraghty: 404 Client Error: Not Found for url: https://www.nationalreview.com/corner/426270/what-ben-carsons-mannatech-answer-tells-us-jim-geraghty/


Processing URLs:  10%|█         | 105/1000 [04:39<20:00,  1.34s/it]

Error extracting text from https://www.nytimes.com/2017/12/27/world/middleeast/syria-evacuations.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/27/world/middleeast/syria-evacuations.html


Processing URLs:  11%|█         | 106/1000 [04:40<17:59,  1.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-11/moody-s-cuts-malaysia-credit-rating-outlook-on-weaker-finances


Processing URLs:  11%|█         | 110/1000 [04:43<13:29,  1.10it/s]

Error extracting text from http://www.newson6.com/story/32203965/oklahoma-national-guard-battalion-deploys-to-middle-east-to-fight-isis: 403 Client Error: Forbidden for url: http://www.newson6.com/story/32203965/oklahoma-national-guard-battalion-deploys-to-middle-east-to-fight-isis


Processing URLs:  11%|█         | 111/1000 [04:45<15:31,  1.05s/it]

Error extracting text from http://www.newsweek.com/bds-movement-accuses-israel-series-cyber-attacks-466113: 403 Client Error: Forbidden for url: https://www.newsweek.com/bds-movement-accuses-israel-series-cyber-attacks-466113


Processing URLs:  11%|█         | 112/1000 [04:48<25:54,  1.75s/it]

Error extracting text from http://www.frontiermyanmar.net/en/constitutional-suspension-raised-possible-fix-daw-suu-kyi-presidency: 404 Client Error: Not Found for url: https://www.frontiermyanmar.net/en/constitutional-suspension-raised-possible-fix-daw-suu-kyi-presidency


Processing URLs:  12%|█▏        | 115/1000 [04:53<22:42,  1.54s/it]

Error extracting text from http://www.tradingeconomics.com/venezuela/government-budget: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/venezuela/government-budget


Processing URLs:  12%|█▏        | 116/1000 [04:53<16:53,  1.15s/it]

Error extracting text from https://www.nytimes.com/2020/12/13/nyregion/nyc-shooting-cathedral.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/13/nyregion/nyc-shooting-cathedral.html


Processing URLs:  12%|█▏        | 118/1000 [04:54<10:28,  1.40it/s]

Error extracting text from http://www.reuters.com/article/us-usa-russia-election-exclusive-idUSKBN17L2N3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-election-exclusive-idUSKBN17L2N3


Processing URLs:  12%|█▏        | 119/1000 [04:57<20:51,  1.42s/it]

Error extracting text from http://english.aawsat.com/2016/01/article55346549/arab-and-regional-condemnation-of-iranian-interference-in-5-countries: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/01/article55346549/arab-and-regional-condemnation-of-iranian-interference-in-5-countries


Processing URLs:  12%|█▏        | 121/1000 [04:58<13:47,  1.06it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-07/dollar-traders-give-yellen-green-light-as-fed-rate-meeting-nears


Processing URLs:  12%|█▎        | 125/1000 [05:03<16:10,  1.11s/it]

Error extracting text from http://www.reuters.com/article/2015/11/27/us-election-trump-idUSKBN0TG2AN20151127: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/us-election-trump-idUSKBN0TG2AN20151127


Processing URLs:  13%|█▎        | 131/1000 [05:11<17:06,  1.18s/it]

Error extracting text from http://www.reuters.com/article/2015/09/11/us-iran-nuclear-parchin-exclusive-idUSKCN0RB2D420150911: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/11/us-iran-nuclear-parchin-exclusive-idUSKCN0RB2D420150911


Processing URLs:  13%|█▎        | 134/1000 [05:17<20:26,  1.42s/it]

Error extracting text from http://www.turkishweekly.net/2015/12/16/news/france-uses-long-range-missiles-to-strike-is-targets-in-iraq-ministry/: 404 Client Error: Not Found for url: https://turkishweekly.net/2015/12/16/news/france-uses-long-range-missiles-to-strike-is-targets-in-iraq-ministry/
Error extracting text from http://www.reuters.com/article/2015/09/16/us-mideast-crisis-syria-assad-idUSKCN0RG0LX20150916: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/16/us-mideast-crisis-syria-assad-idUSKCN0RG0LX20150916


Processing URLs:  14%|█▎        | 135/1000 [05:18<16:53,  1.17s/it]

Error extracting text from http://thehill.com/policy/energy-environment/263371-spending-and-tax-deal-ends-crude-oil-export-ban-extends-renewable: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/263371-spending-and-tax-deal-ends-crude-oil-export-ban-extends-renewable/


Processing URLs:  14%|█▍        | 140/1000 [05:28<21:33,  1.50s/it]

Error extracting text from https://uk.finance.yahoo.com/news/venezuela-creditors-committee-again-postpones-194210953.html: 404 Client Error: Not Found for url: https://uk.finance.yahoo.com/news/venezuela-creditors-committee-again-postpones-194210953.html


Processing URLs:  14%|█▍        | 144/1000 [05:34<20:42,  1.45s/it]

Error extracting text from http://www.fpri.org/articles/2016/01/foreseeable-foreseen-ignored-iran-advancing-its-missile-program-home-while-offshoring-its-nuclear-program-north-korea: 403 Client Error: Forbidden for url: https://www.fpri.org/articles/2016/01/foreseeable-foreseen-ignored-iran-advancing-its-missile-program-home-while-offshoring-its-nuclear-program-north-korea
URL filtered: https://www.youtube.com/watch?v=ikW8D4fncoI


Processing URLs:  15%|█▍        | 146/1000 [05:35<15:07,  1.06s/it]

Error extracting text from https://www.eiu.com/n/campaigns/global-liveability-index-2021/: 403 Client Error: Forbidden for url: https://www.eiu.com/n/campaigns/global-liveability-index-2021/


Processing URLs:  15%|█▍        | 149/1000 [06:38<4:03:52, 17.19s/it]

Error extracting text from https://www.dfat.gov.au/trade/agreements/in-force/cptpp/comprehensive-and-progressive-agreement-for-trans-pacific-partnership: HTTPSConnectionPool(host='www.dfat.gov.au', port=443): Read timed out. (read timeout=60)


Processing URLs:  15%|█▌        | 152/1000 [06:45<1:48:38,  7.69s/it]

Error extracting text from http://www.world-nuclear.org/info/current-and-future-generation/nuclear-power-in-the-world-today/: 404 Client Error: Not Found for url: https://www.world-nuclear.org/info/Current-and-Future-Generation/Nuclear-Power-in-the-World-Today/


Processing URLs:  16%|█▌        | 155/1000 [06:48<46:03,  3.27s/it]  

Error extracting text from http://business.financialpost.com/news/economy/trump-team-sets-timetable-for-possible-nafta-exit: 403 Client Error: Forbidden for url: https://financialpost.com/news/economy/trump-team-sets-timetable-for-possible-nafta-exit
Error extracting text from http://news.yahoo.com/trump-still-not-doing-well-enough-guarantee-nomination-143140132--election.html: 404 Client Error: Not Found for url: http://news.yahoo.com/trump-still-not-doing-well-enough-guarantee-nomination-143140132--election.html


Processing URLs:  16%|█▌        | 158/1000 [06:52<27:18,  1.95s/it]

URL filtered: http://fortune.com/2017/03/05/facebook-disputed-news-tag/


Processing URLs:  16%|█▌        | 161/1000 [06:54<18:30,  1.32s/it]

Error extracting text from http://www.dailysabah.com/money/2016/04/19/russia-expands-visa-requirements-for-turkish-citizens: 404 Client Error: Not Found for url: https://www.dailysabah.com/money/2016/04/19/russia-expands-visa-requirements-for-turkish-citizens


Processing URLs:  16%|█▋        | 164/1000 [07:02<31:02,  2.23s/it]

Error extracting text from http://nation.com.pk/national/09-Jan-2016/imran-wants-govt-to-mediate-in-saudi-iran-conflict: 503 Server Error: Backend fetch failed for url: https://www.nation.com.pk/national/09-Jan-2016/imran-wants-govt-to-mediate-in-saudi-iran-conflict


Processing URLs:  17%|█▋        | 166/1000 [07:08<35:48,  2.58s/it]

Error extracting text from http://tass.ru/en/world/846542: 404 Client Error: Not Found for url: https://tass.ru/en/world/846542


Processing URLs:  17%|█▋        | 170/1000 [07:11<18:21,  1.33s/it]

URL filtered: https://twitter.com/AFSpace/lists/experts/members


Processing URLs:  18%|█▊        | 175/1000 [07:15<14:38,  1.06s/it]

URL filtered: https://twitter.com/apmassaro3/status/1497255556347834375).


Processing URLs:  18%|█▊        | 177/1000 [07:16<09:36,  1.43it/s]

Error extracting text from http://lgbc-scotland.gov.uk/about-us/commission: HTTPConnectionPool(host='lgbc-scotland.gov.uk', port=80): Max retries exceeded with url: /about-us/commission (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301993260>: Failed to resolve 'lgbc-scotland.gov.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  18%|█▊        | 178/1000 [07:18<15:27,  1.13s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/681899/dunford-counter-isil-shaping-operations-have-begun-in-mosul: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/681899/dunford-counter-isil-shaping-operations-have-begun-in-mosul


Processing URLs:  18%|█▊        | 180/1000 [07:20<15:34,  1.14s/it]

Error extracting text from http://www.brennancenter.org/sites/default/files/legacy/CGR%20Reprint: 404 Client Error: Not Found for url: https://www.brennancenter.org/sites/default/files/legacy/CGR%20Reprint


Processing URLs:  18%|█▊        | 181/1000 [07:21<13:30,  1.01it/s]

Error extracting text from http://www.realclearpolitics.com/articles/2015/08/26/demographics_and_the_2016_election_scenarios.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2015/08/26/demographics_and_the_2016_election_scenarios.html


Processing URLs:  18%|█▊        | 184/1000 [07:24<10:17,  1.32it/s]

Error extracting text from http://www.travelweekly.com/Travel-News/Hotel-News/Airbnb-may-be-preparing-for-IPO: 405 Client Error: Not Allowed for url: https://www.travelweekly.com/Travel-News/Hotel-News/Airbnb-may-be-preparing-for-IPO
Error extracting text from https://www.france24.com/en/live-news/20220329-french-far-right-leader-le-pen-closing-gap-on-macron-polls: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20220329-french-far-right-leader-le-pen-closing-gap-on-macron-polls


Processing URLs:  18%|█▊        | 185/1000 [07:24<07:59,  1.70it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-russia-idUSKBN13A0Z7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-russia-idUSKBN13A0Z7


Processing URLs:  19%|█▊        | 186/1000 [07:25<10:55,  1.24it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-12/bullard-says-zero-rates-no-longer-needed-as-inflation-near-goal
URL filtered: http://www.bloomberg.com/gadfly/articles/2016-07-24/asian-corporate-defaults-are-just-getting-started


Processing URLs:  19%|█▉        | 190/1000 [07:28<11:10,  1.21it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-16/china-s-xi-defends-web-controls-in-call-for-cybersovereignty-


Processing URLs:  19%|█▉        | 194/1000 [07:30<08:02,  1.67it/s]

Error extracting text from http://www.nytimes.com/2015/12/28/opinion/doubling-down-on-w.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/28/opinion/doubling-down-on-w.html
Error extracting text from https://www.reuters.com/article/us-tesla-truck-research/teslas-unfettered-ambition-to-drain-finances-analysts-idUSKBN1DH1M4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-truck-research/teslas-unfettered-ambition-to-drain-finances-analysts-idUSKBN1DH1M4


Processing URLs:  20%|█▉        | 196/1000 [07:34<15:20,  1.14s/it]

Error extracting text from https://www.statology.org/probability-of-at-least-one-success/: 403 Client Error: Forbidden for url: https://www.statology.org/probability-of-at-least-one-success/


Processing URLs:  20%|██        | 200/1000 [07:40<16:40,  1.25s/it]

Error extracting text from http://www.telegraph.co.uk/news/worldnews/middleeast/syria/12144248/Up-to-70000-Syrian-refugees-flee-to-Turkeys-closed-border.html: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/news/worldnews/middleeast/syria/12144248/Up-to-70000-Syrian-refugees-flee-to-Turkeys-closed-border.html


Processing URLs:  20%|██        | 202/1000 [07:42<15:36,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-village-idUSKCN0XX0C6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-village-idUSKCN0XX0C6


Processing URLs:  20%|██        | 204/1000 [07:43<10:30,  1.26it/s]

Error extracting text from http://www.radioiowa.com/2016/01/18/surveying-undecided-iowa-gop-caucus-goers/: 403 Client Error: Forbidden for url: http://www.radioiowa.com/2016/01/18/surveying-undecided-iowa-gop-caucus-goers/


Processing URLs:  21%|██        | 208/1000 [07:51<22:38,  1.72s/it]

Error extracting text from https://bit.ly/3vQDzk0: 403 Client Error: Forbidden for url: https://www.electoralcommission.org.uk/who-we-are-and-what-we-do/elections-and-referendums


Processing URLs:  21%|██        | 211/1000 [07:56<20:06,  1.53s/it]

Error extracting text from http://www.brainpreservation.org/overview/: 406 Client Error: Not Acceptable for url: http://www.brainpreservation.org/overview/


Processing URLs:  22%|██▏       | 215/1000 [08:00<14:34,  1.11s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/yellen-signals-confidence-in-economy-ahead-of-fed-rate-decision


Processing URLs:  22%|██▏       | 222/1000 [08:05<07:42,  1.68it/s]

URL filtered: https://twitter.com/demishassabis/status/695379217870008321
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-exclusive-idUSKBN19Z1EN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-exclusive-idUSKBN19Z1EN


Processing URLs:  22%|██▎       | 225/1000 [08:09<14:20,  1.11s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/30/751023/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/30/751023/story.html


Processing URLs:  23%|██▎       | 228/1000 [08:14<19:09,  1.49s/it]

Error extracting text from http://nationalinterest.org/feature/china-wants-turn-water-territory-the-south-china-sea-beyond-19578: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/china-wants-turn-water-territory-the-south-china-sea-beyond-19578


Processing URLs:  23%|██▎       | 229/1000 [08:16<18:32,  1.44s/it]

Error extracting text from http://www.wsj.com/articles/china-issuing-strict-controls-on-overseas-investment-1480071529: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-issuing-strict-controls-on-overseas-investment-1480071529


Processing URLs:  23%|██▎       | 231/1000 [09:17<3:57:41, 18.55s/it]

Error extracting text from https://www.gambling.com/news/no-confidence-in-boris-johnson-vote-odds-hint-at-tory-rebellion-2366900: HTTPSConnectionPool(host='www.gambling.com', port=443): Max retries exceeded with url: /news/no-confidence-in-boris-johnson-vote-odds-hint-at-tory-rebellion-2366900 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303203f50>, 'Connection to www.gambling.com timed out. (connect timeout=60)'))


Processing URLs:  23%|██▎       | 232/1000 [10:17<6:34:30, 30.82s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/haiti/article70110512.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  24%|██▎       | 235/1000 [10:20<2:25:34, 11.42s/it]

URL filtered: https://twitter.com/ScottWapnerCNBC


Processing URLs:  24%|██▍       | 240/1000 [10:24<37:36,  2.97s/it]  

Error extracting text from http://jsb.cs.uec.ac.jp/~igo/eng/participant.html: HTTPConnectionPool(host='jsb.cs.uec.ac.jp', port=80): Max retries exceeded with url: /~igo/eng/participant.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30364da90>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.washingtontimes.com/news/2016/jan/20/l-todd-wood-wwii-animosities-flare-south-china-sea/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/20/l-todd-wood-wwii-animosities-flare-south-china-sea/


Processing URLs:  24%|██▍       | 242/1000 [10:32<41:34,  3.29s/it]

URL filtered: https://www.youtube.com/watch?v=myVzaR8cmDA


Processing URLs:  25%|██▍       | 246/1000 [10:35<19:41,  1.57s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-carter-idUSKCN12L1ZW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-carter-idUSKCN12L1ZW?il=0


Processing URLs:  25%|██▍       | 248/1000 [10:38<17:35,  1.40s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/49516007.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/49516007.cms


Processing URLs:  25%|██▍       | 249/1000 [10:38<13:24,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/gop-realizes-risk-of-clinton-picking-a-supreme-court-nominee-1457642549: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gop-realizes-risk-of-clinton-picking-a-supreme-court-nominee-1457642549


Processing URLs:  25%|██▌       | 254/1000 [10:47<20:35,  1.66s/it]

Error extracting text from http://www.pcacases.com/web/sendAttach/1524: 406 Client Error: Not Acceptable for url: http://www.pcacases.com/web/sendAttach/1524


Processing URLs:  26%|██▌       | 257/1000 [10:54<28:21,  2.29s/it]

Error extracting text from https://undocs.org/en/A/RES/75/287: HTTPSConnectionPool(host='daccess-ods.un.org', port=443): Max retries exceeded with url: /access.nsf/Get?OpenAgent&DS=A/RES/75/287&Lang=E (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  26%|██▌       | 260/1000 [10:55<13:11,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-brokering-analys-idUSKBN15V2QO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-brokering-analys-idUSKBN15V2QO


Processing URLs:  27%|██▋       | 266/1000 [11:28<1:44:05,  8.51s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2016-12-29/u-s-hits-russian-officials-with-sanctions-over-election-hacks


Processing URLs:  27%|██▋       | 269/1000 [11:30<44:45,  3.67s/it]  

Error extracting text from http://www.reuters.com/article/us-global-taxavoidance-cameron-idUSKCN0Y11NW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-taxavoidance-cameron-idUSKCN0Y11NW


Processing URLs:  28%|██▊       | 279/1000 [11:52<22:19,  1.86s/it]

Error extracting text from http://www.reuters.com/article/us-asean-philippines-idUSKBN1AL05R?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-philippines-idUSKBN1AL05R?il=0


Processing URLs:  28%|██▊       | 280/1000 [11:56<27:46,  2.31s/it]

Error extracting text from http://www.water.ca.gov/news/newsreleases/2015/123015.pdf: 404 Client Error: Not Found for url: https://water.ca.gov/news/newsreleases/2015/123015.pdf


Processing URLs:  29%|██▉       | 288/1000 [12:14<26:04,  2.20s/it]

Error extracting text from http://english.aawsat.com/2016/11/article55361422/aramcos-shares-set-ipo-2018: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/11/article55361422/aramcos-shares-set-ipo-2018


Processing URLs:  29%|██▉       | 293/1000 [12:17<12:38,  1.07s/it]

URL filtered: https://www.blueorigin.com/#youtubexYYTuZCjZcE


Processing URLs:  30%|██▉       | 295/1000 [13:17<2:41:24, 13.74s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-01-10/brazils-rousseff-focuses-on-economy-in-face-of-impeachment: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  30%|██▉       | 297/1000 [13:19<1:35:47,  8.18s/it]

Error extracting text from https://www.reuters.com/world/china/chinas-power-crunch-dwarfs-evergrandes-troubles-investors-eyes-2021-09-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/china/chinas-power-crunch-dwarfs-evergrandes-troubles-investors-eyes-2021-09-28/


Processing URLs:  30%|██▉       | 298/1000 [13:22<1:19:25,  6.79s/it]

Error extracting text from http://rusplt.ru/news/rossiya-speshit-otmenyat-650037.html: 404 Client Error: Not Found for url: https://rusplt.ru/news/rossiya-speshit-otmenyat-650037.html


Processing URLs:  30%|███       | 305/1000 [13:34<25:03,  2.16s/it]  

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/263545#.Vum18WWKStU.mailto: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/263545#.Vum18WWKStU.mailto


Processing URLs:  31%|███       | 307/1000 [13:36<18:38,  1.61s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/268836-senate-to-sanction-north-korea-in-rebuke-of-obama-policy: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/268836-senate-to-sanction-north-korea-in-rebuke-of-obama-policy/


Processing URLs:  31%|███       | 309/1000 [13:57<1:18:47,  6.84s/it]

Error extracting text from https://www.thebalance.com/government-shutdown-3305683: 406 Client Error: Not Acceptable for url: https://www.thebalancemoney.com:443/government-shutdown-3305683


Processing URLs:  32%|███▏      | 315/1000 [14:13<26:58,  2.36s/it]  

Error extracting text from http://thehill.com/policy/cybersecurity/356189-mccain-says-white-house-blocked-cyber-czar-from-testifying: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/356189-mccain-says-white-house-blocked-cyber-czar-from-testifying/


Processing URLs:  32%|███▏      | 317/1000 [14:18<28:55,  2.54s/it]

Error extracting text from http://inserbia.info/today/2015/11/experts-serbia-will-have-to-choose-between-russia-and-nato-soon/: 404 Client Error: Not Found for url: https://inserbia.info/today/2015/11/experts-serbia-will-have-to-choose-between-russia-and-nato-soon/


Processing URLs:  32%|███▏      | 321/1000 [14:23<16:57,  1.50s/it]

Error extracting text from http://www.theprovince.com/business/fp/senate+banking+panel+backs+bill+lift+crude+export+despite+white/11406710/story.html: 403 Client Error: Forbidden for url: https://theprovince.com/


Processing URLs:  33%|███▎      | 326/1000 [14:28<10:26,  1.08it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-22/venezuela-s-pdvsa-has-8-billion-of-u-s-assets-at-risk-in-probest


Processing URLs:  33%|███▎      | 329/1000 [14:30<10:01,  1.12it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-14/tillerson-signals-easing-policy-toward-russia-on-ukraine-accord


Processing URLs:  33%|███▎      | 332/1000 [14:30<05:14,  2.13it/s]

Error extracting text from http://www.regulationtomorrow.com/eu/progress-report-on-edis-regulation-and-eu-banking-union/: 403 Client Error: Forbidden for url: https://www.regulationtomorrow.com/eu/progress-report-on-edis-regulation-and-eu-banking-union/


Processing URLs:  33%|███▎      | 333/1000 [14:47<50:04,  4.50s/it]

Error extracting text from http://www.investopedia.com/terms/j/junkbond.asp?layout=orig: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/j/junkbond.asp?layout=orig


Processing URLs:  34%|███▎      | 335/1000 [14:51<38:20,  3.46s/it]

Error extracting text from http://rbth.com/news/2016/01/13/ministry-russia-to-take-no-part-in-nuclear-security-summit-in-washington_558903: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/01/13/ministry-russia-to-take-no-part-in-nuclear-security-summit-in-washington_558903


Processing URLs:  34%|███▎      | 337/1000 [15:00<43:54,  3.97s/it]

Error extracting text from http://www.acleddata.com/wp-content/uploads/2016/05/ACLED-Country-Report-Burundi-May-2016.pd: 404 Client Error: Not Found for url: https://acleddata.com/wp-content/uploads/2016/05/ACLED-Country-Report-Burundi-May-2016.pd


Processing URLs:  34%|███▍      | 338/1000 [15:00<31:59,  2.90s/it]

Error extracting text from https://www.nytimes.com/2017/10/03/world/middleeast/mattis-iran-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/03/world/middleeast/mattis-iran-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp


Processing URLs:  34%|███▍      | 341/1000 [15:02<15:46,  1.44s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/foreign-policy/283421-doubling-down-on-irans-deep-state: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/foreign-policy/283421-doubling-down-on-irans-deep-state/


Processing URLs:  35%|███▌      | 352/1000 [15:22<10:21,  1.04it/s]

Error extracting text from http://www.kurdistan24.net/en/news/33c97c35-dd2f-4dd5-acfa-f3ffdfa95508/-Kurdistan-Region--Iraq-agree-over-Mosul-: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/33c97c35-dd2f-4dd5-acfa-f3ffdfa95508/-Kurdistan-Region--Iraq-agree-over-Mosul-


Processing URLs:  35%|███▌      | 353/1000 [15:24<14:48,  1.37s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN13O0PF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN13O0PF


Processing URLs:  35%|███▌      | 354/1000 [20:32<16:43:26, 93.20s/it]

Error extracting text from http://www.japantoday.com/category/politics/view/s-korea-says-3-way-summit-with-china-japan-may-happen-in-late-oct: 404 Client Error: Not Found for url: https://japantoday.com/category/politics/s-korea-says-3-way-summit-with-china-japan-may-happen-in-late-oct


Processing URLs:  36%|███▌      | 358/1000 [20:38<4:10:28, 23.41s/it] 

Error extracting text from http://www.wsj.com/articles/fears-of-venezuela-default-grow-amid-drop-in-oil-prices-1453422852: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fears-of-venezuela-default-grow-amid-drop-in-oil-prices-1453422852


Processing URLs:  37%|███▋      | 368/1000 [20:58<28:46,  2.73s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2016-12-09/china-championed-asia-trade-pact-gains-traction-in-jakarta-talks


Processing URLs:  37%|███▋      | 370/1000 [21:00<20:02,  1.91s/it]

Error extracting text from https://publisher.websays.com/tower-hamlets-elections-2015/: 403 Client Error: Forbidden for url: https://publisher.websays.com/tower-hamlets-elections-2015/


Processing URLs:  37%|███▋      | 374/1000 [22:07<3:10:44, 18.28s/it]

Error extracting text from https://www.usnews.com/news/best-states/articles/2021-07-29/these-governors-are-mandating-the-covid-19-vaccine-for-government-employees: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  38%|███▊      | 378/1000 [22:11<54:50,  5.29s/it]  

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/exclusive-wto-chief-says-vaccine-answer-close-facing-effort-block-it-2021-12-16/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/exclusive-wto-chief-says-vaccine-answer-close-facing-effort-block-it-2021-12-16/


Processing URLs:  38%|███▊      | 382/1000 [22:18<27:15,  2.65s/it]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-unhcr-idUSKCN0YZ29V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-unhcr-idUSKCN0YZ29V


Processing URLs:  39%|███▉      | 389/1000 [22:31<24:02,  2.36s/it]

Error extracting text from http://www.trust.org/item/20151216123320-7nbdz/?source=search: 404 Client Error:  for url: https://www.trust.org:443/item/20151216123320-7nbdz/?source=search


Processing URLs:  39%|███▉      | 390/1000 [22:32<20:12,  1.99s/it]

Error extracting text from http://google.com/newsstand/s/CBIwscyY_yM: 404 Client Error: Not Found for url: http://google.com/newsstand/s/CBIwscyY_yM


Processing URLs:  39%|███▉      | 391/1000 [22:33<18:05,  1.78s/it]

Error extracting text from http://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151205#LrQt9HRSZkr4I7AU.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151205#LrQt9HRSZkr4I7AU.97


Processing URLs:  40%|███▉      | 399/1000 [22:54<20:32,  2.05s/it]

URL filtered: https://www.facebook.com/FuglsangEP19/posts/853704582105130


Processing URLs:  40%|████      | 401/1000 [22:55<12:51,  1.29s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/Economy/Gov.-Kuroda-says-BOJ-will-make-necessary-policy-adjustments-while-examining-risks: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/Economy/Gov.-Kuroda-says-BOJ-will-make-necessary-policy-adjustments-while-examining-risks


Processing URLs:  40%|████      | 403/1000 [22:59<16:05,  1.62s/it]

Error extracting text from http://asirt.org/initiatives/informing-road-users/road-safety-facts/road-crash-statistics: 403 Client Error: Forbidden for url: http://asirt.org/initiatives/informing-road-users/road-safety-facts/road-crash-statistics


Processing URLs:  40%|████      | 405/1000 [23:02<13:10,  1.33s/it]

Error extracting text from https://www.nytimes.com/2017/08/17/world/europe/european-monuments-statues-communism.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/17/world/europe/european-monuments-statues-communism.html


Processing URLs:  41%|████      | 410/1000 [23:07<08:53,  1.11it/s]

Error extracting text from http://www.nytimes.com/2016/04/19/world/europe/european-union-brexit.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/19/world/europe/european-union-brexit.html


Processing URLs:  41%|████▏     | 414/1000 [23:16<15:12,  1.56s/it]

Error extracting text from http://webcache.googleusercontent.com/search?q=cache:XJNBN8B4RRsJ:www.ft.com/cms/s/0/d65d817c-c0ee-11e5-846f-79b0e3d20eaf.html+&amp;cd=2&amp;hl=en&amp;ct=clnk&amp;gl=us: 404 Client Error: Not Found for url: http://webcache.googleusercontent.com/search?q=cache:XJNBN8B4RRsJ:www.ft.com/cms/s/0/d65d817c-c0ee-11e5-846f-79b0e3d20eaf.html+&amp;cd=2&amp;hl=en&amp;ct=clnk&amp;gl=us


Processing URLs:  42%|████▏     | 415/1000 [23:18<14:58,  1.54s/it]

Error extracting text from http://af.reuters.com/article/investingNews/idAFKCN0WK0K2?pageNumber=2&amp;virtualBrandChannel=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  42%|████▏     | 416/1000 [23:20<18:36,  1.91s/it]

Error extracting text from http://www.theweek.co.uk/eu-referendum/65461/eu-referendum-poll-will-scotland-swing-the-vote: 404 Client Error: Not Found for url: https://theweek.com/eu-referendum/65461/eu-referendum-poll-will-scotland-swing-the-vote


Processing URLs:  42%|████▏     | 417/1000 [23:21<15:41,  1.61s/it]

Error extracting text from http://nationalinterest.org/feature/north-koreas-nuclear-program-irreversible-15537?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/north-koreas-nuclear-program-irreversible-15537?page=2
URL filtered: https://twitter.com/davidmanheim
URL filtered: https://www.youtube.com/watch?v=BUPrpXiF7aE
URL filtered: https://www.youtube.com/watch?v=HL6v8nzFiUk


Processing URLs:  43%|████▎     | 426/1000 [23:33<18:05,  1.89s/it]

URL filtered: https://www.youtube.com/watch?v=u7s-H4EqP4I


Processing URLs:  43%|████▎     | 430/1000 [23:38<13:23,  1.41s/it]

Error extracting text from https://gcaptain.com/china-authorizes-coast-guard-to-fire-on-foreign-vessels-if-needed/?subscriber=true&amp;goal=0_f50174ef03-f0ce50f602-170102337&amp;mc_cid=f0ce50f602&amp;mc_eid=c74873c672: 403 Client Error: Forbidden for url: https://gcaptain.com/china-authorizes-coast-guard-to-fire-on-foreign-vessels-if-needed/?subscriber=true&amp;goal=0_f50174ef03-f0ce50f602-170102337&amp;mc_cid=f0ce50f602&amp;mc_eid=c74873c672
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.folhapolitica.org/2016/02/parte-de-mudanca-de-lula-foi-entregue.html&amp;usg=ALkJrhj_PTEOiXswswYHXOvhhkEoSVGm8g: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.folhapolitica.org/2016/02/parte-de-mudanca-de-lula-foi-entregue.html&amp;usg=ALkJrhj

Processing URLs:  43%|████▎     | 432/1000 [23:39<08:51,  1.07it/s]

Error extracting text from http://www.nytimes.com/2016/01/10/opinion/sunday/myanmars-peace-prize-winner-and-crimes-against-humanity.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/10/opinion/sunday/myanmars-peace-prize-winner-and-crimes-against-humanity.html


Processing URLs:  43%|████▎     | 433/1000 [23:40<08:37,  1.09it/s]

Error extracting text from http://www.financialexpress.com/india-news/india-replies-strongly-to-pakistans-kashmir-jibe-will-free-pok-gilgit-baltistan-bring-jk-to-its-original-form/599337/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/india-news/india-replies-strongly-to-pakistans-kashmir-jibe-will-free-pok-gilgit-baltistan-bring-jk-to-its-original-form/599337/


Processing URLs:  44%|████▎     | 435/1000 [23:42<08:13,  1.15it/s]

Error extracting text from http://worldmaritimenews.com/archives/177580/gupc-panama-canal-leaks-to-be-repaired-in-january/: HTTPConnectionPool(host='worldmaritimenews.com', port=80): Max retries exceeded with url: /archives/177580/gupc-panama-canal-leaks-to-be-repaired-in-january/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff8226f0>: Failed to resolve 'worldmaritimenews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▎     | 437/1000 [23:43<07:46,  1.21it/s]

Error extracting text from http://cherna.gora.me/news/ratification-in-france-a-matter-of-days/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/ratification-in-france-a-matter-of-days/


Processing URLs:  44%|████▍     | 440/1000 [23:47<08:33,  1.09it/s]

Error extracting text from http://www.autonews.com/article/20160627/OEM11/306279987/zev-mandates-get-harder-to-ignore: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160627/OEM11/306279987/zev-mandates-get-harder-to-ignore


Processing URLs:  46%|████▌     | 455/1000 [24:20<15:12,  1.67s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN1072QI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN1072QI


Processing URLs:  46%|████▌     | 459/1000 [24:35<21:45,  2.41s/it]

Error extracting text from https://glpost.com/brig-gen-innocent-kabandana-is-decimating-southern-kivu/: 406 Client Error: Not Acceptable for url: https://glpost.com/brig-gen-innocent-kabandana-is-decimating-southern-kivu/


Processing URLs:  46%|████▋     | 464/1000 [24:40<08:42,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-britain-politics-scotland-may-idUSKBN16A13E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-politics-scotland-may-idUSKBN16A13E


Processing URLs:  47%|████▋     | 471/1000 [24:47<08:40,  1.02it/s]

Error extracting text from http://www.reuters.com/article/2015/09/24/us-usa-fed-yellen-idUSKCN0RO2GR20150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/us-usa-fed-yellen-idUSKCN0RO2GR20150924


Processing URLs:  47%|████▋     | 472/1000 [24:50<12:22,  1.41s/it]

Error extracting text from http://atimes.com/2016/02/south-china-sea-face-off-the-mystery-of-woody-island/: 404 Client Error: Not Found for url: https://atimes.com/2016/02/south-china-sea-face-off-the-mystery-of-woody-island/


Processing URLs:  47%|████▋     | 474/1000 [24:52<11:38,  1.33s/it]

Error extracting text from http://www.hybridcars.com/eight-states-pledge-to-reduce-new-gas-car-sales-to-zero-by-2050/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/eight-states-pledge-to-reduce-new-gas-car-sales-to-zero-by-2050/


Processing URLs:  48%|████▊     | 475/1000 [24:53<10:56,  1.25s/it]

Error extracting text from https://www.reuters.com/world/middle-east/us-will-weigh-all-options-if-iran-will-not-resume-nuclear-deal-2021-10-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/us-will-weigh-all-options-if-iran-will-not-resume-nuclear-deal-2021-10-13/


Processing URLs:  48%|████▊     | 482/1000 [25:02<12:24,  1.44s/it]

Error extracting text from http://journal-neo.org/2016/05/05/is-there-hope-for-relaunching-six-party-talks/: HTTPConnectionPool(host='journal-neo.org', port=80): Max retries exceeded with url: /2016/05/05/is-there-hope-for-relaunching-six-party-talks/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d92e0>: Failed to resolve 'journal-neo.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 483/1000 [25:03<11:22,  1.32s/it]

Error extracting text from http://mobile.reuters.com/article/worldNews/idUSKCN11Q01Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/worldNews/idUSKCN11Q01Q


Processing URLs:  49%|████▊     | 487/1000 [25:07<08:29,  1.01it/s]

Error extracting text from http://www.amazon.com/Human-Thermal-Environments-Moderate-Performance/dp/146659599X/: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Human-Thermal-Environments-Moderate-Performance/dp/146659599X/


Processing URLs:  49%|████▉     | 492/1000 [25:15<11:40,  1.38s/it]

URL filtered: http://www.bloomberg.com/gadfly/articles/2016-03-18/betting-on-a-brazil-impeachment-may-be-premature


Processing URLs:  49%|████▉     | 494/1000 [26:15<2:03:15, 14.62s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/article38240874.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  50%|████▉     | 497/1000 [26:17<52:17,  6.24s/it]  

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/03/14/Engineers-race-to-stop-collapse-of-massive-Mosul-dam.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/03/14/Engineers-race-to-stop-collapse-of-massive-Mosul-dam.html


Processing URLs:  50%|████▉     | 498/1000 [26:19<41:16,  4.93s/it]

Error extracting text from http://thehill.com/homenews/campaign/361455-poll-moore-and-jones-neck-and-neck-in-senate-race: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361455-poll-moore-and-jones-neck-and-neck-in-senate-race/
URL filtered: http://www.businessinsider.sg/facebook-will-fact-check-label-fake-news-in-news-feed-2016-12/?r=US&amp;IR=T#tke8G9c2LJR6vMSg.97


Processing URLs:  50%|█████     | 500/1000 [26:26<36:23,  4.37s/it]

Error extracting text from https://sahara-question.com/en/opinions/messahel-attack-morocco-undermine-horst-kohler-visit: HTTPSConnectionPool(host='sahara-question.com', port=443): Max retries exceeded with url: /en/opinions/messahel-attack-morocco-undermine-horst-kohler-visit (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffe6a8a0>: Failed to resolve 'sahara-question.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|█████     | 505/1000 [26:45<30:56,  3.75s/it]

URL filtered: https://www.linkedin.com/in/benjamin-yeoh-445133/?originalSubdomain=uk


Processing URLs:  51%|█████     | 507/1000 [26:48<22:05,  2.69s/it]

Error extracting text from http://theworldweekly.com/magazine/reader/talks-in-vienna-look-for-a-perilous-path-to-peace-in-syria/5497/14: HTTPConnectionPool(host='theworldweekly.com', port=80): Max retries exceeded with url: /magazine/reader/talks-in-vienna-look-for-a-perilous-path-to-peace-in-syria/5497/14 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304401670>: Failed to resolve 'theworldweekly.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████▏    | 514/1000 [27:01<14:48,  1.83s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-02-24/brazil-s-bolsonaro-starts-a-populist-death-spiral


Processing URLs:  52%|█████▏    | 518/1000 [34:10<7:22:17, 55.06s/it] 

Error extracting text from http://www.wsj.com/articles/u-s-military-seeks-more-troops-for-iraq-1474502709: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-military-seeks-more-troops-for-iraq-1474502709
Error extracting text from https://www.congress.gov/bill/116th-congress/senate-bill/3561: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/116th-congress/senate-bill/3561


Processing URLs:  52%|█████▏    | 520/1000 [34:12<3:56:10, 29.52s/it]

URL filtered: https://m.youtube.com/watch?v=yhQql-ZbZmg


Processing URLs:  52%|█████▏    | 523/1000 [34:14<1:41:51, 12.81s/it]

Error extracting text from http://www.reuters.com/article/2015/09/24/us-mideast-crisis-russia-airstrikes-idUSKCN0RO01320150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/us-mideast-crisis-russia-airstrikes-idUSKCN0RO01320150924


Processing URLs:  53%|█████▎    | 526/1000 [34:19<46:49,  5.93s/it]  

Error extracting text from http://iran-times.com/iran-trying-to-sell-oil-in-eu-at-discount/: 406 Client Error: Not Acceptable for url: http://iran-times.com/iran-trying-to-sell-oil-in-eu-at-discount/


Processing URLs:  53%|█████▎    | 530/1000 [34:22<16:41,  2.13s/it]

Error extracting text from https://www.dia.mil/Portals/27/Documents/News/Military%20Power%20Publications/Iran_Military_Power_LR.pdf: 404 Client Error: Not Found for url: https://www.dia.mil/Portals/27/Documents/News/Military%20Power%20Publications/Iran_Military_Power_LR.pdf
Error extracting text from https://www.barrons.com/news/north-macedonia-set-for-first-census-in-two-decades-01630594808?refsec=afp-news: 403 Client Error: Forbidden for url: https://www.barrons.com/news/north-macedonia-set-for-first-census-in-two-decades-01630594808?refsec=afp-news


Processing URLs:  53%|█████▎    | 534/1000 [34:26<08:20,  1.08s/it]

Error extracting text from https://www.nytimes.com/reuters/2017/03/23/us/politics/23reuters-usa-trump-trade.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/03/23/us/politics/23reuters-usa-trump-trade.html


Processing URLs:  54%|█████▍    | 540/1000 [34:49<16:02,  2.09s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-results-idUSKCN10E2J1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-results-idUSKCN10E2J1


Processing URLs:  54%|█████▍    | 541/1000 [34:52<18:17,  2.39s/it]

Error extracting text from http://in.reuters.com/article/mideast-crisis-turkey-relations-idINKCN0Z02C0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  54%|█████▍    | 543/1000 [34:56<15:55,  2.09s/it]

Error extracting text from http://www.laht.com/article.asp?ArticleId=323794&amp;CategoryId=14510: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=323794&amp;CategoryId=14510


Processing URLs:  55%|█████▍    | 547/1000 [35:05<16:14,  2.15s/it]

Error extracting text from https://bit.ly/3ri1bfR: 403 Client Error: Forbidden for url: https://capx.co/nicola-sturgeon-offers-a-masterclass-in-dodging-direct-questions/


Processing URLs:  55%|█████▍    | 549/1000 [35:08<14:04,  1.87s/it]

Error extracting text from http://www.theepochtimes.com/n3/2188618-analysis-behind-the-xi-jinping-leung-chun-ying-meeting-at-apec/?utm_expid=21082672-18.dmPUWQm0QDq0pZBHW0FDfA.0&amp;utm_referrer=https%3A%2F%2Fduckduckgo.com%2F: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2188618-analysis-behind-the-xi-jinping-leung-chun-ying-meeting-at-apec/?utm_expid=21082672-18.dmPUWQm0QDq0pZBHW0FDfA.0&amp;utm_referrer=https%3A%2F%2Fduckduckgo.com%2F


Processing URLs:  55%|█████▌    | 552/1000 [35:15<14:25,  1.93s/it]

URL filtered: https://www.youtube.com/watch?v=AaJc09SCTpQ


Processing URLs:  56%|█████▌    | 555/1000 [35:19<11:29,  1.55s/it]

Error extracting text from http://www.economiccalendar.com/2016/04/30/us-economy-us-gdp-growth-shows-signs-of-stagnation-in-second-quarter/: 404 Client Error: Not Found for url: http://www.economiccalendar.com/2016/04/30/us-economy-us-gdp-growth-shows-signs-of-stagnation-in-second-quarter/
URL filtered: https://www.bloomberg.com/news/articles/2016-12-19/why-a-russian-s-killing-in-turkey-was-about-syria-quicktake-q-a


Processing URLs:  56%|█████▌    | 559/1000 [36:23<1:56:14, 15.81s/it]

Error extracting text from http://extras.mercurynews.com/silicon-valley-imported-labor/: HTTPConnectionPool(host='extras.mercurynews.com', port=80): Max retries exceeded with url: /silicon-valley-imported-labor/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x301663e00>, 'Connection to extras.mercurynews.com timed out. (connect timeout=60)'))


Processing URLs:  56%|█████▌    | 562/1000 [36:37<1:00:56,  8.35s/it]

URL filtered: https://twitter.com/ink__pad/status/1039352666999214082


Processing URLs:  56%|█████▋    | 564/1000 [36:38<35:04,  4.83s/it]  

Error extracting text from https://www.newsweek.com/iran-gulf-oman-germany-trump-administration-1444112: 403 Client Error: Forbidden for url: https://www.newsweek.com/iran-gulf-oman-germany-trump-administration-1444112


Processing URLs:  57%|█████▋    | 566/1000 [36:41<24:05,  3.33s/it]

URL filtered: https://www.youtube.com/watch?v=So02TBi7R3w


Processing URLs:  57%|█████▋    | 572/1000 [36:47<09:46,  1.37s/it]

Error extracting text from https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30418-9/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30418-9/fulltext


Processing URLs:  57%|█████▋    | 573/1000 [36:48<09:27,  1.33s/it]

Error extracting text from http://uk.reuters.com/article/uk-china-corruption-idUKKCN12K0BQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  57%|█████▊    | 575/1000 [36:53<11:14,  1.59s/it]

Error extracting text from http://www.ntd.tv/en/programs/news-politics/china-forbidden-news/20141006/230457-liu-yunshan-involved-in-cctv-corruption-case-.html: 404 Client Error: Not Found for url: http://www.ntd.tv/en/programs/news-politics/china-forbidden-news/20141006/230457-liu-yunshan-involved-in-cctv-corruption-case-.html


Processing URLs:  58%|█████▊    | 580/1000 [37:09<16:09,  2.31s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-appeals-idUSKBN0TN0ZY20151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-appeals-idUSKBN0TN0ZY20151204
URL filtered: https://www.youtube.com/watch?v=-Wtzs4yFyus


Processing URLs:  58%|█████▊    | 583/1000 [37:09<07:31,  1.08s/it]

Error extracting text from http://iranhr.net/en/articles/2454/: 403 Client Error: Forbidden for url: https://iranhr.net/en/articles/2454/


Processing URLs:  59%|█████▊    | 586/1000 [37:14<07:52,  1.14s/it]

Error extracting text from http://www.wsj.com/articles/saudi-aramco-ipo-wall-streets-white-whale-1465464606: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-aramco-ipo-wall-streets-white-whale-1465464606


Processing URLs:  59%|█████▊    | 587/1000 [37:16<09:16,  1.35s/it]

Error extracting text from http://www.unog.ch/80256EE600585943/(httpPages)/0D4B67A1E11A22BCC1257A410052DE38?OpenDocument: 403 Client Error: Forbidden for url: https://www.un.org/disarmament/


Processing URLs:  59%|█████▉    | 588/1000 [37:16<07:50,  1.14s/it]

URL filtered: http://www.reuters.com/article/us-turkey-security-kurds-idUSKBN13109Y?feedType=RSS&amp;feedName=worldNews&amp;utm_source=Twitter&amp;utm_medium=Social&amp;utm_campaign=Feed%3A+Reuters%2FworldNews+%28Reuters+World+News%29


Processing URLs:  59%|█████▉    | 590/1000 [37:17<05:05,  1.34it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton_vs_johnson_vs_stein-5952.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton_vs_johnson_vs_stein-5952.html


Processing URLs:  59%|█████▉    | 591/1000 [37:21<10:28,  1.54s/it]

URL filtered: https://www.bloomberg.com/graphics/covid-vaccine-tracker-global-distribution/


Processing URLs:  59%|█████▉    | 594/1000 [38:21<1:08:57, 10.19s/it]

Error extracting text from http://tesla.com/: HTTPConnectionPool(host='tesla.com', port=80): Read timed out. (read timeout=60)
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jornalggn.com.br/noticia/com-volta-das-pautas-no-congresso-juizes-defendem-impeachment-de-dilma&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jornalggn.com.br/noticia/com-volta-das-pautas-no-congresso-juizes-defendem-impeachment-de-dilma&amp;prev=search


Processing URLs:  60%|█████▉    | 595/1000 [38:21<52:17,  7.75s/it]  

Error extracting text from https://www.nejm.org/doi/full/10.1056/NEJMp2005630: 403 Client Error: Forbidden for url: https://www.nejm.org/doi/full/10.1056/NEJMp2005630


Processing URLs:  60%|██████    | 601/1000 [38:29<13:36,  2.05s/it]

Error extracting text from http://www.globalcapital.com/article/yn9qs9q1h6dp/ndb-eyes-g3-as-its-kicks-off-funding-in-rmb: 404 Client Error: Not Found for url: https://www.globalcapital.com/article/yn9qs9q1h6dp/ndb-eyes-g3-as-its-kicks-off-funding-in-rmb
Error extracting text from http://www.reuters.com/article/us-northdakota-pipeline-idUSKBN15G357: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northdakota-pipeline-idUSKBN15G357


Processing URLs:  60%|██████    | 603/1000 [38:30<08:36,  1.30s/it]

Error extracting text from http://www.adweek.com/fishbowlny/time-inc-to-cut-dozens-of-staffers/383674: 403 Client Error: Forbidden for url: https://www.adweek.com/fishbowlny/time-inc-to-cut-dozens-of-staffers/383674


Processing URLs:  61%|██████    | 609/1000 [38:47<20:41,  3.18s/it]

Error extracting text from https://www.crunchbase.com/organization/atai-life-sciences-ag: 403 Client Error: Forbidden for url: https://www.crunchbase.com/organization/atai-life-sciences-ag


Processing URLs:  61%|██████    | 611/1000 [39:59<2:22:39, 22.00s/it]

Error extracting text from https://www.cmegroup.com/trading/metals/ferrous/steel-futures.html: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 617/1000 [40:07<22:25,  3.51s/it]  

Error extracting text from https://www.accuweather.com/en/weather-news/asia-winter-forecast-life-threatening-flooding-may-unfold-in-southeast-as-snow-buries-northern-areas/70002973: 403 Client Error: Forbidden for url: https://www.accuweather.com/en/weather-news/asia-winter-forecast-life-threatening-flooding-may-unfold-in-southeast-as-snow-buries-northern-areas/70002973


Processing URLs:  62%|██████▏   | 620/1000 [40:10<11:27,  1.81s/it]

Error extracting text from http://thehill.com/policy/finance/263475-fed-raises-rates-ending-era-of-stimulus: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/263475-fed-raises-rates-ending-era-of-stimulus/


Processing URLs:  63%|██████▎   | 628/1000 [40:26<10:42,  1.73s/it]

Error extracting text from http://www.erasmusprogramme.com/the_erasmus.php: 404 Client Error: Not Found for url: https://erasmusprogramme.com/the_erasmus.php


Processing URLs:  63%|██████▎   | 630/1000 [40:29<08:20,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-usa-fiscal-ryan-idUSKBN0TM1QL20151203: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-ryan-idUSKBN0TM1QL20151203


Processing URLs:  63%|██████▎   | 634/1000 [40:34<07:43,  1.27s/it]

Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-report-trump-s-business-sought-to-1503925135-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-report-trump-s-business-sought-to-1503925135-htmlstory.html


Processing URLs:  64%|██████▎   | 635/1000 [40:50<35:03,  5.76s/it]

Error extracting text from https://www.investopedia.com/new-stablecoin-bill-raises-hackles-of-crypto-community-5090337: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/new-stablecoin-bill-raises-hackles-of-crypto-community-5090337


Processing URLs:  64%|██████▍   | 638/1000 [41:08<42:47,  7.09s/it]

Error extracting text from https://mobile.almasdarnews.com/article/iran-reiterates-support-syria/: 522 Server Error:  for url: https://www.almasdarnews.com/article/iran-reiterates-support-syria/


Processing URLs:  64%|██████▍   | 644/1000 [41:24<18:13,  3.07s/it]

URL filtered: https://twitter.com/waymo
Error extracting text from https://www.reuters.com/article/oman-ratings-sp/sp-downgrades-omans-ratings-idUSL3N1NG5XT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/oman-ratings-sp/sp-downgrades-omans-ratings-idUSL3N1NG5XT
URL filtered: http://www.sciencemag.org/news/2017/02/world-s-most-endangered-marine-mammal-down-30-individuals?utm_source=sciencemagazine&utm_medium=facebook-text&utm_campaign=30vaquitas-10887
Error extracting text from http://www.timesofisrael.com/iran-reformists-demand-review-after-candidates-rejected/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iran-reformists-demand-review-after-candidates-rejected/


Processing URLs:  65%|██████▍   | 646/1000 [41:25<12:41,  2.15s/it]

Error extracting text from http://www.dtic.mil/doctrine/new_pubs/jointpub_operations.htm: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /doctrine/new_pubs/jointpub_operations.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from http://www.reuters.com/article/2015/10/08/us-mideast-crisis-intelligence-exclusive-idUSKCN0S20CZ20151008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/08/us-mideast-crisis-intelligence-exclusive-idUSKCN0S20CZ20151008


Processing URLs:  65%|██████▌   | 651/1000 [41:32<07:17,  1.25s/it]

Error extracting text from http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html


Processing URLs:  66%|██████▌   | 655/1000 [41:43<12:34,  2.19s/it]

Error extracting text from http://inhomelandsecurity.com/us-official-says-marines-expanding-combat-role-in-iraq/?utm_source=IHS: 403 Client Error: Forbidden for url: https://amuedge.com/us-official-says-marines-expanding-combat-role-in-iraq/?utm_source=IHS


Processing URLs:  66%|██████▌   | 657/1000 [41:45<08:17,  1.45s/it]

Error extracting text from http://www.latimes.com/world/europe/la-fg-us-russia-20151218-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/europe/la-fg-us-russia-20151218-story.html


Processing URLs:  66%|██████▌   | 660/1000 [41:52<11:20,  2.00s/it]

Error extracting text from http://www.mid.ru/en/foreign_policy/news/-/asset_publisher/cKNonkJE02Bw/content/id/3054726: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  66%|██████▌   | 661/1000 [41:54<11:44,  2.08s/it]

Error extracting text from https://www.un.int/djibouti/news/his-excellency-mr-mohamed-siad-doualeh-addresses-security-council-situation-somalia-and-0: HTTPSConnectionPool(host='www.un.int', port=443): Max retries exceeded with url: /djibouti/news/his-excellency-mr-mohamed-siad-doualeh-addresses-security-council-situation-somalia-and-0 (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  67%|██████▋   | 666/1000 [41:59<06:06,  1.10s/it]

Error extracting text from http://uk.reuters.com/article/uk-eurozone-greece-bailout-deal-idUKKBN0TU2I920151211: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  67%|██████▋   | 670/1000 [42:07<09:07,  1.66s/it]

Error extracting text from http://allafrica.com/stories/201607090218.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607090218.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe910650>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  67%|██████▋   | 672/1000 [42:11<09:45,  1.78s/it]

Error extracting text from http://www.nzherald.co.nz/world/news/article.cfm?c_id=2&amp;objectid=11865751: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/world/news/article.cfm?c_id=2&amp;objectid=11865751


Processing URLs:  67%|██████▋   | 673/1000 [42:11<07:13,  1.32s/it]

Error extracting text from https://allafrica.com/stories/202104280099.html: HTTPSConnectionPool(host='allafrica.com', port=443): Max retries exceeded with url: /stories/202104280099.html (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2fe913dd0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  68%|██████▊   | 675/1000 [42:13<05:49,  1.08s/it]

Error extracting text from https://www.wsj.com/articles/house-gop-leaders-move-to-tweak-health-care-bill-1490048076: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-gop-leaders-move-to-tweak-health-care-bill-1490048076


Processing URLs:  68%|██████▊   | 676/1000 [50:14<13:02:35, 144.92s/it]

Error extracting text from https://www.thespainreport.com/articles/773-160622115058-spanish-home-secretary-caught-on-tape-plotting-against-catalan-separatists: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/773-160622115058-spanish-home-secretary-caught-on-tape-plotting-against-catalan-separatists (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3038863c0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  68%|██████▊   | 678/1000 [50:17<6:24:45, 71.69s/it]  

Error extracting text from http://www.dailypress.com/news/science/dp-nws-cyber-summit-jeff-lab-three-20160923-story.html: 404 Client Error: Not Found for url: https://www.dailypress.com/news/science/dp-nws-cyber-summit-jeff-lab-three-20160923-story.html


Processing URLs:  68%|██████▊   | 684/1000 [50:34<55:27, 10.53s/it]  

Error extracting text from https://worldcrunch.com/business-finance/biden-on-trade-trump-like-protectionism-with-a-smile: 403 Client Error: Forbidden for url: https://worldcrunch.com/business-finance/biden-on-trade-trump-like-protectionism-with-a-smile


Processing URLs:  68%|██████▊   | 685/1000 [50:34<40:05,  7.64s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56223#.WR6f_2jyuUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56223#.WR6f_2jyuUk


Processing URLs:  69%|██████▊   | 686/1000 [50:35<28:20,  5.41s/it]

Error extracting text from http://english.alarabiya.net/en/views/news/middle-east/2015/12/06/Has-Iran-offered-Assad-asylum-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/views/news/middle-east/2015/12/06/Has-Iran-offered-Assad-asylum-.html


Processing URLs:  69%|██████▊   | 687/1000 [50:38<25:17,  4.85s/it]

Error extracting text from http://www.dailynk.com/english/m/read.php?cataId=nk02500&amp;num=14108: 500 Server Error: Internal Server Error for url: https://www.dailynk.com/english/m/read.php?cataId=nk02500&amp;num=14108


Processing URLs:  69%|██████▉   | 691/1000 [50:44<10:33,  2.05s/it]

Error extracting text from http://thehill.com/homenews/senate/261246-obstacles-imperil-year-end-budget-dea: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/261246-obstacles-imperil-year-end-budget-dea/


Processing URLs:  69%|██████▉   | 694/1000 [50:49<08:45,  1.72s/it]

Error extracting text from https://www.business-standard.com/article/companies/tv-viewership-rose-9-in-2020-on-covid-19-pandemic-shows-barc-data-121030100974_1.html: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/companies/tv-viewership-rose-9-in-2020-on-covid-19-pandemic-shows-barc-data-121030100974_1.html


Processing URLs:  70%|██████▉   | 697/1000 [50:50<04:35,  1.10it/s]

Error extracting text from https://www.wsj.com/articles/cbo-estimates-14-million-more-uninsured-next-year-under-gop-plan-1489436927: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/cbo-estimates-14-million-more-uninsured-next-year-under-gop-plan-1489436927


Processing URLs:  70%|███████   | 704/1000 [51:06<05:42,  1.16s/it]

Error extracting text from https://www.france24.com/en/20110530-german-coalition-end-civil-nuclear-energy-reactors-offline-2022-merkel-cdu: 403 Client Error: Forbidden for url: https://www.france24.com/en/20110530-german-coalition-end-civil-nuclear-energy-reactors-offline-2022-merkel-cdu


Processing URLs:  71%|███████   | 706/1000 [51:08<05:11,  1.06s/it]

Error extracting text from https://www.covid.is/statistical-information-on-vaccination: 404 Client Error: Not Found for url: https://www.covid.is/statistical-information-on-vaccination


Processing URLs:  71%|███████▏  | 713/1000 [51:17<04:55,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/11/13/us-volkswagen-emissions-whistleblower-idUSKCN0T11WW20151113#TFVs3VIHZhOgFqXl.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/13/us-volkswagen-emissions-whistleblower-idUSKCN0T11WW20151113#TFVs3VIHZhOgFqXl.97


Processing URLs:  72%|███████▏  | 716/1000 [51:21<05:22,  1.13s/it]

Error extracting text from http://www.washingtontimes.com/news/2016/nov/14/iran-backed-shiites-join-iraqi-troops-in-mosul-fig/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/nov/14/iran-backed-shiites-join-iraqi-troops-in-mosul-fig/


Processing URLs:  72%|███████▏  | 720/1000 [51:25<04:38,  1.01it/s]

Error extracting text from https://www.yahoo.com/news/syria-talks-resume-chances-seen-very-slim-amid-174959918.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/syria-talks-resume-chances-seen-very-slim-amid-174959918.html


Processing URLs:  72%|███████▏  | 723/1000 [51:30<07:00,  1.52s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-19/lithuanian-president-says-eu-has-reached-deal-with-u-k


Processing URLs:  74%|███████▎  | 735/1000 [51:49<05:45,  1.30s/it]

Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)31048-0/abstract: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)31048-0/abstract


Processing URLs:  74%|███████▎  | 736/1000 [51:50<05:55,  1.35s/it]

URL filtered: https://twitter.com/JournoStephen/status/742470985098625024


Processing URLs:  74%|███████▍  | 742/1000 [51:59<05:22,  1.25s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-pm-eyes-benefits-of-early-election-01-22-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-pm-eyes-benefits-of-early-election-01-22-2016
Error extracting text from https://www.reuters.com/article/us-britain-eu/no-deal-on-brexit-trade-very-very-likely-british-pm-johnson-says-idUSKBN28L0NR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/no-deal-on-brexit-trade-very-very-likely-british-pm-johnson-says-idUSKBN28L0NR


Processing URLs:  74%|███████▍  | 744/1000 [52:02<05:15,  1.23s/it]

Error extracting text from https://www.predictit.org/markets/1/Dem-Nomination: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/1/Dem-Nomination


Processing URLs:  74%|███████▍  | 745/1000 [52:02<04:02,  1.05it/s]

Error extracting text from https://www.nytimes.com/2017/05/17/world/americas/venezuela-police-protests.html?rref=collection%2Fsectioncollection%2Famericas: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/17/world/americas/venezuela-police-protests.html?rref=collection%2Fsectioncollection%2Famericas


Processing URLs:  75%|███████▍  | 747/1000 [52:04<03:46,  1.12it/s]

Error extracting text from http://www.wsj.com/articles/tables-have-turned-for-some-media-in-turkish-crackdown-1473170306: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tables-have-turned-for-some-media-in-turkish-crackdown-1473170306


Processing URLs:  75%|███████▍  | 748/1000 [52:08<07:40,  1.83s/it]

URL filtered: http://thehill.com/policy/cybersecurity/356835-uk-lawmakers-ask-facebook-about-russian-linked-brexit-activity


Processing URLs:  75%|███████▌  | 752/1000 [52:13<05:19,  1.29s/it]

Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-exclusive-idUSKBN16S2VH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-exclusive-idUSKBN16S2VH?il=0


Processing URLs:  76%|███████▌  | 757/1000 [52:18<03:09,  1.28it/s]

Error extracting text from http://www.nytimes.com/2016/03/03/us/politics/white-house-vetting-jane-kelly-judge-supreme-court.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/03/us/politics/white-house-vetting-jane-kelly-judge-supreme-court.html


Processing URLs:  76%|███████▌  | 761/1000 [52:29<05:44,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-iran-oil-exports-idUSKCN12I0KQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-exports-idUSKCN12I0KQ
URL filtered: http://www.bloomberg.com/news/articles/2016-06-16/venezuela-says-oil-at-50-will-be-enough-to-avoid-default
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16L1VF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16L1VF?il=0


Processing URLs:  76%|███████▋  | 764/1000 [53:32<49:01, 12.47s/it]  

Error extracting text from http://aa.com.tr/en/turkey/akkuyu-nuke-plant-in-turkey-to-be-decided-by-firms-putin/492860: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)
Error extracting text from https://www.reuters.com/world/europe/russias-putin-tries-give-ruling-party-pre-election-boost-with-spending-promises-2021-06-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russias-putin-tries-give-ruling-party-pre-election-boost-with-spending-promises-2021-06-19/


Processing URLs:  77%|███████▋  | 766/1000 [53:36<29:36,  7.59s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564302/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564302/


Processing URLs:  77%|███████▋  | 769/1000 [53:40<13:52,  3.60s/it]

Error extracting text from http://www.ibtimes.com/microsoft-signs-deal-provide-windows-10-chinese-government-agencies-2231643: 403 Client Error: Forbidden for url: https://www.ibtimes.com/microsoft-signs-deal-provide-windows-10-chinese-government-agencies-2231643


Processing URLs:  77%|███████▋  | 771/1000 [54:42<1:16:45, 20.11s/it]

Error extracting text from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=13&f=G&l=50&co1=AND&d=PTXT&s1=%22Zhang,+Feng%22.INNM.&s2=CRISPR.ABTX.&OS=IN/%22Zhang,+Feng%22+AND+ABST/CRISPR&RS=IN/%22Zhang,+Feng%22+AND+ABST/CRISPR: HTTPConnectionPool(host='patft.uspto.gov', port=80): Max retries exceeded with url: /netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=13&f=G&l=50&co1=AND&d=PTXT&s1=%22Zhang,+Feng%22.INNM.&s2=CRISPR.ABTX.&OS=IN/%22Zhang,+Feng%22+AND+ABST/CRISPR&RS=IN/%22Zhang,+Feng%22+AND+ABST/CRISPR (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3051d9460>, 'Connection to patft.uspto.gov timed out. (connect timeout=60)'))


Processing URLs:  77%|███████▋  | 774/1000 [54:45<28:33,  7.58s/it]  

Error extracting text from https://www.afghanistan-analysts.org/old-names-for-the-nds-and-defence-ministry-nug-proposes-stanakzai-and-abdullah-khan-again/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/old-names-for-the-nds-and-defence-ministry-nug-proposes-stanakzai-and-abdullah-khan-again/


Processing URLs:  78%|███████▊  | 776/1000 [54:49<17:13,  4.62s/it]

Error extracting text from https://www.semiconductors.org/chips/: 404 Client Error: Not Found for url: https://www.semiconductors.org/chips/


Processing URLs:  78%|███████▊  | 783/1000 [56:02<51:53, 14.35s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-election-scotland-idUSKBN18O0N4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-scotland-idUSKBN18O0N4


Processing URLs:  78%|███████▊  | 785/1000 [56:06<28:06,  7.84s/it]

Error extracting text from https://www.rescue.org/article/battle-mosul-could-displace-more-1-million-iraqis: 404 Client Error: Not Found for url: https://www.rescue.org/article/battle-mosul-could-displace-more-1-million-iraqis


Processing URLs:  79%|███████▊  | 786/1000 [56:08<22:19,  6.26s/it]

Error extracting text from http://blogs.spectator.co.uk/2016/11/emily-thornberry-sparks-brexit-chaos-labour/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/11/emily-thornberry-sparks-brexit-chaos-labour/


Processing URLs:  79%|███████▉  | 789/1000 [56:13<10:51,  3.09s/it]

Error extracting text from https://www.reuters.com/article/usa-bonds/treasuries-u-s-two-year-yields-touch-9-year-high-curve-flattening-resumes-idUSL1N1NJ0RR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/usa-bonds/treasuries-u-s-two-year-yields-touch-9-year-high-curve-flattening-resumes-idUSL1N1NJ0RR


Processing URLs:  79%|███████▉  | 792/1000 [56:15<05:40,  1.64s/it]

Error extracting text from https://theconversation.com/swift-ejecting-russia-is-largely-symbolic-heres-why-178065): 403 Client Error: Forbidden for url: https://theconversation.com/swift-ejecting-russia-is-largely-symbolic-heres-why-178065)


Processing URLs:  79%|███████▉  | 793/1000 [56:17<06:12,  1.80s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/ipsos-todos-escenarios-eventual-segunda-vuelta-noticia-1889683?flsm=1: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/ipsos-todos-escenarios-eventual-segunda-vuelta-noticia-1889683/?flsm=1


Processing URLs:  79%|███████▉  | 794/1000 [56:18<05:41,  1.66s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/california-farmers-brace-water-shortage-el-nino-36344244: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/california-farmers-brace-water-shortage-el-nino-36344244


Processing URLs:  80%|████████  | 801/1000 [56:38<07:10,  2.16s/it]

Error extracting text from http://www.scout.com/military/warrior/story/1735956-pentagon-war-report-isis-surrounded-in-mosul: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  80%|████████  | 804/1000 [56:40<04:00,  1.23s/it]

Error extracting text from http://www.wsj.com/articles/from-uganda-but-live-in-gibraltar-come-vote-in-the-brexit-referendum-1466612978?mod=e2tw: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/from-uganda-but-live-in-gibraltar-come-vote-in-the-brexit-referendum-1466612978?mod=e2tw


Processing URLs:  81%|████████  | 811/1000 [56:49<03:34,  1.13s/it]

Error extracting text from http://stocknewsusa.com/2016/07/27/president-bashar-al-assads-perspective/: 403 Client Error: Forbidden for url: https://stocknewsusa.com/2016/07/27/president-bashar-al-assads-perspective/


Processing URLs:  82%|████████▏ | 815/1000 [56:59<05:44,  1.86s/it]

Error extracting text from http://a.msn.com/r/2/AAegYRF?a=1&amp;m=EN-US: 404 Client Error: Not Found for url: http://a.msn.com/r/2/AAegYRF?a=1&amp;m=EN-US
URL filtered: https://www.bloomberg.com/news/articles/2018-01-31/waymo-robot-cars-need-little-human-help-as-gm-s-make-big-leap


Processing URLs:  82%|████████▏ | 818/1000 [57:02<03:55,  1.29s/it]

Error extracting text from http://www.reuters.com/article/2015/10/23/us-trade-ttip-idUSKCN0SH1XR20151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/us-trade-ttip-idUSKCN0SH1XR20151023


Processing URLs:  82%|████████▏ | 822/1000 [57:06<03:21,  1.13s/it]

Error extracting text from https://www.governor.ny.gov/news/no-202108-continuing-temporary-suspension-and-modification-laws-relating-disaster-emergency: 403 Client Error: Forbidden for url: https://www.governor.ny.gov/news/no-202108-continuing-temporary-suspension-and-modification-laws-relating-disaster-emergency


Processing URLs:  82%|████████▏ | 823/1000 [57:06<02:57,  1.00s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-declassified-a22f66de-dee6-11e5-8c00-8aa03741dced-20160229-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-declassified-a22f66de-dee6-11e5-8c00-8aa03741dced-20160229-story.html


Processing URLs:  83%|████████▎ | 826/1000 [57:11<04:30,  1.55s/it]

Error extracting text from http://www.iran-daily.com/News/188946.html?catid=3&amp;title=Iran-to-become-self-sufficient-in-oil-industry-commodities: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  83%|████████▎ | 827/1000 [57:12<03:26,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/time-inc-reshuffles-top-ranks-in-reorganization-1468422823: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-reshuffles-top-ranks-in-reorganization-1468422823


Processing URLs:  83%|████████▎ | 829/1000 [57:13<02:41,  1.06it/s]

Error extracting text from http://www.rp-online.de/politik/deutschland/groko-verhandlungen-stimmen-zur-einigung-von-union-und-spd-aid-1.7374969: 410 Client Error: Gone for url: http://www.rp-online.de/politik/deutschland/groko-verhandlungen-stimmen-zur-einigung-von-union-und-spd-aid-1.7374969
Error extracting text from http://www.nytimes.com/2016/09/06/world/africa/ancs-combative-response-to-election-losses-startles-south-africa.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/06/world/africa/ancs-combative-response-to-election-losses-startles-south-africa.html


Processing URLs:  83%|████████▎ | 832/1000 [57:17<03:09,  1.13s/it]

Error extracting text from http://allafrica.com/stories/201608081276.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201608081276.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303fdcd40>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  84%|████████▎ | 836/1000 [57:23<03:26,  1.26s/it]

Error extracting text from http://www.nytimes.com/2015/11/07/business/economy/jobs-report-hiring-unemployment-october.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/07/business/economy/jobs-report-hiring-unemployment-october.html?_r=0


Processing URLs:  84%|████████▍ | 838/1000 [57:27<03:56,  1.46s/it]

Error extracting text from https://www.reuters.com/article/us-usa-biden-state-arms-idUSKBN29O2QC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-state-arms-idUSKBN29O2QC


Processing URLs:  84%|████████▍ | 839/1000 [57:28<03:39,  1.36s/it]

Error extracting text from http://www.sciencemag.org/news/2017/01/tracker-we-re-letting-you-know-when-trump-s-cabinet-nominees-talk-about-science-and: 403 Client Error: Forbidden for url: https://www.science.org/news/2017/01/tracker-we-re-letting-you-know-when-trump-s-cabinet-nominees-talk-about-science-and


Processing URLs:  84%|████████▍ | 844/1000 [57:36<02:57,  1.14s/it]

Error extracting text from http://www8.austlii.edu.au/cgi-bin/viewdoc/au/legis/nsw/consol_act/ca1902188/s24b.html: 410 Client Error: Gone for url: http://www8.austlii.edu.au/cgi-bin/viewdoc/au/legis/nsw/consol_act/ca1902188/s24b.html


Processing URLs:  84%|████████▍ | 845/1000 [57:37<02:55,  1.13s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough?page=1: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough?page=1


Processing URLs:  85%|████████▍ | 849/1000 [57:43<03:45,  1.49s/it]

Error extracting text from http://americanfolklore.net/folklore/2010/07/john_henry.html: 403 Client Error: Forbidden for url: https://www.americanfolklore.net/john-henry-the-steel-driving-man/


Processing URLs:  85%|████████▌ | 850/1000 [57:44<03:17,  1.32s/it]

URL filtered: https://www.linkedin.com/in/attila-kaplan/


Processing URLs:  85%|████████▌ | 854/1000 [58:01<07:00,  2.88s/it]

Error extracting text from https://www.reuters.com/article/us-russia-protests/russias-putin-signs-anti-protest-law-before-rally-idUSBRE8570ZH20120608: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-protests/russias-putin-signs-anti-protest-law-before-rally-idUSBRE8570ZH20120608


Processing URLs:  86%|████████▌ | 856/1000 [58:02<04:30,  1.88s/it]

Error extracting text from http://stabilityinstitute.com/: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=stabilityinstitute.com


Processing URLs:  86%|████████▌ | 859/1000 [58:37<22:34,  9.61s/it]

Error extracting text from http://inflationdata.com/Inflation/Inflation_Rate/Long_Term_Inflation.asp: 520 Server Error: status code 520 for url: http://inflationdata.com/Inflation/Inflation_Rate/Long_Term_Inflation.asp


Processing URLs:  86%|████████▌ | 862/1000 [58:41<11:39,  5.07s/it]

Error extracting text from http://m.nzherald.co.nz/nz/news/article.cfm?c_id=1&amp;objectid=11570088: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&amp;objectid=11570088


Processing URLs:  87%|████████▋ | 866/1000 [58:46<05:00,  2.24s/it]

Error extracting text from https://news.lift.co/when-will-canada-legalize-marijuana/: HTTPSConnectionPool(host='news.lift.co', port=443): Max retries exceeded with url: /when-will-canada-legalize-marijuana/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  87%|████████▋ | 870/1000 [58:50<02:26,  1.13s/it]

Error extracting text from http://www.wsj.com/articles/joe-biden-seeks-to-reassure-baltic-states-wary-of-u-s-commitment-to-nato-1471954625: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/joe-biden-seeks-to-reassure-baltic-states-wary-of-u-s-commitment-to-nato-1471954625
URL filtered: https://www.youtube.com/watch?v=ggUHOqq4dhw
Error extracting text from http://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y2WB20151209: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y2WB20151209


Processing URLs:  87%|████████▋ | 871/1000 [58:50<01:59,  1.08it/s]

URL filtered: http://www.bloombergview.com/articles/2015-10-31/no-one-but-u-s-believes-russia-will-abandon-assad
Error extracting text from http://www.reuters.com/article/us-usa-court-pence-idUSKBN15A2RR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-pence-idUSKBN15A2RR


Processing URLs:  88%|████████▊ | 878/1000 [58:56<01:43,  1.18it/s]

Error extracting text from https://www.timesofisrael.com/netanyahu-will-not-resign-if-indicted-coalition-chair-says/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/netanyahu-will-not-resign-if-indicted-coalition-chair-says/


Processing URLs:  88%|████████▊ | 880/1000 [59:01<02:54,  1.45s/it]

Error extracting text from http://www.payvand.com/news/16/nov/1076.html: 404 Client Error: Not Found for url: http://www.payvand.com/news/16/nov/1076.html
Error extracting text from http://www.reuters.com/article/us-usa-sanctions-idUSKBN1962AU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-sanctions-idUSKBN1962AU


Processing URLs:  88%|████████▊ | 882/1000 [59:02<02:07,  1.08s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/rousseff-s-trial-to-run-o/2804094.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/rousseff-s-trial-to-run-o/2804094.html


Processing URLs:  88%|████████▊ | 883/1000 [59:03<02:07,  1.09s/it]

Error extracting text from http://presstv.com/Detail/2015/09/20/429998/Iran-IAEA-Amano--Rouhani: 403 Client Error: Forbidden for url: https://presstv.com/Detail/2015/09/20/429998/Iran-IAEA-Amano--Rouhani


Processing URLs:  88%|████████▊ | 884/1000 [59:04<02:05,  1.08s/it]

Error extracting text from https://en.dailypakistan.com.pk/24-Jan-2021/pakistan-navy-releases-first-promo-for-aman-2021-maritime-peace-exercises: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/24-Jan-2021/pakistan-navy-releases-first-promo-for-aman-2021-maritime-peace-exercises


Processing URLs:  89%|████████▊ | 886/1000 [59:06<01:45,  1.08it/s]

Error extracting text from https://www.thecipherbrief.com/article/widespread-disquiet-amongst-allies-trump-cia-flap-1091: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/widespread-disquiet-amongst-allies-trump-cia-flap-1091


Processing URLs:  89%|████████▉ | 890/1000 [59:27<05:57,  3.25s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/polls/cbs-yougov-23212: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/polls/cbs-yougov-23212
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.r7.com/brasil/cardozo-reafirma-conviccao-de-que-campanha-de-dilma-nao-teve-caixa-2-13022016&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.r7.com/brasil/cardozo-reafirma-conviccao-de-que-campanha-de-dilma-nao-teve-caixa-2-13022016&amp;prev=search


Processing URLs:  89%|████████▉ | 892/1000 [59:29<03:38,  2.03s/it]

Error extracting text from https://www.nytimes.com/2017/08/05/world/europe/vladimir-putin-russia-summer-vacation.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/05/world/europe/vladimir-putin-russia-summer-vacation.html


Processing URLs:  90%|████████▉ | 898/1000 [59:43<04:51,  2.86s/it]

Error extracting text from http://mof.gov.af/en/news/56458: 404 Client Error: Not Found for url: https://mof.gov.af/en/news/56458


Processing URLs:  90%|█████████ | 900/1000 [59:45<03:01,  1.82s/it]

Error extracting text from http://ir.teslamotors.com/releasedetail.cfm?releaseid=963460: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /releasedetail.cfm?releaseid=963460 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fea0e7e0>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/news/articles/2021-01-28/italy-s-renzi-ready-to-support-new-government-to-avoid-snap-vote


Processing URLs:  90%|█████████ | 904/1000 [1:00:46<21:38, 13.53s/it]

Error extracting text from http://en.special.kremlin.ru/events/president/news/52673: HTTPConnectionPool(host='en.special.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  90%|█████████ | 905/1000 [1:00:48<17:05, 10.80s/it]

Error extracting text from http://the-japan-news.com/news/article/0002891777: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002891777


Processing URLs:  91%|█████████ | 907/1000 [1:00:50<09:54,  6.39s/it]

Error extracting text from https://www.oecd.org/sti/broadband/broadband-statistics-update.htm: 403 Client Error: Forbidden for url: https://www.oecd.org/sti/broadband/broadband-statistics-update.htm


Processing URLs:  91%|█████████ | 910/1000 [1:00:51<03:54,  2.61s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/tourist-arrivals/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/tourist-arrivals/forecast
Error extracting text from http://www.reuters.com/article/us-iran-navy-wargames-idUSKBN165094: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-navy-wargames-idUSKBN165094


Processing URLs:  91%|█████████ | 912/1000 [1:00:53<02:37,  1.79s/it]

Error extracting text from https://en.dailypakistan.com.pk/06-Feb-2021/pakistan-navy-releases-promo-in-connection-with-aman-naval-exercise: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/06-Feb-2021/pakistan-navy-releases-promo-in-connection-with-aman-naval-exercise
URL filtered: https://www.youtube.com/watch?v=-Ivt2NmbyGg


Processing URLs:  91%|█████████▏| 914/1000 [1:00:54<01:30,  1.05s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/11/17/national/politics-diplomacy/abe-may-visit-russia-before-putin-visits-japan-russian-official/#.VkvDDoTQb0c: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/11/17/national/politics-diplomacy/abe-may-visit-russia-before-putin-visits-japan-russian-official/#.VkvDDoTQb0c


Processing URLs:  92%|█████████▏| 915/1000 [1:00:58<02:30,  1.78s/it]

URL filtered: https://twitter.com/canadianpolling/status/1432529911877709827


Processing URLs:  92%|█████████▏| 917/1000 [1:00:58<01:37,  1.18s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-06-03/early-retirement-surge-exacerbates-u-s-baby-boomer-inequalities#:~:text=The%20surge%20in%20early%20retirements,market%20prematurely%2C%20a%20study%20showed


Processing URLs:  92%|█████████▏| 920/1000 [1:01:32<10:04,  7.56s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18258-immunity-bill-causes-waves-confusion.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18258-immunity-bill-causes-waves-confusion.html


Processing URLs:  92%|█████████▏| 922/1000 [1:01:32<05:47,  4.45s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN0TU2F920151211#wedtZQoUuUgBlt9X.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN0TU2F920151211#wedtZQoUuUgBlt9X.97


Processing URLs:  93%|█████████▎| 926/1000 [1:01:34<01:51,  1.50s/it]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/15/obama-could-fill-supreme-court-vacancy-with-recess/?page=all: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/15/obama-could-fill-supreme-court-vacancy-with-recess/?page=all


Processing URLs:  93%|█████████▎| 928/1000 [1:01:40<02:27,  2.04s/it]

Error extracting text from https://www.nytimes.com/2017/09/18/world/middleeast/iran-houthis-fifth-fleet-admiral.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/18/world/middleeast/iran-houthis-fifth-fleet-admiral.html


Processing URLs:  93%|█████████▎| 932/1000 [1:01:44<01:13,  1.07s/it]

Error extracting text from http://www.cdm.me/english/membership-in-nato-supported-by-45-percent-of-population: 403 Client Error: Forbidden for url: https://www.cdm.me/english/membership-in-nato-supported-by-45-percent-of-population
URL filtered: https://www.youtube.com/watch?v=ZLDjQr8OrYQ


Processing URLs:  94%|█████████▍| 938/1000 [1:01:51<01:04,  1.05s/it]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2445907&amp;CategoryId=10717: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2445907&amp;CategoryId=10717


Processing URLs:  94%|█████████▍| 940/1000 [1:01:53<00:52,  1.14it/s]

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/exclusive-wto-chief-says-vaccine-answer-close-facing-effort-block-it-2021-12-16/:: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/exclusive-wto-chief-says-vaccine-answer-close-facing-effort-block-it-2021-12-16/:


Processing URLs:  95%|█████████▌| 951/1000 [1:02:10<00:54,  1.12s/it]

Error extracting text from https://www.wsj.com/articles/ukraine-russia-voznesensk-town-battle-11647444734: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ukraine-russia-voznesensk-town-battle-11647444734


Processing URLs:  95%|█████████▌| 952/1000 [1:02:11<00:42,  1.14it/s]

Error extracting text from https://www.predictit.org/Market/1318/Which-party-will-win-the-US-Senate-race-in-Wisconsin-in-2016: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1318/Which-party-will-win-the-US-Senate-race-in-Wisconsin-in-2016
URL filtered: https://www.bloomberg.com/politics/articles/2017-02-10/china-u-s-warplanes-had-unsafe-encounter-in-south-china-sea


Processing URLs:  96%|█████████▌| 956/1000 [1:02:17<01:11,  1.62s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-18/spanish-liberal-says-rajoy-accepted-conditions-for-support


Processing URLs:  96%|█████████▌| 959/1000 [1:02:19<00:41,  1.00s/it]

Error extracting text from http://www.latimes.com/politics/la-na-russia-poll-20170401-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-russia-poll-20170401-story.html
Error extracting text from https://icezone.uk/rentberrys-controversial-property-bid-site-expands-in-us/: HTTPSConnectionPool(host='icezone.uk', port=443): Max retries exceeded with url: /rentberrys-controversial-property-bid-site-expands-in-us/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe657a70>: Failed to resolve 'icezone.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  96%|█████████▋| 963/1000 [1:02:25<00:46,  1.25s/it]

Error extracting text from http://www.latimes.com/opinion/readersreact/la-ol-le-roy-moore-baptists-20171119-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/readersreact/la-ol-le-roy-moore-baptists-20171119-story.html


Processing URLs:  96%|█████████▋| 965/1000 [1:02:25<00:28,  1.24it/s]

Error extracting text from https://www.nytimes.com/2017/10/01/world/asia/afghan-airstrike-helmand-province.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/01/world/asia/afghan-airstrike-helmand-province.html


Processing URLs:  97%|█████████▋| 969/1000 [1:02:31<00:38,  1.25s/it]

Error extracting text from http://www.amazon.com/Unlikeable-Problem-Hillary-Edward-Klein-ebook/dp/B011H5178A/ref=zg_bs_157325011_43: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Unlikeable-Problem-Hillary-Edward-Klein-ebook/dp/B011H5178A/ref=zg_bs_157325011_43


Processing URLs:  97%|█████████▋| 970/1000 [1:02:33<00:37,  1.26s/it]

Error extracting text from http://www.debka.com/newsupdatepopup/15108/S-P-lowers-Saudi-credit-rating-one-notch: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /newsupdatepopup/15108/S-P-lowers-Saudi-credit-rating-one-notch (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  98%|█████████▊| 975/1000 [1:03:39<05:38, 13.53s/it]

Error extracting text from http://aa.com.tr/en/politics/haiti-to-get-election-dates-at-end-of-may/558066: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.scientificamerican.com/article/worlds-most-powerful-laser-facility-shifts-focus-to-warheads/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/worlds-most-powerful-laser-facility-shifts-focus-to-warheads/


Processing URLs:  98%|█████████▊| 979/1000 [1:03:42<01:20,  3.85s/it]

Error extracting text from http://www.latimes.com/politics/la-na-live-updates-democr-sanders-if-elected-id-ask-obama-to-withdraw-gar-1460689474-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-live-updates-democr-sanders-if-elected-id-ask-obama-to-withdraw-gar-1460689474-htmlstory.html
URL filtered: https://www.theguardian.com/us-news/2021/oct/02/donald-trump-asks-florida-judge-to-force-twitter-to-reinstate-account


Processing URLs:  98%|█████████▊| 981/1000 [1:03:45<00:48,  2.56s/it]

Error extracting text from http://www.hellenicshippingnews.com/uk-leave-a-final-nail-in-the-coffin-for-stalled-u-s-eu-trade-deal/: 404 Client Error: Not Found for url: https://www.hellenicshippingnews.com/uk-leave-a-final-nail-in-the-coffin-for-stalled-u-s-eu-trade-deal/


Processing URLs:  98%|█████████▊| 982/1000 [1:03:45<00:37,  2.06s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-06-20/venezuela-2016-default-likely-pdvsa-may-go-first-moody-s-says


Processing URLs:  99%|█████████▊| 987/1000 [1:03:50<00:16,  1.23s/it]

Error extracting text from http://thehill.com/policy/finance/257929-week-ahead-ex-im-highways-debt-limit-top-agenda: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/257929-week-ahead-ex-im-highways-debt-limit-top-agenda/


Processing URLs:  99%|█████████▉| 988/1000 [1:03:52<00:16,  1.38s/it]

Error extracting text from http://www.taek.gov.tr/en/belgeler-formlar/documents/Regulations/nuclear-safety/Decree-on-Licensing-of-Nuclear-Installations/: HTTPSConnectionPool(host='www.taek.gov.tr', port=443): Max retries exceeded with url: /en/belgeler-formlar/documents/Regulations/nuclear-safety/Decree-on-Licensing-of-Nuclear-Installations/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from https://www.gardner.senate.gov/newsroom/press-releases/gardner-pressures-president-obama-on-north-korea-sanctions-implementation: HTTPSConnectionPool(host='www.gardner.senate.gov', port=443): Max retries exceeded with url: /newsroom/press-releases/gardner-pressures-president-obama-on-north-korea-sanctions-implementation (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3018647d0>: Failed to resolve 'www.gardner.senate.gov' ([Errn

Processing URLs:  99%|█████████▉| 991/1000 [1:03:57<00:14,  1.58s/it]

Error extracting text from http://www.al-monitor.com/pulse/contents/afp/2015/11/iran-politics-rouhani-judiciary.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/contents/afp/2015/11/iran-politics-rouhani-judiciary.html


Processing URLs:  99%|█████████▉| 993/1000 [1:03:59<00:08,  1.16s/it]

Error extracting text from http://www.nytimes.com/2015/08/26/us/politics/donald-trump-builds-team-to-bolster-ground-game-in-iowa.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/26/us/politics/donald-trump-builds-team-to-bolster-ground-game-in-iowa.html


Processing URLs:  99%|█████████▉| 994/1000 [1:03:59<00:05,  1.10it/s]

Error extracting text from http://news.yahoo.com/african-leaders-burundi-push-talks-081650049.html: 404 Client Error: Not Found for url: http://news.yahoo.com/african-leaders-burundi-push-talks-081650049.html


Processing URLs: 100%|█████████▉| 995/1000 [1:03:59<00:03,  1.41it/s]

Error extracting text from https://www.wsj.com/articles/south-africa-postpones-key-speech-as-standoff-with-zuma-continues-1517925985: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-africa-postpones-key-speech-as-standoff-with-zuma-continues-1517925985


Processing URLs: 100%|█████████▉| 998/1000 [1:04:02<00:01,  1.43it/s]

Error extracting text from http://news.yahoo.com/venezuelan-struggling-socialists-hold-legislative-primaries-130806386.html: 404 Client Error: Not Found for url: http://news.yahoo.com/venezuelan-struggling-socialists-hold-legislative-primaries-130806386.html


Processing URLs: 100%|██████████| 1000/1000 [1:04:03<00:00,  3.84s/it]


Error extracting text from https://www.reuters.com/article/us-germany-politics/cornered-merkel-launches-talks-with-spd-in-bid-to-secure-fourth-term-idUSKBN1EV0RB?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/cornered-merkel-launches-talks-with-spd-in-bid-to-secure-fourth-term-idUSKBN1EV0RB?il=0


Processing URLs:   0%|          | 0/1000 [00:00<?, ?it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/merkel-throws-political-muscle-behind-deal-to-avoid-brexit


Processing URLs:   1%|          | 7/1000 [00:07<17:48,  1.08s/it]

Error extracting text from https://www.opensecrets.org/races/summary.php?cycle=2016&amp;id=NHS1: 403 Client Error: Forbidden for url: https://www.opensecrets.org/races/summary.php?cycle=2016&amp;id=NHS1


Processing URLs:   1%|          | 8/1000 [00:07<14:57,  1.11it/s]

Error extracting text from http://www.kkw-gundremmingen.de/presse.php?id=571: 404 Client Error: Site Not Found for url: http://www.kkw-gundremmingen.de/presse.php?id=571


Processing URLs:   1%|          | 10/1000 [00:11<22:18,  1.35s/it]

Error extracting text from http://www.reuters.com/article/eu-banks-deposits-guarantees-idUSL8N1D25O7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/eu-banks-deposits-guarantees-idUSL8N1D25O7


Processing URLs:   1%|          | 12/1000 [00:12<17:41,  1.07s/it]

Error extracting text from http://uk.businessinsider.com/r-brazil-senate-planning-impeachment-vote-for-last-days-of-olympics-2016-6?r=US&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/(null)/r-brazil-senate-planning-impeachment-vote-for-last-days-of-olympics-2016-6?amp;IR=T&IR=T


Processing URLs:   1%|▏         | 13/1000 [01:12<4:19:17, 15.76s/it]

Error extracting text from https://hub.united.com/2021-06-03-united-adding-supersonic-speeds-with-new-agreement-to-buy-aircraft-from-boom-supersonic-2653216403.html: HTTPSConnectionPool(host='hub.united.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   2%|▏         | 15/1000 [01:17<2:35:31,  9.47s/it]

Error extracting text from http://gfs.eiu.com/Article.aspx?articleType=rf&amp;articleId=864230070&amp;secId=0: 403 Client Error: Forbidden for url: https://www.eiu.com/n/global-themes/global-forecasting-hub


Processing URLs:   2%|▏         | 16/1000 [01:17<1:52:54,  6.88s/it]

Error extracting text from http://www.todayonline.com/world/iraq-gears-late-year-push-retake-mosul-islamic-state?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/iraq-gears-late-year-push-retake-mosul-islamic-state?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s


Processing URLs:   2%|▏         | 17/1000 [01:19<1:28:13,  5.39s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-11/kaspersky-lab-has-been-working-with-russian-intelligence


Processing URLs:   2%|▏         | 20/1000 [01:21<45:04,  2.76s/it]  

Error extracting text from http://mobile.reuters.com/article/idUSKCN12J1L2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN12J1L2


Processing URLs:   2%|▏         | 21/1000 [01:23<43:03,  2.64s/it]

Error extracting text from http://www.appeal-democrat.com/news/regional_news/asia/indonesia-vows-more-decisive-action-after-chinese-ship-spat/article_5f533bd0-15aa-5c6d-a822-f1575bb3c75f.html: 404 Client Error: Not Found for url: https://www.appeal-democrat.com/news/regional_news/asia/indonesia-vows-more-decisive-action-after-chinese-ship-spat/article_5f533bd0-15aa-5c6d-a822-f1575bb3c75f.html


Processing URLs:   2%|▎         | 25/1000 [01:34<43:15,  2.66s/it]

Error extracting text from https://ajw.asahi.com/article/behind_news/politics/AJ201510140049: HTTPSConnectionPool(host='ajw.asahi.com', port=443): Max retries exceeded with url: /article/behind_news/politics/AJ201510140049 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300402fc0>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   3%|▎         | 29/1000 [01:41<32:10,  1.99s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-analysis/in-syria-russia-securing-position-as-assad-presses-war-idUSKBN1E91UU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-analysis/in-syria-russia-securing-position-as-assad-presses-war-idUSKBN1E91UU


Processing URLs:   3%|▎         | 33/1000 [01:44<17:40,  1.10s/it]

Error extracting text from http://www.intelligenceonline.com/corporate-intelligence/the-red-line/2016/03/16/israeli-cyber-investigators-arrive-in-europe%2C108134626-ART-SUM?did=108132865&amp;eid=%3C_$ena_i_id$_%3E: 403 Client Error: Forbidden for url: https://www.intelligenceonline.com/corporate-intelligence/the-red-line/2016/03/16/israeli-cyber-investigators-arrive-in-europe%2C108134626-ART-SUM?did=108132865&amp;eid=%3C_$ena_i_id$_%3E


Processing URLs:   4%|▎         | 37/1000 [01:52<30:28,  1.90s/it]

Error extracting text from http://www.business-standard.com/article/economy-policy/india-up-for-tough-fight-at-rcep-116091000780_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/economy-policy/india-up-for-tough-fight-at-rcep-116091000780_1.html


Processing URLs:   4%|▍         | 40/1000 [01:56<27:35,  1.72s/it]

Error extracting text from https://www.total-croatia-news.com/item/12988-no-final-session-of-parliament-before-dissolution-on-friday: 404 Client Error: Not Found for url: https://total-croatia-news.com/item/12988-no-final-session-of-parliament-before-dissolution-on-friday


Processing URLs:   5%|▍         | 49/1000 [02:46<45:27,  2.87s/it]  

Error extracting text from http://www.waterboards.ca.gov/water_issues/programs/conservation_portal/docs/emergency_reg_fs_011516.pdf: 404 Client Error: Not Found for url: https://www.waterboards.ca.gov/water_issues/programs/conservation_portal/docs/emergency_reg_fs_011516.pdf


Processing URLs:   5%|▌         | 52/1000 [02:51<30:37,  1.94s/it]

Error extracting text from http://www.paddypower.com/bet/politics/other-politics/european-politics?ev_oc_grp_ids=2904253: HTTPConnectionPool(host='www.paddypower.com', port=80): Max retries exceeded with url: /bet/politics/other-politics/european-politics?ev_oc_grp_ids=2904253 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3027ebce0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:   6%|▌         | 55/1000 [02:56<28:11,  1.79s/it]

Error extracting text from http://phys.org/news/2016-01-evidence-bad.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-01-evidence-bad.html


Processing URLs:   6%|▌         | 61/1000 [03:03<16:28,  1.05s/it]

Error extracting text from http://www.wsj.com/articles/challenging-the-u-s-moscow-pushes-into-afghanistan-1485513002?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/challenging-the-u-s-moscow-pushes-into-afghanistan-1485513002?mod=e2fb


Processing URLs:   6%|▋         | 65/1000 [03:09<16:30,  1.06s/it]

Error extracting text from https://thehill.com/homenews/campaign/413572-hillary-clinton-leaves-door-open-for-2020-run-id-like-to-be-president: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/413572-hillary-clinton-leaves-door-open-for-2020-run-id-like-to-be-president/


Processing URLs:   7%|▋         | 68/1000 [03:10<09:09,  1.69it/s]

Error extracting text from http://www.latimes.com/nation/politics/la-na-obama-analysis-20150121-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/politics/la-na-obama-analysis-20150121-story.html
Error extracting text from http://www.reuters.com/article/2015/10/18/bank-of-china-ipo-idUSL3N12I05220151018: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/18/bank-of-china-ipo-idUSL3N12I05220151018


Processing URLs:   7%|▋         | 69/1000 [03:11<10:48,  1.44it/s]

Error extracting text from http://news.xinhuanet.com/english/2015-11/18/c_134830635.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-11/18/c_134830635.htm


Processing URLs:   7%|▋         | 71/1000 [03:12<10:52,  1.42it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/12/31/russia-ukraine-sanctions-cyber-attacks/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/12/31/russia-ukraine-sanctions-cyber-attacks/


Processing URLs:   8%|▊         | 77/1000 [03:22<21:23,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-usa-election-portman-idUSKCN10Y0VC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-portman-idUSKCN10Y0VC


Processing URLs:   8%|▊         | 79/1000 [03:27<31:07,  2.03s/it]

Error extracting text from http://www.eso.org/public/usa/teles-instr/paranal/: 404 Client Error: Not Found for url: https://www.eso.org/public/usa/teles-instr/paranal-observatory/vlt/


Processing URLs:   8%|▊         | 80/1000 [03:28<28:28,  1.86s/it]

Error extracting text from http://sec.gov/: 403 Client Error: Forbidden for url: http://sec.gov/


Processing URLs:   8%|▊         | 81/1000 [03:29<23:53,  1.56s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/


Processing URLs:   8%|▊         | 82/1000 [03:32<28:11,  1.84s/it]

Error extracting text from http://www.ibtimes.com/russia-announces-syria-troop-pullout-putin-says-main-goals-achieved-2336168: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russia-announces-syria-troop-pullout-putin-says-main-goals-achieved-2336168


Processing URLs:   8%|▊         | 83/1000 [03:33<23:04,  1.51s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-17/tillerson-doesn-t-rule-out-preemptive-strike-against-north-korea


Processing URLs:   8%|▊         | 85/1000 [03:34<18:00,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-opec-meeting-idUSKBN0TQ00520151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting-idUSKBN0TQ00520151207


Processing URLs:   9%|▉         | 89/1000 [03:37<12:21,  1.23it/s]

Error extracting text from http://www.nytimes.com/2016/11/28/world/africa/jacob-zuma-south-africa-anc.html?emc=edit_ae_20161128&amp;nl=todaysheadlines-asia&amp;nlid=77825025: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/28/world/africa/jacob-zuma-south-africa-anc.html?emc=edit_ae_20161128&amp;nl=todaysheadlines-asia&amp;nlid=77825025


Processing URLs:   9%|▉         | 92/1000 [03:39<10:41,  1.42it/s]

Error extracting text from http://focustaiwan.tw/news/aipl/201512270019.aspx: 403 Client Error: Forbidden for url: https://focustaiwan.tw:443/news/aipl/201512270019.aspx


Processing URLs:   9%|▉         | 94/1000 [03:40<10:17,  1.47it/s]

Error extracting text from http://www.tradingeconomics.com/montenegro/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/montenegro/rating


Processing URLs:  10%|▉         | 97/1000 [03:44<16:29,  1.10s/it]

Error extracting text from http://www.joc.com/port-news/panama-canal-news/panama%E2%80%99s-new-locks-seen-slowly-gaining-new-container-services_20160125.html: 404 Client Error: Not Found for url: https://www.joc.com/article/panamas-new-locks-seen-slowly-gaining-new-container-services_20160125.html


Processing URLs:  10%|▉         | 98/1000 [03:47<25:54,  1.72s/it]

Error extracting text from http://www.religiousfreedomcoalition.org/2016/02/04/a-human-rights-threat-alert-on-genocide-in-west-africa/: 404 Client Error: Not Found for url: https://religiousfreedomcoalition.org/2016/02/04/a-human-rights-threat-alert-on-genocide-in-west-africa/


Processing URLs:  10%|█         | 104/1000 [03:56<20:30,  1.37s/it]

Error extracting text from http://www.scotsman.com/scottish-independence/key-topic/currency/: 403 Client Error: Forbidden for url: https://www.scotsman.com/scottish-independence/key-topic/currency/


Processing URLs:  12%|█▏        | 118/1000 [04:32<16:27,  1.12s/it]  

Error extracting text from http://www.reuters.com/article/2015/09/22/panama-canal-idUSL1N11S1DE20150922#1qIckSpak1FU0MwF.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/22/panama-canal-idUSL1N11S1DE20150922#1qIckSpak1FU0MwF.97


Processing URLs:  12%|█▏        | 123/1000 [04:42<28:50,  1.97s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/795226/walter-reed-scientists-test-zika-vaccine-candidate: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/795226/walter-reed-scientists-test-zika-vaccine-candidate


Processing URLs:  13%|█▎        | 126/1000 [04:54<43:06,  2.96s/it]

Error extracting text from https://www.sfgate.com/news/article/Putin-ally-warns-of-arms-race-as-Russia-considers-12603984.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/news/article/Putin-ally-warns-of-arms-race-as-Russia-considers-12603984.php
URL filtered: https://www.youtube.com/watch?v=ol_KL69MihQ


Processing URLs:  13%|█▎        | 132/1000 [05:08<44:45,  3.09s/it]

Error extracting text from http://www.reuters.com/article/us-asean-philippines-idUSKBN1600I3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-philippines-idUSKBN1600I3


Processing URLs:  14%|█▎        | 136/1000 [05:13<30:50,  2.14s/it]

Error extracting text from http://www.asmscience.org/content/book/10.1128/9781555815899.ch04: HTTPSConnectionPool(host='asmscience.org', port=443): Max retries exceeded with url: /content/book/10.1128/9781555815899.ch04 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  14%|█▍        | 138/1000 [05:15<21:31,  1.50s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/sf.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/sf.html


Processing URLs:  14%|█▍        | 145/1000 [05:22<13:45,  1.04it/s]

Error extracting text from http://www.cavatoyota.com/blog/toyota-mirai-has-surpassed-sales-projections/: 403 Client Error: Forbidden for url: http://www.cavatoyota.com/blog/toyota-mirai-has-surpassed-sales-projections/


Processing URLs:  15%|█▍        | 149/1000 [05:25<10:24,  1.36it/s]

Error extracting text from http://www.reuters.com/article/2015/11/23/us-saudi-oil-cabinet-idUSKBN0TC16N20151123: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/23/us-saudi-oil-cabinet-idUSKBN0TC16N20151123
Error extracting text from http://blog.toyota.co.uk/toyota-mirai-production-increased: HTTPConnectionPool(host='blog.toyota.co.uk', port=80): Max retries exceeded with url: /toyota-mirai-production-increased (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb70b0>: Failed to resolve 'blog.toyota.co.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 154/1000 [05:42<29:40,  2.10s/it]

Error extracting text from http://www.boerse-berlin.com/index.php/Bonds?isin=USP7807HAK16: 403 Client Error: Forbidden for url: https://www.boerse-berlin.com/index.php/Bonds?isin=USP7807HAK16


Processing URLs:  16%|█▌        | 155/1000 [05:44<29:46,  2.11s/it]

Error extracting text from http://www.volkswagenag.com/content/vwcorp/content/en/investor_relations/fixec_income/debt_issuance_programs.html: 404 Client Error: Not Found for url: https://www.volkswagen-group.com/content/vwcorp/content/en/investor_relations/fixec_income/debt_issuance_programs.html


Processing URLs:  16%|█▌        | 156/1000 [05:46<32:23,  2.30s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-07/saudi-aramco-chairman-al-falih-replaces-al-naimi-as-oil-minister


Processing URLs:  16%|█▌        | 159/1000 [05:49<22:09,  1.58s/it]

Error extracting text from http://insideevs.com/behind-scenes-toyota-mirai-production-3-made-per-day-videos-images/: 404 Client Error: Not Found for url: https://insideevs.com:443/behind-scenes-toyota-mirai-production-3-made-per-day-videos-images/


Processing URLs:  16%|█▌        | 161/1000 [05:53<24:33,  1.76s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-01/rousseff-poised-to-win-on-impeachment-lose-on-brazilian-economy


Processing URLs:  16%|█▋        | 163/1000 [05:57<24:34,  1.76s/it]

Error extracting text from https://www.northkoreatech.org/2016/02/08/heres-what-we-know-about-kwangmyongsong-4-so-far/: 403 Client Error: Forbidden for url: https://www.northkoreatech.org/2016/02/08/heres-what-we-know-about-kwangmyongsong-4-so-far/


Processing URLs:  17%|█▋        | 169/1000 [06:08<21:55,  1.58s/it]

Error extracting text from http://srbin.info/2015/08/15/kineska-vojska-intezivno-vezba-napad-na-tajvan-video/: 403 Client Error: Forbidden for url: http://srbin.info/2015/08/15/kineska-vojska-intezivno-vezba-napad-na-tajvan-video/
Error extracting text from http://www.kaleme.com/1395/01/21/klm-240733/: HTTPConnectionPool(host='www.kaleme.com', port=80): Max retries exceeded with url: /1395/01/21/klm-240733/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5cd70>: Failed to resolve 'www.kaleme.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  17%|█▋        | 171/1000 [06:10<15:37,  1.13s/it]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2014/october/tradoc_152859.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2014/october/tradoc_152859.pdf
Error extracting text from http://news.discovery.com/human/health/zika-virus-may-cause-more-problems-in-fetuses-160306.htm: HTTPConnectionPool(host='news.discovery.com', port=80): Max retries exceeded with url: /human/health/zika-virus-may-cause-more-problems-in-fetuses-160306.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5f140>: Failed to resolve 'news.discovery.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  17%|█▋        | 173/1000 [06:10<10:15,  1.34it/s]

Error extracting text from https://www.wionews.com/opinions-blogs/evidence-suggests-genetic-manipulation-in-covid-19-303189: 403 Client Error: Forbidden for url: https://www.wionews.com/opinions-blogs/evidence-suggests-genetic-manipulation-in-covid-19-303189


Processing URLs:  17%|█▋        | 174/1000 [06:10<08:38,  1.59it/s]

Error extracting text from https://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/): 403 Client Error: Forbidden for url: https://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/)


Processing URLs:  18%|█▊        | 177/1000 [06:15<17:34,  1.28s/it]

Error extracting text from http://www.argusmedia.com/news/article/?id=1545263: 404 Client Error: Not Found for url: https://www.argusmedia.com/not-found


Processing URLs:  19%|█▊        | 187/1000 [07:38<4:21:37, 19.31s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2015/09/11/russia-calls-on-other-nations-to-help-arm-syrian-government: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  19%|█▉        | 192/1000 [07:43<52:18,  3.88s/it]  

Error extracting text from http://bigstory.ap.org/article/b0179525526e48f69289b6fcb31de9e4/down-another-key-minister-iraq-continues-mosul-push?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=Defense%20EBB%2008-29-16&amp;utm_term=Editorial%20-%20Early%20Bird%20Brief: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/b0179525526e48f69289b6fcb31de9e4/down-another-key-minister-iraq-continues-mosul-push?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=Defense%20EBB%2008-29-16&amp;utm_term=Editorial%20-%20Early%20Bird%20Brief (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fea0dd00>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  19%|█▉        | 194/1000 [07:44<29:02,  2.16s/it]

Error extracting text from https://www.ipsos.com/ipsos-mori/en-uk/boris-johnson-seen-less-trustworthy-keir-starmer-david-cameron-receives-lowest-ratings-all: 403 Client Error: Forbidden for url: https://www.ipsos.com/ipsos-mori/en-uk/boris-johnson-seen-less-trustworthy-keir-starmer-david-cameron-receives-lowest-ratings-all


Processing URLs:  20%|█▉        | 195/1000 [07:45<24:42,  1.84s/it]

URL filtered: https://twitter.com/DrSasa22222/status/1438433948607004673?s=20


Processing URLs:  20%|█▉        | 199/1000 [07:49<14:39,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-asia-oil-analysis-idUSKBN16O07Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asia-oil-analysis-idUSKBN16O07Z


Processing URLs:  20%|██        | 202/1000 [07:50<07:55,  1.68it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-cholera-who-idUSKBN1A61GY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-cholera-who-idUSKBN1A61GY
Error extracting text from https://www.straitstimes.com/sport/olympics-cancel-the-tokyo-games-huge-consequences-and-a-financial-quagmire: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  20%|██        | 203/1000 [07:51<09:45,  1.36it/s]

URL filtered: http://www.nydailynews.com/news/world/russian-front-fake-facebooks-promote-anti-refugee-event-article-1.3489163


Processing URLs:  21%|██        | 206/1000 [07:53<09:15,  1.43it/s]

Error extracting text from https://www.reuters.com/video/2017/10/10/dow-hits-record-high?videoId=372708621&amp;videoChannel=5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2017/10/10/dow-hits-record-high?videoId=372708621&amp;videoChannel=5


Processing URLs:  21%|██        | 208/1000 [07:53<07:01,  1.88it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/264748-trump-we-made-iran-a-power: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/264748-trump-we-made-iran-a-power/


Processing URLs:  21%|██        | 209/1000 [07:54<08:21,  1.58it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-10/assad-regime-woos-asian-powers-to-bolster-position-before-talks


Processing URLs:  21%|██▏       | 213/1000 [07:56<06:42,  1.95it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/ia/iowa_republican_presidential_caucus-3194.html#polls
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16U14Q?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16U14Q?il=0


Processing URLs:  22%|██▏       | 215/1000 [07:57<04:40,  2.80it/s]

Error extracting text from https://english.alarabiya.net/News/gulf/2021/03/22/Saudi-Arabia-proposes-new-peace-plan-to-end-Yemen-war-FM: 403 Client Error: Forbidden for url: https://english.alarabiya.net/News/gulf/2021/03/22/Saudi-Arabia-proposes-new-peace-plan-to-end-Yemen-war-FM


Processing URLs:  22%|██▏       | 216/1000 [08:14<55:38,  4.26s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-03/kurds-tighten-grip-on-north-iraq-oil-fields-with-kirkuk-deal


Processing URLs:  22%|██▏       | 222/1000 [08:21<23:25,  1.81s/it]

Error extracting text from http://www.ibtimes.co.uk/oromo-protests-ethiopia-apologises-deaths-vows-crackdown-anti-peace-forces-1548933: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/oromo-protests-ethiopia-apologises-deaths-vows-crackdown-anti-peace-forces-1548933


Processing URLs:  23%|██▎       | 227/1000 [08:28<17:43,  1.38s/it]

Error extracting text from http://www.wsj.com/articles/imf-wants-eurozone-debt-relief-for-greece-until-2040-1463468493: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/imf-wants-eurozone-debt-relief-for-greece-until-2040-1463468493


Processing URLs:  23%|██▎       | 228/1000 [08:30<20:42,  1.61s/it]

Error extracting text from https://uk.reuters.com/article/uk-pakistan-taliban/taliban-leader-approved-islamabad-meeting-on-afghan-peace-talks-sources-idUKKBN1F6222: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  23%|██▎       | 230/1000 [08:31<13:03,  1.02s/it]

Error extracting text from https://www.findlaw.com/litigation/legal-system/how-does-the-u-s-supreme-court-decide-whether-to-hear-a-case.html: 403 Client Error: Forbidden for url: https://www.findlaw.com/litigation/legal-system/how-does-the-u-s-supreme-court-decide-whether-to-hear-a-case.html


Processing URLs:  23%|██▎       | 232/1000 [08:32<09:37,  1.33it/s]

Error extracting text from http://foundersfund.com/search/universal%20basic%20income: 403 Client Error: Forbidden for url: http://foundersfund.com/search/universal%20basic%20income


Processing URLs:  23%|██▎       | 234/1000 [08:36<14:58,  1.17s/it]

Error extracting text from https://www.oddschecker.com/football/world-cup/winner: 403 Client Error: Forbidden for url: https://www.oddschecker.com/football/world-cup/winner


Processing URLs:  24%|██▎       | 237/1000 [08:38<11:11,  1.14it/s]

Error extracting text from http://www.newsletter.co.uk/news/opinion-be-bold-stormont-go-for-a-lower-benefits-cap-than-gb-1-7010577: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/opinion-be-bold-stormont-go-for-a-lower-benefits-cap-than-gb-1-7010577


Processing URLs:  24%|██▍       | 239/1000 [08:43<22:51,  1.80s/it]

URL filtered: https://www.youtube.com/watch?v=-4YrCFz0Kfc


Processing URLs:  24%|██▍       | 244/1000 [08:46<09:40,  1.30it/s]

URL filtered: https://www.facebook.com/permalink.php
Error extracting text from http://www.worldbulletin.net/headlines/176053/turkey-in-pursuit-of-24b-foreign-investment: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/headlines/176053/turkey-in-pursuit-of-24b-foreign-investment


Processing URLs:  25%|██▍       | 247/1000 [08:50<14:02,  1.12s/it]

Error extracting text from https://www.projekt-u5.de/en/museumsinsel/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  25%|██▍       | 249/1000 [08:52<12:57,  1.04s/it]

Error extracting text from http://www.washingtontimes.com/news/2014/jul/15/husain-counting-violent-muslim-deaths-country-2014/?page=all: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2014/jul/15/husain-counting-violent-muslim-deaths-country-2014/?page=all


Processing URLs:  25%|██▌       | 252/1000 [08:53<07:33,  1.65it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-delay-idUSKBN1AC1KW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-delay-idUSKBN1AC1KW?il=0
Error extracting text from http://drones.fsd.ch/wp-content/uploads/2016/11/Drones-in-Humanitarian-Action.pdf: HTTPConnectionPool(host='drones.fsd.ch', port=80): Max retries exceeded with url: /wp-content/uploads/2016/11/Drones-in-Humanitarian-Action.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3037756d0>: Failed to resolve 'drones.fsd.ch' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  26%|██▌       | 257/1000 [09:03<18:21,  1.48s/it]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2758222: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2758222
Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/novavax-testing-vaccine-that-targets-new-covid-19-variant-2021-11-26/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/novavax-testing-vaccine-that-targets-new-covid-19-variant-2021-11-26/


Processing URLs:  26%|██▌       | 261/1000 [09:10<20:22,  1.65s/it]

Error extracting text from https://m.yenisafak.com/en/world/four-iran-linked-afghan-shia-militiamen-killed-in-syria-3114770: 422 Client Error:  for url: https://www.yenisafak.com/en/world/four-iran-linked-afghan-shia-militiamen-killed-in-syria-3114770


Processing URLs:  26%|██▋       | 265/1000 [09:17<20:38,  1.69s/it]

Error extracting text from http://asia.nikkei.com/Markets/Commodities/Iran-undercuts-Saudi-Arabia-again-in-oil-share-battle: 404 Client Error: Not Found for url: https://asia.nikkei.com/Markets/Commodities/Iran-undercuts-Saudi-Arabia-again-in-oil-share-battle
Error extracting text from http://www.sessions.senate.gov/public/index.cfm/2016/6/at-least-580-individuals-convicted-in-terror-cases-since-9-11-at-least-380-are-foreign-born: HTTPConnectionPool(host='www.sessions.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/2016/6/at-least-580-individuals-convicted-in-terror-cases-since-9-11-at-least-380-are-foreign-born (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3042e7b30>: Failed to resolve 'www.sessions.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  27%|██▋       | 267/1000 [09:18<16:36,  1.36s/it]

Error extracting text from https://news.umich.edu/u-m-economists-see-us-growth-slowing-until-coronavirus-vaccine-becomes-broadly-available/: 403 Client Error: Forbidden for url: https://news.umich.edu/u-m-economists-see-us-growth-slowing-until-coronavirus-vaccine-becomes-broadly-available/
URL filtered: http://www.mediaite.com/online/facebook-vp-dodges-fake-news-question-by-saying-site-is-a-giant-echo-chamber/


Processing URLs:  27%|██▋       | 272/1000 [09:27<24:24,  2.01s/it]

Error extracting text from http://www.timesofisrael.com/irans-parliament-in-the-balance-in-election-run-off/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/irans-parliament-in-the-balance-in-election-run-off/


Processing URLs:  28%|██▊       | 281/1000 [09:43<21:15,  1.77s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16D03L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16D03L


Processing URLs:  28%|██▊       | 285/1000 [09:51<21:29,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-europe-usa-trade-idUSKCN0XI0AT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-usa-trade-idUSKCN0XI0AT


Processing URLs:  29%|██▉       | 290/1000 [09:58<18:01,  1.52s/it]

Error extracting text from http://af.reuters.com/article/worldNews/idAFKBN0TZ22Q20151216: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  29%|██▉       | 294/1000 [10:08<16:48,  1.43s/it]

Error extracting text from https://www.uscis.gov/immigrationaction: 403 Client Error: Forbidden for url: https://www.uscis.gov/immigrationaction
Error extracting text from https://www.opensecrets.org/states/summary.php?state=NC: 403 Client Error: Forbidden for url: https://www.opensecrets.org/states/summary.php?state=NC


Processing URLs:  30%|██▉       | 298/1000 [10:12<13:08,  1.12s/it]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/


Processing URLs:  30%|██▉       | 299/1000 [10:13<11:04,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/shifting-political-landscape-in-u-s-prompts-saudi-arabia-to-rethink-financial-strategy-1481904434: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/shifting-political-landscape-in-u-s-prompts-saudi-arabia-to-rethink-financial-strategy-1481904434
URL filtered: https://twitter.com/USAmbUN/status/1424051526172090370


Processing URLs:  30%|███       | 301/1000 [10:13<06:45,  1.73it/s]

Error extracting text from http://espn.go.com/mlb/story/_/id/8995605/milwaukee-brewers-missing-italian-sausage-costume-found: 403 Client Error: Forbidden for url: http://espn.go.com/mlb/story/_/id/8995605/milwaukee-brewers-missing-italian-sausage-costume-found


Processing URLs:  30%|███       | 304/1000 [10:16<10:03,  1.15it/s]

URL filtered: https://twitter.com/hashtag/podestaemails32?vertical=news&amp;src=hash


Processing URLs:  31%|███       | 307/1000 [10:17<06:26,  1.79it/s]

Error extracting text from http://www.wsj.com/articles/south-korea-and-u-s-begin-formal-talks-on-missile-shield-1454831176: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korea-and-u-s-begin-formal-talks-on-missile-shield-1454831176


Processing URLs:  31%|███       | 312/1000 [10:28<17:23,  1.52s/it]

Error extracting text from https://www.wsj.com/articles/oil-sinks-to-3-month-low-as-u-s-crude-reserves-swell-1489058059: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-sinks-to-3-month-low-as-u-s-crude-reserves-swell-1489058059


Processing URLs:  32%|███▏      | 315/1000 [11:30<3:31:34, 18.53s/it]

Error extracting text from http://www.itv.com/news/update/2016-06-24/salmond-scottish-referendum-would-be-dictated-by-brexit-negotiations/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://twitter.com/Sara__Firth/status/1482981864679821313


Processing URLs:  32%|███▏      | 320/1000 [11:42<1:18:56,  6.96s/it]

Error extracting text from https://www.washingtonpost.com/politics/congress/analysis-nuclear-deal-puts-us-between-iran-and-a-hard-place/2016/04/10/6a44ecca-ff17-11e5-8bb1-f124a43f84dc_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/congress/analysis-nuclear-deal-puts-us-between-iran-and-a-hard-place/2016/04/10/6a44ecca-ff17-11e5-8bb1-f124a43f84dc_story.html


Processing URLs:  32%|███▏      | 321/1000 [11:42<57:33,  5.09s/it]  

Error extracting text from https://www.yahoo.com/news/turkey-complaints-pile-european-rights-court-194757430.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/turkey-complaints-pile-european-rights-court-194757430.html


Processing URLs:  33%|███▎      | 326/1000 [11:50<19:52,  1.77s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/259994: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/259994


Processing URLs:  33%|███▎      | 328/1000 [11:52<13:51,  1.24s/it]

URL filtered: https://twitter.com/ioannZH/status/1394910975271317504


Processing URLs:  33%|███▎      | 330/1000 [11:55<14:55,  1.34s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_01DEC.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_01DEC.pdf


Processing URLs:  34%|███▎      | 336/1000 [12:02<13:45,  1.24s/it]

Error extracting text from https://www.reuters.com/world/europe/dutch-set-announce-findings-omicron-cases-among-safrica-travellers-2021-11-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/dutch-set-announce-findings-omicron-cases-among-safrica-travellers-2021-11-28/


Processing URLs:  34%|███▍      | 345/1000 [12:13<15:17,  1.40s/it]

Error extracting text from http://jen.jiji.com/jc/i?g=eco&amp;k=2015110700260: HTTPSConnectionPool(host='jen.jiji.com', port=443): Max retries exceeded with url: /jc/i?g=eco&amp;k=2015110700260 (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1000)')))


Processing URLs:  35%|███▌      | 353/1000 [12:29<14:35,  1.35s/it]

URL filtered: https://www.google.ch/amp/s/www.bloomberg.com/amp/news/articles/2018-01-10/south-africa-s-zuma-retains-office-as-ramaphosa-bides-his-time


Processing URLs:  36%|███▌      | 362/1000 [12:43<15:42,  1.48s/it]

Error extracting text from http://www.nytimes.com/2016/04/22/us/politics/ted-cruz-campaign.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/us/politics/ted-cruz-campaign.html


Processing URLs:  36%|███▋      | 363/1000 [12:48<25:38,  2.42s/it]

URL filtered: http://www.philly.com/philly/news/politics/presidential/russia-fake-twitter-facebook-posts-accounts-trump-election-jenna-abrams-20171103.html


Processing URLs:  36%|███▋      | 365/1000 [13:48<2:39:57, 15.11s/it]

Error extracting text from http://www.star-telegram.com/sports/nfl/dallas-cowboys/article63176942.html: HTTPConnectionPool(host='www.star-telegram.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  37%|███▋      | 366/1000 [13:49<2:01:01, 11.45s/it]

Error extracting text from https://www.nytimes.com/2017/02/19/science/spacex-launch-kennedy-space-center.html?emc=edit_th_20170220&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/19/science/spacex-launch-kennedy-space-center.html?emc=edit_th_20170220&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  37%|███▋      | 371/1000 [13:54<33:16,  3.17s/it]  

Error extracting text from http://phys.org/news/2016-01-cyberattack-ukraine-power-grid.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-01-cyberattack-ukraine-power-grid.html


Processing URLs:  37%|███▋      | 374/1000 [13:56<15:00,  1.44s/it]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/11/30/venezuela-as-pdvsa-makes-late-payment-2017-bond-risk-rises/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/11/30/venezuela-as-pdvsa-makes-late-payment-2017-bond-risk-rises/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.alertatotal.net/2016/03/dilma-corre-risco-de-ser-processada.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.alertatotal.net/2016/03/dilma-corre-risco-de-ser-processada.html&amp;prev=search


Processing URLs:  38%|███▊      | 376/1000 [13:58<11:39,  1.12s/it]

Error extracting text from http://www.reuters.com/article/us-un-election-idUSKCN10E20G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-idUSKCN10E20G


Processing URLs:  38%|███▊      | 378/1000 [14:02<16:21,  1.58s/it]

Error extracting text from http://data.unhcr.org/mediterranean/country.php?id=105: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/country.php?id=105


Processing URLs:  38%|███▊      | 380/1000 [14:06<19:11,  1.86s/it]

Error extracting text from http://www.daftar.org/Eng/faq_eng.asp?lang=Eng: 404 Client Error: Not Found for url: https://daftar.org/Eng/faq_eng.asp?lang=Eng
Error extracting text from http://www.reuters.com/article/us-usa-trump-rally-idUSKBN1AB031: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-rally-idUSKBN1AB031


Processing URLs:  38%|███▊      | 382/1000 [14:07<13:12,  1.28s/it]

Error extracting text from http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html?emc=edit_th_20160114&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html?emc=edit_th_20160114&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  38%|███▊      | 385/1000 [14:20<24:43,  2.41s/it]

Error extracting text from http://www.homebuyinginstitute.com/news/mortgage-rates-to-rise-slightly-669/#ixzz3lDbbsyPW: 404 Client Error: Not Found for url: https://homebuyinginstitute.com/news/mortgage-rates-to-rise-slightly-669/#ixzz3lDbbsyPW


Processing URLs:  39%|███▉      | 388/1000 [14:24<16:49,  1.65s/it]

Error extracting text from https://www.predictit.org/Contract/554/Will-Greece-declare-a-new-national-currency-in-2015#openoffers1: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/554/Will-Greece-declare-a-new-national-currency-in-2015#openoffers1


Processing URLs:  39%|███▉      | 392/1000 [14:49<39:20,  3.88s/it]  

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-iraq-mosul-idUKKBN13A139: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  39%|███▉      | 394/1000 [15:52<3:26:03, 20.40s/it]

Error extracting text from http://www.usnews.com/news/the-report/articles/2015/09/28/why-public-opinion-polls-are-increasingly-inaccurate: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  40%|███▉      | 396/1000 [15:55<1:47:02, 10.63s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-test-idUSKCN0XE05B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-test-idUSKCN0XE05B


Processing URLs:  40%|███▉      | 397/1000 [15:58<1:25:29,  8.51s/it]

Error extracting text from http://www.qu.edu/news-and-events/quinnipiac-university-poll/2016-presidential-swing-state-polls/release-detail?ReleaseID=2366: 404 Client Error: Not Found for url: https://www.qu.edu/news-and-events/quinnipiac-university-poll/2016-presidential-swing-state-polls/release-detail/?ReleaseID=2366


Processing URLs:  40%|███▉      | 399/1000 [15:59<44:17,  4.42s/it]  

Error extracting text from https://www.nytimes.com/2017/05/14/world/europe/german-elections-angela-merkel-martin-schulz.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/14/world/europe/german-elections-angela-merkel-martin-schulz.html?_r=0


Processing URLs:  40%|████      | 402/1000 [16:03<22:12,  2.23s/it]

Error extracting text from http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA396780: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /cgi-bin/GetTRDoc?AD=ADA396780 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  40%|████      | 405/1000 [16:07<14:44,  1.49s/it]

Error extracting text from https://www.visionofhumanity.org/wp-content/uploads/2020/11/GTI-2020-web-1.pdf: 403 Client Error: Forbidden for url: https://www.visionofhumanity.org/wp-content/uploads/2020/11/GTI-2020-web-1.pdf


Processing URLs:  41%|████      | 408/1000 [16:16<26:57,  2.73s/it]

Error extracting text from http://irrigation.punjab.gov.in/OldVersion/: 404 Client Error: Not Found for url: http://irrigation.punjab.gov.in/OldVersion/
Error extracting text from http://www.reuters.com/article/us-tesla-gigafactory-idUSKCN1062SR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-gigafactory-idUSKCN1062SR


Processing URLs:  41%|████▏     | 413/1000 [16:29<23:15,  2.38s/it]

Error extracting text from http://news.nationalpost.com/sports/rio-2016/not-sounding-alarm-over-zika-shows-world-health-organization-too-close-to-olympic-committee-critics-say: 403 Client Error: Forbidden for url: https://nationalpost.com/category/news//


Processing URLs:  42%|████▏     | 418/1000 [16:47<31:04,  3.20s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-new-hampshire-presidential-democratic-primary: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-new-hampshire-presidential-democratic-primary


Processing URLs:  42%|████▏     | 420/1000 [16:49<18:52,  1.95s/it]

Error extracting text from http://www.cdm.me/english/milic-montenegro-must-be-careful-in-its-collaboration-with-russia: 403 Client Error: Forbidden for url: https://www.cdm.me/english/milic-montenegro-must-be-careful-in-its-collaboration-with-russia


Processing URLs:  42%|████▏     | 423/1000 [16:54<13:50,  1.44s/it]

URL filtered: https://twitter.com/joshtpm/status/748914830816477184


Processing URLs:  42%|████▎     | 425/1000 [16:55<09:57,  1.04s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/02/26/world/middleeast/ap-ml-iran-election-whos-who.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/02/26/world/middleeast/ap-ml-iran-election-whos-who.html


Processing URLs:  43%|████▎     | 426/1000 [16:56<10:35,  1.11s/it]

Error extracting text from http://in.reuters.com/article/us-europe-migrants-turkey-eu-idINKCN0VF0IU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  43%|████▎     | 430/1000 [17:01<10:18,  1.09s/it]

Error extracting text from http://www.fao.org/news/story/en/item/471463/icode/: 404 Client Error: Not Found for url: https://www.fao.org/news/story/en/item/471463/icode/


Processing URLs:  43%|████▎     | 434/1000 [17:08<17:06,  1.81s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-11/oil-majors-face-ratings-cuts-amid-weak-recovery-s-p-global-says


Processing URLs:  44%|████▎     | 437/1000 [17:09<09:43,  1.04s/it]

Error extracting text from http://cleantechnica.com/2015/11/05/self-driving-tesla-ridesharing-1-million-mile-drive-units-really-efficient-capital-efficiency-3-tidbits-from-teslas-financials-call/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2015/11/05/self-driving-tesla-ridesharing-1-million-mile-drive-units-really-efficient-capital-efficiency-3-tidbits-from-teslas-financials-call/


Processing URLs:  44%|████▍     | 440/1000 [17:11<06:56,  1.35it/s]

Error extracting text from https://www.wsj.com/articles/oil-seen-stuck-in-mid-50s-range-1488780001?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-seen-stuck-in-mid-50s-range-1488780001?mod=e2fb


Processing URLs:  44%|████▍     | 442/1000 [17:12<04:58,  1.87it/s]

Error extracting text from https://www.wsj.com/articles/tiny-hard-drive-uses-single-atoms-to-store-data-1468854001: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tiny-hard-drive-uses-single-atoms-to-store-data-1468854001
URL filtered: http://www.bloomberg.com/news/articles/2015-08-11/how-to-impeach-a-brazilian-president-a-step-by-step-guide


Processing URLs:  44%|████▍     | 445/1000 [17:15<08:10,  1.13it/s]

Error extracting text from http://www.kspr.com/news/politics/biden-advisers-eyeing-ballot-deadlines/21051736_35271186: 404 Client Error: Not Found for url: https://www.ky3.com/news/politics/biden-advisers-eyeing-ballot-deadlines/21051736_35271186/


Processing URLs:  45%|████▍     | 449/1000 [17:26<17:17,  1.88s/it]

URL filtered: https://www.youtube.com/watch?v=2iefXdC794I
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-talks-idUSKBN0TT23020151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-talks-idUSKBN0TT23020151210


Processing URLs:  45%|████▌     | 451/1000 [17:29<14:39,  1.60s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://economico.sapo.pt/noticias/japao-eua-e-franca-vao-desmantelar-central-nuclear-de-fukushima_244768.html&amp;usg=ALkJrhiWuo1BZmm7ZIx6cetCda3oEleyGw: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://economico.sapo.pt/noticias/japao-eua-e-franca-vao-desmantelar-central-nuclear-de-fukushima_244768.html&amp;usg=ALkJrhiWuo1BZmm7ZIx6cetCda3oEleyGw


Processing URLs:  45%|████▌     | 454/1000 [17:31<09:53,  1.09s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-tillerson-idUSKBN16S04I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-tillerson-idUSKBN16S04I


Processing URLs:  46%|████▌     | 455/1000 [17:34<14:39,  1.61s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/as-support-falls-german-spd-sees-no-plan-b-to-merkel-coalition-idUSKCN1G10GG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/as-support-falls-german-spd-sees-no-plan-b-to-merkel-coalition-idUSKCN1G10GG?il=0


Processing URLs:  46%|████▋     | 464/1000 [17:58<25:11,  2.82s/it]

URL filtered: https://m.youtube.com/watch?v=SoF_UtLoa3U


Processing URLs:  47%|████▋     | 466/1000 [17:59<15:58,  1.80s/it]

Error extracting text from http://www.reuters.tv/v/CFT/2016/07/21/mosul-no-longer-a-long-shot-for-iraqi-army: HTTPConnectionPool(host='www.reuters.tv', port=80): Max retries exceeded with url: /v/CFT/2016/07/21/mosul-no-longer-a-long-shot-for-iraqi-army (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300ebad50>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.youtube.com/watch?v=Av7kpPJqac0


Processing URLs:  48%|████▊     | 481/1000 [18:22<13:31,  1.56s/it]

Error extracting text from http://www.swissinfo.ch/eng/reuters/bulgaria-says-sticking-to-its-u-n--candidate-for-now-after-talk-of-change/42441738: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/reuters/bulgaria-says-sticking-to-its-u-n--candidate-for-now-after-talk-of-change/42441738
Error extracting text from http://www.portman.senate.gov/public/index.cfm/2016/11/portman-brown-call-for-immediate-action-to-protect-healthcare-pensions-of-retired-miners: HTTPConnectionPool(host='www.portman.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/2016/11/portman-brown-call-for-immediate-action-to-protect-healthcare-pensions-of-retired-miners (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe840560>: Failed to resolve 'www.portman.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 485/1000 [18:26<09:33,  1.11s/it]

Error extracting text from https://balkaninsight.com/2021/09/03/north-macedonia-to-start-delayed-census-despite-covid-19-fears/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/09/03/north-macedonia-to-start-delayed-census-despite-covid-19-fears/


Processing URLs:  49%|████▉     | 490/1000 [18:33<10:37,  1.25s/it]

Error extracting text from http://seekingalpha.com/article/3694706-opec-meeting-appears-doomed: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3694706-opec-meeting-appears-doomed


Processing URLs:  49%|████▉     | 492/1000 [18:43<22:54,  2.71s/it]

Error extracting text from http://www.reuters.com/article/us-northkokrea-congress-idUSKCN0XY0QB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkokrea-congress-idUSKCN0XY0QB


Processing URLs:  50%|████▉     | 496/1000 [18:48<11:41,  1.39s/it]

Error extracting text from http://www.rp-online.de/politik/deutschland/grosse-koalition-angela-merkel-wechselt-cdu-minister-aus-aid-1.7375077: 410 Client Error: Gone for url: http://www.rp-online.de/politik/deutschland/grosse-koalition-angela-merkel-wechselt-cdu-minister-aus-aid-1.7375077


Processing URLs:  50%|████▉     | 498/1000 [18:51<12:00,  1.44s/it]

Error extracting text from http://www.cctv-america.com/2016/05/31/protesters-march-against-keiko-fujimori-before-perus-presidential-vote: 403 Client Error: Forbidden for url: http://america.cgtn.com/2016/05/31/protesters-march-against-keiko-fujimori-before-perus-presidential-vote


Processing URLs:  50%|█████     | 502/1000 [18:59<14:23,  1.73s/it]

Error extracting text from http://in.reuters.com/article/myanmar-politics-vote-idINKCN0WH0H5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  51%|█████     | 507/1000 [19:09<17:20,  2.11s/it]

Error extracting text from http://en.trend.az/iran/society/2503436.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/society/2503436.html


Processing URLs:  51%|█████     | 510/1000 [19:13<11:13,  1.37s/it]

Error extracting text from https://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/: 403 Client Error: Forbidden for url: https://www.parliament.uk/about/faqs/house-of-commons-faqs/business-faq-page/recess-dates/


Processing URLs:  51%|█████     | 511/1000 [19:13<09:23,  1.15s/it]

Error extracting text from https://www.nytimes.com/2017/02/16/us/politics/neil-gorsuch-supreme-court-senate-hearing.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/16/us/politics/neil-gorsuch-supreme-court-senate-hearing.html?_r=0


Processing URLs:  51%|█████▏    | 513/1000 [19:16<10:48,  1.33s/it]

Error extracting text from http://americanfolklore.net/folklore/2010/07/brer_rabbit_meets_a_tar_baby.html: 403 Client Error: Forbidden for url: https://www.americanfolklore.net/brer-rabbit-and-the-tar-baby/


Processing URLs:  51%|█████▏    | 514/1000 [19:18<11:31,  1.42s/it]

Error extracting text from http://atimes.com/2016/08/tsunami-wave-to-wipe-out-islamic-state-if-mosul-dam-collapses/: 404 Client Error: Not Found for url: https://atimes.com/2016/08/tsunami-wave-to-wipe-out-islamic-state-if-mosul-dam-collapses/


Processing URLs:  52%|█████▏    | 515/1000 [19:20<13:30,  1.67s/it]

Error extracting text from http://tass.ru/en/politics/840813: 404 Client Error: Not Found for url: https://tass.ru/en/politics/840813


Processing URLs:  52%|█████▏    | 516/1000 [19:20<10:53,  1.35s/it]

Error extracting text from https://ctc.usma.edu/foreign-technology-or-local-expertise-al-shabaabs-ied-capability/: HTTPSConnectionPool(host='ctc.usma.edu', port=443): Max retries exceeded with url: /foreign-technology-or-local-expertise-al-shabaabs-ied-capability/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'ctc.usma.edu'. (_ssl.c:1000)")))


Processing URLs:  52%|█████▏    | 519/1000 [20:30<2:40:54, 20.07s/it]

Error extracting text from http://www.aa.com.tr/en/world/greece-parliament-passes-bill-on-bank-recapitalization/458643: HTTPConnectionPool(host='www.aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  52%|█████▏    | 522/1000 [20:44<1:12:03,  9.05s/it]

Error extracting text from https://www.nsa.gov/about/leadership/former-directors/: 403 Client Error: Forbidden for url: https://www.nsa.gov/about/leadership/former-directors/


Processing URLs:  52%|█████▏    | 523/1000 [20:46<53:42,  6.75s/it]  

Error extracting text from http://2016.ntiindex.org/wp-content/uploads/2016/02/NTI_2016-Index_021116.pdf: HTTPConnectionPool(host='2016.ntiindex.org', port=80): Max retries exceeded with url: /wp-content/uploads/2016/02/NTI_2016-Index_021116.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fea0d0d0>: Failed to resolve '2016.ntiindex.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  52%|█████▏    | 524/1000 [20:46<38:53,  4.90s/it]

URL filtered: https://www.voanews.com/a/unintended-consequences-catching-up-to-facebook/4050800.html


Processing URLs:  53%|█████▎    | 527/1000 [20:48<18:25,  2.34s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-may-idUSKBN1AN15V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-may-idUSKBN1AN15V


Processing URLs:  53%|█████▎    | 530/1000 [20:50<10:24,  1.33s/it]

URL filtered: https://www.facebook.com/media/set/?set=a.1496423803990873.1073741832.1445556315744289&amp;type=3
Error extracting text from https://www.tandfonline.com/doi/abs/10.1080/02684527.2016.1147164: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/abs/10.1080/02684527.2016.1147164


Processing URLs:  53%|█████▎    | 531/1000 [20:51<08:55,  1.14s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/343312-nsa-chief-now-is-not-the-best-time-for-us-russia-cyber-unit: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/343312-nsa-chief-now-is-not-the-best-time-for-us-russia-cyber-unit/
Error extracting text from http://www.reuters.com/article/iran-oil-idUSL3N1BH1QK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/iran-oil-idUSL3N1BH1QK


Processing URLs:  53%|█████▎    | 533/1000 [20:51<05:44,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/hollande-offers-sharp-critique-of-u-s-policy-1476287400: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hollande-offers-sharp-critique-of-u-s-policy-1476287400


Processing URLs:  54%|█████▎    | 535/1000 [21:19<54:43,  7.06s/it]

Error extracting text from https://www.washingtonpost.com/opinions/increasing-sanctions-is-just-one-part-of-the-north-korean-puzzle/2015/12/25/6b8ff572-a: 404 Client Error: Not Found for url: https://www.washingtonpost.com/opinions/increasing-sanctions-is-just-one-part-of-the-north-korean-puzzle/2015/12/25/6b8ff572-a/


Processing URLs:  54%|█████▎    | 537/1000 [21:24<37:54,  4.91s/it]

Error extracting text from https://www.foxnews.com/politics/north-korea-tests-new-icbm-missile-system-serious-escalation;: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/north-korea-tests-new-icbm-missile-system-serious-escalation;


Processing URLs:  54%|█████▍    | 540/1000 [21:27<17:38,  2.30s/it]

Error extracting text from http://www.jeuneafrique.com/463715/politique/cote-divoire-nouvelle-attaque-de-commissariat-par-des-assaillants-armes-a-adzope/: 403 Client Error: Forbidden for url: https://www.jeuneafrique.com/463715/politique/cote-divoire-nouvelle-attaque-de-commissariat-par-des-assaillants-armes-a-adzope/


Processing URLs:  54%|█████▍    | 543/1000 [21:29<10:03,  1.32s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/rebels-syrian-government-swap-prisoners-aleppo-39176718: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/rebels-syrian-government-swap-prisoners-aleppo-39176718


Processing URLs:  55%|█████▍    | 547/1000 [21:35<12:04,  1.60s/it]

Error extracting text from https://uk.reuters.com/article/uk-ireland-politics-pm/ireland-set-for-december-election-if-crisis-not-averted-by-tuesday-pm-idUKKBN1DO2DI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  56%|█████▌    | 559/1000 [21:49<04:41,  1.57it/s]

Error extracting text from http://www.cdm.me/english/germany-expects-issue-of-nato-invitation-to-montenegro-to-be-positively-resolved-in-december: 403 Client Error: Forbidden for url: https://www.cdm.me/english/germany-expects-issue-of-nato-invitation-to-montenegro-to-be-positively-resolved-in-december
Error extracting text from http://www.reuters.com/article/us-oil-hedgefunds-kemp-column-idUSKBN15L1LZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-hedgefunds-kemp-column-idUSKBN15L1LZ


Processing URLs:  56%|█████▌    | 560/1000 [21:51<06:37,  1.11it/s]

Error extracting text from http://www.ibtimes.co.uk/south-korea-scrambled-fighter-jets-after-3-chinese-military-planes-entered-air-defence-zone-1577311: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/south-korea-scrambled-fighter-jets-after-3-chinese-military-planes-entered-air-defence-zone-1577311


Processing URLs:  57%|█████▋    | 573/1000 [22:11<07:29,  1.05s/it]

Error extracting text from http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2234931: 403 Client Error: Forbidden for url: http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2234931
Error extracting text from http://www.cdm.me/english/qatar-airways-should-introduce-a-direct-airline-to-montenegro: 403 Client Error: Forbidden for url: https://www.cdm.me/english/qatar-airways-should-introduce-a-direct-airline-to-montenegro


Processing URLs:  58%|█████▊    | 582/1000 [22:18<04:56,  1.41it/s]

Error extracting text from http://www.tradingeconomics.com/greece/gdp: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/greece/gdp


Processing URLs:  58%|█████▊    | 583/1000 [22:20<06:46,  1.03it/s]

Error extracting text from http://www.news.com.au/world/breaking-news/us-asks-vw-for-electric-cars/news-story/7907b5f9c856ec70b6ba9ed71310ef82: 404 Client Error: Not Found for url: https://www.news.com.au/404.php
Error extracting text from https://www.reuters.com/world/belarusian-forces-will-not-take-part-ukraine-war-lukashenko-says-2022-03-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/belarusian-forces-will-not-take-part-ukraine-war-lukashenko-says-2022-03-04/
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NY5PXI6K50XU01-7B45E4UD8OPKMMI74QK8BJVOJP


Processing URLs:  59%|█████▊    | 586/1000 [22:21<05:08,  1.34it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-17/battle-lines-over-impeachment-sharpen-in-brazil-s-congress


Processing URLs:  60%|█████▉    | 596/1000 [23:09<1:09:29, 10.32s/it]

Error extracting text from http://m.indiatvnews.com/business/india-pakistan-declares-cyber-war-hackers-deface-ngt-website-350786?utm_source=https://www.google.ca/: 504 Server Error: Gateway Time-out for url: http://m.indiatvnews.com/business/india-pakistan-declares-cyber-war-hackers-deface-ngt-website-350786?utm_source=https://www.google.ca/


Processing URLs:  60%|█████▉    | 599/1000 [23:13<28:29,  4.26s/it]  

Error extracting text from https://english.alarabiya.net/en/News/gulf/2017/08/20/Houthi-militias-detain-Panama-flagged-ship-at-Hudaydah-port-no-reasons-given.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/gulf/2017/08/20/Houthi-militias-detain-Panama-flagged-ship-at-Hudaydah-port-no-reasons-given.html


Processing URLs:  61%|██████▏   | 613/1000 [23:39<12:08,  1.88s/it]

URL filtered: https://www.bloomberg.com/view/articles/2016-08-26/waging-peace-in-colombia


Processing URLs:  62%|██████▏   | 615/1000 [23:41<09:44,  1.52s/it]

URL filtered: https://www.linkedin.com/posts/institute-for-the-study-of-war_the-russian-military-has-likely-decided-to-activity-6931027387782488064-xRFu?utm_source=linkedin_share&amp;utm_medium=member_desktop_web


Processing URLs:  62%|██████▏   | 617/1000 [23:42<06:16,  1.02it/s]

Error extracting text from http://www.barrons.com/articles/its-your-default-not-mine-maduros-doublespeak-in-venezuela-1509721529?mod=yahoobarrons&amp;ru=yahoo&amp;yptr=yahoo: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/its-your-default-not-mine-maduros-doublespeak-in-venezuela-1509721529?mod=yahoobarrons&amp;ru=yahoo&amp;yptr=yahoo


Processing URLs:  62%|██████▏   | 621/1000 [23:46<06:00,  1.05it/s]

Error extracting text from https://www.reuters.com/article/healthcoronavirus-brazil/who-warns-on-brazil-covid-19-outbreak-as-bolsonaro-blasts-senate-inquiry-idUSL1N2M21VE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/healthcoronavirus-brazil/who-warns-on-brazil-covid-19-outbreak-as-bolsonaro-blasts-senate-inquiry-idUSL1N2M21VE


Processing URLs:  63%|██████▎   | 626/1000 [23:55<10:04,  1.62s/it]

Error extracting text from http://www.news.com.au/entertainment/tv/game-of-thrones-season-six-may-be-delayed/story-fnmgnour-1227589073211: 404 Client Error: Not Found for url: https://www.news.com.au/entertainment/tv/game-of-thrones-season-six-may-be-delayed/story-fnmgnour-1227589073211


Processing URLs:  63%|██████▎   | 630/1000 [24:02<10:18,  1.67s/it]

Error extracting text from https://www.stripes.com/news/europe/is-the-us-ready-for-russia-s-largest-military-exercises-since-the-cold-war-1.481253#.WYNlxIjyvIU: 404 Client Error: Not Found for url: https://www.stripes.com/theaters/europe/is-the-us-ready-for-russia-s-largest-military-exercises-since-the-cold-war-1.481253#.WYNlxIjyvIU


Processing URLs:  63%|██████▎   | 632/1000 [24:04<07:40,  1.25s/it]

Error extracting text from http://ir.tesla.com/releasedetail.cfm?releaseid=963460: 403 Client Error: Forbidden for url: http://ir.tesla.com/releasedetail.cfm?releaseid=963460


Processing URLs:  64%|██████▍   | 639/1000 [24:31<30:06,  5.00s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-south-thaad-idUSKBN18Q0I3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-south-thaad-idUSKBN18Q0I3


Processing URLs:  64%|██████▍   | 642/1000 [24:32<13:28,  2.26s/it]

Error extracting text from http://missilethreat.com/missiles/musudan-bm-25/: 404 Client Error: Not Found for url: http://missilethreat.com/missiles/musudan-bm-25/


Processing URLs:  64%|██████▍   | 643/1000 [24:33<11:49,  1.99s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKCN0WM0PS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKCN0WM0PS


Processing URLs:  64%|██████▍   | 645/1000 [24:34<08:11,  1.38s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Talks-stall-on-China-backed-Asia-Pacific-pact-no-deal-in-2015: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Talks-stall-on-China-backed-Asia-Pacific-pact-no-deal-in-2015


Processing URLs:  65%|██████▍   | 647/1000 [24:40<12:59,  2.21s/it]

URL filtered: http://gadgets.ndtv.com/social-networking/news/facebook-launches-disputed-tag-to-crack-down-on-fake-news-1666499
URL filtered: https://www.bloomberg.com/opinion/articles/2021-08-09/bond-yields-can-t-stay-this-low-forever


Processing URLs:  65%|██████▌   | 653/1000 [25:47<1:30:59, 15.73s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-07-11/us-says-liberating-qayarah-part-of-secret-anti-isis-plan: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  66%|██████▌   | 656/1000 [25:52<41:04,  7.16s/it]  

Error extracting text from https://www.polfed.org/herts/news/2021/letter-reveals-officers-anger-at-government/: 403 Client Error: ModSecurity Action for url: https://www.polfed.org/herts/news/2021/letter-reveals-officers-anger-at-government/


Processing URLs:  66%|██████▌   | 660/1000 [25:55<13:29,  2.38s/it]

Error extracting text from https://www.wsj.com/articles/how-congress-might-upend-section-230-the-internet-law-big-tech-is-built-on-11613172368?mod=hp_lead_pos10: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/how-congress-might-upend-section-230-the-internet-law-big-tech-is-built-on-11613172368?mod=hp_lead_pos10


Processing URLs:  66%|██████▌   | 662/1000 [25:57<09:17,  1.65s/it]

Error extracting text from http://thehill.com/homenews/campaign/362912-poll-dem-holds-three-point-lead-over-moore-in-al-senate-race: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/362912-poll-dem-holds-three-point-lead-over-moore-in-al-senate-race/


Processing URLs:  66%|██████▋   | 664/1000 [26:08<21:02,  3.76s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_29DEC.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2010-2015_29DEC.pdf


Processing URLs:  67%|██████▋   | 666/1000 [26:11<13:47,  2.48s/it]

Error extracting text from http://www.gov.ph/diplomatic-relations/differentiating-visits/: 403 Client Error: Forbidden for url: http://www.gov.ph/diplomatic-relations/differentiating-visits/


Processing URLs:  67%|██████▋   | 669/1000 [26:13<07:34,  1.37s/it]

Error extracting text from http://www.cnbcafrica.com/news/east-africa/2016/10/15/rwanda-drone-delivery-service/: 404 Client Error: Not Found for url: https://www.cnbcafrica.com/news/east-africa/2016/10/15/rwanda-drone-delivery-service/
Error extracting text from http://www.nytimes.com/2016/09/24/world/asia/south-koreas-president-has-no-easy-options-in-dealing-with-an-aggressive-north.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/24/world/asia/south-koreas-president-has-no-easy-options-in-dealing-with-an-aggressive-north.html


Processing URLs:  67%|██████▋   | 671/1000 [27:17<1:46:11, 19.37s/it]

Error extracting text from https://archive.is/gdsp5: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  67%|██████▋   | 673/1000 [27:18<53:59,  9.91s/it]  

URL filtered: https://twitter.com/chris_said/status/1396827966395625474?s=11


Processing URLs:  68%|██████▊   | 683/1000 [27:29<08:38,  1.64s/it]

Error extracting text from http://www.gov.me/en/News/154354/hgj.html: 404 Client Error: not found for url: https://www.gov.me/en/News/154354/hgj.html


Processing URLs:  69%|██████▉   | 692/1000 [27:40<06:08,  1.20s/it]

Error extracting text from https://www.yahoo.com/entertainment/u-crude-production-continues-trail-171700213.html: 404 Client Error: Not Found for url: https://www.yahoo.com/entertainment/u-crude-production-continues-trail-171700213.html


Processing URLs:  70%|██████▉   | 696/1000 [27:46<07:04,  1.40s/it]

URL filtered: https://www.youtube.com/watch?v=N_XJ2eB_lRY


Processing URLs:  70%|██████▉   | 699/1000 [27:47<04:22,  1.15it/s]

Error extracting text from https://riponadvance.com/stories/ratcliffe-introduces-bill-require-sanctions-hackers-ties-iranian-government/: 403 Client Error: Forbidden for url: https://riponadvance.com/stories/ratcliffe-introduces-bill-require-sanctions-hackers-ties-iranian-government/


Processing URLs:  70%|███████   | 700/1000 [27:48<03:54,  1.28it/s]

Error extracting text from http://thehill.com/policy/technology/334743-fcc-opens-public-comment-period-for-net-neutrality: 403 Client Error: Forbidden for url: https://thehill.com/policy/technology/334743-fcc-opens-public-comment-period-for-net-neutrality/


Processing URLs:  70%|███████   | 701/1000 [27:48<03:12,  1.55it/s]

Error extracting text from http://www.garretgalland.com/articles/5-charts-that-scream-this-is-itheres-what-to-do: HTTPConnectionPool(host='www.garretgalland.com', port=80): Max retries exceeded with url: /articles/5-charts-that-scream-this-is-itheres-what-to-do (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30018f680>: Failed to resolve 'www.garretgalland.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|███████   | 703/1000 [27:51<04:21,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-trade-tpp-idUSKBN13629G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-tpp-idUSKBN13629G


Processing URLs:  70%|███████   | 704/1000 [28:51<1:29:16, 18.09s/it]

Error extracting text from http://www.newsobserver.com/news/politics-government/national-politics/article93759922.html: HTTPConnectionPool(host='www.newsobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  71%|███████   | 706/1000 [28:54<47:22,  9.67s/it]  

Error extracting text from https://www.gov.ca.gov/2021/07/26/california-implements-first-in-the-nation-measures-to-encourage-state-employees-and-health-care-workers-to-get-vaccinated/: 403 Client Error: Forbidden for url: https://www.gov.ca.gov/2021/07/26/california-implements-first-in-the-nation-measures-to-encourage-state-employees-and-health-care-workers-to-get-vaccinated/


Processing URLs:  71%|███████   | 708/1000 [29:01<30:37,  6.29s/it]

Error extracting text from https://www.reuters.com/world/middle-east/dubai-expo-restrict-entry-those-vaccinated-against-covid-19-or-have-tested-2021-09-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/dubai-expo-restrict-entry-those-vaccinated-against-covid-19-or-have-tested-2021-09-15/


Processing URLs:  71%|███████   | 710/1000 [29:10<28:54,  5.98s/it]

Error extracting text from https://www.washingtonpost.com/national/nato-montenegro-membership-certain-despite-russia-objection/2016/11/03/7e9d1dd4-a1bd-11e6-8864-6f892cad0865_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/national/nato-montenegro-membership-certain-despite-russia-objection/2016/11/03/7e9d1dd4-a1bd-11e6-8864-6f892cad0865_story.html


Processing URLs:  71%|███████   | 711/1000 [29:11<21:41,  4.50s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/after-paris-can-iran-be-counted-help-defeat-isis-14386: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/after-paris-can-iran-be-counted-help-defeat-isis-14386


Processing URLs:  71%|███████   | 712/1000 [29:14<18:08,  3.78s/it]

Error extracting text from http://predictwise.com/politics/2016-congress-senate: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016-congress-senate


Processing URLs:  72%|███████▏  | 719/1000 [29:21<04:38,  1.01it/s]

Error extracting text from http://www.nytimes.com/2015/09/10/world/asia/in-china-a-forceful-crackdown-in-response-to-stock-market-crisis.html?emc=edit_th_20150910&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/10/world/asia/in-china-a-forceful-crackdown-in-response-to-stock-market-crisis.html?emc=edit_th_20150910&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  72%|███████▏  | 720/1000 [29:22<04:43,  1.01s/it]

Error extracting text from http://www.cnbc.com/2015/12/14/: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/14/


Processing URLs:  72%|███████▏  | 724/1000 [29:30<06:05,  1.33s/it]

Error extracting text from http://nationalinterest.org/blog/the-skeptics/exclusive-the-coming-battle-mosul-will-be-tougher-you-think-15998?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-skeptics/exclusive-the-coming-battle-mosul-will-be-tougher-you-think-15998?page=2


Processing URLs:  73%|███████▎  | 727/1000 [29:35<06:03,  1.33s/it]

Error extracting text from http://chicagocitywire.com/stories/511131521-illinois-earns-sigh-of-relief-as-moody-s-maintains-credit-rating: 403 Client Error: Forbidden for url: https://chicagocitywire.com/stories/511131521-illinois-earns-sigh-of-relief-as-moody-s-maintains-credit-rating
Error extracting text from http://www.nytimes.com/2015/10/23/opinion/keynes-comes-to-canada.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/23/opinion/keynes-comes-to-canada.html


Processing URLs:  73%|███████▎  | 728/1000 [29:36<05:28,  1.21s/it]

Error extracting text from https://www.flightradar24.com/data/statistics: 451 Client Error: Unavailable For Legal Reasons for url: https://www.flightradar24.com/data/statistics


Processing URLs:  73%|███████▎  | 729/1000 [29:36<04:26,  1.02it/s]

Error extracting text from https://www.theparliamentmagazine.eu/articles/news/us-committed-swift-conclusion-transatlantic-trade-deal-says-senior-trade-negotiator: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  73%|███████▎  | 730/1000 [29:36<03:24,  1.32it/s]

Error extracting text from https://www.nytimes.com/2017/10/04/technology/driverless-cars-testing.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/04/technology/driverless-cars-testing.html


Processing URLs:  73%|███████▎  | 731/1000 [29:38<04:28,  1.00it/s]

Error extracting text from http://europe.newsweek.com/putins-united-russia-drops-polls-ahead-september-elections-494887?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/putins-united-russia-drops-polls-ahead-september-elections-494887


Processing URLs:  73%|███████▎  | 734/1000 [29:42<05:11,  1.17s/it]

Error extracting text from http://www.worldbulletin.net/world/193420/5-killed-in-2-car-bomb-blasts-in-southern-baghdad: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/world/193420/5-killed-in-2-car-bomb-blasts-in-southern-baghdad
Error extracting text from https://www.reuters.com/article/us-hongkong-protests/hong-kong-facing-worst-crisis-since-handover-senior-china-official-idUSKCN1UX089: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-protests/hong-kong-facing-worst-crisis-since-handover-senior-china-official-idUSKCN1UX089


Processing URLs:  74%|███████▍  | 738/1000 [29:47<04:48,  1.10s/it]

Error extracting text from http://www.healthfirst.com/acls-kit.html: 403 Client Error: Forbidden for url: http://www.healthfirst.com/acls-kit.html


Processing URLs:  75%|███████▍  | 746/1000 [30:05<09:05,  2.15s/it]

Error extracting text from http://asia.nikkei.com/Features/Market-turmoil2/Finance-ministers-central-bankers-seen-taking-up-forex-stability: 404 Client Error: Not Found for url: https://asia.nikkei.com/Features/Market-turmoil2/Finance-ministers-central-bankers-seen-taking-up-forex-stability


Processing URLs:  75%|███████▍  | 747/1000 [30:07<08:26,  2.00s/it]

Error extracting text from https://larswericson.wordpress.com/2016/01/05/04jan16-pm-sitrep/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/01/05/04jan16-pm-sitrep/


Processing URLs:  75%|███████▌  | 750/1000 [31:11<1:21:06, 19.47s/it]

Error extracting text from http://www.spaceflightinsider.com/organizations/space-exploration-technologies/spacex-falcon-9-return-flight-pushed-2017/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /organizations/space-exploration-technologies/spacex-falcon-9-return-flight-pushed-2017/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303fdcd10>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  75%|███████▌  | 751/1000 [31:12<56:50, 13.70s/it]  

Error extracting text from http://www.intelligenceonline.fr/intelligence-economique/2016/03/02/obama-prive-ses-allies-d-outils-de-cybersecurite,108132515-ART-CAN: 403 Client Error: Forbidden for url: https://www.intelligenceonline.fr/intelligence-economique/2016/03/02/obama-prive-ses-allies-d-outils-de-cybersecurite,108132515-ART-CAN


Processing URLs:  75%|███████▌  | 752/1000 [31:17<46:14, 11.19s/it]

Error extracting text from http://newsok.com/beijing-auto-show-showcases-chinas-suv-love-affair/article/feed/1001508?custom_click=headlines_widget&amp;newsletter=business-dynamic-email: 404 Client Error: OK for url: https://www.oklahoman.com/beijing-auto-show-showcases-chinas-suv-love-affair/article/feed/1001508/


Processing URLs:  76%|███████▌  | 760/1000 [31:28<06:13,  1.56s/it]

Error extracting text from http://www.nytimes.com/2014/11/28/business/international/opec-leaves-oil-production-quotas-unchanged-and-prices-fall-further.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2014/11/28/business/international/opec-leaves-oil-production-quotas-unchanged-and-prices-fall-further.html?_r=0


Processing URLs:  76%|███████▋  | 765/1000 [31:47<15:47,  4.03s/it]

Error extracting text from https://globalguessing.com/metaculus-mondays-vol13/: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /metaculus-mondays-vol13/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  77%|███████▋  | 766/1000 [31:48<11:50,  3.04s/it]

Error extracting text from http://money.cnn.com/2017/05/16/technology/ransomware-north-korea-hacking-history/index.htmlBs: 404 Client Error: Not Found for url: https://money.cnn.com/2017/05/16/technology/ransomware-north-korea-hacking-history/index.htmlBs


Processing URLs:  77%|███████▋  | 768/1000 [31:50<07:20,  1.90s/it]

Error extracting text from https://www.neweurope.eu/article/kosovo-ratifies-border-deal-montenegro-amidst-violent-opposition/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/kosovo-ratifies-border-deal-montenegro-amidst-violent-opposition/


Processing URLs:  77%|███████▋  | 769/1000 [31:50<05:24,  1.40s/it]

Error extracting text from https://news.google.com/articles/CAIiEFQ1vZuGaUG1f3Vlh_hTGPYqMwgEKioIACIQpzoRSNLEm6QR--MasMLSAioUCAoiEKc6EUjSxJukEfvjGrDC0gIwpvTQBg?hl=en-US&amp;gl=US&amp;ceid=US%3Aen: 500 Server Error: Internal Server Error for url: https://news.google.com/articles/CAIiEFQ1vZuGaUG1f3Vlh_hTGPYqMwgEKioIACIQpzoRSNLEm6QR--MasMLSAioUCAoiEKc6EUjSxJukEfvjGrDC0gIwpvTQBg?hl=en-US&amp;gl=US&amp;ceid=US:en&gl=US&ceid=US:en


Processing URLs:  77%|███████▋  | 770/1000 [31:50<04:29,  1.17s/it]

Error extracting text from http://mobile.nytimes.com/2015/09/05/world/middleeast/russian-moves-in-syria-pose-concerns-for-us.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/09/05/world/middleeast/russian-moves-in-syria-pose-concerns-for-us.html


Processing URLs:  77%|███████▋  | 771/1000 [31:52<05:11,  1.36s/it]

Error extracting text from http://www.ibtimes.co.uk/donald-trump-not-idiot-he-could-be-next-us-president-1540654: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/donald-trump-not-idiot-he-could-be-next-us-president-1540654


Processing URLs:  77%|███████▋  | 774/1000 [31:56<04:02,  1.07s/it]

Error extracting text from http://www.reuters.com/article/2015/11/17/us-usa-election-rubio-fed-idUSKCN0T605320151117: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/17/us-usa-election-rubio-fed-idUSKCN0T605320151117


Processing URLs:  78%|███████▊  | 776/1000 [31:58<03:28,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN16I0AU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN16I0AU
Error extracting text from http://www.balkaninsight.com/en/article/montenegro-hires-us-lobbists-to-push-nato-case-05-05-2016#sthash.Y9gI9Hox.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-hires-us-lobbists-to-push-nato-case-05-05-2016#sthash.Y9gI9Hox.dpuf


Processing URLs:  78%|███████▊  | 778/1000 [31:58<02:02,  1.81it/s]

Error extracting text from http://www.latimes.com/world/la-fg-iran-death-penalty-20161006-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-iran-death-penalty-20161006-snap-story.html


Processing URLs:  78%|███████▊  | 780/1000 [32:05<05:51,  1.60s/it]

Error extracting text from https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(17)30171-3#%20: 403 Client Error: Forbidden for url: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(17)30171-3#%20


Processing URLs:  79%|███████▊  | 787/1000 [32:11<03:14,  1.10it/s]

Error extracting text from http://www.reuters.com/article/us-usa-soda-tax-idUSKBN1352Y0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-soda-tax-idUSKBN1352Y0


Processing URLs:  79%|███████▉  | 789/1000 [32:13<02:49,  1.24it/s]

Error extracting text from https://larswericson.wordpress.com/2016/04/26/gitrep-25apr16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/26/gitrep-25apr16pm/


Processing URLs:  79%|███████▉  | 793/1000 [32:23<05:42,  1.66s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/03/gitrep-2may16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/03/gitrep-2may16pm/
URL filtered: https://www.youtube.com/watch?v=Ut0tLsHEoKc


Processing URLs:  80%|███████▉  | 798/1000 [32:53<19:18,  5.74s/it]

Error extracting text from http://www.nytimes.com/2016/08/24/world/middleeast/isis-mosul-iraq.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/24/world/middleeast/isis-mosul-iraq.html


Processing URLs:  80%|████████  | 802/1000 [32:57<07:02,  2.13s/it]

Error extracting text from http://uk.reuters.com/article/uk-usa-iran-cyber-idUKKCN0WP2NQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://thekabultimes.gov.af/index.php/opinions/politics/10635-introduction-of-two-new-key-acting-chiefs-effective-in-fighting-terrorism.html: 404 Client Error: Not Found for url: http://thekabultimes.gov.af/index.php/opinions/politics/10635-introduction-of-two-new-key-acting-chiefs-effective-in-fighting-terrorism.html


Processing URLs:  81%|████████  | 806/1000 [34:11<1:07:39, 20.92s/it]

Error extracting text from https://archive.is/RAOVT: HTTPSConnectionPool(host='archive.is', port=443): Read timed out. (read timeout=60)


Processing URLs:  81%|████████  | 811/1000 [34:17<13:45,  4.37s/it]  

Error extracting text from http://www.latimes.com/world/la-fg-pakistan-protest-20161101-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-pakistan-protest-20161101-story.html


Processing URLs:  81%|████████  | 812/1000 [34:20<12:35,  4.02s/it]

URL filtered: https://twitter.com/realDonaldTrump/status/831846101179314177


Processing URLs:  82%|████████▏ | 816/1000 [34:25<05:41,  1.86s/it]

Error extracting text from https://www.oddschecker.com/tennis/french-open/mens: 403 Client Error: Forbidden for url: https://www.oddschecker.com/tennis/french-open/mens


Processing URLs:  82%|████████▏ | 819/1000 [34:29<04:38,  1.54s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/08/22/773630/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/08/22/773630/story.html


Processing URLs:  82%|████████▏ | 822/1000 [34:33<03:43,  1.26s/it]

Error extracting text from http://www.khaosodenglish.com/politics/2016/12/31/post-election-thailand-return-guided-democracy-scholar-says/: 403 Client Error: Forbidden for url: https://www.khaosodenglish.com/politics/2016/12/31/post-election-thailand-return-guided-democracy-scholar-says/


Processing URLs:  82%|████████▏ | 824/1000 [34:35<03:35,  1.23s/it]

Error extracting text from http://www.brookings.edu/research/opinions/2016/04/01-ending-myanmar-conflict-brennan-zaw-oo: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/opinions/2016/04/01-ending-myanmar-conflict-brennan-zaw-oo


Processing URLs:  82%|████████▎ | 825/1000 [34:37<04:06,  1.41s/it]

Error extracting text from https://www.enca.com/business/dubai-airport-passenger-volumes-slump: 404 Client Error: Not Found for url: https://www.enca.com/business/dubai-airport-passenger-volumes-slump


Processing URLs:  83%|████████▎ | 826/1000 [34:38<03:21,  1.16s/it]

Error extracting text from http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkuNO7erTRb: 403 Client Error: Forbidden for url: http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkuNO7erTRb


Processing URLs:  83%|████████▎ | 827/1000 [34:39<03:12,  1.11s/it]

Error extracting text from http://ntiindex.org/news-items/cyber-attack-nuclear-facility-unfold/: 404 Client Error: Not Found for url: https://www.ntiindex.org/news-items/cyber-attack-nuclear-facility-unfold/


Processing URLs:  83%|████████▎ | 834/1000 [34:45<02:16,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13J0XC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13J0XC


Processing URLs:  84%|████████▎ | 835/1000 [34:46<02:14,  1.23it/s]

Error extracting text from https://www.ipredict.co.nz/app.php?do=browse&amp;cat=775: 404 Client Error: Not Found for url: http://www.ipredict.co.nz/app.php?do=browse&amp;cat=775


Processing URLs:  84%|████████▎ | 837/1000 [34:47<01:54,  1.42it/s]

Error extracting text from http://mobile.reuters.com/article/amp/idUSKBN1D90L4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN1D90L4
Error extracting text from http://www.cdm.me/english/vice-governor-milosevic-is-leaving-the-central-bank: 403 Client Error: Forbidden for url: https://www.cdm.me/english/vice-governor-milosevic-is-leaving-the-central-bank


Processing URLs:  84%|████████▍ | 838/1000 [34:49<03:13,  1.19s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-10-13/amc-theaters-said-to-mull-bankruptcy-after-moviegoers-stay-home


Processing URLs:  84%|████████▍ | 840/1000 [34:53<03:36,  1.35s/it]

Error extracting text from http://mobile.esecurityplanet.com/hackers/hackers-steal-data-from-japanese-nuclear-facility.html: 404 Client Error: Not Found for url: https://www.esecurityplanet.com/threats/hackers-steal-data-from-japanese-nuclear-facility/


Processing URLs:  84%|████████▍ | 844/1000 [34:57<02:48,  1.08s/it]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKCN0SM14C20151028?sp=true: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca
Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/03/14/brazil-protests-signal-rousseff-impeachment/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/03/14/brazil-protests-signal-rousseff-impeachment/


Processing URLs:  85%|████████▍ | 848/1000 [35:03<03:13,  1.27s/it]

Error extracting text from https://www.bls.gov/charts/employment-situation/civilian-labor-force-participation-rate.htm: 403 Client Error: Forbidden for url: https://www.bls.gov/charts/employment-situation/civilian-labor-force-participation-rate.htm
URL filtered: https://twitter.com/FinancialTimes/status/1504098983895080965


Processing URLs:  85%|████████▌ | 850/1000 [35:04<02:20,  1.06it/s]

Error extracting text from http://polling.reuters.com/#poll/CP3_2/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/#poll/CP3_2/


Processing URLs:  85%|████████▌ | 852/1000 [35:08<03:08,  1.27s/it]

Error extracting text from http://www.rtlnieuws.nl/nieuws/politiek/nederland-gaat-terreurgroep-bombarderen-syrie: 404 Client Error: Not Found for url: https://www.rtlnieuws.nl/nieuws/politiek/nederland-gaat-terreurgroep-bombarderen-syrie


Processing URLs:  86%|████████▌ | 861/1000 [35:23<04:47,  2.07s/it]

Error extracting text from http://www.ibtimes.co.uk/uk-block-un-probe-into-yemen-war-after-saudi-arabia-threatens-cut-trade-ties-1641054: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/uk-block-un-probe-into-yemen-war-after-saudi-arabia-threatens-cut-trade-ties-1641054


Processing URLs:  86%|████████▌ | 862/1000 [35:24<04:11,  1.83s/it]

Error extracting text from http://www.japan.go.jp/g7/summit/: 404 Client Error: Not Found for url: https://www.japan.go.jp/g7/summit/index.html


Processing URLs:  86%|████████▋ | 864/1000 [36:25<43:07, 19.02s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-04-26/putin-agrees-to-meet-biden-as-west-seeks-to-deescalate-russian-aggression: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  87%|████████▋ | 869/1000 [36:33<09:24,  4.31s/it]

Error extracting text from http://news.yahoo.com/iraq-deploying-thousands-troops-retake-mosul-114704876.html: 404 Client Error: Not Found for url: http://news.yahoo.com/iraq-deploying-thousands-troops-retake-mosul-114704876.html


Processing URLs:  87%|████████▋ | 871/1000 [36:39<07:05,  3.30s/it]

Error extracting text from http://www.balkaneu.com/french-question-mark-montenegros-membership-nato/: 404 Client Error: Not Found for url: http://www.balkaneu.com/french-question-mark-montenegros-membership-nato/


Processing URLs:  87%|████████▋ | 872/1000 [36:39<05:12,  2.44s/it]

Error extracting text from https://thehill.com/homenews/house/541382-capitol-police-chief-says-threats-to-members-of-congress-have-nearly-doubled: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/541382-capitol-police-chief-says-threats-to-members-of-congress-have-nearly-doubled/
URL filtered: https://www.bloomberg.com/news/articles/2021-09-20/nyc-returns-to-hosting-un-week-as-new-yorkers-fear-covid-spike?srnd=premium&amp;sref=i2Bc5OtW


Processing URLs:  87%|████████▋ | 874/1000 [36:40<03:14,  1.54s/it]

URL filtered: http://www.bloombergview.com/articles/2015-10-05/ending-the-u-s-oil-export-ban-is-an-empty-gesture


Processing URLs:  88%|████████▊ | 876/1000 [36:44<03:25,  1.66s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-12/putin-says-sheltering-assad-would-be-easier-than-snowden-asylum?cmpid=wsdemand
URL filtered: http://www.bloomberg.com/news/articles/2016-05-18/tesla-announces-2-billion-public-offering-to-accelerate-model-3-ramp-up


Processing URLs:  88%|████████▊ | 882/1000 [36:51<03:01,  1.53s/it]

Error extracting text from https://www.pakistantoday.com.pk/2018/01/15/shadow-of-uncertainty-looms-large-over-2018-general-elections-siraj/: 403 Client Error: Forbidden for url: https://www.pakistantoday.com.pk/2018/01/15/shadow-of-uncertainty-looms-large-over-2018-general-elections-siraj/


Processing URLs:  89%|████████▊ | 886/1000 [37:19<12:39,  6.66s/it]

Error extracting text from http://www.almasdarnews.com/article/24877/: 522 Server Error:  for url: https://www.almasdarnews.com/article/24877/


Processing URLs:  89%|████████▉ | 888/1000 [37:20<06:40,  3.57s/it]

Error extracting text from http://www.businessinsider.com/ap-us-philippines-agree-on-locations-covered-by-defense-pact-2016-3: 404 Client Error: Not Found for url: https://www.businessinsider.com/ap-us-philippines-agree-on-locations-covered-by-defense-pact-2016-3
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN1441DE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN1441DE


Processing URLs:  89%|████████▉ | 891/1000 [37:23<03:27,  1.90s/it]

Error extracting text from https://english.alarabiya.net/en/News/middle-east/2016/10/27/Iranian-arms-shipments-to-Yemen-stopped-US.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/10/27/Iranian-arms-shipments-to-Yemen-stopped-US.html


Processing URLs:  89%|████████▉ | 892/1000 [37:29<05:43,  3.18s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-juncker/they-have-to-pay-eus-juncker-says-of-britain-idUSKBN1CI12L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-juncker/they-have-to-pay-eus-juncker-says-of-britain-idUSKBN1CI12L


Processing URLs:  90%|████████▉ | 899/1000 [37:39<02:25,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-mobileye-tesla-idUSKCN11K2T8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mobileye-tesla-idUSKCN11K2T8


Processing URLs:  90%|█████████ | 903/1000 [37:48<03:17,  2.03s/it]

Error extracting text from http://www3.nhk.or.jp/nhkworld/english/news/onbusiness/2014122202.html: 404 Client Error: Not Found for url: http://www3.nhk.or.jp/nhkworld/english/news/onbusiness/2014122202.html


Processing URLs:  91%|█████████ | 909/1000 [38:13<07:16,  4.79s/it]

Error extracting text from http://www.reuters.com/article/us-britain-scotland-idUSKBN15K0UA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-scotland-idUSKBN15K0UA


Processing URLs:  92%|█████████▏| 917/1000 [38:23<01:32,  1.12s/it]

Error extracting text from https://www.timesofisrael.com/5-times-israeli-politicians-said-theres-nothing-to-it-and-ended-up-in-jail/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/5-times-israeli-politicians-said-theres-nothing-to-it-and-ended-up-in-jail/
Error extracting text from http://www.nytimes.com/2016/05/18/world/middleeast/isis-bombing-baghdad-iraq-market.html?emc=edit_th_20160518&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/18/world/middleeast/isis-bombing-baghdad-iraq-market.html?emc=edit_th_20160518&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  92%|█████████▏| 922/1000 [38:28<01:18,  1.01s/it]

Error extracting text from http://www.iran-daily.com/News/134115.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
URL filtered: https://twitter.com/Noahpinion/status/1497330120687702018


Processing URLs:  92%|█████████▎| 925/1000 [38:36<02:22,  1.90s/it]

URL filtered: https://www.youtube.com/watch?v=eSTFa8uV6gU&amp;list=PLp9OZuwtlUZZN5ntif4RbAeHQ7WGdfHuf&amp;index=2


Processing URLs:  93%|█████████▎| 927/1000 [38:37<01:46,  1.46s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-15/fed-increase-for-wimps-economists-propose-1-8-point-rate-rise
URL filtered: https://www.youtube.com/watch?v=yaFnL4d7GC8&amp;feature


Processing URLs:  93%|█████████▎| 930/1000 [38:39<01:09,  1.01it/s]

URL filtered: https://finance.yahoo.com/news/facebook-exec-on-launching-digital-wallet-we-plan-to-earn-peoples-trust-173921742.html


Processing URLs:  93%|█████████▎| 934/1000 [38:41<00:50,  1.32it/s]

Error extracting text from http://www.nytimes.com/2015/08/24/opinion/an-opening-for-diplomacy-in-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/24/opinion/an-opening-for-diplomacy-in-syria.html


Processing URLs:  94%|█████████▎| 936/1000 [38:43<00:53,  1.19it/s]

Error extracting text from https://academic.oup.com/chinesejil/article-abstract/14/2/271/391978/The-East-China-Sea-Air-Defense-Identification-Zone: 403 Client Error: Forbidden for url: https://academic.oup.com/chinesejil/article-abstract/14/2/271/391978/The-East-China-Sea-Air-Defense-Identification-Zone


Processing URLs:  94%|█████████▍| 939/1000 [38:48<01:18,  1.29s/it]

Error extracting text from http://www.ibtimes.com/pound-surges-new-brexit-poll-showing-strong-majority-britons-wish-stay-eu-2370762: 403 Client Error: Forbidden for url: https://www.ibtimes.com/pound-surges-new-brexit-poll-showing-strong-majority-britons-wish-stay-eu-2370762


Processing URLs:  94%|█████████▍| 941/1000 [38:50<01:05,  1.11s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-11/30/c_134868871.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-11/30/c_134868871.htm
URL filtered: http://www.bloomberg.com/news/articles/2015-12-16/boj-is-done-boosting-stimulus-in-view-of-half-of-economists
Error extracting text from http://www.iol.co.za/news/world/iran-fm-blasts-call-for-assad-s-ouster-1.1912077#.Ve_M7uk0r04: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/world/iran-fm-blasts-call-for-assad-s-ouster-1.1912077#.Ve_M7uk0r04


Processing URLs:  94%|█████████▍| 944/1000 [38:51<00:41,  1.33it/s]

Error extracting text from http://www.ultimasnoticias.com.ve/noticias/actualidad/economia/en-abril-el-compromiso-de-pago-de-deuda-externa-as.aspx: 403 Client Error: Forbidden for url: http://www.ultimasnoticias.com.ve/noticias/actualidad/economia/en-abril-el-compromiso-de-pago-de-deuda-externa-as.aspx
URL filtered: https://coingeek.com/facebook-diem-announces-us-stablecoin-launch/


Processing URLs:  95%|█████████▍| 949/1000 [39:01<01:36,  1.88s/it]

Error extracting text from http://farc-epeace.org/index.php/point-of-view/item/972-meet-colombia-s-farc-rebels-preparing-for-peace-after-half-century-of-conflict.html: 436 Client Error:  for url: http://ww16.farc-epeace.org/index.php/point-of-view/item/972-meet-colombia-s-farc-rebels-preparing-for-peace-after-half-century-of-conflict.html?sub1=20240202-0615-10de-ac36-859d1339d834


Processing URLs:  95%|█████████▌| 950/1000 [40:02<13:59, 16.80s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/53151: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  96%|█████████▌| 955/1000 [40:08<03:07,  4.16s/it]

Error extracting text from http://www.ibtimes.co.uk/boris-johnson-ditches-old-view-that-assad-must-go-major-shift-uk-policy-syria-1603351: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/boris-johnson-ditches-old-view-that-assad-must-go-major-shift-uk-policy-syria-1603351


Processing URLs:  96%|█████████▌| 956/1000 [40:09<02:18,  3.15s/it]

Error extracting text from http://nerdist.com/giant-robot-vs-wrecking-ball-a-fight-where-everyone-wins/: 403 Client Error: Forbidden for url: http://nerdist.com/giant-robot-vs-wrecking-ball-a-fight-where-everyone-wins/


Processing URLs:  96%|█████████▌| 960/1000 [40:17<01:29,  2.23s/it]

Error extracting text from http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/unsc_elections_2017.pdf: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/unsc_elections_2017.pdf


Processing URLs:  96%|█████████▋| 963/1000 [40:20<00:48,  1.31s/it]

Error extracting text from https://www.reuters.com/article/us-brazil-bolsonaro/brazils-bolsonaro-approval-rating-stays-at-highest-level-during-pandemic-idUSKBN28N0YT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-bolsonaro/brazils-bolsonaro-approval-rating-stays-at-highest-level-during-pandemic-idUSKBN28N0YT


Processing URLs:  96%|█████████▋| 964/1000 [40:20<00:36,  1.02s/it]

Error extracting text from http://www.rand.org/blog/2016/03/rouhani-and-khamenei-are-both-winners-in-irans-elections.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2016/03/rouhani-and-khamenei-are-both-winners-in-irans-elections.html


Processing URLs:  97%|█████████▋| 966/1000 [40:23<00:36,  1.09s/it]

Error extracting text from http://americanresearchgroup.com/pres2016/primary/dem/nhdem.html: 403 Client Error: Forbidden for url: http://americanresearchgroup.com/pres2016/primary/dem/nhdem.html


Processing URLs:  97%|█████████▋| 967/1000 [40:24<00:35,  1.07s/it]

URL filtered: https://twitter.com/hashtag/stoptrump?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Ehashtag


Processing URLs:  97%|█████████▋| 970/1000 [40:28<00:40,  1.33s/it]

Error extracting text from http://www.cruisemapper.com/wiki/761-cruise-ship-passenger-capacity-ratings: 403 Client Error: Forbidden for url: https://www.cruisemapper.com/wiki/761-cruise-ship-passenger-capacity-ratings


Processing URLs:  97%|█████████▋| 972/1000 [40:31<00:36,  1.32s/it]

Error extracting text from http://abcnews.go.com/Technology/wireStory/poland-repelled-3rd-russian-hacking-attack-50457003: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/poland-repelled-3rd-russian-hacking-attack-50457003


Processing URLs:  97%|█████████▋| 974/1000 [40:36<00:45,  1.76s/it]

Error extracting text from https://news.crunchbase.com/news/2017s-ico-market-grew-nearly-100x-q1-q4/: 403 Client Error: Forbidden for url: https://news.crunchbase.com/news/2017s-ico-market-grew-nearly-100x-q1-q4/


Processing URLs:  98%|█████████▊| 975/1000 [40:38<00:46,  1.87s/it]

Error extracting text from http://www.france24.com/en/20171109-opels-2020-vision-profit-finally: 403 Client Error: Forbidden for url: http://www.france24.com/en/20171109-opels-2020-vision-profit-finally


Processing URLs:  98%|█████████▊| 980/1000 [40:44<00:22,  1.10s/it]

Error extracting text from http://interactive.aljazeera.com/aje/2016/un-debate-secretary-general/index.html: 404 Client Error: Not Found for url: https://interactive.aljazeera.com/aje/2016/un-debate-secretary-general/index.html


Processing URLs:  98%|█████████▊| 982/1000 [40:45<00:14,  1.29it/s]

Error extracting text from https://www.realclearpolitics.com/epolls/2017/senate/al/alabama_senate_special_election_moore_vs_jones-6271.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2017/senate/al/alabama_senate_special_election_moore_vs_jones-6271.html#polls


Processing URLs:  99%|█████████▉| 990/1000 [40:57<00:13,  1.37s/it]

Error extracting text from http://www.wsj.com/articles/russia-expands-military-its-presence-in-syria-satellite-photos-show-1442937150?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2ASituation%20Report: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-expands-military-its-presence-in-syria-satellite-photos-show-1442937150?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2ASituation%20Report


Processing URLs:  99%|█████████▉| 994/1000 [41:02<00:08,  1.38s/it]

Error extracting text from https://www.dw.com/en/displaced-oil-and-ruin-the-exodus-of-venezuela/av-52011198: 404 Client Error: Not Found for url: https://www.dw.com/en/displaced-oil-and-ruin-the-exodus-of-venezuela/av-52011198


Processing URLs: 100%|█████████▉| 995/1000 [41:18<00:27,  5.58s/it]

Error extracting text from https://doc.research-and-analytics.csfb.com/docView?language=ENG&amp;format=PDF&amp;source_id=em&amp;document_id=1053681521&amp;serialid=gRAGx5o9KjpeAGBLPq7bpyJRa6r6fj06KjHB6PGBbGU%3d: HTTPSConnectionPool(host='plus2.credit-suisse.com', port=443): Max retries exceeded with url: /docView?language=ENG&amp;format=PDF&amp;source_id=em&amp;document_id=1053681521&amp;serialid=gRAGx5o9KjpeAGBLPq7bpyJRa6r6fj06KjHB6PGBbGU%3D (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffa50050>: Failed to resolve 'plus2.credit-suisse.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs: 100%|██████████| 1000/1000 [41:27<00:00,  2.49s/it]
Processing URLs:   0%|          | 1/1000 [00:00<02:42,  6.13it/s]

Error extracting text from http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html


Processing URLs:   0%|          | 3/1000 [00:02<13:35,  1.22it/s]

Error extracting text from http://www.tax-news.com/news/ASEAN_Reinforces_Commitment_to_RCEP____69829.html: 404 Client Error: Not Found for url: http://www.tax-news.com/news/ASEAN_Reinforces_Commitment_to_RCEP____69829.html


Processing URLs:   1%|          | 6/1000 [00:05<16:47,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKCN0VJ2OS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKCN0VJ2OS


Processing URLs:   1%|▏         | 13/1000 [00:25<39:35,  2.41s/it]  

Error extracting text from https://www.nytimes.com/2017/08/12/us/politics/richard-burr-senate-intelligence-new-washington-podcast.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/12/us/politics/richard-burr-senate-intelligence-new-washington-podcast.html


Processing URLs:   2%|▏         | 17/1000 [00:28<19:23,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16V2KY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16V2KY


Processing URLs:   2%|▏         | 22/1000 [00:43<29:04,  1.78s/it]

Error extracting text from http://www.aninews.in/newsdetail4/story255013/handover-ceremony-remains-at-odd-between-nld-and-govt-.html: 403 Client Error: Forbidden for url: https://www.aninews.in/newsdetail4/story255013/handover-ceremony-remains-at-odd-between-nld-and-govt-.html


Processing URLs:   2%|▏         | 24/1000 [00:57<1:16:49,  4.72s/it]

Error extracting text from https://apps.washingtonpost.com/g/documents/world/full-text-of-the-iran-nuclear-deal/1651/: 502 Server Error: Bad Gateway for url: https://apps.washingtonpost.com/g/documents/world/full-text-of-the-iran-nuclear-deal/1651/


Processing URLs:   2%|▎         | 25/1000 [00:57<57:23,  3.53s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2016-05-16/u-s-discloses-saudi-arabia-s-treasuries-holdings-for-first-time


Processing URLs:   3%|▎         | 27/1000 [00:58<32:29,  2.00s/it]

Error extracting text from http://nationalinterest.org/feature/philippines-v-china-the-adiz-menace-15791: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/philippines-v-china-the-adiz-menace-15791
URL filtered: https://www.bloomberg.com/news/articles/2017-03-19/bullish-bets-on-crude-cut-by-most-ever-as-price-falls-below-50


Processing URLs:   3%|▎         | 30/1000 [00:59<20:14,  1.25s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/angela-merkel-says-aiming-to-resolve-syria-conflict-without-bashar-al-assad/articleshow/50205820.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/angela-merkel-says-aiming-to-resolve-syria-conflict-without-bashar-al-assad/articleshow/50205820.cms


Processing URLs:   3%|▎         | 34/1000 [01:06<29:33,  1.84s/it]

Error extracting text from http://micanaldepanama.com/expansion/documents/third-set-of-locks-contract/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/documents/third-set-of-locks-contract/


Processing URLs:   4%|▎         | 35/1000 [01:07<23:35,  1.47s/it]

Error extracting text from http://thehill.com/homenews/campaign/256955-carson-campaign-plots-next-phase: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/256955-carson-campaign-plots-next-phase/


Processing URLs:   4%|▎         | 36/1000 [01:07<18:03,  1.12s/it]

Error extracting text from https://www.nytimes.com/2021/12/03/health/coronavirus-omicron-vaccines-contagiousness.html.: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/03/health/coronavirus-omicron-vaccines-contagiousness.html.


Processing URLs:   4%|▍         | 38/1000 [01:10<17:44,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-idUSKBN0U208420151219: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKBN0U208420151219


Processing URLs:   4%|▍         | 40/1000 [01:11<13:06,  1.22it/s]

Error extracting text from http://nyti.ms/20H80UP: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/16/opinion/erdogan-and-merkels-comic-comeuppance.html


Processing URLs:   5%|▍         | 47/1000 [01:17<10:19,  1.54it/s]

Error extracting text from http://www.nytimes.com/2015/09/11/world/americas/brazils-economic-crisis-intensifies-raising-pressure-on-president.html?emc=edit_th_20150911&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/11/world/americas/brazils-economic-crisis-intensifies-raising-pressure-on-president.html?emc=edit_th_20150911&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://europe.autonews.com/article/20160503/ANE/160509975/german-car-sales-rose-8-in-april-adding-to-france-italy-spain-gains: 403 Client Error: Forbidden for url: https://europe.autonews.com/article/20160503/ANE/160509975/german-car-sales-rose-8-in-april-adding-to-france-italy-spain-gains
URL filtered: https://www.cnbc.com/2021/04/20/facebook-backed-diem-aims-to-launch-digital-currency-pilot-in-2021.html
URL filtered: https://www.theguardian.com/technology/2016/dec/17/german-officials-say-facebook-is-doing-too-little-to-stop-hate-speech


Processing URLs:   5%|▌         | 51/1000 [01:20<12:44,  1.24it/s]

Error extracting text from http://www.kba.de/DE/Presse/Pressemitteilungen/2011_2015/2015/Fahrzeugzulassungen/pm01_2015_n_12_14_pm_komplett.html: 404 Client Error: Not Found for url: https://www.kba.de/DE/Presse/Pressemitteilungen/2011_2015/2015/Fahrzeugzulassungen/pm01_2015_n_12_14_pm_komplett.html
Error extracting text from http://fusion.net/story/302369/olympics-brazil-fear-zika/: HTTPConnectionPool(host='fusion.net', port=80): Max retries exceeded with url: /story/302369/olympics-brazil-fear-zika/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2febe36e0>: Failed to resolve 'fusion.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 54/1000 [01:22<10:07,  1.56it/s]

Error extracting text from http://uk.reuters.com/article/us-northkorea-nuclear-britain-china-idUKKBN0UK0QQ20160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   6%|▌         | 56/1000 [01:33<41:19,  2.63s/it]

Error extracting text from https://www.faa.gov/uas/civil_operations/: 500 Server Error: Internal Server Error for url: https://www.faa.gov/uas/civil_operations/


Processing URLs:   6%|▌         | 60/1000 [01:44<40:34,  2.59s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/have-iphone-sales-peaked-analysts-predict-slump-in-fiscal-2016


Processing URLs:   7%|▋         | 67/1000 [01:51<14:53,  1.04it/s]

Error extracting text from https://www.wsj.com/articles/u-k-s-boris-johnson-will-self-isolate-for-10-days-11626626679: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-s-boris-johnson-will-self-isolate-for-10-days-11626626679
Error extracting text from http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html?_r=0


Processing URLs:   7%|▋         | 72/1000 [01:59<20:51,  1.35s/it]

Error extracting text from http://www.nytimes.com/2016/06/29/us/politics/supreme-court-term.html?emc=edit_th_20160629&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/29/us/politics/supreme-court-term.html?emc=edit_th_20160629&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:   7%|▋         | 74/1000 [02:02<18:50,  1.22s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-spd-leaders-aim-to-improve-on-coalition-deal-with-merkel-idUSKBN1F30L4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-spd-leaders-aim-to-improve-on-coalition-deal-with-merkel-idUSKBN1F30L4


Processing URLs:   8%|▊         | 81/1000 [02:15<24:33,  1.60s/it]

Error extracting text from https://www.nytimes.com/2017/03/30/business/nafta-trade-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/30/business/nafta-trade-deal-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:   9%|▊         | 87/1000 [02:30<45:15,  2.97s/it]

Error extracting text from http://finance.yahoo.com/news/opec-set-maintain-output-levels-oil-glut-worsens-112215939--finance.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/opec-set-maintain-output-levels-oil-glut-worsens-112215939--finance.html


Processing URLs:   9%|▉         | 90/1000 [02:32<19:51,  1.31s/it]

Error extracting text from http://www.nrc.gov/reading-rm/doc-collections/fact-sheets/nuclear-insurance.html: 403 Client Error: Forbidden for url: http://www.nrc.gov/reading-rm/doc-collections/fact-sheets/nuclear-insurance.html


Processing URLs:   9%|▉         | 92/1000 [02:35<21:28,  1.42s/it]



Processing URLs:  10%|▉         | 95/1000 [02:35<09:26,  1.60it/s]

Error extracting text from http://www.iol.co.za/sundayindependent/dispatch/mbetes-delay-suggests-the-horse-trading-is-complex-10656780: 403 Client Error: Forbidden for url: http://www.iol.co.za/sundayindependent/dispatch/mbetes-delay-suggests-the-horse-trading-is-complex-10656780


Processing URLs:  10%|▉         | 97/1000 [02:38<14:08,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-fishingboats-idUSKCN0XS0RS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-fishingboats-idUSKCN0XS0RS


Processing URLs:  10%|█         | 101/1000 [02:41<12:28,  1.20it/s]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL8N1CR2VH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL8N1CR2VH


Processing URLs:  10%|█         | 103/1000 [02:42<08:16,  1.81it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-healthcare-idUSKBN16D09K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-healthcare-idUSKBN16D09K


Processing URLs:  10%|█         | 104/1000 [02:44<12:38,  1.18it/s]

URL filtered: https://twitter.com/realDonaldTrump/status/912994446219898880


Processing URLs:  11%|█         | 108/1000 [02:55<47:04,  3.17s/it]

Error extracting text from http://www.morningnewsusa.com/south-china-sea-war-china-hold-exercises-disputed-waters-2387263.html: HTTPConnectionPool(host='www.morningnewsusa.com', port=80): Max retries exceeded with url: /south-china-sea-war-china-hold-exercises-disputed-waters-2387263.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301867f80>: Failed to resolve 'www.morningnewsusa.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  12%|█▏        | 116/1000 [03:19<35:03,  2.38s/it]  

Error extracting text from http://www.wsj.com/articles/iraqi-troops-stoke-sectarian-tensions-in-mosul-fight-1477042201: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-troops-stoke-sectarian-tensions-in-mosul-fight-1477042201


Processing URLs:  12%|█▏        | 120/1000 [03:22<17:31,  1.19s/it]

Error extracting text from https://www.thecipherbrief.com/column/agenda-setter/open-letter-cia-director-nominee-1091: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/agenda-setter/open-letter-cia-director-nominee-1091


Processing URLs:  12%|█▏        | 122/1000 [03:25<19:41,  1.35s/it]

Error extracting text from https://reut.rs/3uViuot: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/austrias-kurz-expects-be-charged-cleared-perjury-case-2021-05-16/


Processing URLs:  12%|█▎        | 125/1000 [03:27<13:13,  1.10it/s]

URL filtered: https://www.nytimes.com/2021/06/29/technology/facebook-google-antitrust-tech.html


Processing URLs:  13%|█▎        | 129/1000 [03:34<20:23,  1.40s/it]

Error extracting text from http://www.proatom.ru/modules.php?name=News&amp;file=article&amp;sid=3715: 403 Client Error: Forbidden for url: http://www.proatom.ru/modules.php?name=News&amp;file=article&amp;sid=3715


Processing URLs:  13%|█▎        | 130/1000 [03:34<17:01,  1.17s/it]

Error extracting text from http://www.scottaaronson.com/busybeaver.pdf: 406 Client Error: Not Acceptable for url: http://www.scottaaronson.com/busybeaver.pdf
URL filtered: https://www.cnbc.com/2022/02/03/facebook-shares-plummet-22percent-after-reporting-weak-guidance.html


Processing URLs:  13%|█▎        | 134/1000 [03:37<10:30,  1.37it/s]

Error extracting text from http://www.nytimes.com/2016/05/05/business/tesla-says-it-will-sharply-ramp-up-production-of-model-3.html?emc=edit_th_20160505&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/05/business/tesla-says-it-will-sharply-ramp-up-production-of-model-3.html?emc=edit_th_20160505&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  14%|█▍        | 138/1000 [03:43<19:31,  1.36s/it]

Error extracting text from https://www.tuko.co.ke/247608-confusion-woman-discovers-live-snake-home-door-calls-police.html: 410 Client Error: Gone for url: https://www.tuko.co.ke/247608-confusion-woman-discovers-live-snake-home-door-calls-police.html


Processing URLs:  14%|█▍        | 142/1000 [03:52<21:15,  1.49s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-20/another-bad-sign-for-opec-and-the-oil-bulls


Processing URLs:  14%|█▍        | 144/1000 [03:53<15:26,  1.08s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/african-union-plans-deploy-military-monitors-burundi-37239828: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/african-union-plans-deploy-military-monitors-burundi-37239828
Error extracting text from http://www.reuters.com/article/us-philippines-russia-duterte-idUSKBN14Q0Z0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-russia-duterte-idUSKBN14Q0Z0


Processing URLs:  15%|█▍        | 147/1000 [03:55<12:16,  1.16it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/359806-pentagons-coordinated-disclosure-program-defangs-2800-security-flaws: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/359806-pentagons-coordinated-disclosure-program-defangs-2800-security-flaws/


Processing URLs:  15%|█▍        | 149/1000 [03:57<11:41,  1.21it/s]

Error extracting text from https://thehill.com/policy/international/557603-us-iran-nuclear-talks-to-resume-this-weekend: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/557603-us-iran-nuclear-talks-to-resume-this-weekend/


Processing URLs:  15%|█▌        | 152/1000 [03:58<08:07,  1.74it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.tribunadainternet.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.tribunadainternet.com.br/&amp;prev=search


Processing URLs:  16%|█▌        | 158/1000 [04:04<12:20,  1.14it/s]

Error extracting text from http://greece.greekreporter.com/2016/09/17/undeclared-labor-painting-a-black-picture-for-greece/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/09/17/undeclared-labor-painting-a-black-picture-for-greece/


Processing URLs:  16%|█▌        | 160/1000 [04:05<09:00,  1.55it/s]

Error extracting text from https://www.wsj.com/articles/bullish-oil-investors-may-be-getting-ahead-of-themselves-1487602859: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bullish-oil-investors-may-be-getting-ahead-of-themselves-1487602859


Processing URLs:  16%|█▋        | 163/1000 [04:06<07:02,  1.98it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/11/04/U-S-says-too-soon-for-Syrian-opposition-to-attend-talks-in-Russia.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/11/04/U-S-says-too-soon-for-Syrian-opposition-to-attend-talks-in-Russia.html
Error extracting text from https://www.reuters.com/article/us-usa-tech-liability/democrats-prefer-scalpel-over-jackhammer-to-reform-key-u-s-internet-law-idUSKBN27E1IA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tech-liability/democrats-prefer-scalpel-over-jackhammer-to-reform-key-u-s-internet-law-idUSKBN27E1IA


Processing URLs:  16%|█▋        | 165/1000 [04:09<11:32,  1.21it/s]

Error extracting text from https://www.nasdaq.com/articles/ethiopias-regional-tigray-forces-name-conditions-for-peace-with-government-2021-02-19: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/ethiopias-regional-tigray-forces-name-conditions-for-peace-with-government-2021-02-19


Processing URLs:  17%|█▋        | 168/1000 [04:12<14:24,  1.04s/it]

Error extracting text from http://www.superforecasting.com/: HTTPSConnectionPool(host='www.superforecasting.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  17%|█▋        | 169/1000 [04:15<21:54,  1.58s/it]

Error extracting text from http://pollyvote.com/en/: 404 Client Error: Not Found for url: https://www.pollyvote.com/en/


Processing URLs:  18%|█▊        | 177/1000 [04:33<33:54,  2.47s/it]

URL filtered: https://www.youtube.com/watch?v=Nk6bqwMfLU8


Processing URLs:  18%|█▊        | 180/1000 [04:36<23:16,  1.70s/it]

Error extracting text from http://www.international.gc.ca/isrop-prisi/index.aspx?lang=eng: 404 Client Error: Not Found for url: https://www.international.gc.ca/isrop-prisi/index.aspx?lang=eng


Processing URLs:  18%|█▊        | 182/1000 [04:38<16:22,  1.20s/it]

Error extracting text from http://www.novayagazeta.ru/politics/70001.html: 404 Client Error: Not Found for url: https://novayagazeta.ru/politics/70001.html
Error extracting text from https://www.straitstimes.com/business/companies-markets/bezos-to-step-down-as-amazon-ceo-and-transition-to-executive-chair-role: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  19%|█▉        | 188/1000 [04:42<11:07,  1.22it/s]

Error extracting text from https://theconversation.com/boycotting-the-next-olympics-in-beijing-will-hurt-athletes-heres-a-better-idea-165451: 403 Client Error: Forbidden for url: https://theconversation.com/boycotting-the-next-olympics-in-beijing-will-hurt-athletes-heres-a-better-idea-165451


Processing URLs:  19%|█▉        | 190/1000 [04:44<12:04,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-burundi-unrest-idUSKCN0W00H9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-unrest-idUSKCN0W00H9


Processing URLs:  20%|█▉        | 196/1000 [04:55<18:11,  1.36s/it]

Error extracting text from http://www.hybridcars.com/gm-ev-battery-cells-down-to-145kwh-and-still-falling/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/gm-ev-battery-cells-down-to-145kwh-and-still-falling/
URL filtered: http://bloomberg.econoday.com/byshoweventfull.asp?fid=467037&amp;cust=bloomberg-us&amp;year=2015&amp;lid=0&amp;prev=/byweek.asp#top


Processing URLs:  20%|██        | 200/1000 [04:58<10:01,  1.33it/s]

Error extracting text from http://opinion.inquirer.net/93802/peru-and-ph-and-dictators-offspring#ixzz436l4OfrX: 403 Client Error: Forbidden for url: https://opinion.inquirer.net/93802/peru-and-ph-and-dictators-offspring#ixzz436l4OfrX


Processing URLs:  20%|██        | 203/1000 [05:04<17:24,  1.31s/it]

Error extracting text from https://finance.yahoo.com/news/august-jobs-report-nonfarm-payrolls-labor-department-coronavirus-194347179.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/august-jobs-report-nonfarm-payrolls-labor-department-coronavirus-194347179.html


Processing URLs:  21%|██        | 206/1000 [05:06<11:50,  1.12it/s]

Error extracting text from http://news.yahoo.com/syrias-assad-stay-only-until-transition-council-saudi-182718758.html: 404 Client Error: Not Found for url: http://news.yahoo.com/syrias-assad-stay-only-until-transition-council-saudi-182718758.html


Processing URLs:  21%|██        | 207/1000 [05:08<17:55,  1.36s/it]

Error extracting text from https://www.nord-stream2.com/media-info/news-events/gas-filling-of-the-first-nord-stream-2-string-started-153/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /media-info/news-events/gas-filling-of-the-first-nord-stream-2-string-started-153/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3042e4350>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  21%|██        | 212/1000 [05:11<07:26,  1.77it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/04/13/national/politics-diplomacy/putin-hints-japan-visit-isle-row-lingers-looks-meet-abe-sochi-may/#.Vw4g08hXeEc: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/04/13/national/politics-diplomacy/putin-hints-japan-visit-isle-row-lingers-looks-meet-abe-sochi-may/#.Vw4g08hXeEc
Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-african-president-zumas-fate-to-be-decided-on-monday-says-anc-head-idUSKBN1FW028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-african-president-zumas-fate-to-be-decided-on-monday-says-anc-head-idUSKBN1FW028
Error extracting text from http://www.timesofisrael.com/zarif-iran-will-never-trust-the-united-states/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/zarif-iran-will-never-trust-the-united-states/


Processing URLs:  21%|██▏       | 214/1000 [06:15<3:28:34, 15.92s/it]

Error extracting text from http://www.icana.ir/En/: HTTPConnectionPool(host='www.icana.ir', port=80): Max retries exceeded with url: /En/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3042e7590>, 'Connection to www.icana.ir timed out. (connect timeout=60)'))


Processing URLs:  22%|██▏       | 221/1000 [06:26<33:52,  2.61s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16G0J8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16G0J8


Processing URLs:  22%|██▏       | 223/1000 [06:29<24:28,  1.89s/it]

Error extracting text from http://www.reuters.com/article/us-safrica-politics-anc/south-africas-unruly-anc-branches-kick-off-race-to-succeed-zuma-idUSKBN1D14M3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics-anc/south-africas-unruly-anc-branches-kick-off-race-to-succeed-zuma-idUSKBN1D14M3?il=0


Processing URLs:  22%|██▏       | 224/1000 [07:29<4:08:48, 19.24s/it]

Error extracting text from http://www.uavexpertnews.com/2017/04/faa-outlines-laanc-automated-drone-tool/: HTTPConnectionPool(host='www.uavexpertnews.com', port=80): Max retries exceeded with url: /2017/04/faa-outlines-laanc-automated-drone-tool/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe910650>, 'Connection to www.uavexpertnews.com timed out. (connect timeout=60)'))


Processing URLs:  23%|██▎       | 232/1000 [08:00<52:04,  4.07s/it]  

URL filtered: https://www.youtube.com/watch?v=cma_xwcI3Ak&amp;feature=youtu.be&amp;t=40s


Processing URLs:  23%|██▎       | 234/1000 [08:02<33:49,  2.65s/it]

Error extracting text from https://uk.reuters.com/article/uk-czech-usa-cybercrime/czech-high-court-says-alleged-russian-hacker-can-be-extradited-to-united-states-idUKKBN1DO1GI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  24%|██▎       | 237/1000 [08:05<20:50,  1.64s/it]

Error extracting text from https://www.congress.gov/bill/110th-congress/senate-bill/494/text/pl: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/110th-congress/senate-bill/494/text/pl


Processing URLs:  24%|██▍       | 245/1000 [08:20<19:37,  1.56s/it]

Error extracting text from https://www.chapters.indigo.ca/en-ca/home/economics/528279-cat.html: 404 Client Error: Not Found for url: https://www.indigo.ca/en-ca/home/economics/528279-cat.html


Processing URLs:  25%|██▍       | 248/1000 [08:41<1:18:07,  6.23s/it]

Error extracting text from https://www.recode.net/2017/7/12/15960704/alphabet-uber-lawsuit-settlement-waymo-self-driving-tech: Exceeded 30 redirects.


Processing URLs:  25%|██▌       | 251/1000 [08:45<33:49,  2.71s/it]  

Error extracting text from https://www.amazon.com/Kill-Switch-Crippling-American-Democracy/dp/1631497774/ref=sr_1_1?dchild=1&amp;keywords=filibuster+book&amp;qid=1614991555&amp;s=books&amp;sr=1-1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Kill-Switch-Crippling-American-Democracy/dp/1631497774/ref=sr_1_1?dchild=1&amp;keywords=filibuster+book&amp;qid=1614991555&amp;s=books&amp;sr=1-1
Error extracting text from http://www.reuters.com/article/us-global-economy-oecd-idUSKCN0YN46O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-economy-oecd-idUSKCN0YN46O


Processing URLs:  25%|██▌       | 253/1000 [08:47<22:37,  1.82s/it]

Error extracting text from http://www.proatom.ru/modules.php?name=News&amp;file=article&amp;sid=3672&amp;mode=flat&amp;order=1&amp;thold=0: 403 Client Error: Forbidden for url: http://www.proatom.ru/modules.php?name=News&amp;file=article&amp;sid=3672&amp;mode=flat&amp;order=1&amp;thold=0


Processing URLs:  25%|██▌       | 254/1000 [08:47<16:57,  1.36s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2015/12/islamic-state-atrocities-fuelled-by-decades-of-reckless-arms-trading/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2015/12/islamic-state-atrocities-fuelled-by-decades-of-reckless-arms-trading/


Processing URLs:  26%|██▌       | 256/1000 [08:49<14:18,  1.15s/it]

Error extracting text from http://www.eveningtribune.com/article/20160316/NEWS/160319796: 404 Client Error: Not Found for url: https://www.eveningtribune.com/obituaries/story-obituaries-2016-03-16-gerald-joseph-auger-1930-2016-32389492007
Error extracting text from http://www.realclearpolitics.com/articles/2017/01/25/trump_pushes_for_rapid_action_on_obamacare_132889.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2017/01/25/trump_pushes_for_rapid_action_on_obamacare_132889.html


Processing URLs:  26%|██▋       | 263/1000 [09:12<37:26,  3.05s/it]

URL filtered: https://twitter.com/NicolasMaduro/status/928486948364316672


Processing URLs:  26%|██▋       | 265/1000 [09:13<24:31,  2.00s/it]

Error extracting text from https://www.br.de/nachrichten/schwaben/inhalt/kkw-gundremmingen-schadsoftware-akw-100.html: 404 Client Error: Not Found for url: https://www.br.de/nachricht/schwaben/inhalt/kkw-gundremmingen-schadsoftware-akw-100.html


Processing URLs:  27%|██▋       | 268/1000 [09:18<18:00,  1.48s/it]

Error extracting text from http://www.nytimes.com/2016/01/14/opinion/iraq-and-the-kurds-are-going-broke.html?partner=rss&amp;emc=rss&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/opinion/iraq-and-the-kurds-are-going-broke.html?partner=rss&amp;emc=rss&amp;_r=0


Processing URLs:  27%|██▋       | 269/1000 [09:18<14:48,  1.22s/it]

Error extracting text from http://thehill.com/homenews/campaign/265839-chelsea-goes-on-the-attack-dems-ask-why: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/265839-chelsea-goes-on-the-attack-dems-ask-why/


Processing URLs:  27%|██▋       | 270/1000 [09:19<12:20,  1.01s/it]

Error extracting text from http://www.nextev.com/news: HTTPConnectionPool(host='www.nextev.com', port=80): Max retries exceeded with url: /news (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x304fcc6e0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  27%|██▋       | 271/1000 [09:20<13:28,  1.11s/it]

Error extracting text from https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/pai-statement: 403 Client Error: Forbidden for url: https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/pai-statement


Processing URLs:  27%|██▋       | 272/1000 [09:20<10:35,  1.15it/s]

Error extracting text from http://asiafoundation.org/in-asia/2016/04/06/afghanistans-electoral-reform-a-distant-reality/: 403 Client Error: Forbidden for url: http://asiafoundation.org/in-asia/2016/04/06/afghanistans-electoral-reform-a-distant-reality/


Processing URLs:  27%|██▋       | 274/1000 [09:23<12:37,  1.04s/it]

Error extracting text from http://www.mrt.com/business/oil/top_stories/article_16507234-79d8-11e5-a2c1-7f9051699030.html#ixzz3qreD8y8p: 403 Client Error: Forbidden for url: https://www.mrt.com/business/oil/top_stories/article_16507234-79d8-11e5-a2c1-7f9051699030.html#ixzz3qreD8y8p


Processing URLs:  28%|██▊       | 275/1000 [09:25<18:13,  1.51s/it]

Error extracting text from https://getcruise.com/careers/location/nyc: 404 Client Error: Not Found for url: https://getcruise.com/careers/location/nyc


Processing URLs:  28%|██▊       | 276/1000 [09:26<14:18,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0XJ268: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0XJ268


Processing URLs:  28%|██▊       | 279/1000 [09:28<10:05,  1.19it/s]

Error extracting text from http://www.nytimes.com/2016/09/15/technology/how-did-gm-create-teslas-dream-car-first.html?emc=edit_th_20160915&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/15/technology/how-did-gm-create-teslas-dream-car-first.html?emc=edit_th_20160915&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  28%|██▊       | 281/1000 [09:31<15:05,  1.26s/it]

URL filtered: https://www.youtube.com/watch?v=L8EtfwcKX5U


Processing URLs:  28%|██▊       | 285/1000 [09:34<10:29,  1.14it/s]

Error extracting text from http://www.wsj.com/articles/u-s-hunts-for-russian-equipment-stolen-by-islamic-state-in-syria-1481927186: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-hunts-for-russian-equipment-stolen-by-islamic-state-in-syria-1481927186


Processing URLs:  29%|██▊       | 286/1000 [09:35<11:27,  1.04it/s]

Error extracting text from http://www.reuters.com/article/2015/11/27/panama-canal-idUSL1N13M01V20151127?feedType=RSS&amp;feedName=utilitiesSector: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/panama-canal-idUSL1N13M01V20151127?feedType=RSS&amp;feedName=utilitiesSector


Processing URLs:  29%|██▉       | 290/1000 [09:41<16:39,  1.41s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-19/homebuilder-confidence-reaches-decade-high-on-u-s-sales-outlook


Processing URLs:  29%|██▉       | 294/1000 [10:01<1:00:24,  5.13s/it]

Error extracting text from http://www.agriculture.com/news/technology/superforecasting-for-the-farm: 406 Client Error: Not Acceptable for url: https://www.agriculture.com/news/technology/superforecasting-for-the-farm


Processing URLs:  30%|██▉       | 297/1000 [10:04<29:21,  2.51s/it]  

Error extracting text from https://www.scientificamerican.com/article/how-chinas-bat-woman-hunted-down-viruses-from-sars-to-the-new-coronavirus1/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/how-chinas-bat-woman-hunted-down-viruses-from-sars-to-the-new-coronavirus1/


Processing URLs:  30%|███       | 301/1000 [10:09<18:44,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-01/venezuelan-credit-dashboard-726-million-comes-due-in-august#media-1


Processing URLs:  30%|███       | 305/1000 [10:17<22:25,  1.94s/it]

URL filtered: https://www.youtube.com/watch?v=l7c5f0oPTLI


Processing URLs:  31%|███       | 310/1000 [10:20<09:49,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-philippines-security-idUSKBN15B0XO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-security-idUSKBN15B0XO


Processing URLs:  31%|███       | 311/1000 [10:20<07:52,  1.46it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/other/congressional_job_approval-903.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/other/congressional_job_approval-903.html


Processing URLs:  31%|███▏      | 314/1000 [10:24<11:18,  1.01it/s]

Error extracting text from http://www.humanosphere.org/world-politics/2017/04/japan-begins-pull-troops-south-sudan-peacekeeping-mission/: 404 Client Error: Not Found for url: http://www.humanosphere.org/world-politics/2017/04/japan-begins-pull-troops-south-sudan-peacekeeping-mission/


Processing URLs:  32%|███▏      | 317/1000 [10:31<16:29,  1.45s/it]

Error extracting text from https://gulfnews.com/expo-2020/pavilions/italy-pavilion-seen-so-far-by-20-per-cent-of-total-expo-2020-dubai-visitors-1.1634641276347): 404 Client Error: Not Found for url: https://gulfnews.com/expo-2020/pavilions/italy-pavilion-seen-so-far-by-20-per-cent-of-total-expo-2020-dubai-visitors-1.1634641276347)


Processing URLs:  32%|███▏      | 322/1000 [10:46<29:34,  2.62s/it]

Error extracting text from http://www.presstv.com/Detail/2016/09/05/483243/Yemen-Saudi-Arabia-cluster-bombs: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/09/05/483243/Yemen-Saudi-Arabia-cluster-bombs


Processing URLs:  32%|███▏      | 324/1000 [10:51<30:13,  2.68s/it]

Error extracting text from http://in.reuters.com/article/turkey-referendum-eu-idINKBN16W0RF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  33%|███▎      | 326/1000 [10:54<21:19,  1.90s/it]

Error extracting text from http://www.globaltimes.cn/content/1033687.shtml: 404 Client Error: Not Found for url: https://www.globaltimes.cn/content/1033687.shtml


Processing URLs:  33%|███▎      | 329/1000 [11:08<48:53,  4.37s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/putin-urges-us-business-to-help-normalize-russia-us-ties/2017/06/02/32e80a74-478b-11e7-8de1-cec59a9bf4b1_story.html?utm_term=.0c995bbcaff6: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/putin-urges-us-business-to-help-normalize-russia-us-ties/2017/06/02/32e80a74-478b-11e7-8de1-cec59a9bf4b1_story.html?utm_term=.0c995bbcaff6


Processing URLs:  33%|███▎      | 330/1000 [11:09<36:32,  3.27s/it]

Error extracting text from https://abcnews.go.com/Politics/wireStory/schumer-sets-june-vote-elections-overhaul-bill-77968902: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/schumer-sets-june-vote-elections-overhaul-bill-77968902


Processing URLs:  33%|███▎      | 331/1000 [11:11<32:02,  2.87s/it]

Error extracting text from http://in.rbth.com/news/2015/12/15/iran-to-ship-85-tonnes-of-uranium-materials-to-russia-before-yearend_550995: HTTPConnectionPool(host='in.rbth.com', port=80): Max retries exceeded with url: /news/2015/12/15/iran-to-ship-85-tonnes-of-uranium-materials-to-russia-before-yearend_550995 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303884a70>: Failed to resolve 'in.rbth.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  33%|███▎      | 332/1000 [11:12<26:04,  2.34s/it]

Error extracting text from http://www.business-standard.com/article/news-ians/brazilian-former-minister-bernardo-silva-arrested-on-corruption-charges-116062400083_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/brazilian-former-minister-bernardo-silva-arrested-on-corruption-charges-116062400083_1.html


Processing URLs:  33%|███▎      | 333/1000 [11:42<1:57:34, 10.58s/it]

Error extracting text from https://ir.usembassy.gov/protecting-nation-foreign-terrorist-entry-united-states/: 404 Client Error: Not Found for url: https://ir.usembassy.gov/protecting-nation-foreign-terrorist-entry-united-states/


Processing URLs:  33%|███▎      | 334/1000 [11:43<1:24:41,  7.63s/it]

Error extracting text from http://leginfo.ca.gov/pub/15-16/bill/asm/ab_1551-1600/ab_1592_bill_20160823_status.html: 404 Client Error: Not found for url: http://leginfo.ca.gov/pub/15-16/bill/asm/ab_1551-1600/ab_1592_bill_20160823_status.html


Processing URLs:  34%|███▎      | 336/1000 [11:47<51:30,  4.65s/it]  

Error extracting text from http://en.trend.az/business/energy/2465388.html: 404 Client Error: Not Found for url: https://www.trend.az/business/energy/2465388.html
Error extracting text from http://www.nytimes.com/2016/10/08/us/politics/isis-mosul-iraq-us.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/08/us/politics/isis-mosul-iraq-us.html


Processing URLs:  34%|███▍      | 340/1000 [13:02<4:02:07, 22.01s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-03-03/venezuelas-guaido-calls-for-opposition-input-into-new-electoral-body: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  34%|███▍      | 343/1000 [13:07<1:31:45,  8.38s/it]

Error extracting text from http://www.autoevolution.com/news/ev-branching-puts-financial-strain-on-mercedes-benz-ceo-plans-budget-cuts-112097.html: 403 Client Error: Forbidden for url: https://www.autoevolution.com/news/ev-branching-puts-financial-strain-on-mercedes-benz-ceo-plans-budget-cuts-112097.html


Processing URLs:  35%|███▌      | 351/1000 [13:21<28:36,  2.64s/it]  

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekend&amp;id=avengers2.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekend&amp;id=avengers2.htm


Processing URLs:  35%|███▌      | 353/1000 [13:25<24:44,  2.29s/it]

Error extracting text from http://carnegie-mec.org/diwan/68649: 403 Client Error: Forbidden for url: http://carnegie-mec.org/diwan/68649


Processing URLs:  36%|███▌      | 355/1000 [13:26<14:36,  1.36s/it]

Error extracting text from https://pythagorassite.files.wordpress.com/2016/04/first_forecast_for_the_brexit_referendum___elections_etc.png?w=978: 404 Client Error: Not Found for url: https://pythagorassite.files.wordpress.com/2016/04/first_forecast_for_the_brexit_referendum___elections_etc.png?w=978


Processing URLs:  36%|███▌      | 356/1000 [13:26<11:59,  1.12s/it]

Error extracting text from https://theconversation.com/whos-running-haiti-after-presidents-assassination-5-questions-answered-164287: 403 Client Error: Forbidden for url: https://theconversation.com/whos-running-haiti-after-presidents-assassination-5-questions-answered-164287


Processing URLs:  36%|███▌      | 357/1000 [13:27<10:07,  1.06it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/249983-trump-builds-his-political-machine: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/249983-trump-builds-his-political-machine/


Processing URLs:  36%|███▌      | 359/1000 [13:28<08:33,  1.25it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/01/30/world/middleeast/ap-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/01/30/world/middleeast/ap-syria.html


Processing URLs:  36%|███▌      | 360/1000 [13:29<08:07,  1.31it/s]

URL filtered: http://www.bloomberg.com/news/articles/2014-12-17/china-said-to-plan-sweeping-shift-from-foreign-technology-to-own


Processing URLs:  36%|███▋      | 364/1000 [13:33<08:51,  1.20it/s]

Error extracting text from https://www.rand.org/blog/2021/03/russian-mercenaries-in-great-power-competition-strategic.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2021/03/russian-mercenaries-in-great-power-competition-strategic.html


Processing URLs:  38%|███▊      | 375/1000 [13:53<17:04,  1.64s/it]

Error extracting text from http://colombiapeace.org/: 403 Client Error: Forbidden for url: http://colombiapeace.org/


Processing URLs:  38%|███▊      | 376/1000 [13:53<12:52,  1.24s/it]

Error extracting text from http://www.nytimes.com/aponline/2015/11/16/world/middleeast/ap-un-united-nations-syria.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/11/16/world/middleeast/ap-un-united-nations-syria.html?_r=0


Processing URLs:  38%|███▊      | 377/1000 [13:54<12:10,  1.17s/it]

Error extracting text from http://tinyurl.com/jmx4po7: 400 Client Error: Bad Request for url: https://email.gjopen.com/c/eJwVj0tuhDAQRE8Du0Y2_uEFi2xyjai7aYJHYE_4hMnt45FqUXoqqaqmUUJ02KbR-cHqOUQd_fBlzMyRe6sCO6utM2Qaq9gZ3yNNYJAD2MEqQHYaaDYknjyL-G7DtLbLyBORYkUUKQxRC2G0lczKSuDAsV3H5Tyfjflo-s-q-76770d5Su64bBX8XHKcqeSj-sHDndYVeEkZgUueLj4h4y-uIC_ZOR1y1Mi5AOY_wEMwwyYbyQ4kc9kFNDyuLNAr7dt9lP2VN5kS1ldLOd-b37X_dJpWDw


Processing URLs:  38%|███▊      | 380/1000 [13:57<10:12,  1.01it/s]

Error extracting text from http://www.wsj.com/articles/prospect-of-ouster-grows-for-brazil-president-dilma-rousseff-1457393804: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/prospect-of-ouster-grows-for-brazil-president-dilma-rousseff-1457393804


Processing URLs:  38%|███▊      | 383/1000 [14:00<09:26,  1.09it/s]

Error extracting text from http://www.justice.gov/enrd/mobile-sources: 403 Client Error: Forbidden for url: https://www.justice.gov/enrd/mobile-sources
Error extracting text from http://www.nytimes.com/2016/04/14/world/asia/party-of-south-koreas-president-loses-majority-in-parliament.html?emc=edit_th_20160414&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/14/world/asia/party-of-south-koreas-president-loses-majority-in-parliament.html?emc=edit_th_20160414&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  38%|███▊      | 384/1000 [14:01<09:17,  1.11it/s]

Error extracting text from http://uk.reuters.com/article/us-china-autos-green-idUKKCN12A1WY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  39%|███▉      | 389/1000 [14:12<17:22,  1.71s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missile-idUSKCN0V52TZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-idUSKCN0V52TZ


Processing URLs:  40%|███▉      | 396/1000 [14:30<15:07,  1.50s/it]

Error extracting text from http://bit.ly/2zJZPmZ: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2017/senate/al/alabama_senate_special_election_moore_vs_jones-6271.html
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.jb.com.br/pais/noticias/2016/02/21/dilma-pede-no-stf-manutencao-de-decisao-sobre-rito-de-impeachment/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.jb.com.br/pais/noticias/2016/02/21/dilma-pede-no-stf-manutencao-de-decisao-sobre-rito-de-impeachment/&amp;prev=search


Processing URLs:  40%|███▉      | 397/1000 [14:30<11:47,  1.17s/it]

Error extracting text from http://www.bls.gov/opub/ted/2015/employment-up-211000-from-october-2015-to-november-2015.htm: 403 Client Error: Forbidden for url: http://www.bls.gov/opub/ted/2015/employment-up-211000-from-october-2015-to-november-2015.htm


Processing URLs:  40%|████      | 400/1000 [14:35<16:44,  1.67s/it]

Error extracting text from http://dx.doi.org/10.1051/shsconf/20162801048: 403 Client Error: Forbidden for url: https://www.shs-conferences.org/10.1051/shsconf/20162801048
URL filtered: https://www.bloombergquint.com/business/merck-signs-1-2-billion-u-s-supply-pact-for-covid-treatment
Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/march1GOP.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/march1GOP.html


Processing URLs:  40%|████      | 405/1000 [14:42<14:10,  1.43s/it]

Error extracting text from https://ca.investing.com/news/commodities-news/crude-oil-prices-rise-as-imf-raises-growth-outlook2341128: 404 Client Error: Not Found for url: https://ca.investing.com/news/commodities-news/crude-oil-prices-rise-as-imf-raises-growth-outlook%022341128


Processing URLs:  41%|████      | 408/1000 [14:47<12:43,  1.29s/it]

Error extracting text from https://www.csoonline.com/article/3236721/security/homeland-security-team-remotely-hacked-a-boeing-757.html: 404 Client Error: Not Found for url: https://www.csoonline.com/article/3236721/security/homeland-security-team-remotely-hacked-a-boeing-757.html
Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN15Y0FS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN15Y0FS


Processing URLs:  41%|████      | 409/1000 [14:49<16:29,  1.67s/it]

Error extracting text from http://sos.nh.gov/VotePartyPrimFAQ.aspx: 404 Client Error: Not Found for url: https://sos.nh.gov/VotePartyPrimFAQ.aspx
URL filtered: https://www.youtube.com/watch?v=OsxnqOv06xE


Processing URLs:  41%|████      | 412/1000 [14:51<10:57,  1.12s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC213968/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC213968/


Processing URLs:  42%|████▏     | 416/1000 [15:00<21:29,  2.21s/it]

Error extracting text from http://theweek.com/articles/635515/cia-team-clairvoyants: 404 Client Error: Not Found for url: https://theweek.com/articles/635515/cia-team-clairvoyants


Processing URLs:  42%|████▏     | 417/1000 [15:02<21:09,  2.18s/it]

Error extracting text from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.488.9725&amp;rep=rep1&amp;type=pdf: 401 Client Error: Unauthorized for url: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.488.9725&amp;rep=rep1&amp;type=pdf


Processing URLs:  42%|████▏     | 420/1000 [15:07<17:40,  1.83s/it]

Error extracting text from http://www.columbiatribune.com/news/poor-visibility-further-slows-mosul-advance/article_0b0711c5-9f0e-5f80-a392-e20794f3ce63.html: 404 Client Error: OK for url: https://www.columbiatribune.com/news/poor-visibility-further-slows-mosul-advance/article_0b0711c5-9f0e-5f80-a392-e20794f3ce63.html/


Processing URLs:  43%|████▎     | 426/1000 [15:26<25:31,  2.67s/it]

Error extracting text from https://www.reuters.com/article/us-usa-congress-budget-idUSKBN1A30RD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-budget-idUSKBN1A30RD


Processing URLs:  43%|████▎     | 433/1000 [15:35<11:04,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-kidnapping-idUSKCN0UV09X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-kidnapping-idUSKCN0UV09X


Processing URLs:  44%|████▎     | 437/1000 [15:44<16:55,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VW0TY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VW0TY


Processing URLs:  44%|████▍     | 438/1000 [15:46<17:42,  1.89s/it]

URL filtered: http://rue89.nouvelobs.com/2015/05/17/liran-teste-systeme-censure-intelligente-instagram-259223


Processing URLs:  44%|████▍     | 443/1000 [15:49<07:49,  1.19it/s]

Error extracting text from http://www.nytimes.com/1977/06/30/archives/senate-vote-forbids-using-federal-funds-for-most-abortions-high.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1977/06/30/archives/senate-vote-forbids-using-federal-funds-for-most-abortions-high.html


Processing URLs:  45%|████▍     | 446/1000 [15:53<10:27,  1.13s/it]

Error extracting text from http://uk.reuters.com/article/uk-turkey-referendum-idUKKBN1733PN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  45%|████▍     | 448/1000 [15:57<13:56,  1.52s/it]

Error extracting text from https://www.fire.ca.gov/media/5511/top20_destruction.pdf: 403 Client Error: Forbidden for url: https://www.fire.ca.gov/media/5511/top20_destruction.pdf


Processing URLs:  45%|████▍     | 449/1000 [15:58<11:20,  1.23s/it]

Error extracting text from http://goodjudgment.com/gjp/index.php/2013/12/31/reflections-on-season-3-thus-far-part-1/: 403 Client Error: Forbidden for url: http://goodjudgment.com/gjp/index.php/2013/12/31/reflections-on-season-3-thus-far-part-1/


Processing URLs:  45%|████▌     | 452/1000 [16:02<12:34,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=nAtcHwMHdPY


Processing URLs:  46%|████▌     | 457/1000 [16:09<12:57,  1.43s/it]

Error extracting text from http://www.abelprize.no/nyheter/vis.html?tid=46013: 404 Client Error: Not Found for url: https://abelprize.no/nyheter/vis.html?tid=46013
Error extracting text from http://www.reuters.com/article/2015/09/18/us-markets-oil-idUSKCN0RH02A20150918: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/18/us-markets-oil-idUSKCN0RH02A20150918


Processing URLs:  46%|████▌     | 460/1000 [16:12<08:01,  1.12it/s]

Error extracting text from http://www.nytimes.com/2016/04/02/us/politics/gop-fears-donald-trump-as-zombie-candidate-damaged-but-unstoppable.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/02/us/politics/gop-fears-donald-trump-as-zombie-candidate-damaged-but-unstoppable.html


Processing URLs:  46%|████▌     | 462/1000 [16:13<08:08,  1.10it/s]

Error extracting text from http://flip.it/to3eK: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html


Processing URLs:  46%|████▋     | 463/1000 [16:14<06:25,  1.39it/s]

Error extracting text from http://www.nationmultimedia.com/business/RCEP-meeting-this-month-to-reveal-extent-of-progre-30283691.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/RCEP-meeting-this-month-to-reveal-extent-of-progre-30283691.html


Processing URLs:  46%|████▋     | 465/1000 [16:14<04:29,  1.99it/s]

Error extracting text from http://www.wsj.com/articles/battered-emerging-markets-race-to-stem-outflows-1453410496: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/battered-emerging-markets-race-to-stem-outflows-1453410496
Error extracting text from http://www.wsj.com/articles/feds-yellen-expresses-confidence-in-u-s-economy-ahead-of-december-meeting-1449077125: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-yellen-expresses-confidence-in-u-s-economy-ahead-of-december-meeting-1449077125


Processing URLs:  47%|████▋     | 467/1000 [16:16<06:40,  1.33it/s]

Error extracting text from http://www.barchart.com/opinions/futures/CBK16: 403 Client Error: Forbidden for url: https://www.barchart.com/opinions/futures/CBK16


Processing URLs:  47%|████▋     | 468/1000 [16:17<05:15,  1.68it/s]

Error extracting text from https://www.nytimes.com/2021/06/16/world/middleeast/israel-hamas-gaza-cease-fire.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/16/world/middleeast/israel-hamas-gaza-cease-fire.html


Processing URLs:  47%|████▋     | 469/1000 [16:19<10:31,  1.19s/it]

Error extracting text from http://www.kdmid.ru/docs.aspx?lst=country_wiki&amp;it=/%D0%A1%D0%BE%D0%B3%D0%BB%D0%B0%D1%88%D0%B5%D0%BD%D0%B8%D0%B5%20%D0%BC%D0%B5%D0%B6%D0%B4%D1%83%20%D0%9F%D1%80%D0%B0%D0%B2%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%BE%D0%BC%20%D0%A0%D0%A4%20%D0%B8%20%D0%9F%D1%80%D0%B0%D0%B2%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D1%81%D1%82%D0%B2%D0%BE%D0%BC%20%D0%A2%D1%83%D1%80%D0%B5%D1%86%D0%BA%D0%BE%D0%B9%20%D0%A0%D0%B5%D1%81%D0%BF%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%B8%20%D0%BE%20%D0%B1%D0%B5%D0%B7%D0%B2%D0%B8%D0%B7%D0%BE%D0%B2%D1%8B%D1%85%20%D0%BF%D0%BE%D0%B5%D0%B7%D0%B4%D0%BA%D0%B0%D1%85%20%D0%BF%D0%BE%20%D0%B4%D0%B8%D0%BF%D0%BB%D0%BE%D0%BC%D0%B0%D1%82%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B8%D0%BC%20%D0%BF%D0%B0%D1%81%D0%BF%D0%BE%D1%80%D1%82%D0%B0%D0%BC.aspx: 404 Client Error: Not Found for url: https://www.kdmid.ru/docs.aspx?lst=country_wiki&amp;it=/%D0%A1%D0%BE%D0%B3%D0%BB%D0%B0%D1%88%D0%B5%D0%BD%D0%B8%D0%B5%20%D0%BC%D0%B5%D0%B6%D0%B4%D1%83%20%D0%9F%D1%80%D0%B0%D0%B2%D0%B

Processing URLs:  48%|████▊     | 476/1000 [16:35<14:48,  1.70s/it]

Error extracting text from http://www.dailymotion.com/video/x61lq_aretha-franklin-jumpin-jack-flash_news: 404 Client Error: Not Found for url: https://www.dailymotion.com/video/xyqjuo
Error extracting text from http://www.reuters.com/article/2015/11/11/us-southchinasea-usa-passage-idUSKCN0T02DQ20151111: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/11/us-southchinasea-usa-passage-idUSKCN0T02DQ20151111


Processing URLs:  48%|████▊     | 478/1000 [16:38<13:49,  1.59s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-communications-idUSKCN0XV0XF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-communications-idUSKCN0XV0XF


Processing URLs:  48%|████▊     | 479/1000 [16:39<12:35,  1.45s/it]

URL filtered: https://www.youtube.com/watch?v=EeB_tmdFAdc


Processing URLs:  48%|████▊     | 483/1000 [16:41<07:53,  1.09it/s]

Error extracting text from https://www.newsweek.com/venezuela-defend-start-dialogue-trump-biden-wins-1544542: 403 Client Error: Forbidden for url: https://www.newsweek.com/venezuela-defend-start-dialogue-trump-biden-wins-1544542


Processing URLs:  49%|████▊     | 487/1000 [16:46<09:39,  1.13s/it]

URL filtered: https://www.youtube.com/watch?v=mOD0XCm57d8


Processing URLs:  49%|████▉     | 493/1000 [16:52<08:53,  1.05s/it]

Error extracting text from http://aranews.net/2016/05/us-led-coalition-bombs-isis-financial-center-clashes-intensify-near-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/05/us-led-coalition-bombs-isis-financial-center-clashes-intensify-near-mosul/


Processing URLs:  50%|████▉     | 497/1000 [17:02<14:46,  1.76s/it]

Error extracting text from https://uk.finance.yahoo.com/news/u-default-date-estimate-congress-060001725.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAIeYAAqeImCovUw2m_GdFJL6JF8yRuoxpU-38u6Kt8AwMCkLMYtqzJptao-Pefr0FGuqmkG4-AVOfkyId0WphavybbaFJL7iaDHmGnj98F6HRfhi6GPuBBnjLBsQy50nwnqpHnYwu3q-CK7Vj0BIpPqrdy7t2Ygbn9cEWTwQQjP4: 404 Client Error: Not Found for url: https://uk.finance.yahoo.com/news/u-default-date-estimate-congress-060001725.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAAIeYAAqeImCovUw2m_GdFJL6JF8yRuoxpU-38u6Kt8AwMCkLMYtqzJptao-Pefr0FGuqmkG4-AVOfkyId0WphavybbaFJL7iaDHmGnj98F6HRfhi6GPuBBnjLBsQy50nwnqpHnYwu3q-CK7Vj0BIpPqrdy7t2Ygbn9cEWTwQQjP4
URL filtered: https://www.youtube.com/watch?v=mDYNuD4CwlI


Processing URLs:  50%|█████     | 504/1000 [17:31<20:07,  2.43s/it]

Error extracting text from http://blogs.wsj.com/cfo/2015/10/02/the-morning-ledger-janet-yellen-ally-sees-interest-rate-move-this-year/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/cfo/2015/10/02/the-morning-ledger-janet-yellen-ally-sees-interest-rate-move-this-year/
URL filtered: http://www.businessinsider.com/facebook-russia-election-2017-9
URL filtered: https://www.bloomberg.com/news/articles/2018-11-14/poor-succession-planning-causes-high-ceo-turnover-at-u-s-firms
Error extracting text from http://www.portman.senate.gov/public/index.cfm/press-releases?ID=D6E4A1CE-9C6B-4664-9758-7DEDB895142D: HTTPConnectionPool(host='www.portman.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/press-releases?ID=D6E4A1CE-9C6B-4664-9758-7DEDB895142D (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303887410>: Failed to resolve 'www.portman.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████     | 507/1000 [17:48<29:15,  3.56s/it]

Error extracting text from https://www.aa.com.tr/en/politics/russia-to-take-part-in-joint-military-drill-with-nato/2072488: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  51%|█████     | 510/1000 [17:50<18:01,  2.21s/it]

Error extracting text from http://www.politico.com/story/2015/10/export-import-bank-revive-republican-support-214339&gt: 404 Client Error: Not Found for url: https://www.politico.com/story/2015/10/export-import-bank-revive-republican-support-214339&gt


Processing URLs:  51%|█████▏    | 514/1000 [17:54<09:49,  1.21s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-awaits-nato-top-officials-with-high-hopes-10-09-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-awaits-nato-top-officials-with-high-hopes-10-09-2015


Processing URLs:  52%|█████▏    | 517/1000 [17:56<06:51,  1.17it/s]

Error extracting text from https://www.sec.gov/Archives/edgar/vprr/0804/08046908.pdf: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/vprr/0804/08046908.pdf
Error extracting text from https://www.reuters.com/article/us-russia-venezuela-debt-idUSKBN18Y007: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-venezuela-debt-idUSKBN18Y007


Processing URLs:  52%|█████▏    | 520/1000 [19:00<2:23:01, 17.88s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2015/09/15/white-house-opposes-gop-bill-to-lift-oil-export-ban: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  52%|█████▏    | 524/1000 [19:05<43:04,  5.43s/it]  

Error extracting text from https://www.heritage.org/asia/commentary/north-korea-getting-ready-test-new-icbm-and-more-nuclear-weapons;: 404 Client Error: Not Found for url: https://www.heritage.org/asia/commentary/north-korea-getting-ready-test-new-icbm-and-more-nuclear-weapons;


Processing URLs:  53%|█████▎    | 527/1000 [19:10<21:18,  2.70s/it]

Error extracting text from https://www.reuters.com/article/us-russia-usa-security-kremlin/russia-us-extend-new-start-arms-pact-kremlin-says-idUSKBN29V10N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-security-kremlin/russia-us-extend-new-start-arms-pact-kremlin-says-idUSKBN29V10N


Processing URLs:  53%|█████▎    | 530/1000 [19:11<10:33,  1.35s/it]

Error extracting text from http://www.lejdd.fr/International/Moyen-Orient/Daech-en-Syrie-56-des-Francais-sont-pour-la-guerre-terrestre-750715: 404 Client Error: Not Found for url: https://www.lejdd.fr/International/Moyen-Orient/Daech-en-Syrie-56-des-Francais-sont-pour-la-guerre-terrestre-750715
Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN1860F8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN1860F8


Processing URLs:  53%|█████▎    | 531/1000 [19:27<39:22,  5.04s/it]

Error extracting text from http://www.huanqiumil.com/a/54635.html: HTTPConnectionPool(host='www.huanqiumil.com', port=80): Max retries exceeded with url: /a/54635.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30175f410>: Failed to resolve 'www.huanqiumil.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  54%|█████▍    | 538/1000 [19:45<25:12,  3.27s/it]

Error extracting text from http://www.newsfultoncounty.com/politics/news/1029360-farc-open-to-extending-peace-talks-in-colombia: 403 Client Error: Forbidden for url: https://www.newsfultoncounty.com/politics/news/1029360-farc-open-to-extending-peace-talks-in-colombia


Processing URLs:  54%|█████▍    | 543/1000 [19:48<09:16,  1.22s/it]

Error extracting text from http://news.sciencemag.org/health/2015/09/polio-resurfaces-mali-and-ukraine: 403 Client Error: Forbidden for url: https://www.science.org/content/article/polio-resurfaces-mali-and-ukraine


Processing URLs:  55%|█████▍    | 548/1000 [19:55<11:05,  1.47s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/illinois-gop-governor-urges-unity-ahead-divisive-election-52751711: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/illinois-gop-governor-urges-unity-ahead-divisive-election-52751711


Processing URLs:  55%|█████▌    | 550/1000 [19:56<07:10,  1.05it/s]

Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=11141: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=11141


Processing URLs:  55%|█████▌    | 554/1000 [20:02<08:09,  1.10s/it]

Error extracting text from https://www.wsj.com/graphics/the-threat-from-north-koreas-missiles/nuclear-armed: 403 Client Error: Forbidden for url: https://www.wsj.com/graphics/the-threat-from-north-koreas-missiles/nuclear-armed


Processing URLs:  56%|█████▌    | 555/1000 [20:04<10:32,  1.42s/it]

Error extracting text from http://www.marinecorpstimes.com/story/military/2016/07/31/retaking-iraqs-isis-held-mosul-likely-prove-tricky-costly/87883640/: 404 Client Error: Not Found for url: https://www.marinecorpstimes.com/story/military/2016/07/31/retaking-iraqs-isis-held-mosul-likely-prove-tricky-costly/87883640/


Processing URLs:  56%|█████▌    | 557/1000 [20:05<06:46,  1.09it/s]

Error extracting text from http://www.reuters.com/article/2015/10/02/japan-economy-spending-idUSL3N12124I20151002: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/02/japan-economy-spending-idUSL3N12124I20151002


Processing URLs:  56%|█████▌    | 558/1000 [20:08<11:01,  1.50s/it]

Error extracting text from http://www.trtworld.com/asia/afghanistan-announces-new-spy-chief-101000: 404 Client Error: Not Found for url: https://www.trtworld.com:443/asia/afghanistan-announces-new-spy-chief-101000
URL filtered: https://www.cbsnews.com/news/twitter-trolls-russia-donald-trump-bad-news-ap-investigation/


Processing URLs:  56%|█████▌    | 560/1000 [20:09<08:14,  1.12s/it]

Error extracting text from http://press.ihs.com/press-release/aerospace-defense-security/islamic-state-caliphate-shrinks-16-percent-2016-ihs-markit-: 403 Client Error: Forbidden for url: https://investor.spglobal.com/news-releases/default.aspx


Processing URLs:  56%|█████▌    | 562/1000 [20:10<06:48,  1.07it/s]

Error extracting text from http://www.mb.com.ph/china-slams-g7-statement-on-maritime-disputes/: 403 Client Error: Forbidden for url: https://mb.com.ph/china-slams-g7-statement-on-maritime-disputes/


Processing URLs:  56%|█████▋    | 564/1000 [20:33<35:45,  4.92s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-21/opec-sees-moderate-oil-rebound-even-if-iran-won-t-join-freeze
Error extracting text from http://www.reuters.com/article/us-britain-eu-may-vote/uk-parliament-to-vote-on-brexit-deal-before-european-parliament-may-idUSKBN1CS23C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-may-vote/uk-parliament-to-vote-on-brexit-deal-before-european-parliament-may-idUSKBN1CS23C


Processing URLs:  57%|█████▋    | 567/1000 [20:33<16:57,  2.35s/it]

Error extracting text from https://www.russiamatters.org/analysis/strategic-response-russian-hacking-affair: 403 Client Error: Forbidden for url: https://www.russiamatters.org/analysis/strategic-response-russian-hacking-affair


Processing URLs:  57%|█████▋    | 571/1000 [20:35<08:09,  1.14s/it]

Error extracting text from https://www.nord-stream2.com/en/pdf/document/480/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /en/pdf/document/480/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30018f170>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nato.int/cps/en/natohq/topics_50115.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_50115.htm?selectedLocale=en
Error extracting text from https://scontent-mia1-1.xx.fbcdn.net/v/t1.0-9/13327522_10153736153045945_6084666742540737063_n.jpg?oh=68a259b48a1204535c231945ac836320&amp;oe=57D35509: HTTPSConnectionPool(host='scontent-mia1-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13327522_10153736153045945_6084666742540737063_n.jpg?oh=68a259b48a1204535c231945ac836320&amp;oe=57D35509 (Caused by NameResol

Processing URLs:  57%|█████▋    | 573/1000 [20:38<09:05,  1.28s/it]

Error extracting text from https://asia.nikkei.com/print/article/284940: 404 Client Error: Not Found for url: https://asia.nikkei.com/print/article/284940


Processing URLs:  58%|█████▊    | 576/1000 [20:39<05:08,  1.37it/s]

Error extracting text from http://www.wsj.com/articles/chinas-cicc-prices-hong-kong-ipo-at-top-of-range-1446290476: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinas-cicc-prices-hong-kong-ipo-at-top-of-range-1446290476
Error extracting text from http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?_r=0


Processing URLs:  58%|█████▊    | 584/1000 [20:55<13:10,  1.90s/it]

Error extracting text from http://investor.apple.com/results.cfm: 403 Client Error: Forbidden for url: http://investor.apple.com/results.cfm


Processing URLs:  59%|█████▉    | 588/1000 [21:01<10:36,  1.55s/it]

URL filtered: https://www.youtube.com/watch?v=xI_QAfkgyw4&amp;feature=youtu.be


Processing URLs:  60%|█████▉    | 595/1000 [21:10<09:14,  1.37s/it]

Error extracting text from http://www.iranwatch.org/our-publications/weapon-program-background-report/table-irans-ballistic-missile-arsenal: 403 Client Error: Forbidden for url: https://www.iranwatch.org/our-publications/weapon-program-background-report/table-irans-ballistic-missile-arsenal


Processing URLs:  60%|█████▉    | 598/1000 [21:24<22:37,  3.38s/it]

Error extracting text from http://www.dailysabah.com/energy/2016/04/28/rosatom-to-sell-49-pct-of-akkuyu-nuclear-plant-due-to-financial-issue: 404 Client Error: Not Found for url: https://www.dailysabah.com/energy/2016/04/28/rosatom-to-sell-49-pct-of-akkuyu-nuclear-plant-due-to-financial-issue


Processing URLs:  60%|██████    | 601/1000 [21:37<30:12,  4.54s/it]

URL filtered: https://www.youtube.com/watch?v=oWFIvqn-ZXo


Processing URLs:  60%|██████    | 603/1000 [21:37<16:32,  2.50s/it]

Error extracting text from https://www.nytimes.com/2017/03/14/world/europe/scotland-uk-independence-referendum.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/14/world/europe/scotland-uk-independence-referendum.html


Processing URLs:  60%|██████    | 605/1000 [21:38<09:55,  1.51s/it]

Error extracting text from http://seekingalpha.com/article/4047600-aramco-ipo-doubtful: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4047600-aramco-ipo-doubtful
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14C1RW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14C1RW


Processing URLs:  61%|██████    | 606/1000 [21:39<09:02,  1.38s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/peru-opens-vote-buying-probe-fujimori-37767239: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/peru-opens-vote-buying-probe-fujimori-37767239


Processing URLs:  61%|██████    | 609/1000 [21:41<05:44,  1.13it/s]

Error extracting text from http://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm21_2016_n_06_16_pm_komplett.html;jsessionid=3ACBA18D62CE124EC9781D69B342A780.live11291?nn=716864: 404 Client Error: Not Found for url: https://www.kba.de/DE/Presse/Pressemitteilungen/2016/Fahrzeugzulassungen/pm21_2016_n_06_16_pm_komplett.html;jsessionid=3ACBA18D62CE124EC9781D69B342A780.live11291?nn=716864
URL filtered: http://www.bloomberg.com/news/articles/2016-07-08/china-blasts-u-s-south-korea-missile-defense-deployment
Error extracting text from http://www.nytimes.com/2016/05/15/world/americas/nicolas-maduro-tightens-hold-on-venezuela-as-us-fears-further-tumult.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/15/world/americas/nicolas-maduro-tightens-hold-on-venezuela-as-us-fears-further-tumult.html?_r=0


Processing URLs:  61%|██████▏   | 614/1000 [21:43<03:06,  2.07it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKCN0ZY2PC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKCN0ZY2PC
Error extracting text from https://www.donaldjtrump.com/media/donald-trump-releases-immigration-reform-plan-designed-to-get-americans-bac: 403 Client Error: Forbidden for url: https://www.donaldjtrump.com/media/donald-trump-releases-immigration-reform-plan-designed-to-get-americans-bac


Processing URLs:  62%|██████▏   | 619/1000 [21:52<08:35,  1.35s/it]

Error extracting text from http://www.deepspace.ucsb.edu/wp-content/uploads/2015/04/A-Roadmap-to-Interstellar-Flight-15-h.pdf: 404 Client Error: Not Found for url: https://www.deepspace.ucsb.edu/wp-content/uploads/2015/04/A-Roadmap-to-Interstellar-Flight-15-h.pdf


Processing URLs:  62%|██████▏   | 621/1000 [21:56<10:26,  1.65s/it]

Error extracting text from http://www.futuresmag.com/2016/01/15/it%E2%80%99s-not-just-what-you-forecast-it%E2%80%99s-how-you-forecast: 404 Client Error: Not Found for url: https://www.futuresmag.com/2016/01/15/it%E2%80%99s-not-just-what-you-forecast-it%E2%80%99s-how-you-forecast


Processing URLs:  62%|██████▎   | 625/1000 [22:08<14:12,  2.27s/it]

Error extracting text from https://www.reuters.com/article/us-alphabet-uber-ruling/uber-lawyer-says-board-ex-ceo-knew-of-evidence-withheld-from-waymo-case-idUSKBN1DT2XT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-alphabet-uber-ruling/uber-lawyer-says-board-ex-ceo-knew-of-evidence-withheld-from-waymo-case-idUSKBN1DT2XT
Error extracting text from http://www.reuters.com/article/us-zimbabwe-drought-children-idUSKCN10103C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-zimbabwe-drought-children-idUSKCN10103C


Processing URLs:  63%|██████▎   | 629/1000 [22:17<16:35,  2.68s/it]

URL filtered: https://www.youtube.com/watch?v=itgu-1SgkII


Processing URLs:  64%|██████▎   | 635/1000 [22:27<10:51,  1.78s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/pentagon-leaders-killed-airstrike-iraqs-mosul-40297894: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/pentagon-leaders-killed-airstrike-iraqs-mosul-40297894


Processing URLs:  64%|██████▎   | 637/1000 [22:29<07:59,  1.32s/it]

Error extracting text from https://global.handelsblatt.com/edition/388/ressort/politics/article/germany-squares-off-with-imf-over-greece: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/edition/388/ressort/politics/article/germany-squares-off-with-imf-over-greece


Processing URLs:  64%|██████▍   | 639/1000 [22:36<14:11,  2.36s/it]

Error extracting text from https://reut.rs/3sa107k: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/haiti-elections-replace-slain-president-postponed-nov-7-media-2021-08-12/
URL filtered: https://www.youtube.com/watch?v=M2wWzTJ1zw8


Processing URLs:  64%|██████▍   | 642/1000 [22:37<06:46,  1.14s/it]

Error extracting text from http://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#r3XDWEOjpD6E8JE3.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#r3XDWEOjpD6E8JE3.97
Error extracting text from http://www.reuters.com/article/us-peru-election-kuczynski-idUSKCN0XW24M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-kuczynski-idUSKCN0XW24M


Processing URLs:  64%|██████▍   | 644/1000 [23:38<1:18:22, 13.21s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220795-rouhani-orders-iran’s-missile-program-accelerated: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220795-rouhani-orders-iran%E2%80%99s-missile-program-accelerated (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303774140>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:  65%|██████▍   | 647/1000 [23:40<35:20,  6.01s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN17622G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN17622G


Processing URLs:  65%|██████▍   | 648/1000 [23:41<26:48,  4.57s/it]

Error extracting text from http://www.amazon.com/Thriving-Chaos-Handbook-Management-Revolution/dp/0060971843: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Thriving-Chaos-Handbook-Management-Revolution/dp/0060971843


Processing URLs:  65%|██████▌   | 650/1000 [23:44<18:31,  3.18s/it]

Error extracting text from http://thenextweb.com/google/2015/04/01/roundup-all-of-googles-jokes-for-april-fools-day-2015/#gref: 403 Client Error: Forbidden for url: http://thenextweb.com/google/2015/04/01/roundup-all-of-googles-jokes-for-april-fools-day-2015/#gref


Processing URLs:  65%|██████▌   | 654/1000 [23:45<07:28,  1.30s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_In_detail.htm?No=117219: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_In_detail.htm?No=117219


Processing URLs:  66%|██████▌   | 655/1000 [23:47<07:36,  1.32s/it]

Error extracting text from http://europe.newsweek.com/should-we-trust-iran-ally-fight-against-isis-397448?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/should-we-trust-iran-ally-fight-against-isis-397448


Processing URLs:  66%|██████▌   | 659/1000 [23:52<07:51,  1.38s/it]

Error extracting text from https://www.reuters.com/places/myanmar: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/places/myanmar


Processing URLs:  66%|██████▌   | 662/1000 [23:57<07:55,  1.41s/it]

Error extracting text from http://syriadirect.org/news/activist-ypg-practices-%E2%80%98collective%E2%80%99-punishment-against-al-hasakah-arabs/: 404 Client Error: Not Found for url: http://syriadirect.org/news/activist-ypg-practices-%E2%80%98collective%E2%80%99-punishment-against-al-hasakah-arabs/


Processing URLs:  66%|██████▋   | 663/1000 [23:57<06:34,  1.17s/it]

Error extracting text from http://bigstory.ap.org/article/8525438a8ca94624934a8020f445ff41/faces-budget-crunch-killing-perks-and-slashing-salaries: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/8525438a8ca94624934a8020f445ff41/faces-budget-crunch-killing-perks-and-slashing-salaries (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303775d00>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  66%|██████▋   | 664/1000 [24:00<09:27,  1.69s/it]

Error extracting text from https://www.nass.usda.gov/Publications/Todays_Reports/reports/land0816.pdf: 404 Client Error: Not Found for url: https://www.nass.usda.gov/Publications/Todays_Reports/reports/land0816.pdf


Processing URLs:  67%|██████▋   | 666/1000 [24:01<05:52,  1.06s/it]

Error extracting text from http://blog.dilbert.com/2016/01/28/the-fake-because/: HTTPConnectionPool(host='blog.dilbert.com', port=80): Max retries exceeded with url: /2016/01/28/the-fake-because/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303774920>: Failed to resolve 'blog.dilbert.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 679/1000 [24:18<05:04,  1.05it/s]

Error extracting text from http://www.nytimes.com/2015/11/07/business/dealbook/china-stock-market-ipo.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/07/business/dealbook/china-stock-market-ipo.html
URL filtered: http://adage.com/article/digital/verizon-chases-digital-duopoly-facebook-google/305258/
Error extracting text from http://www.reuters.com/article/2015/11/26/saudi-aramco-indonesia-pertamina-idUSL3N13L42X20151126#uRy4UCM2tJgwEIxD.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/26/saudi-aramco-indonesia-pertamina-idUSL3N13L42X20151126#uRy4UCM2tJgwEIxD.97
Error extracting text from http://www.reuters.com/article/us-burundi-politics-idUSKCN0WI1KU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-politics-idUSKCN0WI1KU


Processing URLs:  68%|██████▊   | 681/1000 [24:18<03:22,  1.57it/s]

Error extracting text from https://www.nytimes.com/2017/07/08/us/politics/trump-russia-kushner-manafort.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/08/us/politics/trump-russia-kushner-manafort.html


Processing URLs:  68%|██████▊   | 682/1000 [24:21<05:30,  1.04s/it]

Error extracting text from http://www.ibtimes.com/europe-refugee-crisis-german-minister-threatens-eu-countries-legal-action-over-2233623: 403 Client Error: Forbidden for url: https://www.ibtimes.com/europe-refugee-crisis-german-minister-threatens-eu-countries-legal-action-over-2233623


Processing URLs:  68%|██████▊   | 684/1000 [24:22<04:46,  1.10it/s]

Error extracting text from https://www.justice.gov/sites/default/files/olc/opinions/attachments/2014/11/20/2014-11-19-auth-prioritize-removal.pdf: 404 Client Error: Not Found for url: https://www.justice.gov/sites/default/files/olc/opinions/attachments/2014/11/20/2014-11-19-auth-prioritize-removal.pdf
URL filtered: https://www.youtube.com/watch?v=0fkDVNkFr40


Processing URLs:  69%|██████▉   | 690/1000 [24:28<05:53,  1.14s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from https://www.berlin.de/ausstellungen/nachrichten/6466235-3041403-humboldt-forum-sucht-datum-fuer-echte-er.html: Document is empty


Processing URLs:  69%|██████▉   | 693/1000 [24:33<07:24,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-thailand-politics-idUSKBN15N19M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-politics-idUSKBN15N19M


Processing URLs:  70%|███████   | 700/1000 [24:46<07:03,  1.41s/it]

Error extracting text from https://www.nytimes.com/2021/06/17/opinion/joe-manchin-filibuster-voting-rights.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/17/opinion/joe-manchin-filibuster-voting-rights.html
Error extracting text from https://www.digitaltveurope.com/2021/02/24/cgtn-turns-to-france-following-uk-ban/: 403 Client Error: Forbidden for url: https://www.digitaltveurope.com/2021/02/24/cgtn-turns-to-france-following-uk-ban/


Processing URLs:  71%|███████   | 706/1000 [24:51<03:36,  1.36it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-usa-china-idUSKCN0Z02UN?feedType=RSS&amp;feedName=topNews&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed:%20reuters/topNews%20(News%20/%20US%20/%20Top%20News)&amp;utm_content=Netvibes: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-usa-china-idUSKCN0Z02UN?feedType=RSS&amp;feedName=topNews&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed:%20reuters/topNews%20(News%20/%20US%20/%20Top%20News)&amp;utm_content=Netvibes


Processing URLs:  71%|███████   | 707/1000 [24:52<03:08,  1.56it/s]

Error extracting text from https://www.consilium.europa.eu/en/policies/eu-uk-negotiations-on-the-future-relationship/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/policies/eu-uk-negotiations-on-the-future-relationship/


Processing URLs:  71%|███████   | 710/1000 [24:53<02:29,  1.94it/s]

Error extracting text from http://www.wsj.com/articles/paul-ryans-new-leadership-style-to-begin-with-highway-bill-1446571080: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/paul-ryans-new-leadership-style-to-begin-with-highway-bill-1446571080


Processing URLs:  71%|███████▏  | 713/1000 [24:56<03:11,  1.50it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-putin-idUSKBN15E0SF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-putin-idUSKBN15E0SF


Processing URLs:  71%|███████▏  | 714/1000 [24:58<05:02,  1.06s/it]

Error extracting text from http://bigstory.ap.org/article/19177c00c77441499d5ddbddaec5f609/nations-face-tough-question-who-are-syrias-terrorists: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/19177c00c77441499d5ddbddaec5f609/nations-face-tough-question-who-are-syrias-terrorists (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3021d9a30>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/criticalthreats/status/1360344363348480002


Processing URLs:  72%|███████▏  | 717/1000 [25:02<06:19,  1.34s/it]

Error extracting text from http://www.cfr.org/public-health-threats-and-pandemics/zika-virus/p37527: 404 Client Error: Not Found for url: https://www.cfr.org/public-health-threats-and-pandemics/zika-virus/p37527
Error extracting text from https://www.reuters.com/world/middle-east/oil-prices-extend-losses-investors-brace-more-supplies-2021-07-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/oil-prices-extend-losses-investors-brace-more-supplies-2021-07-15/


Processing URLs:  72%|███████▏  | 720/1000 [25:05<05:13,  1.12s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/


Processing URLs:  72%|███████▏  | 723/1000 [25:08<04:26,  1.04it/s]

Error extracting text from http://www.africanmedias.com/cote-divoire-reprise-du-travail-a-lhopital-de-bouake/: 403 Client Error: Forbidden for url: http://www.africanmedias.com/cote-divoire-reprise-du-travail-a-lhopital-de-bouake/


Processing URLs:  72%|███████▎  | 725/1000 [25:12<06:49,  1.49s/it]

Error extracting text from https://www.timesca.com/index.php/news/26-opinion-head/18059-the-taliban-s-spring-offensive-afghanistan-faces-a-crucial-year: 403 Client Error: Forbidden for url: https://www.timesca.com/index.php/news/26-opinion-head/18059-the-taliban-s-spring-offensive-afghanistan-faces-a-crucial-year


Processing URLs:  73%|███████▎  | 726/1000 [25:13<05:49,  1.27s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/06/25/0301000000AEN20160625000400315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  73%|███████▎  | 728/1000 [25:15<04:33,  1.01s/it]

Error extracting text from http://www.straitstimes.com/world/isis-jihadists-pull-out-of-several-iraq-towns-officers: 403 Client Error: Forbidden for url: https://www.straitstimes.com/world/isis-jihadists-pull-out-of-several-iraq-towns-officers


Processing URLs:  73%|███████▎  | 729/1000 [25:21<10:58,  2.43s/it]

Error extracting text from http://www.bunkerworld.com/news/Use-of-Citgo-in-PDVSA-loan-just-to-secure-financial-interests-Rosneft-147861: 403 Client Error: Forbidden for url: https://plattsconnect.spglobal.com:443/news/Use-of-Citgo-in-PDVSA-loan-just-to-secure-financial-interests-Rosneft-147861


Processing URLs:  73%|███████▎  | 731/1000 [25:23<07:42,  1.72s/it]

Error extracting text from http://finviz.com/forex_charts.ashx?t=EURGBP&amp;tf=d1: 403 Client Error: Forbidden for url: https://finviz.com/forex_charts.ashx?t=EURGBP&amp;tf=d1
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN13S088: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN13S088


Processing URLs:  74%|███████▎  | 736/1000 [25:30<07:27,  1.70s/it]

Error extracting text from https://us.spindices.com/indices/real-estate/sp-corelogic-case-shiller-20-city-composite-home-price-nsa-index: 404 Client Error: Not Found for url: https://www.spglobal.com/spdji/en/indices/real-estate/sp-corelogic-case-shiller-20-city-composite-home-price-nsa-index/
URL filtered: https://twitter.com/DaniellaMicaela/status/1504444695736463362
URL filtered: https://www.youtube.com/watch?v=r3f8-WNriSw
Error extracting text from http://www.reuters.com/article/us-europe-migrants-turkey-eu-idUSKCN10Q0JB?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-turkey-eu-idUSKCN10Q0JB?il=0


Processing URLs:  74%|███████▍  | 741/1000 [25:41<08:28,  1.96s/it]

Error extracting text from http://www.wsj.com/articles/a-hint-of-trouble-in-european-debt-1453155026: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-hint-of-trouble-in-european-debt-1453155026


Processing URLs:  75%|███████▍  | 746/1000 [25:46<04:31,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-esm-idUSKBN19S1ZT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-esm-idUSKBN19S1ZT


Processing URLs:  75%|███████▌  | 750/1000 [25:57<10:15,  2.46s/it]

Error extracting text from http://www.stltoday.com/news/national/govt-and-politics/fact-check-man-dissed-by-trump-has-put-felons-in/article_e7382323-8cc8-5316-940f-fd290ec8a630.html: 404 Client Error: Not Found for url: https://www.stltoday.com/news/nation-world/govt-and-politics/fact-check-man-dissed-by-trump-has-put-felons-in/article_e7382323-8cc8-5316-940f-fd290ec8a630.html
Error extracting text from http://www.oddschecker.com/politics/european-politics/french-election/next-president: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/european-politics/french-election/next-president


Processing URLs:  75%|███████▌  | 753/1000 [25:59<05:43,  1.39s/it]

Error extracting text from http://www.bbc.co.uk/weather/2635167: 404 Client Error: Not Found for url: https://www.bbc.co.uk/weather/2635167


Processing URLs:  75%|███████▌  | 754/1000 [26:02<07:03,  1.72s/it]

Error extracting text from http://www.ibtimes.com/aung-san-suu-kyi-aide-favored-myanmar-presidency-2331349: 403 Client Error: Forbidden for url: https://www.ibtimes.com/aung-san-suu-kyi-aide-favored-myanmar-presidency-2331349


Processing URLs:  76%|███████▌  | 757/1000 [26:07<07:34,  1.87s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-20/le-drian-hints-french-special-forces-are-fighting-islamic-state


Processing URLs:  76%|███████▌  | 762/1000 [26:12<04:12,  1.06s/it]

Error extracting text from http://www.asahi.com/ajw/articles/AJ201709210042.html: 404 Client Error: Not Found for url: https://www.asahi.com/ajw/articles/AJ201709210042.html


Processing URLs:  76%|███████▋  | 763/1000 [28:12<2:14:47, 34.12s/it]

Error extracting text from https://www.yang2020.com/policies/legalization-of-marijuana/: HTTPSConnectionPool(host='www.yang2020.com', port=443): Max retries exceeded with url: /policies/legalization-of-marijuana/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3015396a0>, 'Connection to www.yang2020.com timed out. (connect timeout=60)'))


Processing URLs:  77%|███████▋  | 767/1000 [28:18<37:56,  9.77s/it]  

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-usa-un-idUSKCN18F230: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-usa-un-idUSKCN18F230


Processing URLs:  77%|███████▋  | 771/1000 [28:22<11:05,  2.91s/it]

Error extracting text from https://www.yahoo.com/news/tutsi-general-killed-burundi-attack-091540989.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/tutsi-general-killed-burundi-attack-091540989.html
Error extracting text from http://www.reuters.com/article/us-poland-eu-rule-of-law-idUSKBN12E1P7?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-eu-rule-of-law-idUSKBN12E1P7?il=0


Processing URLs:  77%|███████▋  | 773/1000 [28:39<19:57,  5.27s/it]

Error extracting text from http://www.foxnews.com/politics/2015/09/10/russian-troops-reportedly-join-syria-fight-prop-up-assad/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/09/10/russian-troops-reportedly-join-syria-fight-prop-up-assad/


Processing URLs:  77%|███████▋  | 774/1000 [28:40<14:12,  3.77s/it]

Error extracting text from https://www.nytimes.com/2017/07/23/world/asia/taliban-seize-two-more-afghan-districts-in-sustained-fighting.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/23/world/asia/taliban-seize-two-more-afghan-districts-in-sustained-fighting.html


Processing URLs:  78%|███████▊  | 776/1000 [29:42<1:15:48, 20.31s/it]

Error extracting text from http://www.satsentinel.org/reports-and-imagery: HTTPConnectionPool(host='www.satsentinel.org', port=80): Max retries exceeded with url: /reports-and-imagery (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300083d40>, 'Connection to www.satsentinel.org timed out. (connect timeout=60)'))
URL filtered: http://mobile.reuters.com/article/idUSKCN1250H9?feedType=RSS&amp;feedName=worldNews&amp;utm_source=Twitter&amp;utm_medium=Social&amp;utm_campaign=Feed%253A+Reuters%252FworldNews+%2528Reuters+World+News%2529


Processing URLs:  78%|███████▊  | 778/1000 [29:43<41:44, 11.28s/it]  

URL filtered: https://www.cnbc.com/2020/12/14/ftc-orders-amazon-facebook-and-others-to-explain-how-they-use-personal-data.html


Processing URLs:  78%|███████▊  | 783/1000 [29:48<13:26,  3.72s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jovempan.uol.com.br/programas/jornal-da-manha/com-comissao-debate-sobre-impeachment-se-fortalecera-diz-lider-do-psdb.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jovempan.uol.com.br/programas/jornal-da-manha/com-comissao-debate-sobre-impeachment-se-fortalecera-diz-lider-do-psdb.html&amp;prev=search


Processing URLs:  79%|███████▊  | 786/1000 [29:50<06:10,  1.73s/it]

Error extracting text from https://www.nytimes.com/2017/09/01/world/africa/kenya-election-kenyatta-odinga.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/01/world/africa/kenya-election-kenyatta-odinga.html?_r=0


Processing URLs:  79%|███████▉  | 788/1000 [29:51<04:05,  1.16s/it]

Error extracting text from http://www.nationalinterest.org/feature/irans-false-choice-rebranding-hard-liners-‘moderates’-15397: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/irans-false-choice-rebranding-hard-liners-%E2%80%98moderates%E2%80%99-15397


Processing URLs:  79%|███████▉  | 789/1000 [29:53<04:43,  1.34s/it]

Error extracting text from https://intpolicydigest.org/2017/08/11/china-military-base-djibouti/: 403 Client Error: Forbidden for url: https://intpolicydigest.org/china-military-base-djibouti


Processing URLs:  79%|███████▉  | 793/1000 [29:55<02:43,  1.26it/s]

Error extracting text from https://www.reuters.com/article/us-hd-supply-holdgs-m-a-home-depot-idUKKBN27W1MO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hd-supply-holdgs-m-a-home-depot-idUKKBN27W1MO


Processing URLs:  80%|███████▉  | 795/1000 [30:00<04:23,  1.28s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/japans-maritime-self-defence-force-to-cover-more-of-south-china-sea-sources: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/east-asia/japans-maritime-self-defence-force-to-cover-more-of-south-china-sea-sources


Processing URLs:  80%|███████▉  | 796/1000 [30:00<03:17,  1.03it/s]

Error extracting text from http://mosaicmagazine.com/essay/2016/07/the-great-arab-implosion-and-its-consequences/: 403 Client Error: Forbidden for url: https://mosaicmagazine.com/essay/2016/07/the-great-arab-implosion-and-its-consequences/


Processing URLs:  80%|███████▉  | 798/1000 [30:02<03:02,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/10/31/world/middleeast/israel-rivlin-netanyahu-democracy.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/31/world/middleeast/israel-rivlin-netanyahu-democracy.html


Processing URLs:  80%|████████  | 804/1000 [30:25<13:00,  3.98s/it]

Error extracting text from http://www.parl.gc.ca/SenatorsBio/standings_senate.aspx?Language=E: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  81%|████████  | 806/1000 [30:26<07:22,  2.28s/it]

Error extracting text from https://www.realclearpolitics.com/video/2017/11/16/sarah_sanders_on_roy_moore_president_trump_is_not_a_voter_in_alabama.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/video/2017/11/16/sarah_sanders_on_roy_moore_president_trump_is_not_a_voter_in_alabama.html


Processing URLs:  81%|████████  | 807/1000 [30:27<05:59,  1.86s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53522#.VvMBGIEXbqA: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53522#.VvMBGIEXbqA


Processing URLs:  81%|████████  | 810/1000 [30:33<06:03,  1.91s/it]

URL filtered: https://www.stratfor.com/weekly/ruthless-and-sober-syria?utm_source=LinkedIn&amp;utm_medium=Official&amp;utm_campaign=Link


Processing URLs:  81%|████████  | 812/1000 [30:35<04:35,  1.46s/it]

Error extracting text from http://www.boerse-berlin.com/index.php/Bonds?isin=USP97475AF73: 403 Client Error: Forbidden for url: https://www.boerse-berlin.com/index.php/Bonds?isin=USP97475AF73


Processing URLs:  82%|████████▏ | 816/1000 [30:42<04:48,  1.57s/it]

Error extracting text from http://in.reuters.com/article/europe-migrants-turkey-idINKCN0Y215Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  82%|████████▏ | 818/1000 [30:45<04:20,  1.43s/it]

Error extracting text from http://www.wsj.com/articles/u-s-official-ratchets-down-expectations-on-retaking-mosul-1455042631: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-official-ratchets-down-expectations-on-retaking-mosul-1455042631


Processing URLs:  82%|████████▏ | 824/1000 [30:55<04:16,  1.45s/it]

Error extracting text from https://www.espn.com/mlb/story/_/id/33167564/nearly-two-months-mlb-lockout-to-worry-spring-training-opening-day: 403 Client Error: Forbidden for url: https://www.espn.com/mlb/story/_/id/33167564/nearly-two-months-mlb-lockout-to-worry-spring-training-opening-day


Processing URLs:  83%|████████▎ | 830/1000 [31:06<03:46,  1.33s/it]

Error extracting text from https://www.nytimes.com/2017/01/15/world/europe/kompromat-donald-trump-russia-democracy.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/15/world/europe/kompromat-donald-trump-russia-democracy.html?_r=0
Error extracting text from http://www.nytimes.com/2016/09/29/world/kristalina-georgieva-candidate-secretary-general.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/29/world/kristalina-georgieva-candidate-secretary-general.html?_r=0


Processing URLs:  84%|████████▍ | 838/1000 [31:26<07:22,  2.73s/it]

Error extracting text from http://www.foxnews.com/us/2015/10/30/budget-passes-rubio-surges-trump-momentum-stalls/: 404 Client Error: Not Found for url: https://www.foxnews.com/us/2015/10/30/budget-passes-rubio-surges-trump-momentum-stalls/


Processing URLs:  84%|████████▍ | 839/1000 [31:27<06:23,  2.38s/it]

Error extracting text from http://thehill.com/homenews/house/250237-gop-embassy-security-cuts-draw-democrats-scrutiny: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/250237-gop-embassy-security-cuts-draw-democrats-scrutiny/


Processing URLs:  84%|████████▍ | 844/1000 [31:44<07:06,  2.73s/it]

Error extracting text from http://heritageaction.com/ex-im/: 404 Client Error: Not Found for url: https://heritageaction.com/ex-im/


Processing URLs:  85%|████████▍ | 849/1000 [32:01<06:59,  2.78s/it]

Error extracting text from https://www.electionbettingodds.com/conv.html: 404 Client Error: Not Found for url: https://www.electionbettingodds.com/conv.html
Error extracting text from https://scontent-mia1-1.xx.fbcdn.net/v/t1.0-9/13335622_10154214850349591_9034166558947734733_n.jpg?oh=827cc5199b0103ed38f56c42af3c2222&amp;oe=57DABCFD: HTTPSConnectionPool(host='scontent-mia1-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13335622_10154214850349591_9034166558947734733_n.jpg?oh=827cc5199b0103ed38f56c42af3c2222&amp;oe=57DABCFD (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2feeb6360>: Failed to resolve 'scontent-mia1-1.xx.fbcdn.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  85%|████████▌ | 853/1000 [32:24<11:54,  4.86s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16217C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16217C


Processing URLs:  86%|████████▌ | 856/1000 [33:29<47:32, 19.81s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220835-tightened-security-measures-in-bahrain-concerning-iran: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220835-tightened-security-measures-in-bahrain-concerning-iran (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303203080>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:  86%|████████▌ | 860/1000 [33:34<13:26,  5.76s/it]

Error extracting text from http://carnegieeurope.eu/strategiceurope/?fa=62197: 403 Client Error: Forbidden for url: http://carnegieeurope.eu/strategiceurope/?fa=62197


Processing URLs:  86%|████████▌ | 861/1000 [33:35<10:00,  4.32s/it]

Error extracting text from https://www.newsweek.com/iran-lawmaker-admits-nuclear-expansion-leverage-joe-biden-jcpoa-talks-1577381: 403 Client Error: Forbidden for url: https://www.newsweek.com/iran-lawmaker-admits-nuclear-expansion-leverage-joe-biden-jcpoa-talks-1577381


Processing URLs:  86%|████████▋ | 864/1000 [33:39<05:01,  2.22s/it]

Error extracting text from http://www.theepochtimes.com/n3/2093009-is-chinas-propaganda-chief-headed-for-a-fall/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2093009-is-chinas-propaganda-chief-headed-for-a-fall/
Error extracting text from https://www.reuters.com/world/asia-pacific/un-fears-civilians-myanmar-after-army-build-up-2021-10-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/un-fears-civilians-myanmar-after-army-build-up-2021-10-08/


Processing URLs:  87%|████████▋ | 866/1000 [33:40<03:00,  1.35s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu/deadlock-as-brexit-trade-deal-faces-make-or-break-day-idUSKBN28M0ZY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/deadlock-as-brexit-trade-deal-faces-make-or-break-day-idUSKBN28M0ZY


Processing URLs:  87%|████████▋ | 868/1000 [33:45<04:13,  1.92s/it]

URL filtered: https://www.reuters.com/article/us-imf-worldbank-facebook/facebook-open-to-currency-pegged-stablecoins-for-libra-project-idUSKBN1WZ0NX


Processing URLs:  87%|████████▋ | 870/1000 [33:46<02:23,  1.11s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2016/03/16/a_dangerous_showdown_looming_with_china_129989.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2016/03/16/a_dangerous_showdown_looming_with_china_129989.html


Processing URLs:  87%|████████▋ | 873/1000 [33:53<04:35,  2.17s/it]

Error extracting text from http://www.afcea.org/content/?q=Article-cyber-ethics-vex-online-warfighters: 404 Client Error: Not Found for url: https://www.afcea.org/signal-media/q=Article-cyber-ethics-vex-online-warfighters


Processing URLs:  87%|████████▋ | 874/1000 [33:54<04:09,  1.98s/it]

Error extracting text from http://www.newsweek.com/repression-grows-china-xi-becoming-second-mao-396400: 403 Client Error: Forbidden for url: https://www.newsweek.com/repression-grows-china-xi-becoming-second-mao-396400


Processing URLs:  88%|████████▊ | 876/1000 [34:04<07:08,  3.46s/it]

Error extracting text from https://www.reuters.com/lifestyle/sports/us-lawmakers-urge-ioc-delay-or-move-chinas-2022-winter-olympics-2021-07-23/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/us-lawmakers-urge-ioc-delay-or-move-chinas-2022-winter-olympics-2021-07-23/
URL filtered: https://www.youtube.com/watch?v=GbYQQE5LZ2E


Processing URLs:  88%|████████▊ | 881/1000 [34:15<05:12,  2.63s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-indonesia-idUSKCN1240O9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-indonesia-idUSKCN1240O9


Processing URLs:  88%|████████▊ | 882/1000 [34:18<05:34,  2.84s/it]

Error extracting text from https://www.bna.com/atttime-warner-deal-n57982079076/: 403 Client Error: Forbidden for url: https://www.bloombergindustry.com/


Processing URLs:  89%|████████▉ | 889/1000 [34:40<04:02,  2.18s/it]

Error extracting text from https://www.congress.gov/bill/114th-congress/house-bill/757/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/house-bill/757/text
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://g1.globo.com/politica/operacao-lava-jato/noticia/2016/03/dilma-se-diz-indignada-com-tentativa-de-envolve-la-em-ato-de-mercadante.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://g1.globo.com/politica/operacao-lava-jato/noticia/2016/03/dilma-se-diz-indignada-com-tentativa-de-envolve-la-em-ato-de-mercadante.html&amp;prev=search


Processing URLs:  89%|████████▉ | 890/1000 [34:41<03:12,  1.75s/it]

Error extracting text from http://www.reuters.com/article/us-usa-utilities-cybersecurity-idUSKBN0UK2MM20160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-utilities-cybersecurity-idUSKBN0UK2MM20160106


Processing URLs:  89%|████████▉ | 894/1000 [34:45<02:09,  1.22s/it]

Error extracting text from http://nationalinterest.org/feature/iraqs-shia-militias-arent-bad-you-think-16291: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/iraqs-shia-militias-arent-bad-you-think-16291


Processing URLs:  90%|████████▉ | 896/1000 [34:49<02:26,  1.41s/it]

Error extracting text from http://theiowarepublican.com/2015/trumps-impact-on-caucus-turnout-could-be-yuuuuge/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/trumps-impact-on-caucus-turnout-could-be-yuuuuge/


Processing URLs:  90%|████████▉ | 897/1000 [34:49<01:50,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/09/25/technology/wooing-amazon-second-headquarters.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/25/technology/wooing-amazon-second-headquarters.html


Processing URLs:  90%|████████▉ | 898/1000 [34:50<01:35,  1.07it/s]

Error extracting text from http://www.scout.com/military/warrior/story/1679499-2-sm-6s-attack-missile-target-at-same-time: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  90%|█████████ | 901/1000 [35:04<05:14,  3.17s/it]

Error extracting text from http://seekingalpha.com/article/3978923-peruvian-runoff-sunday-keiko-leads: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3978923-peruvian-runoff-sunday-keiko-leads


Processing URLs:  90%|█████████ | 905/1000 [35:09<02:29,  1.57s/it]

Error extracting text from https://www.un.org/en/about-us/about-un-membership: 403 Client Error: Forbidden for url: https://www.un.org/en/about-us/about-un-membership
URL filtered: https://twitter.com/ariane5/status/1474201765977509892


Processing URLs:  91%|█████████ | 907/1000 [35:10<01:50,  1.19s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/south-china-sea-tensions/2399082.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/south-china-sea-tensions/2399082.html


Processing URLs:  91%|█████████▏| 913/1000 [36:18<27:05, 18.68s/it]

Error extracting text from http://aa.com.tr/en/middle-east/israeli-warplanes-strike-targets-in-syria/773658: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  92%|█████████▏| 918/1000 [36:26<05:47,  4.24s/it]

Error extracting text from https://science.sciencemag.org/content/372/6543/694.1: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.abj0016
Error extracting text from http://www.nytimes.com/2016/11/22/world/middleeast/iraq-civilians-flee-mosul.html?emc=edit_ae_20161122&amp;nl=todaysheadlines-asia&amp;nlid=77825025: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/22/world/middleeast/iraq-civilians-flee-mosul.html?emc=edit_ae_20161122&amp;nl=todaysheadlines-asia&amp;nlid=77825025


Processing URLs:  92%|█████████▏| 921/1000 [36:29<02:45,  2.10s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-jpmorgan-idUSKCN0Z327W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-jpmorgan-idUSKCN0Z327W


Processing URLs:  92%|█████████▏| 922/1000 [36:32<03:15,  2.50s/it]

Error extracting text from http://buenosairesherald.com/article/204072/as-venezuela-parliamentary-campaigns-enter-final-week-division-reflected-in-polls-: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/204072/as-venezuela-parliamentary-campaigns-enter-final-week-division-reflected-in-polls-


Processing URLs:  92%|█████████▏| 924/1000 [36:34<02:00,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-nato-expansion-idUSKCN0XJ1GM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-expansion-idUSKCN0XJ1GM
Error extracting text from https://www.rfi.fr/en/international/20210507-french-senate-s-taiwan-vote-triggers-beijing-s-anger-again: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/international/20210507-french-senate-s-taiwan-vote-triggers-beijing-s-anger-again


Processing URLs:  93%|█████████▎| 928/1000 [36:58<07:39,  6.38s/it]

Error extracting text from https://www.thebalance.com/how-the-dollar-impacts-commodity-prices-809294: 406 Client Error: Not Acceptable for url: https://www.thebalancemoney.com:443/how-the-dollar-impacts-commodity-prices-809294


Processing URLs:  93%|█████████▎| 929/1000 [36:59<05:39,  4.77s/it]

Error extracting text from https://navalnews.net/russian-black-sea-fleet-to-participate-in-aman-2021-naval-exercise-in-pakistan/: HTTPSConnectionPool(host='navalnews.net', port=443): Max retries exceeded with url: /russian-black-sea-fleet-to-participate-in-aman-2021-naval-exercise-in-pakistan/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'navalnews.net'. (_ssl.c:1000)")))


Processing URLs:  93%|█████████▎| 932/1000 [38:01<22:10, 19.57s/it]

Error extracting text from http://investcorrectly.com/20160111/growing-number-analysts-trim-estimates-apple-inc-nasdaqaapl/: HTTPConnectionPool(host='investcorrectly.com', port=80): Max retries exceeded with url: /20160111/growing-number-analysts-trim-estimates-apple-inc-nasdaqaapl/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffb58d70>, 'Connection to investcorrectly.com timed out. (connect timeout=60)'))


Processing URLs:  93%|█████████▎| 933/1000 [38:04<16:24, 14.70s/it]

Error extracting text from http://shippingtribune.com/panama-canal-expansion-96pc-complete-transit-trials-begin-in-april/: 404 Client Error: Not Found for url: https://shippingtribune.com/index.php/panama-canal-expansion-96pc-complete-transit-trials-begin-in-april/


Processing URLs:  94%|█████████▎| 935/1000 [38:06<08:18,  7.67s/it]

Error extracting text from https://www.nytimes.com/2017/09/12/us/politics/trump-tax-code-republicans-votes.html?emc=edit_th_20170913&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/12/us/politics/trump-tax-code-republicans-votes.html?emc=edit_th_20170913&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  94%|█████████▍| 941/1000 [38:15<02:21,  2.40s/it]

Error extracting text from https://tradingeconomics.com/commodity/steel: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/steel


Processing URLs:  94%|█████████▍| 943/1000 [38:17<01:36,  1.69s/it]

Error extracting text from http://www.reuters.com/article/us-germany-tesla-idUSKCN0ZO1ZZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-tesla-idUSKCN0ZO1ZZ?il=0


Processing URLs:  94%|█████████▍| 945/1000 [38:27<02:40,  2.92s/it]

Error extracting text from http://www.reuters.com/article/us-britain-election-scotland-idUSKBN17L16X?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-scotland-idUSKBN17L16X?mod=related&amp;channelName=worldNews


Processing URLs:  95%|█████████▌| 952/1000 [38:38<00:55,  1.15s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/07/11/world/middleeast/ap-ml-united-states-iraq.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/07/11/world/middleeast/ap-ml-united-states-iraq.html


Processing URLs:  95%|█████████▌| 953/1000 [38:39<00:49,  1.05s/it]

Error extracting text from http://oklo.org/2009/03/12/too-cheap-to-meter/: 406 Client Error: Not Acceptable for url: http://oklo.org/2009/03/12/too-cheap-to-meter/


Processing URLs:  95%|█████████▌| 954/1000 [38:41<00:58,  1.27s/it]

URL filtered: https://twitter.com/SpaceX/status/833327256239878145


Processing URLs:  96%|█████████▌| 957/1000 [38:44<00:44,  1.03s/it]

Error extracting text from http://www.iran-bn.com/2017/01/11/oil-production-approaching-4m-bpd: HTTPConnectionPool(host='www.iran-bn.com', port=80): Max retries exceeded with url: /2017/01/11/oil-production-approaching-4m-bpd (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3040eafc0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  96%|█████████▋| 965/1000 [39:00<01:16,  2.18s/it]

Error extracting text from http://www.cfr.org/proliferation/six-party-talks-north-koreas-nuclear-program/p13593: 404 Client Error: Not Found for url: https://www.cfr.org/proliferation/six-party-talks-north-koreas-nuclear-program/p13593


Processing URLs:  97%|█████████▋| 967/1000 [39:02<00:56,  1.71s/it]

Error extracting text from http://in.reuters.com/article/us-usa-cyber-idINKCN1061KT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  97%|█████████▋| 969/1000 [39:04<00:36,  1.19s/it]

Error extracting text from http://on.wsj.com/1J2HRJW: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/supreme-court-revives-challenge-to-north-carolina-redistricting-1429541094


Processing URLs:  97%|█████████▋| 972/1000 [39:07<00:29,  1.06s/it]

URL filtered: https://twitter.com/VeskoGarcevic
Error extracting text from https://www.oddschecker.com/football/champions-league/nationality-of-winner: 403 Client Error: Forbidden for url: https://www.oddschecker.com/football/champions-league/nationality-of-winner


Processing URLs:  98%|█████████▊| 979/1000 [39:16<00:19,  1.06it/s]

Error extracting text from http://www.gatesnotes.com/Health/XKCD-Marks-the-Spot: 403 Client Error: Forbidden for url: http://www.gatesnotes.com/Health/XKCD-Marks-the-Spot


Processing URLs:  98%|█████████▊| 983/1000 [39:20<00:19,  1.15s/it]

URL filtered: https://twitter.com/annamerlan/status/837804702255431680/photo/1


Processing URLs:  99%|█████████▉| 991/1000 [47:28<19:45, 131.69s/it]

Error extracting text from https://www.thespainreport.com/articles/831-160810135544-rivera-satisfied-rajoy-in-no-rush-after-meeting: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/831-160810135544-rivera-satisfied-rajoy-in-no-rush-after-meeting (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe911430>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs: 100%|█████████▉| 997/1000 [47:35<00:53, 17.72s/it] 

Error extracting text from http://www.nytimes.com/2016/05/18/business/japans-economy-posts-growth-for-first-quarter.html?emc=edit_th_20160518&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/18/business/japans-economy-posts-growth-for-first-quarter.html?emc=edit_th_20160518&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs: 100%|█████████▉| 998/1000 [47:37<00:25, 12.87s/it]

Error extracting text from http://speakingofjustice.libsyn.com/hillary-clintons-email-investigation-attorney-mark-zaid-sorts-the-facts: 404 Client Error: Not Found for url: https://speakingofjustice.libsyn.com/hillary-clintons-email-investigation-attorney-mark-zaid-sorts-the-facts


Processing URLs: 100%|█████████▉| 999/1000 [47:46<00:11, 11.96s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/israeli-knesset-begins-passing-pro-netanyahu-legislation/2017/11/28/245d83c0-d413-11e7-9ad9-ca0619edfa05_story.html?utm_term=.09d6e575f2d4: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/israeli-knesset-begins-passing-pro-netanyahu-legislation/2017/11/28/245d83c0-d413-11e7-9ad9-ca0619edfa05_story.html?utm_term=.09d6e575f2d4


Processing URLs: 100%|██████████| 1000/1000 [47:48<00:00,  2.87s/it]
Processing URLs:   0%|          | 1/1000 [00:00<02:32,  6.56it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-syria-idUSKBN1A42KC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-syria-idUSKBN1A42KC


Processing URLs:   0%|          | 3/1000 [00:01<06:25,  2.58it/s]

Error extracting text from http://www.reuters.com/article/us-usa-mideast-biden-idUSKCN0V62R9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-mideast-biden-idUSKCN0V62R9


Processing URLs:   0%|          | 4/1000 [00:01<05:36,  2.96it/s]

Error extracting text from http://www.wsj.com/articles/in-shift-u-s-invites-iran-to-join-syria-talks-1445991928: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-shift-u-s-invites-iran-to-join-syria-talks-1445991928


Processing URLs:   1%|          | 7/1000 [00:06<21:50,  1.32s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/352902-california-says-dhs-gave-bad-info-on-russian-targeting: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/352902-california-says-dhs-gave-bad-info-on-russian-targeting/


Processing URLs:   1%|          | 10/1000 [00:18<50:29,  3.06s/it]

URL filtered: https://twitter.com/popis2021/status/1443634094320070663
Error extracting text from http://www.reuters.com/article/us-nigeria-oil-delta-idUSKCN0Y41VI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-oil-delta-idUSKCN0Y41VI


Processing URLs:   2%|▏         | 15/1000 [00:22<22:35,  1.38s/it]

Error extracting text from http://www.theage.com.au/business/markets/feds-janet-yellen-sees-us-economy-ripe-for-interest-rate-hike-20151202-gle1fa.html: 404 Client Error: Not Found for url: https://www.theage.com.au/business/markets/feds-janet-yellen-sees-us-economy-ripe-for-interest-rate-hike-20151202-gle1fa.html


Processing URLs:   2%|▏         | 20/1000 [00:31<25:27,  1.56s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481591/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481591/


Processing URLs:   2%|▏         | 22/1000 [00:32<18:30,  1.14s/it]

Error extracting text from https://www.nytimes.com/2018/02/13/world/middleeast/netanyahu-israel-corruption.html?smid=tw-nytimes&amp;smtyp=cur: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/13/world/middleeast/netanyahu-israel-corruption.html?smid=tw-nytimes&amp;smtyp=cur


Processing URLs:   3%|▎         | 29/1000 [01:02<1:04:01,  3.96s/it]

URL filtered: https://www.youtube.com/watch?v=uI4fVgVVpiw&amp;index=4&amp;list=PLCFAF235C611CA84D


Processing URLs:   3%|▎         | 31/1000 [01:03<36:47,  2.28s/it]  

Error extracting text from https://www.weforum.org/agenda/2020/03/working-from-home-coronavirus-workers-future-of-work/: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2020/03/working-from-home-coronavirus-workers-future-of-work/


Processing URLs:   3%|▎         | 34/1000 [01:06<23:16,  1.45s/it]

Error extracting text from https://www.timesofisrael.com/trump-says-us-will-dump-iran-deal-if-watchdog-doesnt-bare-teeth/?utm_source=The+Times+of+Israel+Daily+Edition&amp;utm_campaign=d0957fffa7-EMAIL_CAMPAIGN_2017_09_18&amp;utm_medium=email&amp;utm_term=0_adb46cec92-d0957fffa7-54464953: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/trump-says-us-will-dump-iran-deal-if-watchdog-doesnt-bare-teeth/?utm_source=The+Times+of+Israel+Daily+Edition&amp;utm_campaign=d0957fffa7-EMAIL_CAMPAIGN_2017_09_18&amp;utm_medium=email&amp;utm_term=0_adb46cec92-d0957fffa7-54464953


Processing URLs:   4%|▎         | 36/1000 [01:07<15:10,  1.06it/s]

Error extracting text from https://www.genomeweb.com/mdx/genentech-partners-xenon-discover-develop-genetically-targeted-pain-drugs-compan: 403 Client Error: Forbidden for url: https://www.genomeweb.com/mdx/genentech-partners-xenon-discover-develop-genetically-targeted-pain-drugs-compan


Processing URLs:   4%|▍         | 40/1000 [01:13<20:26,  1.28s/it]

Error extracting text from http://aranews.net/2016/03/isis-removes-local-leaders-power-iraqi-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/isis-removes-local-leaders-power-iraqi-mosul/


Processing URLs:   5%|▍         | 46/1000 [01:41<1:25:11,  5.36s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-10/nobel-economist-thaler-says-he-s-nervous-about-stock-market


Processing URLs:   5%|▌         | 50/1000 [01:43<32:30,  2.05s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2017-09-05/as-world-watches-kim-china-quietly-builds-south-china-sea-clout
Error extracting text from http://www.nytimes.com/2015/09/10/world/middleeast/russia-syria-military-advisers.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/10/world/middleeast/russia-syria-military-advisers.html?_r=0


Processing URLs:   5%|▌         | 51/1000 [01:43<25:54,  1.64s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-radar-idUSKCN0VW092: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-radar-idUSKCN0VW092


Processing URLs:   5%|▌         | 53/1000 [01:45<21:44,  1.38s/it]

Error extracting text from http://www.powerlineblog.com/archives/2015/11/is-iran-already-violating-the-nuclear-deal.php: 403 Client Error: Forbidden for url: https://www.powerlineblog.com/archives/2015/11/is-iran-already-violating-the-nuclear-deal.php


Processing URLs:   6%|▌         | 56/1000 [01:48<17:27,  1.11s/it]

Error extracting text from http://finance.yahoo.com/q?s=hyg: 404 Client Error: Not Found for url: https://finance.yahoo.com/q?s=hyg


Processing URLs:   6%|▌         | 58/1000 [01:50<16:51,  1.07s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3353596/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3353596/


Processing URLs:   6%|▌         | 61/1000 [01:52<16:07,  1.03s/it]

Error extracting text from https://www.politicshome.com/news/uk/foreign-affairs/news/73031/alex-salmond-helps-broker-moves-end-death-penalty-iran: 404 Client Error: Page Not Found for url: https://www.politicshome.com/news/uk/foreign-affairs/news/73031/alex-salmond-helps-broker-moves-end-death-penalty-iran


Processing URLs:   6%|▌         | 62/1000 [01:53<14:16,  1.10it/s]

Error extracting text from https://www.espn.com/olympics/story/_/id/31482937/ioc-vp-says-tokyo-olympics-take-place-even-state-emergency-place-covid-19: 403 Client Error: Forbidden for url: https://www.espn.com/olympics/story/_/id/31482937/ioc-vp-says-tokyo-olympics-take-place-even-state-emergency-place-covid-19


Processing URLs:   6%|▋         | 63/1000 [01:55<17:08,  1.10s/it]

Error extracting text from https://www.carrickflynnfororegon.com/: HTTPSConnectionPool(host='www.carrickflynnfororegon.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x304fcd1c0>: Failed to resolve 'www.carrickflynnfororegon.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 67/1000 [02:01<23:41,  1.52s/it]

Error extracting text from https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/60c7cfb63fd8004686a2308a/1623707617072/REINZ+Monthly+Property+Report+-+May+2021.pdf: 403 Client Error: Forbidden for url: https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/60c7cfb63fd8004686a2308a/1623707617072/REINZ+Monthly+Property+Report+-+May+2021.pdf


Processing URLs:   7%|▋         | 70/1000 [02:08<32:43,  2.11s/it]

Error extracting text from http://elcomercio.pe/elecciones-2016-resultados-flash: 404 Client Error: Not Found for url: https://elcomercio.pe/elecciones-2016-resultados-flash/


Processing URLs:   7%|▋         | 71/1000 [02:08<23:56,  1.55s/it]

Error extracting text from https://www.nytimes.com/2017/03/14/us/politics/paul-ryan-health-care.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/14/us/politics/paul-ryan-health-care.html


Processing URLs:   8%|▊         | 78/1000 [02:24<25:13,  1.64s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/new-zealand-covid-19-cases-jump-21-origin-outbreak-identified-2021-08-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/new-zealand-covid-19-cases-jump-21-origin-outbreak-identified-2021-08-19/


Processing URLs:   8%|▊         | 83/1000 [02:28<10:49,  1.41it/s]

URL filtered: https://www.bloomberg.com/politics/articles/2017-04-02/go-slow-india-may-water-down-china-championed-asia-trade-pact
Error extracting text from http://www.latimes.com/world/middleeast/la-fg-israel-labor-party-20170704-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-israel-labor-party-20170704-story.html


Processing URLs:   8%|▊         | 84/1000 [02:29<11:06,  1.38it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-may-idUSKBN17W09N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-may-idUSKBN17W09N
URL filtered: https://www.youtube.com/watch?v=hIvRkjOd1f8
URL filtered: https://www.reuters.com/article/us-usa-trump-russia-alphabet/google-uncovered-russia-backed-ads-on-youtube-gmail-source-says-idUSKBN1CE192
URL filtered: http://www.bloomberg.com/news/articles/2015-09-30/rousseff-s-approval-at-record-low-in-brazil-as-economy-slips


Processing URLs:   9%|▉         | 90/1000 [02:32<10:06,  1.50it/s]

Error extracting text from http://www.reuters.com/article/us-eu-germany-merkel-idUSKBN16V05P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-germany-merkel-idUSKBN16V05P


Processing URLs:   9%|▉         | 93/1000 [02:36<14:17,  1.06it/s]

Error extracting text from http://www.fayobserver.com/opinion/myron_pitts/myron-b-pitts-government-shutdown-deal-only-delays-the-pain/article_5c76838b-c098-5030-8b89-ace516422eaa.html?mode=story: 404 Client Error: OK for url: https://www.fayobserver.com/opinion/myron_pitts/myron-b-pitts-government-shutdown-deal-only-delays-the-pain/article_5c76838b-c098-5030-8b89-ace516422eaa.html/?mode=story
URL filtered: http://www.bloomberg.com/quote/USDRUB:CUR


Processing URLs:  10%|▉         | 95/1000 [02:38<14:37,  1.03it/s]

Error extracting text from http://www.qatar.cmu.edu/iliano/papers/sebd93.pdf: 403 Client Error: Forbidden for url: http://www.qatar.cmu.edu/iliano/papers/sebd93.pdf


Processing URLs:  10%|▉         | 97/1000 [03:06<1:14:20,  4.94s/it]

Error extracting text from https://www.topspeed.com/cars/car-news/vw-bmw-and-daimler-also-gas-chambered-humans-in-diesel-emission-study-ar179569.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  10%|▉         | 98/1000 [03:08<1:02:06,  4.13s/it]

Error extracting text from http://www.caraotadigital.net/nacionales/las-106-muertes-violentas-en-117-dias-de-protestas-contra-regimen-de-maduro/: 404 Client Error: Not Found for url: http://www.caraotadigital.net/nacionales/las-106-muertes-violentas-en-117-dias-de-protestas-contra-regimen-de-maduro/


Processing URLs:  10%|█         | 104/1000 [04:13<4:42:12, 18.90s/it]

Error extracting text from https://www.seattletimes.com/nation-world/world/syria-based-breakaway-palestinian-faction-elects-new-leader/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  11%|█         | 106/1000 [04:19<2:41:46, 10.86s/it]

Error extracting text from http://www.ibtimes.com/france-talks-allies-form-military-action-plan-libya-2016-report-2237893: 403 Client Error: Forbidden for url: https://www.ibtimes.com/france-talks-allies-form-military-action-plan-libya-2016-report-2237893


Processing URLs:  11%|█         | 109/1000 [04:23<1:07:39,  4.56s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-poll-yougov-idUKKCN0YR103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  11%|█▏        | 113/1000 [04:29<33:00,  2.23s/it]  

Error extracting text from https://www.reuters.com/article/us-nigeria-security/suspected-boko-haram-militants-kill-eight-soldiers-one-civilian-in-nigerias-northeast-police-idUSKBN1CU2XL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security/suspected-boko-haram-militants-kill-eight-soldiers-one-civilian-in-nigerias-northeast-police-idUSKBN1CU2XL


Processing URLs:  12%|█▏        | 116/1000 [04:30<18:00,  1.22s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2018/02/04/54/0401000000AEN20180204005853315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  12%|█▏        | 120/1000 [04:34<13:55,  1.05it/s]

Error extracting text from https://www.airbnb.com/: 429 Client Error: Too Many Requests for url: https://www.airbnb.com/


Processing URLs:  12%|█▏        | 123/1000 [04:38<15:01,  1.03s/it]

Error extracting text from http://www.bls.gov/news.release/cpi.nr0.htm: 403 Client Error: Forbidden for url: http://www.bls.gov/news.release/cpi.nr0.htm
Error extracting text from https://i.guim.co.uk/img/media/c283d5762772c56fb4675612e5f9e8ef620b80ac/0_173_5178_3107/master/5178.jpg?width=1200&amp;height=900&amp;quality=85&amp;auto=format&amp;fit=crop&amp;s=bae40cab3db7b30a5f0c4763c9abb69d: 401 Client Error: Unauthorized - missing signature for url: https://i.guim.co.uk/img/media/c283d5762772c56fb4675612e5f9e8ef620b80ac/0_173_5178_3107/master/5178.jpg?width=1200&amp;height=900&amp;quality=85&amp;auto=format&amp;fit=crop&amp;s=bae40cab3db7b30a5f0c4763c9abb69d


Processing URLs:  13%|█▎        | 127/1000 [04:42<14:20,  1.01it/s]

Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=8157: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=8157


Processing URLs:  13%|█▎        | 130/1000 [05:44<4:17:41, 17.77s/it]

Error extracting text from http://www.usnews.com/news/articles/2015/11/05/us-china-locked-in-contest-to-earn-vietnams-favor: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  14%|█▎        | 136/1000 [05:53<50:03,  3.48s/it]  

Error extracting text from http://www.player.one/where-westworld-what-finale-revealed-about-parks-location-573358: 403 Client Error: Forbidden for url: https://www.player.one/where-westworld-what-finale-revealed-about-parks-location-573358
Error extracting text from https://www.madamasr.com/en/2017/12/11/news/u/egypt-left-empty-handed-after-putin-visit/: 403 Client Error: Forbidden for url: https://www.madamasr.com/en/2017/12/11/news/u/egypt-left-empty-handed-after-putin-visit/


Processing URLs:  14%|█▎        | 137/1000 [05:55<43:10,  3.00s/it]

Error extracting text from http://summit.africacncl.org/attendees: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  14%|█▍        | 139/1000 [05:59<34:55,  2.43s/it]

Error extracting text from http://www.brecorder.com/general-news/172/9229/: 404 Client Error: Not Found for url: https://www.brecorder.com/general-news/172/9229/


Processing URLs:  14%|█▍        | 143/1000 [06:04<18:24,  1.29s/it]

Error extracting text from http://thehill.com/homenews/administration/352565-mueller-may-begin-interviews-with-white-house-staff-later-this-week: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/352565-mueller-may-begin-interviews-with-white-house-staff-later-this-week/


Processing URLs:  15%|█▍        | 147/1000 [06:07<13:19,  1.07it/s]

Error extracting text from https://theconversation.com/we-finally-have-the-rulebook-for-the-paris-agreement-but-global-climate-action-is-still-inadequate-108918: 403 Client Error: Forbidden for url: https://theconversation.com/we-finally-have-the-rulebook-for-the-paris-agreement-but-global-climate-action-is-still-inadequate-108918


Processing URLs:  15%|█▍        | 149/1000 [06:10<19:13,  1.36s/it]

Error extracting text from http://www.ibtimes.com/irans-foreign-minister-denies-missiles-violate-un-resolution-says-they-are-self-2336539: 403 Client Error: Forbidden for url: https://www.ibtimes.com/irans-foreign-minister-denies-missiles-violate-un-resolution-says-they-are-self-2336539


Processing URLs:  15%|█▌        | 150/1000 [06:11<18:40,  1.32s/it]

Error extracting text from https://www.uaslawblog.com/2016/10/06/amazon-google-push-drone-delivery-services-take-flight/: HTTPSConnectionPool(host='www.uaslawblog.com', port=443): Max retries exceeded with url: /2016/10/06/amazon-google-push-drone-delivery-services-take-flight/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.uaslawblog.com'. (_ssl.c:1000)")))


Processing URLs:  15%|█▌        | 152/1000 [06:14<18:25,  1.30s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/russia-deploys-nuclear-capable-missiles-on-nato-doorstep/articleshow/54758730.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/russia-deploys-nuclear-capable-missiles-on-nato-doorstep/articleshow/54758730.cms


Processing URLs:  15%|█▌        | 154/1000 [06:17<20:04,  1.42s/it]

Error extracting text from https://reut.rs/2UTmNEf: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/mozambique-president-nyusi-says-army-gaining-ground-insurgency-hit-region-2021-07-25/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=es&amp;u=http://www.infotep.gov.do/pdf_prog_form/nomina_gral2013.pdf&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=es&amp;u=http://www.infotep.gov.do/pdf_prog_form/nomina_gral2013.pdf&amp;prev=search
URL filtered: https://twitter.com/washingtonpost/status/1497221123850883075


Processing URLs:  17%|█▋        | 168/1000 [06:59<1:37:30,  7.03s/it]

Error extracting text from http://www.paddypower.com/bet?action=go_event&category=SPECIALS&ev_class_id=45&ev_type_id=22711&ev_id=13023353&force_racing_css=&ev_desc=Where+will+Amazon+build+their+Second+Headquarters%3f&AFF_ID=8531: HTTPConnectionPool(host='www.paddypower.com', port=80): Max retries exceeded with url: /bet?action=go_event&category=SPECIALS&ev_class_id=45&ev_type_id=22711&ev_id=13023353&force_racing_css=&ev_desc=Where+will+Amazon+build+their+Second+Headquarters%3F&AFF_ID=8531 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3020e3aa0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  17%|█▋        | 173/1000 [07:05<28:23,  2.06s/it]  

Error extracting text from http://seekingalpha.com/article/3751226-why-the-fed-will-not-be-raising-rates: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3751226-why-the-fed-will-not-be-raising-rates


Processing URLs:  18%|█▊        | 175/1000 [07:06<18:15,  1.33s/it]

Error extracting text from http://www.latimes.com/world/middleeast/82156403-157.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/82156403-157.html


Processing URLs:  18%|█▊        | 181/1000 [07:16<19:55,  1.46s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-26/u-s-said-to-allow-navy-patrol-by-disputed-chinese-made-islands


Processing URLs:  19%|█▊        | 187/1000 [07:25<24:32,  1.81s/it]

Error extracting text from http://tass.ru/en/economy/870056: 404 Client Error: Not Found for url: https://tass.ru/en/economy/870056


Processing URLs:  19%|█▉        | 188/1000 [07:26<23:14,  1.72s/it]

Error extracting text from http://www.worldlifeexpectancy.com/country-health-profile/syria: 403 Client Error: Forbidden for url: https://www.worldlifeexpectancy.com/403.shtml


Processing URLs:  19%|█▉        | 193/1000 [07:34<16:54,  1.26s/it]

Error extracting text from https://jsis.washington.edu/news/cybersecurity-implications-chinese-undersea-cable-investment/: HTTPSConnectionPool(host='jsis.washington.edu', port=443): Max retries exceeded with url: /news/cybersecurity-implications-chinese-undersea-cable-investment/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=mk&amp;u=http://www.president.gov.mk/en.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=mk&amp;u=http://www.president.gov.mk/en.html&amp;prev=search


Processing URLs:  19%|█▉        | 194/1000 [07:35<17:53,  1.33s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-27/rousseff-s-approval-hovers-near-record-low-amid-impeachment-talk


Processing URLs:  20%|█▉        | 198/1000 [07:41<17:50,  1.33s/it]

Error extracting text from http://www.odessatalk.com/2016/07/ukraine-and-the-imf-where-are-we-at/: 404 Client Error: Not Found for url: https://www.odessatalk.com/2016/07/ukraine-and-the-imf-where-are-we-at/


Processing URLs:  20%|██        | 200/1000 [07:51<37:44,  2.83s/it]

Error extracting text from https://thehill.com/homenews/senate/576039-in-blistering-letter-to-biden-mcconnell-vows-gop-wont-help-raise-debt-ceiling-in: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/576039-in-blistering-letter-to-biden-mcconnell-vows-gop-wont-help-raise-debt-ceiling-in/


Processing URLs:  20%|██        | 202/1000 [07:52<21:17,  1.60s/it]

Error extracting text from http://aranews.net/2016/03/isis-governor-mosul-killed-coalition-airstrike/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/isis-governor-mosul-killed-coalition-airstrike/
Error extracting text from http://www.nytimes.com/2013/12/02/business/media/long-on-cutting-edge-of-print-new-york-magazine-cuts-back.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2013/12/02/business/media/long-on-cutting-edge-of-print-new-york-magazine-cuts-back.html


Processing URLs:  20%|██        | 203/1000 [07:52<15:35,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-gdp-idUSKCN1140WT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-gdp-idUSKCN1140WT


Processing URLs:  20%|██        | 205/1000 [07:56<21:11,  1.60s/it]

Error extracting text from http://www.ibtimes.co.uk/china-mobilise-hundreds-missiles-disputed-south-china-sea-islands-coming-months-us-officials-1598043: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/china-mobilise-hundreds-missiles-disputed-south-china-sea-islands-coming-months-us-officials-1598043


Processing URLs:  21%|██        | 206/1000 [07:59<23:17,  1.76s/it]

Error extracting text from http://www.spacex.com/news/2016/06/15/eutelsatabs-mission-photos: 404 Client Error: Not Found for url: https://www.spacex.com/news/2016/06/15/eutelsatabs-mission-photos


Processing URLs:  21%|██        | 209/1000 [08:03<19:58,  1.52s/it]

Error extracting text from http://www.iol.co.za/mercury/people-want-more-than-just-things-from-government-2055473: 403 Client Error: Forbidden for url: http://www.iol.co.za/mercury/people-want-more-than-just-things-from-government-2055473


Processing URLs:  21%|██        | 211/1000 [08:05<15:15,  1.16s/it]

Error extracting text from http://www.timesofisrael.com/second-round-of-iran-elections-set-for-april-29/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/second-round-of-iran-elections-set-for-april-29/


Processing URLs:  21%|██▏       | 213/1000 [09:06<3:12:18, 14.66s/it]

Error extracting text from http://aa.com.tr/en/world/iraq-expects-new-flow-of-refugees-ahead-of-mosul-assault/534204: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  22%|██▎       | 225/1000 [09:30<15:50,  1.23s/it]  

Error extracting text from https://www.reuters.com/article/us-intel-cyber-vulnerability/u-s-government-warns-businesses-about-cyber-bug-in-intel-chips-idUSKBN1DM01R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-intel-cyber-vulnerability/u-s-government-warns-businesses-about-cyber-bug-in-intel-chips-idUSKBN1DM01R


Processing URLs:  23%|██▎       | 226/1000 [09:31<14:34,  1.13s/it]

URL filtered: http://www.bloombergview.com/articles/2015-09-10/russia-s-syrian-air-base-has-u-s-scrambling-for-a-plan


Processing URLs:  23%|██▎       | 231/1000 [09:47<40:30,  3.16s/it]

Error extracting text from https://reut.rs/34PyEFf: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  23%|██▎       | 232/1000 [09:49<34:58,  2.73s/it]

Error extracting text from http://www.newsweek.com/2016/04/15/north-korea-nuclear-deal-kim-jong-un-barack-obama-south-korea-china-united-442503.html: 403 Client Error: Forbidden for url: https://www.newsweek.com/2016/04/15/north-korea-nuclear-deal-kim-jong-un-barack-obama-south-korea-china-united-442503.html


Processing URLs:  23%|██▎       | 234/1000 [10:51<4:04:57, 19.19s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article160803619.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  24%|██▎       | 235/1000 [10:52<2:56:20, 13.83s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Obama-urged-Abe-to-refrain-from-visiting-Russia-in-May: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Obama-urged-Abe-to-refrain-from-visiting-Russia-in-May


Processing URLs:  24%|██▎       | 237/1000 [10:54<1:34:35,  7.44s/it]

Error extracting text from http://www.wsj.com/articles/iranians-mostly-support-nuclear-deal-with-west-poll-says-1441825240: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iranians-mostly-support-nuclear-deal-with-west-poll-says-1441825240


Processing URLs:  24%|██▍       | 242/1000 [11:03<32:18,  2.56s/it]  

Error extracting text from http://www.hybridcars.com/toyota-mirai-fcv-declared-2016-world-green-car/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/toyota-mirai-fcv-declared-2016-world-green-car/


Processing URLs:  25%|██▍       | 247/1000 [11:09<15:17,  1.22s/it]

Error extracting text from https://www.nytimes.com/2013/08/08/world/europe/obama-cancels-visit-to-putin-as-snowden-adds-to-tensions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2013/08/08/world/europe/obama-cancels-visit-to-putin-as-snowden-adds-to-tensions.html


Processing URLs:  25%|██▍       | 248/1000 [11:10<16:33,  1.32s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-20/china-restarts-ipo-process-for-10-companies-as-stocks-stabilize


Processing URLs:  25%|██▌       | 252/1000 [11:14<14:37,  1.17s/it]

Error extracting text from http://www.ibtimes.com/japan-slips-recession-q3-gdp-contracts-08-percent-2185568: 403 Client Error: Forbidden for url: https://www.ibtimes.com/japan-slips-recession-q3-gdp-contracts-08-percent-2185568


Processing URLs:  26%|██▌       | 256/1000 [11:21<19:00,  1.53s/it]

Error extracting text from http://www.securitycouncilreport.org/monthly-forecast/2017-10/famine.php: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/monthly-forecast/2017-10/famine.php


Processing URLs:  26%|██▌       | 259/1000 [11:23<11:46,  1.05it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKBN12Z2KU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKBN12Z2KU


Processing URLs:  26%|██▌       | 260/1000 [11:24<10:17,  1.20it/s]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/12/11/94/0401000000AEN20151211003800315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  26%|██▌       | 261/1000 [11:25<10:38,  1.16it/s]

Error extracting text from https://www.zawya.com/story/US_forces_capture_IS_operative_in_Iraq-TR20160302nL8N16A1BLX2/: 404 Client Error: Not Found for url: https://www.zawya.com/story/US_forces_capture_IS_operative_in_Iraq-TR20160302nL8N16A1BLX2


Processing URLs:  26%|██▋       | 264/1000 [11:38<36:06,  2.94s/it]

Error extracting text from http://www.timesofisrael.com/trump-courts-jewish-republicans-with-offensive-stereotypes/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/trump-courts-jewish-republicans-with-offensive-stereotypes/


Processing URLs:  27%|██▋       | 268/1000 [11:40<16:03,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-assad-idUSKBN0U032R20151217: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-assad-idUSKBN0U032R20151217
Error extracting text from https://www.reuters.com/lifestyle/sports/french-president-macron-attend-tokyo-olympics-minister-2021-05-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/french-president-macron-attend-tokyo-olympics-minister-2021-05-21/


Processing URLs:  28%|██▊       | 275/1000 [11:48<14:23,  1.19s/it]

Error extracting text from https://www.irinnews.org/opinion/2016/11/25/genocidal-logic-south-sudan%E2%80%99s-%E2%80%9Cgun-class%E2%80%9D: 502 Server Error: Bad Gateway for url: https://www.irinnews.org/opinion/2016/11/25/genocidal-logic-south-sudan%E2%80%99s-%E2%80%9Cgun-class%E2%80%9D


Processing URLs:  28%|██▊       | 276/1000 [11:50<17:37,  1.46s/it]

Error extracting text from http://www.bankofengland.co.uk/publications/Documents/inflationreport/2016/conf120516.pdf: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/publications/Documents/inflationreport/2016/conf120516.pdf


Processing URLs:  28%|██▊       | 277/1000 [11:53<23:30,  1.95s/it]

Error extracting text from http://www.tribuneindia.com/news/jammu-kashmir/over-200-families-in-kishtwar-to-be-hit-by-kiru-kwar-hydel-projects/481311.html: 403 Client Error: Forbidden for url: http://www.tribuneindia.com/news/jammu-kashmir/over-200-families-in-kishtwar-to-be-hit-by-kiru-kwar-hydel-projects/481311.html


Processing URLs:  28%|██▊       | 281/1000 [12:09<28:54,  2.41s/it]

Error extracting text from http://www.oddschecker.com/politics/us-politics/us-democrat-primaries/new-hampshire-primary: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/us-politics/us-democrat-primaries/new-hampshire-primary


Processing URLs:  28%|██▊       | 283/1000 [12:10<19:05,  1.60s/it]

URL filtered: http://www.businessinsider.com/ex-facebook-president-sean-parker-social-network-human-vulnerability-2017-11


Processing URLs:  29%|██▊       | 286/1000 [12:15<20:22,  1.71s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-13/sturgeon-to-seek-right-to-hold-second-scottish-independence-vote


Processing URLs:  29%|██▉       | 288/1000 [12:15<13:18,  1.12s/it]

Error extracting text from https://nationalinterest.org/blog/the-buzz/american-mobile-nuclear-missile-launchers-really-bad-idea-22579: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/american-mobile-nuclear-missile-launchers-really-bad-idea-22579


Processing URLs:  29%|██▉       | 290/1000 [12:19<14:44,  1.25s/it]

Error extracting text from https://www.un.org/press/en/2016/sc12653.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2016/sc12653.doc.htm


Processing URLs:  29%|██▉       | 292/1000 [12:26<25:24,  2.15s/it]

Error extracting text from https://www.38north.org/2017/07/jschilling071017/).: 403 Client Error: Forbidden for url: https://www.38north.org/2017/07/jschilling071017/


Processing URLs:  30%|██▉       | 296/1000 [12:29<13:27,  1.15s/it]

Error extracting text from http://www.isc-connect.org/uav-forestry-challenge: 404 Client Error: Not Found for url: http://www.isc-connect.org/uav-forestry-challenge


Processing URLs:  30%|██▉       | 298/1000 [12:33<17:50,  1.53s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/09/21/0401000000AEN20150921005700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  30%|███       | 303/1000 [12:45<29:27,  2.54s/it]

Error extracting text from https://www.reuters.com/world/europe/estonia-sends-javelin-anti-tank-weapons-ukraine-2022-02-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/estonia-sends-javelin-anti-tank-weapons-ukraine-2022-02-18/


Processing URLs:  31%|███       | 306/1000 [12:47<15:38,  1.35s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160513/1605191152.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160513/1605191152.html


Processing URLs:  31%|███       | 307/1000 [12:48<14:53,  1.29s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-08/u-k-and-eu-strike-brexit-deal-opening-path-to-trade-talks


Processing URLs:  32%|███▏      | 315/1000 [13:00<20:06,  1.76s/it]

Error extracting text from http://www.reuters.com/article/us-socialmedia-eu-consumersconsumers-idUSKBN16N2YI?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-socialmedia-eu-consumersconsumers-idUSKBN16N2YI?il=0
URL filtered: https://www.youtube.com/watch?v=HMQkV5cTuoY


Processing URLs:  32%|███▏      | 319/1000 [13:02<09:29,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-idUSKCN0YL1SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-idUSKCN0YL1SB
URL filtered: https://www.youtube.com/watch?v=IP7nW_hKB7I


Processing URLs:  33%|███▎      | 326/1000 [14:17<3:32:31, 18.92s/it]

Error extracting text from http://www.spa.gov.sa/viewstory.php?lang=en&amp;newsid=1704324: HTTPSConnectionPool(host='oportal.spa.gov.sa', port=443): Max retries exceeded with url: /?lang=en&amp;newsid=1704324 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ff2e54c0>, 'Connection to oportal.spa.gov.sa timed out. (connect timeout=60)'))
Error extracting text from http://dariobusinessanalytics.com/httpwww-thepworld-compeventsevent129global-hr-trends-summit-iran/: HTTPConnectionPool(host='dariobusinessanalytics.com', port=80): Max retries exceeded with url: /httpwww-thepworld-compeventsevent129global-hr-trends-summit-iran/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff2e5820>: Failed to resolve 'dariobusinessanalytics.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  33%|███▎      | 330/1000 [14:27<1:29:30,  8.02s/it]

Error extracting text from https://www.kkc-curacao.com/bb-pdvsa-offers-citgo-backing-to-sweeten-7-billion-bond-swap/: HTTPSConnectionPool(host='www.kkc-curacao.com', port=443): Max retries exceeded with url: /bb-pdvsa-offers-citgo-backing-to-sweeten-7-billion-bond-swap/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff2e6c90>: Failed to resolve 'www.kkc-curacao.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  33%|███▎      | 331/1000 [14:28<1:07:09,  6.02s/it]

Error extracting text from https://www.reuters.com/lifestyle/sports/fretting-about-covid-most-japan-firms-say-olympics-should-be-cancelled-or-2021-05-20/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/fretting-about-covid-most-japan-firms-say-olympics-should-be-cancelled-or-2021-05-20/


Processing URLs:  33%|███▎      | 334/1000 [14:30<32:43,  2.95s/it]  

Error extracting text from http://newsroom.aaa.com/tag/gas-prices/: 403 Client Error: Forbidden for url: http://newsroom.aaa.com/tag/gas-prices/


Processing URLs:  34%|███▎      | 337/1000 [14:42<32:16,  2.92s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN1601BD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN1601BD


Processing URLs:  34%|███▍      | 341/1000 [15:08<1:04:33,  5.88s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0ZY0FJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0ZY0FJ


Processing URLs:  34%|███▍      | 344/1000 [15:19<46:57,  4.29s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16N1IK?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16N1IK?il=0
URL filtered: https://www.youtube.com/watch?v=o-D4I7XdTEM


Processing URLs:  35%|███▍      | 347/1000 [15:20<20:59,  1.93s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missile-un-idUSKBN1A50AA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-un-idUSKBN1A50AA
Error extracting text from https://www.reuters.com/world/americas/canadas-trudeau-trailing-polls-goes-attack-two-weeks-before-vote-2021-09-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/canadas-trudeau-trailing-polls-goes-attack-two-weeks-before-vote-2021-09-06/


Processing URLs:  36%|███▌      | 357/1000 [15:32<12:23,  1.16s/it]

Error extracting text from http://theconversation.com/iraq-what-happened-to-the-oil-after-the-war-62188: 403 Client Error: Forbidden for url: http://theconversation.com/iraq-what-happened-to-the-oil-after-the-war-62188


Processing URLs:  36%|███▌      | 359/1000 [15:36<15:54,  1.49s/it]

URL filtered: https://www.linkedin.com/jobs/search?keywords=Nextev&amp;locationId=us:0&amp;f_C=10100864&amp;start=0&amp;count=25&amp;trk=jobs_jserp_pagination_1


Processing URLs:  36%|███▌      | 361/1000 [15:37<10:04,  1.06it/s]

Error extracting text from https://m.rusemb.org.uk/article/why-russia-was-forced-to-suspend-pmda-by-ambassador-yakovenko-for-rt: 500 Server Error: Internal Server Error for url: https://m.rusemb.org.uk/article/why-russia-was-forced-to-suspend-pmda-by-ambassador-yakovenko-for-rt


Processing URLs:  36%|███▌      | 362/1000 [15:38<12:26,  1.17s/it]

URL filtered: https://www.youtube.com/watch?v=RfuHtZZF2NY


Processing URLs:  37%|███▋      | 367/1000 [15:40<05:22,  1.96it/s]

Error extracting text from http://www.nytimes.com/2016/01/06/world/asia/north-korea-hydrogen-bomb-test.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/06/world/asia/north-korea-hydrogen-bomb-test.html?_r=0
Error extracting text from http://www.reuters.com/article/us-venezuela-economy-idUSKCN0UT2ER: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-idUSKCN0UT2ER
Error extracting text from http://www.reuters.com/article/2015/12/03/us-opec-meeting-idUSKBN0TL0LY20151203#z0M9WVwHvXBmquI6.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/03/us-opec-meeting-idUSKBN0TL0LY20151203#z0M9WVwHvXBmquI6.97


Processing URLs:  37%|███▋      | 369/1000 [15:40<03:37,  2.90it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/apr/15/iranian-quds-general-back-moscow-consultations/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/apr/15/iranian-quds-general-back-moscow-consultations/


Processing URLs:  37%|███▋      | 371/1000 [15:43<06:41,  1.57it/s]

Error extracting text from http://americablog.com/2015/12/new-des-moines-register-poll-shows-why-polls-primary-season-weak.html: 403 Client Error: Forbidden for url: https://americablog.com/2015/12/new-des-moines-register-poll-shows-why-polls-primary-season-weak.html


Processing URLs:  37%|███▋      | 374/1000 [15:49<17:08,  1.64s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/may-2016-sales-chart.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/may-2016-sales-chart/


Processing URLs:  38%|███▊      | 378/1000 [15:53<09:16,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16T1IW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16T1IW
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15P19B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15P19B


Processing URLs:  38%|███▊      | 383/1000 [16:16<36:05,  3.51s/it]

Error extracting text from http://atimes.com/2016/04/south-korea-election-reverses-weaken-parks-reform-drive/: 404 Client Error: Not Found for url: https://atimes.com/2016/04/south-korea-election-reverses-weaken-parks-reform-drive/


Processing URLs:  38%|███▊      | 384/1000 [16:16<26:44,  2.60s/it]

Error extracting text from http://www.cdm.me/english: 403 Client Error: Forbidden for url: https://www.cdm.me/english


Processing URLs:  39%|███▊      | 386/1000 [16:18<18:13,  1.78s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/282805-clinton-absolutely-zero-chance-fbi-investigation-into: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/282805-clinton-absolutely-zero-chance-fbi-investigation-into/


Processing URLs:  39%|███▊      | 387/1000 [16:19<17:20,  1.70s/it]

Error extracting text from http://ncr-iran.org/en/news/human-rights/20191-prisoners-say-eu-officials-visits-to-iran-encourage-more-executions: 403 Client Error: Forbidden for url: https://ncr-iran.org/en/news/human-rights/20191-prisoners-say-eu-officials-visits-to-iran-encourage-more-executions
Error extracting text from http://jakartaglobe.beritasatu.com/international/iran-unveils-second-underground-missile-likely-irk-us/: HTTPConnectionPool(host='jakartaglobe.beritasatu.com', port=80): Max retries exceeded with url: /international/iran-unveils-second-underground-missile-likely-irk-us/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3042e78f0>: Failed to resolve 'jakartaglobe.beritasatu.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  39%|███▉      | 391/1000 [16:21<07:29,  1.35it/s]

Error extracting text from http://www.nytimes.com/2016/10/02/upshot/debate-night-message-the-markets-are-afraid-of-donald-trump.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/02/upshot/debate-night-message-the-markets-are-afraid-of-donald-trump.html?_r=0


Processing URLs:  39%|███▉      | 394/1000 [16:25<11:29,  1.14s/it]

Error extracting text from http://www.who.int/influenza/human_animal_interface/H5N1_cumulative_table_archives/en/: 404 Client Error: Not Found for url: https://www.who.int/influenza/human_animal_interface/H5N1_cumulative_table_archives/en/


Processing URLs:  40%|███▉      | 395/1000 [16:56<1:38:06,  9.73s/it]

Error extracting text from http://www.wantchinatimes.com/news-subclass-cnt.aspx?cid=1101&amp;MainCatID=11&amp;id=20150916000073: 522 Server Error:  for url: http://www.wantchinatimes.com/news-subclass-cnt.aspx?cid=1101&amp;MainCatID=11&amp;id=20150916000073


Processing URLs:  40%|███▉      | 397/1000 [16:57<51:59,  5.17s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2016-08-01/congo-opposition-raise-stakes-as-rally-urges-kabila-to-step-down-irbpa0sw


Processing URLs:  40%|████      | 400/1000 [16:59<25:26,  2.54s/it]

Error extracting text from http://www.mysanantonio.com/opinion/editorials/article/Texas-needs-to-be-prepared-for-more-election-hack-12372513.php: 403 Client Error: Forbidden for url: https://www.mysanantonio.com/opinion/editorials/article/Texas-needs-to-be-prepared-for-more-election-hack-12372513.php


Processing URLs:  40%|████      | 401/1000 [17:01<22:53,  2.29s/it]

Error extracting text from https://bit.ly/33yFo9c: 403 Client Error: Forbidden for url: https://capx.co/theres-one-simple-lesson-from-this-mixed-bag-of-election-results/


Processing URLs:  40%|████      | 402/1000 [18:01<2:58:48, 17.94s/it]

Error extracting text from http://www.cmegroup.com/trading/interest-rates/countdown-to-fomc.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  40%|████      | 405/1000 [18:05<1:13:00,  7.36s/it]

Error extracting text from http://journal.ijreview.com/2016/01/252498-said-undecided-iowan-received-controversial-mailer-ted-cruz/: HTTPConnectionPool(host='journal.ijreview.com', port=80): Max retries exceeded with url: /2016/01/252498-said-undecided-iowan-received-controversial-mailer-ted-cruz/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff7174a0>: Failed to resolve 'journal.ijreview.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 416/1000 [18:26<18:42,  1.92s/it]  

Error extracting text from https://marginalrevolution.com/marginalrevolution/2020/12/buy-capacity-not-doses.html: 403 Client Error: Forbidden for url: https://marginalrevolution.com/marginalrevolution/2020/12/buy-capacity-not-doses.html


Processing URLs:  42%|████▏     | 417/1000 [18:26<14:51,  1.53s/it]

Error extracting text from http://www.hybridcars.com/china-reports-500000th-plug-in-vehicle-sold/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/china-reports-500000th-plug-in-vehicle-sold/


Processing URLs:  42%|████▏     | 423/1000 [19:34<3:03:00, 19.03s/it]

Error extracting text from http://www.spaceflightinsider.com/organizations/space-exploration-technologies/spacex-still-eyeing-fall-launch-maiden-flight-falcon-heavy/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /organizations/space-exploration-technologies/spacex-still-eyeing-fall-launch-maiden-flight-falcon-heavy/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe655070>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  42%|████▏     | 424/1000 [19:36<2:12:58, 13.85s/it]

Error extracting text from https://www.thebureauinvestigates.com/drone-war/data/somalia-reported-us-covert-actions-2017: 500 Server Error: Internal Server Error for url: https://www.thebureauinvestigates.com/drone-war/data/somalia-reported-us-covert-actions-2017


Processing URLs:  43%|████▎     | 426/1000 [19:40<1:17:46,  8.13s/it]

Error extracting text from http://www.theoilandgasyear.com/articles/local-manufacturing-in-iran/: 404 Client Error: Not Found for url: https://theenergyyear.com/articles/local-manufacturing-in-iran/
Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/10/01/brazil-absent-fiscal-adjustment-expect-junk-downgrades/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/10/01/brazil-absent-fiscal-adjustment-expect-junk-downgrades/


Processing URLs:  43%|████▎     | 432/1000 [19:47<20:43,  2.19s/it]  

Error extracting text from https://www.nytimes.com/2017/08/24/opinion/canada-legalize-marijuana.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/24/opinion/canada-legalize-marijuana.html


Processing URLs:  43%|████▎     | 434/1000 [19:49<13:38,  1.45s/it]

Error extracting text from http://www.reuters.com/article/uk-usa-trump-congress-idUSKBN15V2UT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-usa-trump-congress-idUSKBN15V2UT


Processing URLs:  44%|████▎     | 436/1000 [19:52<13:24,  1.43s/it]

Error extracting text from https://newsroom.uber.com/pittsburgh-self-driving-uber/: 403 Client Error: Forbidden for url: https://newsroom.uber.com/pittsburgh-self-driving-uber/


Processing URLs:  44%|████▎     | 437/1000 [19:52<10:21,  1.10s/it]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-energy/u-s-warns-public-about-attacks-on-energy-industrial-firms-idUSKBN1CQ0IN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-energy/u-s-warns-public-about-attacks-on-energy-industrial-firms-idUSKBN1CQ0IN
Error extracting text from https://www.espn.co.uk/mlb/story/_/id/33167564/nearly-two-months-mlb-lockout-to-worry-spring-training-opening-day: 403 Client Error: Forbidden for url: https://www.espn.co.uk/mlb/story/_/id/33167564/nearly-two-months-mlb-lockout-to-worry-spring-training-opening-day


Processing URLs:  44%|████▍     | 441/1000 [19:54<06:17,  1.48it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-delay-idUSKBN1AC1KW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-delay-idUSKBN1AC1KW


Processing URLs:  45%|████▍     | 449/1000 [20:08<14:23,  1.57s/it]

Error extracting text from https://oec.world/en/profile/hs92/wheat: 404 Client Error: Not Found for url: https://oec.world/en/404


Processing URLs:  45%|████▌     | 454/1000 [20:14<12:59,  1.43s/it]

Error extracting text from http://sports.williamhill.com/bet/en-gb/betting/e/5849295/ICC+World+Twenty20+2016+%2d+Outright.html: HTTPConnectionPool(host='sports.williamhill.com', port=80): Max retries exceeded with url: /bet/en-gb/betting/e/5849295/ICC+World+Twenty20+2016+-+Outright.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3042e5640>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  46%|████▌     | 456/1000 [20:16<10:46,  1.19s/it]

Error extracting text from https://www.nytimes.com/2017/02/05/health/with-fda-vacancy-trump-sees-chance-to-speed-drugs-to-the-market.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/05/health/with-fda-vacancy-trump-sees-chance-to-speed-drugs-to-the-market.html?_r=0


Processing URLs:  46%|████▌     | 462/1000 [20:23<07:08,  1.25it/s]

Error extracting text from http://thehill.com/homenews/348531-kremlin-confirms-receiving-email-from-trump-lawyer-cohen-did-not-respond: 403 Client Error: Forbidden for url: https://thehill.com/homenews/348531-kremlin-confirms-receiving-email-from-trump-lawyer-cohen-did-not-respond/
Error extracting text from https://www.politico.com/states/new-york/city-hall/story/2017/10/19/mayor-will-fight-vigorously-against-cuomo-approved-autonomous-vehicle-tests-115155: 403 Client Error: Forbidden for url: https://www.politico.com/states/new-york/city-hall/story/2017/10/19/mayor-will-fight-vigorously-against-cuomo-approved-autonomous-vehicle-tests-115155


Processing URLs:  46%|████▋     | 463/1000 [20:26<13:44,  1.54s/it]

Error extracting text from http://www.ainonline.com/aviation-news/aviation-international-news/2014-06-05/ads-b-out-loa-mandatory-some-countries: 500 Server Error: Internal Server Error for url: https://www.ainonline.com/aviation-news/aviation-international-news/2014-06-05/ads-b-out-loa-mandatory-some-countries


Processing URLs:  47%|████▋     | 467/1000 [20:31<10:53,  1.23s/it]

Error extracting text from http://www.reuters.com/article/us-nato-summit-idUSKCN0ZN2NL?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-summit-idUSKCN0ZN2NL?il=0


Processing URLs:  47%|████▋     | 469/1000 [20:32<07:23,  1.20it/s]

Error extracting text from http://www.wsj.com/articles/spains-rajoy-braces-for-second-confidence-vote-in-parliament-1472816776: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/spains-rajoy-braces-for-second-confidence-vote-in-parliament-1472816776


Processing URLs:  47%|████▋     | 471/1000 [20:37<14:44,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-germany-attack-idUSKBN1640NW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-attack-idUSKBN1640NW


Processing URLs:  47%|████▋     | 473/1000 [20:38<10:35,  1.21s/it]

Error extracting text from https://the-world-is-watching.org/wp-content/uploads/2021/09/Myanmar-Legal-Opinion-Final-2.pdf: 406 Client Error: Not Acceptable for url: https://the-world-is-watching.org/wp-content/uploads/2021/09/Myanmar-Legal-Opinion-Final-2.pdf


Processing URLs:  48%|████▊     | 475/1000 [20:41<10:54,  1.25s/it]

Error extracting text from https://www.israelhayom.com/2021/07/05/gantz-war-with-hamas-can-resume-at-any-time/: 403 Client Error: Forbidden for url: https://www.israelhayom.com/2021/07/05/gantz-war-with-hamas-can-resume-at-any-time/


Processing URLs:  48%|████▊     | 480/1000 [21:01<18:35,  2.15s/it]

Error extracting text from http://www.wsj.com/articles/series-of-bombs-in-syria-hit-assad-strongholds-1463998056: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/series-of-bombs-in-syria-hit-assad-strongholds-1463998056
Error extracting text from http://news.usni.org/2015/11/13/essay-russias-military-role-in-syria: 403 Client Error: Forbidden for url: http://news.usni.org/2015/11/13/essay-russias-military-role-in-syria


Processing URLs:  48%|████▊     | 482/1000 [21:02<11:52,  1.38s/it]

Error extracting text from http://www.nbr.co.nz/article/clark-eighth-fourth-round-fight-be-next-un-secretary-general-—-vows-fight-ck-194151: 404 Client Error: Not Found for url: https://www.nbr.co.nz/article/clark-eighth-fourth-round-fight-be-next-un-secretary-general-%E2%80%94-vows-fight-ck-194151
Error extracting text from https://www.reuters.com/article/us-myanmar-h1n1/myanmar-tracks-spread-of-h1n1-as-outbreak-claims-sixth-victim-idUSKBN1AC1XZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-h1n1/myanmar-tracks-spread-of-h1n1-as-outbreak-claims-sixth-victim-idUSKBN1AC1XZ


Processing URLs:  48%|████▊     | 484/1000 [21:03<08:36,  1.00s/it]

Error extracting text from http://news.heart.org/world-heart-day-building-a-global-culture-of-health-to-reduce-death-from-cardiovascular-disease/: 403 Client Error: Forbidden for url: http://news.heart.org/world-heart-day-building-a-global-culture-of-health-to-reduce-death-from-cardiovascular-disease/


Processing URLs:  49%|████▊     | 486/1000 [21:08<13:09,  1.54s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/0001418091/000110465922045641/tm2212748d1_sc13da.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/0001418091/000110465922045641/tm2212748d1_sc13da.htm


Processing URLs:  49%|████▉     | 489/1000 [21:17<16:17,  1.91s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-poll-candidate-idUSKBN15W0JI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-candidate-idUSKBN15W0JI
Error extracting text from http://www.reuters.com/article/us-autos-electric-germany-idUSKCN10F276: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-autos-electric-germany-idUSKCN10F276


Processing URLs:  50%|████▉     | 496/1000 [21:27<11:08,  1.33s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0W35CZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0W35CZ


Processing URLs:  50%|████▉     | 499/1000 [22:31<2:13:57, 16.04s/it]

Error extracting text from http://origin.www.uscc.gov/sites/default/files/transcripts/3.18.08HearingTranscript.pdf: HTTPConnectionPool(host='origin.www.uscc.gov', port=80): Max retries exceeded with url: /sites/default/files/transcripts/3.18.08HearingTranscript.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30153aba0>, 'Connection to origin.www.uscc.gov timed out. (connect timeout=60)'))


Processing URLs:  50%|█████     | 501/1000 [22:34<1:16:34,  9.21s/it]

Error extracting text from https://www.wsj.com/articles/trudeau-deploys-vaccine-mandates-as-wedge-issue-in-canadas-election-11629889200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trudeau-deploys-vaccine-mandates-as-wedge-issue-in-canadas-election-11629889200


Processing URLs:  50%|█████     | 503/1000 [22:36<43:52,  5.30s/it]  

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/09/15/0401000000AEN20150915000300315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  50%|█████     | 504/1000 [22:37<33:07,  4.01s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-09-05/china-tests-new-myanmar-trade-route-in-boost-to-ties-post


Processing URLs:  51%|█████     | 507/1000 [22:40<18:21,  2.23s/it]

URL filtered: https://twitter.com/siteintelgroup/status/791247092014387200


Processing URLs:  51%|█████     | 509/1000 [22:40<11:09,  1.36s/it]

Error extracting text from https://www.nytimes.com/2017/04/14/us/politics/russia-investigation-cyprus-mike-quigley.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/14/us/politics/russia-investigation-cyprus-mike-quigley.html?_r=1


Processing URLs:  51%|█████     | 511/1000 [22:42<08:45,  1.08s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/07/21/world/ap-un-united-nations-next-secretary-general.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/07/21/world/ap-un-united-nations-next-secretary-general.html


Processing URLs:  51%|█████     | 512/1000 [22:57<39:41,  4.88s/it]

Error extracting text from https://www.almasdarnews.com/article/iraqi-army-renews-push-mosul-city-reinforcements-arrive-crush-isis/: 522 Server Error:  for url: https://www.almasdarnews.com/article/iraqi-army-renews-push-mosul-city-reinforcements-arrive-crush-isis/
Error extracting text from http://www.zikavirusnet.com/history-of-zika.html: HTTPConnectionPool(host='www.zikavirusnet.com', port=80): Max retries exceeded with url: /history-of-zika.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e08da0>: Failed to resolve 'www.zikavirusnet.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  52%|█████▏    | 518/1000 [23:11<21:17,  2.65s/it]

Error extracting text from http://www.presstv.com/Detail/2015/11/27/439344/France-Syria-antiDaesh-fight: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2015/11/27/439344/France-Syria-antiDaesh-fight


Processing URLs:  52%|█████▏    | 519/1000 [23:13<21:05,  2.63s/it]

Error extracting text from http://www.elespectador.com/noticias/paz/santos-dice-firma-de-paz-el-23-de-marzo-esta-casi-desca-articulo-621861: 404 Client Error: Not Found for url: https://www.elespectador.com/noticias/paz/santos-dice-firma-de-paz-el-23-de-marzo-esta-casi-desca-articulo-621861/


Processing URLs:  52%|█████▏    | 521/1000 [23:19<20:30,  2.57s/it]

Error extracting text from http://www.wsj.com/articles/u-s-punishes-russia-over-election-hacking-with-sanctions-1483039178: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-punishes-russia-over-election-hacking-with-sanctions-1483039178


Processing URLs:  52%|█████▏    | 522/1000 [23:20<15:28,  1.94s/it]

Error extracting text from http://montrealgazette.com/news/national/quebecers-break-ranks-with-canada-and-oppose-legal-weed-poll: 403 Client Error: Forbidden for url: https://montrealgazette.com:443/news/national/quebecers-break-ranks-with-canada-and-oppose-legal-weed-poll
URL filtered: http://www.bloomberg.com/news/articles/2016-01-29/yen-tumbles-more-than-2-as-boj-adopts-negative-interest-rates


Processing URLs:  52%|█████▎    | 525/1000 [23:22<09:47,  1.24s/it]

Error extracting text from http://evobsession.com/bmw-i3-sales-rose-germany-following-electric-vehicle-subsidy-implementation/: 403 Client Error: Forbidden for url: http://evobsession.com/bmw-i3-sales-rose-germany-following-electric-vehicle-subsidy-implementation/


Processing URLs:  53%|█████▎    | 526/1000 [23:24<10:13,  1.29s/it]



Processing URLs:  53%|█████▎    | 528/1000 [23:56<1:15:39,  9.62s/it]

Error extracting text from https://aboutcroatia.net/news/balkan/stoltenberg-says-montenegros-nato-accession-nothing-do-russia-26064: 522 Server Error:  for url: https://aboutcroatia.net/news/balkan/stoltenberg-says-montenegros-nato-accession-nothing-do-russia-26064


Processing URLs:  53%|█████▎    | 530/1000 [24:00<45:28,  5.80s/it]  

Error extracting text from http://blogs.spectator.co.uk/2016/05/brexit-stab-back-myth-coming/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/05/brexit-stab-back-myth-coming/


Processing URLs:  54%|█████▍    | 542/1000 [24:15<08:32,  1.12s/it]

Error extracting text from http://www.wsj.com/articles/tesla-motors-to-restart-sales-of-lower-range-model-s-sedan-1465476547: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tesla-motors-to-restart-sales-of-lower-range-model-s-sedan-1465476547


Processing URLs:  54%|█████▍    | 544/1000 [24:17<07:15,  1.05it/s]

Error extracting text from https://www.nytimes.com/2017/02/10/opinion/sunday/charles-schumer-judge-gorsuch-we-wont-be-fooled-again.html?ref=opinion: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/10/opinion/sunday/charles-schumer-judge-gorsuch-we-wont-be-fooled-again.html?ref=opinion
URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/have-iphone-sales-peaked-analysts-predict-slump-in-fiscal-2016?cmpid=wsdemand


Processing URLs:  55%|█████▍    | 548/1000 [24:20<05:18,  1.42it/s]

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-2021-polling-expert-says-snp-majority-on-a-knife-edge-3215752: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-polling-expert-says-snp-majority-on-a-knife-edge-3215752


Processing URLs:  56%|█████▌    | 555/1000 [24:27<05:21,  1.38it/s]

Error extracting text from http://greece.greekreporter.com/2016/11/09/what-could-the-donald-trump-victory-mean-for-greece/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/11/09/what-could-the-donald-trump-victory-mean-for-greece/
Error extracting text from https://www.reuters.com/article/us-israel-palestinians-settlements/lasers-and-flaming-torches-light-up-battle-over-new-israeli-settlement-idUSKCN2E01LP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-settlements/lasers-and-flaming-torches-light-up-battle-over-new-israeli-settlement-idUSKCN2E01LP


Processing URLs:  56%|█████▌    | 556/1000 [24:28<03:59,  1.86it/s]

Error extracting text from http://www.nytimes.com/aponline/2015/11/10/world/middleeast/ap-un-united-nations-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/11/10/world/middleeast/ap-un-united-nations-syria.html
URL filtered: https://www.youtube.com/watch?v=k9dhcfIyOFc


Processing URLs:  56%|█████▌    | 559/1000 [24:29<03:16,  2.24it/s]

Error extracting text from https://census.stat.gov.mk/?fbclid=IwAR1ImtObHe6StuDqGREITj9rpQsOZrMGU184gTT7xZbSey-4K65DutYYBVU#: HTTPSConnectionPool(host='census.stat.gov.mk', port=443): Max retries exceeded with url: /?fbclid=IwAR1ImtObHe6StuDqGREITj9rpQsOZrMGU184gTT7xZbSey-4K65DutYYBVU (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303009640>: Failed to resolve 'census.stat.gov.mk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  56%|█████▌    | 560/1000 [24:29<03:08,  2.34it/s]

Error extracting text from http://www.benning.army.mil/INFANTRY/Magazine/issues/2017/JAN-MAr/pdf/7: HTTPConnectionPool(host='www.benning.army.mil', port=80): Max retries exceeded with url: /INFANTRY/Magazine/issues/2017/JAN-MAr/pdf/7 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303008650>: Failed to resolve 'www.benning.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/09/24/brazil-corruption-court-idUSL1N11U03520150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/brazil-corruption-court-idUSL1N11U03520150924


Processing URLs:  56%|█████▌    | 562/1000 [25:30<1:36:24, 13.21s/it]

Error extracting text from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=20&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=%22Amazon+Technologies%22&amp;OS=%22Amazon+Technologies%22&amp;RS=%22Amazon+Technologies%22: HTTPConnectionPool(host='patft.uspto.gov', port=80): Max retries exceeded with url: /netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=20&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=%22Amazon+Technologies%22&amp;OS=%22Amazon+Technologies%22&amp;RS=%22Amazon+Technologies%22 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303008dd0>, 'Connection to patft.uspto.gov timed out. (connect timeout=60)'))


Processing URLs:  57%|█████▋    | 566/1000 [25:46<48:47,  6.75s/it]  

Error extracting text from http://thebulletin.org/north-koreas-nuclear-weapons-what-now/time-may-be-right-northeast-asia-nuclear-weapon-free-zone: 404 Client Error: Not Found for url: https://thebulletin.org/north-koreas-nuclear-weapons-what-now/time-may-be-right-northeast-asia-nuclear-weapon-free-zone/


Processing URLs:  57%|█████▋    | 567/1000 [25:47<35:43,  4.95s/it]

Error extracting text from http://venturebeat.com/2016/07/11/microsoft-ceo-chatbots-will-fundamentally-revolutionize-computing/: 403 Client Error: Forbidden for url: https://venturebeat.com/2016/07/11/microsoft-ceo-chatbots-will-fundamentally-revolutionize-computing/


Processing URLs:  57%|█████▋    | 569/1000 [25:48<20:52,  2.91s/it]

Error extracting text from http://www.reuters.com/article/us-usa-tax/senate-poised-for-crucial-vote-related-to-tax-reform-measure-idUSKBN1CO0ET: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax/senate-poised-for-crucial-vote-related-to-tax-reform-measure-idUSKBN1CO0ET
URL filtered: http://www.bloomberg.com/news/articles/2015-10-13/britain-s-inflation-rate-unexpectedly-drops-back-below-zero


Processing URLs:  57%|█████▊    | 575/1000 [26:04<18:27,  2.61s/it]

Error extracting text from http://www.amazon.com/Lights-Out-Cyberattack-Unprepared-Surviving/dp/055341996X: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Lights-Out-Cyberattack-Unprepared-Surviving/dp/055341996X


Processing URLs:  58%|█████▊    | 577/1000 [26:05<10:06,  1.43s/it]

Error extracting text from http://www.paulharrell.com/: 404 Client Error: Not Found for url: http://www.paulharrell.com/
Error extracting text from http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html?action=Click&amp;contentCollection=BreakingNews&amp;contentID=64407638&amp;pgtype=Homepage: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html?action=Click&amp;contentCollection=BreakingNews&amp;contentID=64407638&amp;pgtype=Homepage


Processing URLs:  58%|█████▊    | 580/1000 [26:12<13:43,  1.96s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-conservatives-reject-united-states-of-europe-ahead-of-coalition-talks-idUSKBN1E30KH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-conservatives-reject-united-states-of-europe-ahead-of-coalition-talks-idUSKBN1E30KH?il=0


Processing URLs:  58%|█████▊    | 583/1000 [26:13<07:06,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/gop-plots-its-path-on-merrick-garland-supreme-court-nomination-1458169679: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gop-plots-its-path-on-merrick-garland-supreme-court-nomination-1458169679


Processing URLs:  58%|█████▊    | 584/1000 [26:14<07:20,  1.06s/it]

Error extracting text from http://www.ibtimes.co.uk/five-places-that-remain-forefront-war-against-islamic-state-iraq-syria-1596302: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/five-places-that-remain-forefront-war-against-islamic-state-iraq-syria-1596302


Processing URLs:  59%|█████▊    | 587/1000 [26:17<07:24,  1.08s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0WJ1WN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0WJ1WN


Processing URLs:  59%|█████▉    | 588/1000 [26:18<06:49,  1.01it/s]

Error extracting text from http://gawker.com/asma-al-assad-a-rose-in-the-desert-1265002284: 404 Client Error: Not Found for url: https://gawker.com/asma-al-assad-a-rose-in-the-desert-1265002284


Processing URLs:  59%|█████▉    | 591/1000 [26:24<09:48,  1.44s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/british-american-tobacco-pulls-out-army-ruled-myanmar-2021-10-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/british-american-tobacco-pulls-out-army-ruled-myanmar-2021-10-12/


Processing URLs:  60%|█████▉    | 595/1000 [26:45<23:11,  3.44s/it]

Error extracting text from http://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160106


Processing URLs:  60%|█████▉    | 599/1000 [26:50<13:54,  2.08s/it]

Error extracting text from https://www.afghanistan-analysts.org/the-new-kabul-green-belt-security-plan-more-security-for-whom/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/the-new-kabul-green-belt-security-plan-more-security-for-whom/


Processing URLs:  60%|██████    | 600/1000 [26:51<11:18,  1.70s/it]

Error extracting text from http://nationalinterest.org/feature/iran-nuclear-deals-next-test-getting-through-the-iaea-14483: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/iran-nuclear-deals-next-test-getting-through-the-iaea-14483


Processing URLs:  60%|██████    | 605/1000 [27:45<1:40:44, 15.30s/it]

Error extracting text from http://www.welt.de/newsticker/dpa_nt/infoline_nt/wirtschaft_nt/article155427929/Neben-Kaufpraemie-kommt-auch-10-Jahres-Steuerbonus.html: 500 Server Error: Internal Server Error for url: https://www.welt.de/newsticker/dpa_nt/infoline_nt/wirtschaft_nt/article155427929/Neben-Kaufpraemie-kommt-auch-10-Jahres-Steuerbonus.html


Processing URLs:  61%|██████    | 607/1000 [27:47<53:39,  8.19s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-09-13/u-s-bombers-train-with-japan-south-korea-after-nuclear-test


Processing URLs:  61%|██████    | 609/1000 [27:48<29:43,  4.56s/it]

Error extracting text from https://www.un.org/press/en/2017/sc12791.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2017/sc12791.doc.htm


Processing URLs:  61%|██████▏   | 613/1000 [27:58<20:20,  3.15s/it]

Error extracting text from https://nyti.ms/3JRG5h1: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/02/24/world/russia-attacks-ukraine/zelensky-says-russian-saboteurs-are-in-kyiv-and-he-is-moscows-prime-target?smid=tw-nytimes&smtyp=cur


Processing URLs:  61%|██████▏   | 614/1000 [27:59<16:21,  2.54s/it]

Error extracting text from https://sledcom.ru/news/item/1526952/: HTTPSConnectionPool(host='sledcom.ru', port=443): Max retries exceeded with url: /news/item/1526952/ (Caused by SSLError(SSLError(1, '[SSL: WRONG_SIGNATURE_TYPE] wrong signature type (_ssl.c:1000)')))


Processing URLs:  62%|██████▏   | 616/1000 [28:00<09:36,  1.50s/it]

Error extracting text from http://english.aawsat.com/2016/09/article55357921/barzani-divide-mosul-post-isis-three-provinces: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/09/article55357921/barzani-divide-mosul-post-isis-three-provinces


Processing URLs:  62%|██████▏   | 621/1000 [28:04<04:51,  1.30it/s]

Error extracting text from http://www.nytimes.com/2015/12/31/upshot/donald-trumps-strongest-supporters-a-certain-kind-of-democrat.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/31/upshot/donald-trumps-strongest-supporters-a-certain-kind-of-democrat.html


Processing URLs:  62%|██████▏   | 624/1000 [28:07<05:53,  1.06it/s]

Error extracting text from http://www.crows.org/item/advanced-principles-of-electronic-warfare.html: 403 Client Error: Forbidden for url: http://www.crows.org/item/advanced-principles-of-electronic-warfare.html


Processing URLs:  63%|██████▎   | 628/1000 [28:11<05:45,  1.08it/s]

Error extracting text from https://mobile.nytimes.com/2017/10/12/world/middleeast/palestinians-fatah-hamas-gaza.html?referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/12/world/middleeast/palestinians-fatah-hamas-gaza.html?referer=https://www.google.com/


Processing URLs:  63%|██████▎   | 631/1000 [28:14<07:13,  1.17s/it]

Error extracting text from http://www.sis.gov.eg/Story/121824?lang=en-us: HTTPSConnectionPool(host='www.sis.gov.eg', port=443): Max retries exceeded with url: /Story/121824?lang=en-us (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  64%|██████▍   | 638/1000 [28:25<06:10,  1.02s/it]

Error extracting text from http://www.thanhniennews.com/world/syrian-government-ready-to-join-un-talks-to-end-conflict-assad-aide-57356.html: HTTPConnectionPool(host='www.thanhniennews.com', port=80): Max retries exceeded with url: /world/syrian-government-ready-to-join-un-talks-to-end-conflict-assad-aide-57356.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3022d5520>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/2015/09/15/usa-oilexports-senate-idUSL1N11L2AU20150915: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/15/usa-oilexports-senate-idUSL1N11L2AU20150915


Processing URLs:  64%|██████▍   | 639/1000 [28:26<05:55,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-peru-: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-


Processing URLs:  64%|██████▍   | 641/1000 [28:26<04:02,  1.48it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/02/16/venezuela-will-currency-reform-wednesday-stave-off-default/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/02/16/venezuela-will-currency-reform-wednesday-stave-off-default/


Processing URLs:  65%|██████▍   | 646/1000 [28:35<08:01,  1.36s/it]

Error extracting text from http://www.reuters.com/article/2015/10/09/us-china-usa-southchinasea-idUSKCN0S30ND20151009: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/09/us-china-usa-southchinasea-idUSKCN0S30ND20151009


Processing URLs:  65%|██████▍   | 649/1000 [28:37<04:53,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKCN10S0DQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKCN10S0DQ
Error extracting text from http://www.latimes.com/sports/nfl/la-sp-nfl-la-farmer-20150923-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/sports/nfl/la-sp-nfl-la-farmer-20150923-story.html


Processing URLs:  65%|██████▌   | 651/1000 [28:45<12:55,  2.22s/it]

Error extracting text from https://uk.reuters.com/article/uk-northkorea-missiles-idUKKCN1B2107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  65%|██████▌   | 654/1000 [28:46<05:45,  1.00it/s]

Error extracting text from http://www.reuters.com/article/us-israel-palestinians-hamas-idUSKBN1862PK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-hamas-idUSKBN1862PK
Error extracting text from http://www.bosniatoday.ba/american-congressman-michael-turner-warns-bosnia-over-territorial-dispute-with-montenegro/: HTTPConnectionPool(host='www.bosniatoday.ba', port=80): Max retries exceeded with url: /american-congressman-michael-turner-warns-bosnia-over-territorial-dispute-with-montenegro/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fefc7e60>: Failed to resolve 'www.bosniatoday.ba' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  66%|██████▌   | 655/1000 [28:48<07:03,  1.23s/it]

Error extracting text from http://blogs.spectator.co.uk/2016/02/boris-johnson-eu-red-card-is-not-enough/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/02/boris-johnson-eu-red-card-is-not-enough/


Processing URLs:  66%|██████▌   | 659/1000 [29:01<17:05,  3.01s/it]

Error extracting text from https://www.medicalcountermeasures.gov/barda/fdaapprovals/: 403 Client Error: Forbidden for url: https://www.medicalcountermeasures.gov/barda/fdaapprovals/
Error extracting text from http://blogs.wsj.com/washwire/2015/10/19/power-struggle-in-iran-intensifies-ahead-of-key-elections/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/10/19/power-struggle-in-iran-intensifies-ahead-of-key-elections/


Processing URLs:  66%|██████▌   | 662/1000 [29:04<10:09,  1.80s/it]

Error extracting text from http://www.theblaze.com/stories/2016/10/17/russias-nuclear-brinksmanship-alarms-national-security-experts/: 404 Client Error: Not Found for url: https://www.theblaze.com/stories/2016/10/17/russias-nuclear-brinksmanship-alarms-national-security-experts/


Processing URLs:  66%|██████▋   | 664/1000 [29:06<07:50,  1.40s/it]

Error extracting text from http://www.law.uci.edu/lawreview/vol4/no1/Coutin.pdf: 404 Client Error: Not Found for url: https://www.law.uci.edu:443/lawreview/vol4/no1/Coutin.pdf
Error extracting text from http://www.nytimes.com/2015/10/27/us/politics/congress-and-white-house-near-deal-on-budget.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/27/us/politics/congress-and-white-house-near-deal-on-budget.html


Processing URLs:  67%|██████▋   | 666/1000 [29:07<05:22,  1.04it/s]

Error extracting text from https://nationalinterest.org/feature/why-biden-pushing-arms-control-russia-176908: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/why-biden-pushing-arms-control-russia-176908
Error extracting text from http://parstoday.com/en/news/iran-i43993-iran_starts_injecting_uf6_into_domestically_made_ir_8_centrifuges_aeoi: HTTPConnectionPool(host='parstoday.com', port=80): Max retries exceeded with url: /en/news/iran-i43993-iran_starts_injecting_uf6_into_domestically_made_ir_8_centrifuges_aeoi (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303ec87d0>: Failed to resolve 'parstoday.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  67%|██████▋   | 674/1000 [29:32<12:58,  2.39s/it]

Error extracting text from http://www.newsweek.com/brexit-polling-vote-eu-referendum-445101: 403 Client Error: Forbidden for url: https://www.newsweek.com/brexit-polling-vote-eu-referendum-445101


Processing URLs:  68%|██████▊   | 677/1000 [29:51<26:23,  4.90s/it]

Error extracting text from http://blog.dilbert.com/2017/07/05/solving-the-north-korea-situation/: HTTPConnectionPool(host='blog.dilbert.com', port=80): Max retries exceeded with url: /2017/07/05/solving-the-north-korea-situation/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3024e4e00>: Failed to resolve 'blog.dilbert.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 681/1000 [29:53<09:16,  1.74s/it]

Error extracting text from http://www.reuters.com/article/us-usa-oil-eia-idUSKBN16T1VR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-oil-eia-idUSKBN16T1VR
URL filtered: https://twitter.com/RALee85/status/1491521222437445633


Processing URLs:  68%|██████▊   | 683/1000 [30:53<1:12:59, 13.82s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-03-21/nixon-counsel-during-watergate-says-trump-administration-showing-how-damn-guilty-they-are: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  69%|██████▉   | 688/1000 [30:57<19:15,  3.70s/it]  

Error extracting text from http://uk.reuters.com/article/uk-germany-election-poll-idUKKBN16M0KE?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  69%|██████▉   | 690/1000 [31:01<14:11,  2.75s/it]

Error extracting text from http://bigstory.ap.org/article/66f8c15a9741487f9a72a086b005b237/state-tv-saudi-arabia-has-executed-47-criminals: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/66f8c15a9741487f9a72a086b005b237/state-tv-saudi-arabia-has-executed-47-criminals (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3021d8920>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  69%|██████▉   | 693/1000 [31:02<07:14,  1.42s/it]

Error extracting text from http://www.businessinsider.com/r-north-korea-may-be-readying-long-range-missile-launch-soon-kyodo-2016-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-north-korea-may-be-readying-long-range-missile-launch-soon-kyodo-2016-1


Processing URLs:  70%|██████▉   | 698/1000 [32:09<1:32:52, 18.45s/it]

Error extracting text from http://www.usnews.com/opinion/articles/2016-08-12/how-the-us-can-fight-back-against-russias-cyberattacks: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  70%|███████   | 700/1000 [32:16<55:03, 11.01s/it]  

URL filtered: https://www.youtube.com/watch?v=q_qgVn-Op7Q


Processing URLs:  70%|███████   | 703/1000 [32:17<23:11,  4.69s/it]

Error extracting text from http://www.nytimes.com/2016/08/09/world/asia/china-spratly-islands-south-china-sea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/09/world/asia/china-spratly-islands-south-china-sea.html?_r=0


Processing URLs:  71%|███████   | 706/1000 [32:22<12:45,  2.60s/it]

Error extracting text from https://english.ahram.org.eg/NewsContent/50/1201/416279/AlAhram-Weekly/Egypt/GERD-Back-to-the-Security-Council.aspx: 403 Client Error: Forbidden for url: https://english.ahram.org.eg/NewsContent/50/1201/416279/AlAhram-Weekly/Egypt/GERD-Back-to-the-Security-Council.aspx


Processing URLs:  71%|███████   | 707/1000 [32:22<09:31,  1.95s/it]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2020/10/16/remarks-by-president-charles-michel-after-the-european-council-meeting-on-15-and-16-october-2020/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2020/10/16/remarks-by-president-charles-michel-after-the-european-council-meeting-on-15-and-16-october-2020/


Processing URLs:  71%|███████   | 710/1000 [32:26<07:09,  1.48s/it]

Error extracting text from http://www.nationmultimedia.com/breakingnews/319-garment-factories-to-shut-down-in-Bangladesh-30286006.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/breakingnews/319-garment-factories-to-shut-down-in-Bangladesh-30286006.html


Processing URLs:  71%|███████   | 712/1000 [32:30<07:42,  1.61s/it]

Error extracting text from https://www.france24.com/en/asia-pacific/20210729-kabul-faces-existential-crisis-in-face-of-taliban-surge-us-watchdog-says: 403 Client Error: Forbidden for url: https://www.france24.com/en/asia-pacific/20210729-kabul-faces-existential-crisis-in-face-of-taliban-surge-us-watchdog-says


Processing URLs:  72%|███████▏  | 717/1000 [32:37<08:44,  1.85s/it]

Error extracting text from http://www.ibtimes.com/brazil-corruption-scandal-update-former-president-lula-prepares-graft-probe-testimony-2306818: 403 Client Error: Forbidden for url: https://www.ibtimes.com/brazil-corruption-scandal-update-former-president-lula-prepares-graft-probe-testimony-2306818


Processing URLs:  72%|███████▏  | 719/1000 [32:39<06:17,  1.35s/it]

Error extracting text from http://www.ibtimes.com/iran-elections-2016-president-hassan-rouhanis-moderate-reformist-allies-win-all-2326441: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-elections-2016-president-hassan-rouhanis-moderate-reformist-allies-win-all-2326441
Error extracting text from http://www.reuters.com/article/us-iran-usa-arrest-idUSKBN16C0PI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-arrest-idUSKBN16C0PI


Processing URLs:  72%|███████▏  | 721/1000 [32:42<06:55,  1.49s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-russia-diplomacy-idUSKBN1481RE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-russia-diplomacy-idUSKBN1481RE


Processing URLs:  73%|███████▎  | 726/1000 [32:49<06:17,  1.38s/it]

Error extracting text from http://waronwant.org/media/ttip-talks-trouble-no-end-sight: 404 Client Error: Not Found for url: https://waronwant.org/media/ttip-talks-trouble-no-end-sight


Processing URLs:  73%|███████▎  | 729/1000 [32:53<05:40,  1.26s/it]

Error extracting text from https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/michael-saunder-some-monetary-policy-options-if-more-support-is-needed.pdf?la=en&amp;hash=CBA72BE6376441756ABA4ACBF3218A0555927E7B: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/michael-saunder-some-monetary-policy-options-if-more-support-is-needed.pdf?la=en&amp;hash=CBA72BE6376441756ABA4ACBF3218A0555927E7B


Processing URLs:  73%|███████▎  | 733/1000 [32:59<05:20,  1.20s/it]

Error extracting text from http://fusion.net/story/214234/bernie-sanders-wins-democratic-debate-focus-group/: HTTPConnectionPool(host='fusion.net', port=80): Max retries exceeded with url: /story/214234/bernie-sanders-wins-democratic-debate-focus-group/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3021d8260>: Failed to resolve 'fusion.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://news.usni.org/2017/02/10/manila-predicts-beijing-will-build-base-on-scarborough-shoal: 403 Client Error: Forbidden for url: https://news.usni.org/2017/02/10/manila-predicts-beijing-will-build-base-on-scarborough-shoal


Processing URLs:  74%|███████▍  | 738/1000 [34:07<1:19:43, 18.26s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/haiti/article70110512.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  74%|███████▍  | 739/1000 [34:08<58:05, 13.35s/it]  

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZ2JZ26VDKHS01-3QF4V5PU5U6J5IDDANGQ33NKC3


Processing URLs:  74%|███████▍  | 741/1000 [34:10<33:00,  7.65s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-14/putin-orders-main-part-of-russian-army-to-start-syria-pullout


Processing URLs:  74%|███████▍  | 743/1000 [34:11<20:29,  4.78s/it]

Error extracting text from http://editasmedicine.com/: 403 Client Error: Forbidden for url: http://editasmedicine.com/
URL filtered: https://www.spiegel.de/international/business/economist-nouriel-roubini-twitter-and-the-other-platforms-are-bad-facebook-is-worse-a-fc660029-71a4-4575-87ba-40d5bf2ab711


Processing URLs:  74%|███████▍  | 745/1000 [34:12<14:23,  3.39s/it]

Error extracting text from http://www.spacex.com/sites/spacex/files/hyperloop_alpha.pdf: 404 Client Error: The requested content does not exist. for url: https://www.spacex.com/sites/spacex/files/hyperloop_alpha.pdf


Processing URLs:  75%|███████▍  | 748/1000 [34:18<10:39,  2.54s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=54032#.V1DKnb7IZPY: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=54032#.V1DKnb7IZPY


Processing URLs:  75%|███████▌  | 754/1000 [34:27<07:31,  1.83s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/us-will-change-course-on-climate-policy-says-epas-myron-ebell/articleshow/56877460.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/us-will-change-course-on-climate-policy-says-epas-myron-ebell/articleshow/56877460.cms


Processing URLs:  76%|███████▌  | 759/1000 [34:38<09:34,  2.38s/it]

Error extracting text from http://news.thaivisa.com/thailand/gen-prayut-general-election-to-be-held-in-2017/155398/: 404 Client Error: Not Found for url: https://aseannow.com/thailand/gen-prayut-general-election-to-be-held-in-2017/155398/


Processing URLs:  76%|███████▌  | 762/1000 [34:43<07:21,  1.85s/it]

Error extracting text from https://rochan-consulting.com/imint-analysis-russian-forces-near-ukraine/: 500 Server Error: Internal Server Error for url: https://rochan-consulting.com/imint-analysis-russian-forces-near-ukraine/


Processing URLs:  76%|███████▋  | 763/1000 [34:45<07:53,  2.00s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-13/xi-calls-for-a-china-in-2016-where-nobody-dares-to-be-corrupt-
URL filtered: http://www.bloomberg.com/gadfly/articles/2016-04-24/iran-might-spoil-saudi-arabia-oil-plan


Processing URLs:  77%|███████▋  | 766/1000 [34:46<03:57,  1.02s/it]

Error extracting text from http://jurnalmaritim.com/2016/02/hari-ini-kasal-buka-persiapan-akhir-komodo-exercise-2016/: 404 Client Error: Not Found for url: https://jurnalmaritim.com/2016/02/hari-ini-kasal-buka-persiapan-akhir-komodo-exercise-2016/


Processing URLs:  77%|███████▋  | 767/1000 [34:48<05:07,  1.32s/it]

Error extracting text from http://www.nasdaq.com/earnings/report/tsla: 403 Client Error: Forbidden for url: http://www.nasdaq.com/earnings/report/tsla


Processing URLs:  77%|███████▋  | 770/1000 [34:50<03:33,  1.08it/s]

Error extracting text from http://www.el-nacional.com/noticias/economia/morgan-anuncio-maduro-sobre-deuda-ambiguo_210441: 403 Client Error: Forbidden for url: https://www.elnacional.com/noticias/economia/morgan-anuncio-maduro-sobre-deuda-ambiguo_210441


Processing URLs:  77%|███████▋  | 772/1000 [34:53<04:01,  1.06s/it]

Error extracting text from http://www.khaama.com/afghanistan-gets-final-vote-results-of-presidential-elections-after-a-year-and-half-0159: 403 Client Error: Forbidden for url: http://www.khaama.com/afghanistan-gets-final-vote-results-of-presidential-elections-after-a-year-and-half-0159


Processing URLs:  78%|███████▊  | 777/1000 [34:59<03:45,  1.01s/it]

Error extracting text from http://www.wsj.com/articles/irans-moderates-seek-to-capitalize-on-nuclear-deal-for-election-gains-1456350947: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/irans-moderates-seek-to-capitalize-on-nuclear-deal-for-election-gains-1456350947


Processing URLs:  78%|███████▊  | 778/1000 [34:59<02:53,  1.28it/s]

Error extracting text from https://www.nytimes.com/2017/04/13/world/canada/trudeau-marijuana.html?emc=edit_mbe_20170414&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/13/world/canada/trudeau-marijuana.html?emc=edit_mbe_20170414&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1&amp;_r=0


Processing URLs:  78%|███████▊  | 780/1000 [35:09<11:07,  3.03s/it]

Error extracting text from http://www.strongerin.co.uk/#QBqTe6zczyKQEf4J.97: 521 Server Error:  for url: http://www.strongerin.co.uk/#QBqTe6zczyKQEf4J.97


Processing URLs:  78%|███████▊  | 783/1000 [35:13<07:03,  1.95s/it]

Error extracting text from http://www.ibtimes.com/after-iran-nuclear-sanctions-lifted-us-issues-fresh-embargo-over-ballistic-missile-2268656: 403 Client Error: Forbidden for url: https://www.ibtimes.com/after-iran-nuclear-sanctions-lifted-us-issues-fresh-embargo-over-ballistic-missile-2268656


Processing URLs:  78%|███████▊  | 784/1000 [35:14<05:49,  1.62s/it]

URL filtered: https://www.youtube.com/watch?v=30jtbkUTJEE&amp;list=PL_BRXVrL7271MO6FxsEd_PeXLczu_6PsE


Processing URLs:  79%|███████▉  | 788/1000 [35:17<03:54,  1.11s/it]

Error extracting text from http://www.wsj.com/articles/wsj-survey-economists-are-convinced-fed-will-raise-rates-in-december-1449759601: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/wsj-survey-economists-are-convinced-fed-will-raise-rates-in-december-1449759601


Processing URLs:  79%|███████▉  | 792/1000 [35:28<08:47,  2.53s/it]

Error extracting text from http://theiranproject.com/blog/2015/09/29/deputy-fm-no-permission-issued-for-burying-iranian-pilgrims-in-mecca/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=deputy-fm-no-permission-issued-for-burying-iranian-pilgrims-in-mecca


Processing URLs:  79%|███████▉  | 793/1000 [35:29<06:56,  2.01s/it]

URL filtered: https://www.youtube.com/watch?v=rZ-__5uj76E


Processing URLs:  80%|███████▉  | 795/1000 [35:30<04:25,  1.29s/it]

Error extracting text from https://wikileaks.org/podesta-emails/emailid/24258: 403 Client Error: Forbidden for url: https://wikileaks.org/podesta-emails/emailid/24258


Processing URLs:  80%|███████▉  | 796/1000 [35:30<03:35,  1.06s/it]

Error extracting text from https://www.scotsman.com/news/politics/worst-polling-for-yes-since-2019-as-snp-support-continues-to-drop-poll-shows-3218102: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/worst-polling-for-yes-since-2019-as-snp-support-continues-to-drop-poll-shows-3218102


Processing URLs:  80%|███████▉  | 797/1000 [35:30<03:08,  1.08it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/overnights/359003-overnight-cybersecurity: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/overnights/359003-overnight-cybersecurity/


Processing URLs:  80%|███████▉  | 799/1000 [35:33<03:35,  1.07s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-national-democratic-primary: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-national-democratic-primary
URL filtered: https://www.youtube.com/watch?v=W7Rq-PEW5qM


Processing URLs:  80%|████████  | 804/1000 [35:39<03:52,  1.19s/it]

Error extracting text from https://www.nasdaq.com/articles/brazil-2020-2021-total-corn-crop-forecast-cut-by-almost-8-safras-mercado-2021-04-30: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/brazil-2020-2021-total-corn-crop-forecast-cut-by-almost-8-safras-mercado-2021-04-30


Processing URLs:  81%|████████  | 808/1000 [35:40<01:55,  1.66it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1A70PW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1A70PW


Processing URLs:  82%|████████▏ | 818/1000 [35:55<02:17,  1.33it/s]

Error extracting text from http://www.reuters.com/article/us-myanmar-evictions-idUSKCN0VK2LX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-evictions-idUSKCN0VK2LX
Error extracting text from http://www.nytimes.com/2015/11/01/us/politics/bernie-sanders-doesnt-kiss-babies-that-a-problem.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/01/us/politics/bernie-sanders-doesnt-kiss-babies-that-a-problem.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news
URL filtered: https://www.youtube.com/watch?v=ZfR0plT3iUY
URL filtered: http://www.bloomberg.com/news/articles/2016-02-25/sifting-through-swiss-gold-vaults-for-clues-of-venezuela-default


Processing URLs:  82%|████████▏ | 824/1000 [35:58<01:42,  1.71it/s]

URL filtered: https://twitter.com/DDurwent/status/1475478267134099463
Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=5377: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=5377


Processing URLs:  83%|████████▎ | 827/1000 [36:13<09:58,  3.46s/it]

URL filtered: https://www.wired.co.uk/article/margrethe-vestager-apple-facebook-google-antitrust-case


Processing URLs:  83%|████████▎ | 829/1000 [36:13<06:18,  2.21s/it]

URL filtered: https://twitter.com/KevinCChang/status/1437721732069085185


Processing URLs:  83%|████████▎ | 831/1000 [36:14<04:11,  1.49s/it]

Error extracting text from http://thehill.com/latino/315861-mexican-official-we-could-leave-nafta-if-there-are-no-clear-benefits: 403 Client Error: Forbidden for url: https://thehill.com/latino/315861-mexican-official-we-could-leave-nafta-if-there-are-no-clear-benefits/


Processing URLs:  83%|████████▎ | 834/1000 [36:20<04:51,  1.75s/it]

URL filtered: https://www.theguardian.com/technology/2022/mar/18/accc-takes-meta-to-court-over-facebook-scam-ads-depicting-australian-identities
URL filtered: https://twitter.com/bscholl/status/1462206001386377224


Processing URLs:  84%|████████▎ | 837/1000 [36:22<03:04,  1.13s/it]

Error extracting text from http://human-brain.org/arguments.html: 406 Client Error: Not Acceptable for url: http://human-brain.org/arguments.html


Processing URLs:  84%|████████▍ | 838/1000 [36:24<03:43,  1.38s/it]

Error extracting text from http://www.thefiscaltimes.com/2015/10/01/Can-GOP-Bypass-Its-Right-Wing-Revive-Ex-Im-Bank&gt: 404 Client Error: Not Found for url: https://www.thefiscaltimes.com:443/2015/10/01/Can-GOP-Bypass-Its-Right-Wing-Revive-Ex-Im-Bank&gt


Processing URLs:  84%|████████▍ | 844/1000 [36:54<07:40,  2.95s/it]

Error extracting text from https://www.nytimes.com/2017/08/09/opinion/the-smart-way-to-deal-with-putins-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/09/opinion/the-smart-way-to-deal-with-putins-russia.html


Processing URLs:  85%|████████▍ | 847/1000 [37:05<07:43,  3.03s/it]

Error extracting text from https://www.eurogroupforanimals.org/news/why-eu-mercosur-agreement-bad-news-european-animals: 404 Client Error: Not Found for url: https://www.eurogroupforanimals.org/news/why-eu-mercosur-agreement-bad-news-european-animals


Processing URLs:  85%|████████▌ | 850/1000 [37:09<05:06,  2.04s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0SZ1Z720151110: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0SZ1Z720151110
URL filtered: https://www.youtube.com/watch?v=7Pq-S557XQU


Processing URLs:  85%|████████▌ | 852/1000 [37:11<03:54,  1.58s/it]

URL filtered: https://twitter.com/kylieatwood/status/1426853588782010370


Processing URLs:  85%|████████▌ | 854/1000 [37:22<07:33,  3.10s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-security-carrier-iran-idUSKBN16W0TA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-security-carrier-iran-idUSKBN16W0TA


Processing URLs:  86%|████████▌ | 856/1000 [37:23<05:19,  2.22s/it]

URL filtered: http://www.bloomberg.com/graphics/2016-brexit-watch/


Processing URLs:  86%|████████▌ | 858/1000 [37:24<03:33,  1.50s/it]

Error extracting text from https://www.yahoo.com/news/syria-opposition-rejects-un-proposal-assad-stay-source-003249933.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/syria-opposition-rejects-un-proposal-assad-stay-source-003249933.html


Processing URLs:  86%|████████▌ | 860/1000 [37:24<02:33,  1.10s/it]

Error extracting text from http://news.yahoo.com/venezuelan-opposition-skeptical-vote-observers-232011825.html: 404 Client Error: Not Found for url: http://news.yahoo.com/venezuelan-opposition-skeptical-vote-observers-232011825.html


Processing URLs:  87%|████████▋ | 866/1000 [37:33<02:12,  1.01it/s]

Error extracting text from http://www.straitstimes.com/asia/east-asia/north-korea-seeks-peace-treaty-with-us-south-korea-and-china-source: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  87%|████████▋ | 868/1000 [37:34<01:56,  1.13it/s]

Error extracting text from https://www.predictit.org/Market/1460/Who-will-win-the-2016-New-Hampshire-Democratic-presidential-primary: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1460/Who-will-win-the-2016-New-Hampshire-Democratic-presidential-primary


Processing URLs:  87%|████████▋ | 871/1000 [37:39<02:38,  1.23s/it]

URL filtered: http://www.bloomberg.com/features/2016-how-crispr-will-change-the-world/


Processing URLs:  88%|████████▊ | 876/1000 [37:43<02:32,  1.23s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-16/may-to-pledge-ever-closer-u-k-union-in-rebuke-to-sturgeon-s-snp


Processing URLs:  88%|████████▊ | 878/1000 [37:46<02:32,  1.25s/it]

Error extracting text from https://www.rusemb.org.uk/fnapr/6076: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  88%|████████▊ | 879/1000 [37:49<03:24,  1.69s/it]

Error extracting text from https://www.similarweb.com/blog/us-media-publishers-june-2016: 403 Client Error: Forbidden for url: https://www.similarweb.com/blog/us-media-publishers-june-2016


Processing URLs:  88%|████████▊ | 881/1000 [37:51<02:57,  1.49s/it]

Error extracting text from http://www.sciencemag.org/news/2017/07/sci-hub-s-cache-pirated-papers-so-big-subscription-journals-are-doomed-data-analyst: 403 Client Error: Forbidden for url: https://www.science.org/news/2017/07/sci-hub-s-cache-pirated-papers-so-big-subscription-journals-are-doomed-data-analyst


Processing URLs:  88%|████████▊ | 884/1000 [37:57<03:00,  1.56s/it]

Error extracting text from http://www.wsj.com/articles/gates-foundation-sees-possible-end-to-polio-soon-1453487699: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gates-foundation-sees-possible-end-to-polio-soon-1453487699


Processing URLs:  88%|████████▊ | 885/1000 [37:58<02:28,  1.29s/it]

Error extracting text from https://www.realcleardefense.com/articles/2021/02/20/russian_modernization_of_its_nuclear_and_military_forces_in_2021_661111.html#:~:text=The%20increase%20in%20strategic%20nuclear,the%20standards%20of%20recent%20years: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2021/02/20/russian_modernization_of_its_nuclear_and_military_forces_in_2021_661111.html#:~:text=The%20increase%20in%20strategic%20nuclear,the%20standards%20of%20recent%20years


Processing URLs:  89%|████████▊ | 887/1000 [38:17<08:43,  4.64s/it]

Error extracting text from http://abcnews.go.com/Technology/wireStory/us-clears-living-drug-tough-childhood-leukemia-49513403: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/us-clears-living-drug-tough-childhood-leukemia-49513403


Processing URLs:  89%|████████▉ | 888/1000 [38:18<06:37,  3.55s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-07-12/wall-street-s-biggest-venezuela-bond-contrarian-feels-vindicated


Processing URLs:  89%|████████▉ | 893/1000 [38:25<04:00,  2.24s/it]

Error extracting text from http://nation.com.pk/national/27-Mar-2017/pakistan-should-pay-785-mn-for-south-asian-university-saarc-members: 503 Server Error: Backend fetch failed for url: https://www.nation.com.pk/national/27-Mar-2017/pakistan-should-pay-785-mn-for-south-asian-university-saarc-members
URL filtered: https://www.youtube.com/watch?v=ihD3CaLpcRE


Processing URLs:  90%|████████▉ | 895/1000 [38:26<02:24,  1.37s/it]

Error extracting text from https://larswericson.wordpress.com/2015/10/09/my-voting-record-on-will-the-book-superforecasting-2015-be-on-the-new-york-times-bestsellers-list-by-the-end-of-october/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/10/09/my-voting-record-on-will-the-book-superforecasting-2015-be-on-the-new-york-times-bestsellers-list-by-the-end-of-october/


Processing URLs:  90%|████████▉ | 896/1000 [38:27<02:20,  1.35s/it]

URL filtered: http://www.foxnews.com/politics/2016/10/11/wikileaks-clinton-campaign-in-twitter-war-over-latest-leaks.html


Processing URLs:  90%|█████████ | 903/1000 [38:34<01:21,  1.18it/s]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/1982505979x0x889927/27EE2FDA-9C77-4D6A-8CEE-E8DFE45227BA/Q1_2016_Tesla_Shareholder_Letter.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/1982505979x0x889927/27EE2FDA-9C77-4D6A-8CEE-E8DFE45227BA/Q1_2016_Tesla_Shareholder_Letter.pdf
Error extracting text from http://www.wsj.com/articles/iraqi-city-of-fallujah-fully-liberated-from-islamic-state-iraqicommander-says-1466934423: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-city-of-fallujah-fully-liberated-from-islamic-state-iraqicommander-says-1466934423


Processing URLs:  91%|█████████ | 909/1000 [38:41<01:46,  1.17s/it]

Error extracting text from http://www.nytimes.com/2016/04/29/world/middleeast/with-iraq-mired-in-turmoil-some-call-for-partitioning-the-country.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/29/world/middleeast/with-iraq-mired-in-turmoil-some-call-for-partitioning-the-country.html


Processing URLs:  91%|█████████▏| 913/1000 [38:47<01:37,  1.12s/it]

Error extracting text from https://www.thestreet.com/story/14062147/1/trump-antitrust-pick-approved-of-at-amp-t-time-warner-merger-in-the-past.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/14062147/1/trump-antitrust-pick-approved-of-at-amp-t-time-warner-merger-in-the-past.html


Processing URLs:  92%|█████████▏| 916/1000 [38:50<01:38,  1.17s/it]

Error extracting text from https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/85775b52-ec99-4ad3-bbee-14826bdf86e5.pdf: 404 Client Error: Not Found for url: https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/85775b52-ec99-4ad3-bbee-14826bdf86e5.pdf


Processing URLs:  92%|█████████▏| 917/1000 [38:50<01:16,  1.08it/s]

Error extracting text from http://www.cdm.me/english/snp-submitted-a-resolution-on-the-referendum-on-nato: 403 Client Error: Forbidden for url: https://www.cdm.me/english/snp-submitted-a-resolution-on-the-referendum-on-nato


Processing URLs:  92%|█████████▏| 923/1000 [38:58<01:25,  1.12s/it]

Error extracting text from http://www.wsj.com/articles/germanys-angela-merkel-becomes-unexpected-greek-ally-in-migrant-crisis-1456773578Print: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germanys-angela-merkel-becomes-unexpected-greek-ally-in-migrant-crisis-1456773578Print


Processing URLs:  92%|█████████▏| 924/1000 [39:01<01:56,  1.54s/it]

Error extracting text from http://en.trend.az/business/economy/2506204.html: 404 Client Error: Not Found for url: https://www.trend.az/business/economy/2506204.html
URL filtered: http://www.bloomberg.com/news/articles/2015-10-26/how-a-fed-rate-hike-could-actually-stimulate-the-u-s-economy
Error extracting text from http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=18m: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=18m
URL filtered: http://www.bloomberg.com/news/articles/2015-09-08/these-three-charts-illustrate-the-fed-s-labor-market-dilemma


Processing URLs:  93%|█████████▎| 928/1000 [39:01<00:44,  1.61it/s]

Error extracting text from https://www.nytimes.com/2020/12/08/briefing/vaccine-don-gable-your-tuesday-briefing.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/08/briefing/vaccine-don-gable-your-tuesday-briefing.html


Processing URLs:  93%|█████████▎| 929/1000 [39:02<00:45,  1.56it/s]

Error extracting text from http://www.dailywire.com/news/11410/complete-list-radical-islamic-terror-attacks-us-james-barrett+: 404 Client Error: Not Found for url: https://www.dailywire.com/news/11410/complete-list-radical-islamic-terror-attacks-us-james-barrett+


Processing URLs:  93%|█████████▎| 931/1000 [39:10<02:32,  2.22s/it]

Error extracting text from http://rbth.com/news/2016/12/20/russia-iran-turkey-pledge-to-fight-against-islamic-state-nusra-together_664058: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/12/20/russia-iran-turkey-pledge-to-fight-against-islamic-state-nusra-together_664058


Processing URLs:  93%|█████████▎| 934/1000 [39:12<01:17,  1.17s/it]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#U85BV1okfsSbAGFM.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#U85BV1okfsSbAGFM.97
Error extracting text from http://www.nytimes.com/2017/12/10/world/asia/north-korea-submarine-missile.html?ribbon-ad-idx=7&amp;rref=world/asia&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=Asia%20Pacific&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/12/10/world/asia/north-korea-submarine-missile.html?ribbon-ad-idx=7&amp;rref=world/asia&amp;module=Ribbon&amp;version=context&amp;region=Header&amp;action=click&amp;contentCollection=Asia%20Pacific&amp;pgtype=article


Processing URLs:  94%|█████████▎| 937/1000 [39:13<00:34,  1.81it/s]

Error extracting text from https://www.nytimes.com/2017/06/21/world/asia/china-vietnam-south-china-sea.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/21/world/asia/china-vietnam-south-china-sea.html
Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0M9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0M9


Processing URLs:  94%|█████████▍| 938/1000 [40:13<18:37, 18.02s/it]

Error extracting text from http://www.ledger-enquirer.com/news/business/article131254869.html#storylink=cpy: HTTPConnectionPool(host='www.ledger-enquirer.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  94%|█████████▍| 942/1000 [40:22<06:04,  6.28s/it]

URL filtered: https://www.youtube.com/watch?v=zHGZtRKTUnU


Processing URLs:  94%|█████████▍| 944/1000 [40:23<03:23,  3.64s/it]

Error extracting text from http://www.reuters.com/article/us-usa-afghanistan-talks-idUSKCN0YN5T1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-afghanistan-talks-idUSKCN0YN5T1


Processing URLs:  95%|█████████▍| 947/1000 [40:25<01:43,  1.96s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-12-22/biden-camp-adds-pressure-on-eu-to-hit-the-brakes-on-china-deal


Processing URLs:  95%|█████████▌| 950/1000 [40:28<01:19,  1.58s/it]

URL filtered: https://www.linkedin.com/pulse/fake-news-reporting-feature-now-available-new-zealand-vaughn-davis


Processing URLs:  95%|█████████▌| 953/1000 [40:29<00:42,  1.11it/s]

Error extracting text from https://www.transparency.org/country/#SAU: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#SAU
Error extracting text from http://www.balkaninsight.com/en/article/orthodox-montenegrins-celebrate-christmas-divided-01-07-2016-1#sthash.JHtTwthf.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/orthodox-montenegrins-celebrate-christmas-divided-01-07-2016-1#sthash.JHtTwthf.dpuf


Processing URLs:  95%|█████████▌| 954/1000 [40:29<00:32,  1.40it/s]

Error extracting text from http://www.cbi.org.uk/news/stay-in-single-market-and-a-customs-union-until-final-deal-in-force/: 403 Client Error: Forbidden for url: http://www.cbi.org.uk/news/stay-in-single-market-and-a-customs-union-until-final-deal-in-force/


Processing URLs:  96%|█████████▌| 956/1000 [40:31<00:40,  1.09it/s]

Error extracting text from http://www.newsweek.com/diabetes-drug-could-be-anti-aging-miracle-404370: 403 Client Error: Forbidden for url: https://www.newsweek.com/diabetes-drug-could-be-anti-aging-miracle-404370


Processing URLs:  96%|█████████▌| 957/1000 [40:33<00:47,  1.11s/it]

Error extracting text from https://www.stopkillerrobots.org/2017/05/diplomatsfalter/: 404 Client Error: Not Found for url: https://www.stopkillerrobots.org/2017/05/diplomatsfalter/


Processing URLs:  96%|█████████▌| 961/1000 [40:36<00:28,  1.37it/s]

Error extracting text from http://www.intelligenceonline.com/government-intelligence/grey-areas/2016/03/02/is-puts-larfarge-in-cement-boots,108132550-ART: 403 Client Error: Forbidden for url: https://www.intelligenceonline.com/government-intelligence/grey-areas/2016/03/02/is-puts-larfarge-in-cement-boots,108132550-ART


Processing URLs:  97%|█████████▋| 966/1000 [40:43<00:48,  1.42s/it]

Error extracting text from http://www.businessinsider.com/r-syria-ceasefire-approaches-with-assad-emboldened-opposition-wary-2016-9: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-syria-ceasefire-approaches-with-assad-emboldened-opposition-wary-2016-9


Processing URLs:  97%|█████████▋| 968/1000 [40:45<00:42,  1.33s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-01/24/c_135040802.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-01/24/c_135040802.htm


Processing URLs:  97%|█████████▋| 970/1000 [40:48<00:38,  1.28s/it]

Error extracting text from http://www.brasil247.com/pt/247/poder/252566/54-senadores-j%C3%A1-se-declaram-a-favor-do-impeachment.htm: 404 Client Error: Not Found for url: https://www.brasil247.com:443/redirecting/extra/number/54


Processing URLs:  97%|█████████▋| 974/1000 [40:55<00:37,  1.43s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-un-idUSKCN0W34QN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-un-idUSKCN0W34QN


Processing URLs:  98%|█████████▊| 976/1000 [40:56<00:23,  1.01it/s]

Error extracting text from https://www.parliament.scot/ResearchBriefingsAndFactsheets/S5/SB_16-34_Election_2016.pdf: 403 Client Error: Forbidden for url: https://www.parliament.scot/ResearchBriefingsAndFactsheets/S5/SB_16-34_Election_2016.pdf
Error extracting text from http://www.france24.com/en/20160510-interview-adel-al-jubeir-saudi-fm-arabia-syria-assad-yemen-iran-israel: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160510-interview-adel-al-jubeir-saudi-fm-arabia-syria-assad-yemen-iran-israel


Processing URLs:  98%|█████████▊| 978/1000 [40:58<00:19,  1.16it/s]

Error extracting text from https://www.reuters.com/business/energy/oil-resumes-climb-large-us-oil-stocks-drawdown-2021-06-23/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-resumes-climb-large-us-oil-stocks-drawdown-2021-06-23/


Processing URLs:  98%|█████████▊| 979/1000 [40:58<00:14,  1.46it/s]

Error extracting text from https://www.yahoo.com/news/un-report-iran-deal-compliance-likely-saturday-sources-200039241.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/un-report-iran-deal-compliance-likely-saturday-sources-200039241.html


Processing URLs:  99%|█████████▊| 986/1000 [41:07<00:11,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKCN0VX1NC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKCN0VX1NC


Processing URLs:  99%|█████████▊| 987/1000 [41:08<00:11,  1.18it/s]

Error extracting text from http://www.caam.org.cn/zhengceyanjiu/20151216/1605181551.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/zhengceyanjiu/20151216/1605181551.html


Processing URLs:  99%|█████████▉| 989/1000 [41:10<00:10,  1.03it/s]

Error extracting text from http://www.rp-online.de/politik/deutschland/spd-mitgliedervotum-zur-groko-verfassungsgericht-weist-eilantraege-ab-aid-1.7372442: 410 Client Error: Gone for url: http://www.rp-online.de/politik/deutschland/spd-mitgliedervotum-zur-groko-verfassungsgericht-weist-eilantraege-ab-aid-1.7372442
URL filtered: https://www.youtube.com/watch?v=Z82oZIoGNTE


Processing URLs:  99%|█████████▉| 991/1000 [41:11<00:05,  1.59it/s]

Error extracting text from http://www.newzimbabwe.com/news-28301-Burundi+on+brink+of+massive+violence+UN/news.aspx: 403 Client Error: Forbidden for url: http://www.newzimbabwe.com/news-28301-Burundi+on+brink+of+massive+violence+UN/news.aspx


Processing URLs:  99%|█████████▉| 992/1000 [41:12<00:06,  1.32it/s]

Error extracting text from http://www.nytimes.com/2015/12/31/world/europe/russia-putin-turkey-sanctions.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/31/world/europe/russia-putin-turkey-sanctions.html?_r=0


Processing URLs:  99%|█████████▉| 994/1000 [42:16<01:46, 17.74s/it]

Error extracting text from http://www.hortibiz.com/item/news/rus-turkish-tomato-ban-to-be-partially-lifted/: 404 Client Error: Not Found for url: https://www.hortibiz.com/item/news/rus-turkish-tomato-ban-to-be-partially-lifted/


Processing URLs: 100%|█████████▉| 997/1000 [42:17<00:20,  6.76s/it]

Error extracting text from http://www.ndb.int/Brics-bank-carves-100-green-niche-with-first-loan-rollout.php: 403 Client Error: Forbidden for url: https://www.ndb.int/Brics-bank-carves-100-green-niche-with-first-loan-rollout.php
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=fa&amp;u=http://www.tabnak.ir/fa/news/570076/: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=fa&amp;u=http://www.tabnak.ir/fa/news/570076/


Processing URLs: 100%|█████████▉| 999/1000 [42:20<00:03,  3.99s/it]

Error extracting text from https://www.nytimes.com/2017/07/22/world/asia/china-xi-jinping-sun-zhengcai-chongqing-.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/22/world/asia/china-xi-jinping-sun-zhengcai-chongqing-.html?_r=0


Processing URLs: 100%|██████████| 1000/1000 [42:22<00:00,  2.54s/it]


Error extracting text from http://www.militarytimes.com/story/military/2016/07/31/retaking-iraqs-isis-held-mosul-likely-prove-tricky-costly/87883640/: 404 Client Error: Not Found for url: https://www.militarytimes.com/story/military/2016/07/31/retaking-iraqs-isis-held-mosul-likely-prove-tricky-costly/87883640/


Processing URLs:   0%|          | 1/1000 [00:00<05:48,  2.87it/s]

Error extracting text from http://thehill.com/: 403 Client Error: Forbidden for url: https://thehill.com/


Processing URLs:   1%|          | 8/1000 [00:20<27:42,  1.68s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-haley-idUSKBN1712QL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-haley-idUSKBN1712QL


Processing URLs:   1%|          | 11/1000 [00:34<55:13,  3.35s/it]  

Error extracting text from https://fcw.com/articles/2017/11/06/estonia-cyber-johnson.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2017/11/06/estonia-cyber-johnson.aspx/


Processing URLs:   1%|          | 12/1000 [00:34<41:28,  2.52s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-cyber-russia/senator-says-russian-internet-trolls-stoked-nfl-debate-idUSKCN1C237J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-cyber-russia/senator-says-russian-internet-trolls-stoked-nfl-debate-idUSKCN1C237J


Processing URLs:   2%|▏         | 15/1000 [00:37<26:48,  1.63s/it]

Error extracting text from https://blog.boomsupersonic.com/an-insiders-look-at-flight-test-q-a-with-boom-s-chief-flight-test-engineer-c60e5d00b876: 403 Client Error: Forbidden for url: https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Fblog.boomsupersonic.com%2Fan-insiders-look-at-flight-test-q-a-with-boom-s-chief-flight-test-engineer-c60e5d00b876


Processing URLs:   2%|▏         | 16/1000 [00:53<1:27:28,  5.33s/it]

Error extracting text from https://www.almasdarnews.com/article/three-settlements-syria-join-ceasefire/: 522 Server Error:  for url: https://www.almasdarnews.com/article/three-settlements-syria-join-ceasefire/


Processing URLs:   2%|▏         | 24/1000 [01:12<38:08,  2.34s/it]  

Error extracting text from http://thehill.com/blogs/pundits-blog/foreign-policy/319864-president-trump-makes-nuclear-mistake-on-arms-control: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/foreign-policy/319864-president-trump-makes-nuclear-mistake-on-arms-control/


Processing URLs:   3%|▎         | 28/1000 [01:35<1:51:24,  6.88s/it]

Error extracting text from http://www.recode.net/2016/12/14/13955818/amazon-drone-delivery-uk-us-faa-testing: Exceeded 30 redirects.
URL filtered: https://twitter.com/hashtag/caucusforcruz


Processing URLs:   3%|▎         | 30/1000 [01:36<1:02:00,  3.84s/it]

Error extracting text from http://www.rand.org/blog/2016/10/can-the-islamic-state-lose-mosul-and-still-win.html?adbsc=social_20161106_1091671&amp;adbid=795079509196144640&amp;adbpl=tw&amp;adbpr=22545453: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2016/10/can-the-islamic-state-lose-mosul-and-still-win.html?adbsc=social_20161106_1091671&amp;adbid=795079509196144640&amp;adbpl=tw&amp;adbpr=22545453


Processing URLs:   4%|▎         | 35/1000 [01:46<46:09,  2.87s/it]  

Error extracting text from http://www.theaustralian.com.au/news/latest-news/iran-may-freeze-oil-production-opec/news-story/e0aa600c4a836d6a810918a8c4702b29: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/404.php


Processing URLs:   4%|▎         | 36/1000 [01:49<43:24,  2.70s/it]

URL filtered: https://www.youtube.com/watch?v=0rF5XftjRGM


Processing URLs:   4%|▍         | 42/1000 [01:55<21:43,  1.36s/it]

Error extracting text from http://cci.mit.edu/mciresearchpage.html: 404 Client Error: Not Found for url: https://cci.mit.edu/mciresearchpage.html
Error extracting text from http://www.nato.int/cps/en/natohq/topics_110496.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_110496.htm


Processing URLs:   4%|▍         | 43/1000 [01:56<21:31,  1.35s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/china-expresses-concern-over-reported-duterte-south-china-sea-comments: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:   5%|▍         | 46/1000 [01:59<13:56,  1.14it/s]

Error extracting text from http://gr.ferhatbingol.com/2010/07/28/turks-with-green-passport-can-travel-to-greece-without-visa/: HTTPConnectionPool(host='gr.ferhatbingol.com', port=80): Max retries exceeded with url: /2010/07/28/turks-with-green-passport-can-travel-to-greece-without-visa/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304921df0>: Failed to resolve 'gr.ferhatbingol.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▍         | 47/1000 [02:00<16:26,  1.03s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN11A08M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN11A08M


Processing URLs:   5%|▌         | 50/1000 [02:02<12:54,  1.23it/s]

Error extracting text from https://preview.redd.it/anis8xmso4w81.jpg?width=768&amp;auto=webp&amp;s=08d813a52720bb6ebac3fb46979803d3b3a70e4e: 403 Client Error: Forbidden for url: https://preview.redd.it/anis8xmso4w81.jpg?width=768&amp;auto=webp&amp;s=08d813a52720bb6ebac3fb46979803d3b3a70e4e


Processing URLs:   5%|▌         | 52/1000 [02:04<12:54,  1.22it/s]

Error extracting text from http://www.defense.gov/News-Article-View/Article/778192/oir-commander-isil-campaign-shifts-toward-taking-mosul-and-raqqa: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/778192/oir-commander-isil-campaign-shifts-toward-taking-mosul-and-raqqa


Processing URLs:   5%|▌         | 54/1000 [02:07<15:06,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-usa-congress-perry-idUSKBN1531OZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-perry-idUSKBN1531OZ?il=0


Processing URLs:   6%|▌         | 56/1000 [02:10<22:23,  1.42s/it]

Error extracting text from http://blogs.wsj.com/economics/2015/09/17/parsing-the-fed-how-the-september-statement-changed-from-july/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/economics/2015/09/17/parsing-the-fed-how-the-september-statement-changed-from-july/
URL filtered: https://www.facebook.com/KeikoSofiaFujimoriHiguchi/photos/a.465409801595.253995.291182786595/10153484133926596/?type=3&amp;theater


Processing URLs:   6%|▌         | 60/1000 [02:12<12:19,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-china-parliament-idUSKBN16C007: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-parliament-idUSKBN16C007


Processing URLs:   6%|▌         | 61/1000 [02:13<13:18,  1.18it/s]

Error extracting text from https://www.ipos.me/en/polls/2016/02/24/elections/: 403 Client Error: Forbidden for url: https://www.ipos.me/en/polls/2016/02/24/elections/
URL filtered: https://twitter.com/BCAppelbaum/status/667055736556535808


Processing URLs:   6%|▋         | 63/1000 [02:14<10:57,  1.43it/s]

Error extracting text from http://finance.yahoo.com/news/two-tesla-production-chiefs-leave-163510735.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/two-tesla-production-chiefs-leave-163510735.html


Processing URLs:   7%|▋         | 70/1000 [02:32<52:55,  3.41s/it]

Error extracting text from http://www.scmagazine.com/us-air-force-cyberspace-weapon-first-to-reach-full-operational-status/article/466931/: 404 Client Error: Not Found for url: https://www.scmagazine.com/news/us-air-force-cyberspace-weapon-first-to-reach-full-operational-status


Processing URLs:   7%|▋         | 71/1000 [02:33<39:47,  2.57s/it]

Error extracting text from http://thehill.com/business-a-lobbying/347878-ex-cia-head-complained-lawmakers-didnt-get-gravity-of-russian: 403 Client Error: Forbidden for url: https://thehill.com/business-a-lobbying/347878-ex-cia-head-complained-lawmakers-didnt-get-gravity-of-russian/


Processing URLs:   7%|▋         | 74/1000 [02:34<18:24,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKCN18B02Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKCN18B02Y


Processing URLs:   8%|▊         | 75/1000 [02:35<14:44,  1.05it/s]

Error extracting text from http://www.manta.com/c/mm7skmv/orion-strategies: 403 Client Error: Forbidden for url: https://www.manta.com/c/mm7skmv/orion-strategies


Processing URLs:   8%|▊         | 77/1000 [02:36<10:40,  1.44it/s]

Error extracting text from http://www.nytimes.com/2007/11/08/arts/08soloway.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2007/11/08/arts/08soloway.html


Processing URLs:   8%|▊         | 80/1000 [02:40<16:28,  1.07s/it]

Error extracting text from http://www.iflscience.com/first-outbreak-polio-virus-africa-year: 404 Client Error: Not Found for url: https://www.iflscience.com/first-outbreak-polio-virus-africa-year
Error extracting text from https://www.reuters.com/article/idUSKBN2B71V7?il=0&amp;utm_source=reddit.com: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN2B71V7?il=0&amp;utm_source=reddit.com


Processing URLs:   9%|▊         | 86/1000 [02:46<15:37,  1.03s/it]

URL filtered: https://twitter.com/bscholl/status/1387604718264872961


Processing URLs:   9%|▉         | 89/1000 [02:47<08:56,  1.70it/s]

Error extracting text from http://www.nytimes.com/2016/01/29/opinion/moderates-under-pressure-in-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/29/opinion/moderates-under-pressure-in-iran.html


Processing URLs:   9%|▉         | 90/1000 [02:48<09:35,  1.58it/s]

Error extracting text from http://bigstory.ap.org/article/5dbd4d46948042468fa065c576c5e805/nato-chief-russia-interference-boosts-montenegro-chances: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/5dbd4d46948042468fa065c576c5e805/nato-chief-russia-interference-boosts-montenegro-chances (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303203800>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 94/1000 [02:50<08:24,  1.79it/s]

Error extracting text from http://www.nytimes.com/2015/09/19/world/europe/us-to-begin-military-talks-with-russia-on-syria.html?smid=fb-nytimes&amp;smtyp=cur&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/19/world/europe/us-to-begin-military-talks-with-russia-on-syria.html?smid=fb-nytimes&amp;smtyp=cur&amp;_r=0
Error extracting text from http://www.reuters.com/article/2015/10/08/brazil-court-cunha-idUSE6N10I06C20151008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/08/brazil-court-cunha-idUSE6N10I06C20151008
URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/brazil-s-lower-house-head-cunha-accepts-impeachment-request


Processing URLs:  10%|▉         | 96/1000 [02:51<05:16,  2.85it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0WQ0IR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0WQ0IR


Processing URLs:  10%|▉         | 97/1000 [02:52<07:51,  1.92it/s]

Error extracting text from https://2sjjwunnql41ia7ki31qqub1-wpengine.netdna-ssl.com/wp-content/uploads/2021/04/Final_38028217-Scotland-Poll-Scotsman-20210428_Private__.pdf: HTTPSConnectionPool(host='2sjjwunnql41ia7ki31qqub1-wpengine.netdna-ssl.com', port=443): Max retries exceeded with url: /wp-content/uploads/2021/04/Final_38028217-Scotland-Poll-Scotsman-20210428_Private__.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3032020f0>: Failed to resolve '2sjjwunnql41ia7ki31qqub1-wpengine.netdna-ssl.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|█         | 102/1000 [03:00<18:41,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-nuclearpower-cyber-germany-idUSKCN0XN2OS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nuclearpower-cyber-germany-idUSKCN0XN2OS


Processing URLs:  10%|█         | 105/1000 [03:04<18:25,  1.24s/it]

Error extracting text from https://www.kansascityfed.org/~/media/files/publicat/econrev/econrevarchive/2015/3q15davigetal.pdf: 403 Client Error: Forbidden for url: https://www.kansascityfed.org/~/media/files/publicat/econrev/econrevarchive/2015/3q15davigetal.pdf


Processing URLs:  11%|█         | 110/1000 [03:10<15:40,  1.06s/it]

Error extracting text from http://www.nytimes.com/2016/01/23/world/europe/russians-anxiety-swells-as-oil-prices-collapse.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/23/world/europe/russians-anxiety-swells-as-oil-prices-collapse.html


Processing URLs:  11%|█         | 111/1000 [03:10<13:25,  1.10it/s]

Error extracting text from http://thehill.com/homenews/senate/261246-obstacles-imperil-year-end-budget-deal: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/261246-obstacles-imperil-year-end-budget-deal/


Processing URLs:  11%|█         | 112/1000 [03:11<11:32,  1.28it/s]

Error extracting text from http://africanarguments.org/2016/10/07/ethiopia-how-popular-uprising-became-the-only-option/: 403 Client Error: Forbidden for url: http://africanarguments.org/2016/10/07/ethiopia-how-popular-uprising-became-the-only-option/


Processing URLs:  11%|█▏        | 114/1000 [03:12<10:44,  1.38it/s]

Error extracting text from https://www.newsweek.com/us-base-attack-forces-joe-biden-juggle-retaliation-iran-nuclear-deal-hopes-1569887: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-base-attack-forces-joe-biden-juggle-retaliation-iran-nuclear-deal-hopes-1569887


Processing URLs:  13%|█▎        | 126/1000 [03:39<22:32,  1.55s/it]

Error extracting text from https://efficientgov.com/blog/2017/11/02/phoenix-self-driving-cars-waymo/: 404 Client Error: Not Found for url: https://www.gov1.com/blog/2017/11/02/phoenix-self-driving-cars-waymo/
Error extracting text from http://www.cdm.me/english/white-house-has-started-a-process-of-ratifying-montenegros-protocol-for-nato: 403 Client Error: Forbidden for url: https://www.cdm.me/english/white-house-has-started-a-process-of-ratifying-montenegros-protocol-for-nato


Processing URLs:  13%|█▎        | 133/1000 [03:51<21:30,  1.49s/it]

Error extracting text from http://www.themoscowtimes.com/article/549908.html: 404 Client Error: Not Found for url: https://www.themoscowtimes.com/article/549908.html


Processing URLs:  14%|█▎        | 135/1000 [03:58<32:32,  2.26s/it]

Error extracting text from https://www.timesofisrael.com/liveblog-may-20-2021/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/liveblog-may-20-2021/


Processing URLs:  14%|█▍        | 138/1000 [04:00<17:13,  1.20s/it]

Error extracting text from http://www.latimes.com/world/la-fg-korean-war-20170925-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-korean-war-20170925-story.html


Processing URLs:  14%|█▍        | 140/1000 [04:03<20:19,  1.42s/it]

Error extracting text from http://www.peruthisweek.com/news-ppk-keiko-fujimori-tie-109434: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-ppk-keiko-fujimori-tie-109434 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5cb30>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  14%|█▍        | 142/1000 [04:05<14:20,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN14W0MC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN14W0MC


Processing URLs:  14%|█▍        | 145/1000 [04:10<19:09,  1.34s/it]

URL filtered: http://www.bloombergview.com/articles/2015-10-23/ted-cruz-has-a-ben-carson-problem


Processing URLs:  15%|█▌        | 152/1000 [04:35<49:05,  3.47s/it]  

Error extracting text from http://www.thezensite.com/ZenTeachings/Dogen_Teachings/Uji_Welch.htm: 412 Client Error: Precondition Failed for url: http://www.thezensite.com/ZenTeachings/Dogen_Teachings/Uji_Welch.htm


Processing URLs:  16%|█▌        | 158/1000 [04:43<18:49,  1.34s/it]

URL filtered: https://twitter.com/ianbremmer/status/1426871735673098244


Processing URLs:  16%|█▌        | 161/1000 [04:44<09:13,  1.51it/s]

Error extracting text from http://thehill.com/homenews/senate/361633-trump-senate-gop-at-odds-over-roy-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/361633-trump-senate-gop-at-odds-over-roy-moore/
Error extracting text from http://www.reuters.com/article/2015/12/02/usa-fed-lockhart-idUSL1N13R11U20151202#qXcXWAZtLrGw3Uj4.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/02/usa-fed-lockhart-idUSL1N13R11U20151202#qXcXWAZtLrGw3Uj4.99


Processing URLs:  16%|█▌        | 162/1000 [04:44<08:39,  1.61it/s]

Error extracting text from http://thehill.com/homenews/campaign/364383-ala-drama-nears-an-explosive-end: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/364383-ala-drama-nears-an-explosive-end/
URL filtered: https://twitter.com/ZekeJMiller?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  16%|█▋        | 164/1000 [04:45<07:11,  1.94it/s]

Error extracting text from http://www.pravdareport.com/world/americas/12-01-2017/136613-venezuela-0/#sthash.0guBKUfS.dpuf: 404 Client Error: Not Found for url: https://www.pravda.ru/world/americas/12-01-2017/136613-venezuela-0/#sthash.0guBKUfS.dpuf


Processing URLs:  17%|█▋        | 170/1000 [04:53<13:41,  1.01it/s]

Error extracting text from https://www.yahoo.com/news/judgment-vote-looms-brazils-rousseff-135018622.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/judgment-vote-looms-brazils-rousseff-135018622.html
Error extracting text from http://www.reuters.com/article/us-iran-missiles-idUSKCN0WA0UY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-idUSKCN0WA0UY


Processing URLs:  17%|█▋        | 172/1000 [04:54<11:18,  1.22it/s]

Error extracting text from http://uk.reuters.com/article/2015/10/29/uk-russia-economy-idUKKCN0SN1OA20151029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  17%|█▋        | 174/1000 [04:55<10:43,  1.28it/s]

Error extracting text from http://www.caam.org.cn/hangye/20160913/1105198655.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20160913/1105198655.html


Processing URLs:  18%|█▊        | 175/1000 [04:55<08:39,  1.59it/s]

Error extracting text from http://www.wsj.com/articles/eu-slaps-new-tariffs-on-china-taiwan-steel-imports-1485518002: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-slaps-new-tariffs-on-china-taiwan-steel-imports-1485518002


Processing URLs:  18%|█▊        | 181/1000 [05:07<23:06,  1.69s/it]

Error extracting text from https://80000hours.org/2017/11/prof-tetlock-predicting-the-future/: 403 Client Error: Forbidden for url: https://80000hours.org/2017/11/prof-tetlock-predicting-the-future/


Processing URLs:  18%|█▊        | 182/1000 [05:10<27:14,  2.00s/it]

Error extracting text from https://priceonomics.com/the-trade-of-the-century-when-george-soros-broke/: 403 Client Error: Forbidden for url: https://priceonomics.com/the-trade-of-the-century-when-george-soros-broke/


Processing URLs:  18%|█▊        | 183/1000 [05:11<23:22,  1.72s/it]

Error extracting text from http://time.com/4165875/iran-saudi-rivalry/: 404 Client Error: Not Found for url: https://time.com/4165875/iran-saudi-rivalry/


Processing URLs:  18%|█▊        | 185/1000 [05:15<25:28,  1.88s/it]

Error extracting text from https://www.reuters.com/markets/commodities/oil-rises-1-ahead-opec-meeting-under-omicron-cloud-2021-12-01/).: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/markets/commodities/oil-rises-1-ahead-opec-meeting-under-omicron-cloud-2021-12-01/).


Processing URLs:  19%|█▉        | 188/1000 [05:18<17:47,  1.31s/it]

Error extracting text from http://in.reuters.com/article/usa-fed-bangladesh-governor-idINKCN0WH0GW?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2AAfPak%20Daily%20Brief: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
URL filtered: http://www.bloomberg.com/news/articles/2016-03-13/fujimori-widens-lead-in-peru-president-race-after-rivals-barred
Error extracting text from https://www.reuters.com/article/us-germany-politics/german-parties-agree-on-cleaner-car-engines-as-coalition-talks-progress-idUSKBN1EZ10V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-parties-agree-on-cleaner-car-engines-as-coalition-talks-progress-idUSKBN1EZ10V


Processing URLs:  19%|█▉        | 194/1000 [05:25<19:53,  1.48s/it]

Error extracting text from https://english.ahram.org.eg/NewsContent/2/8/418064/World/Region/Israel-blames-Iran-over-lethal-attack-on-oil-tanke.aspx: 403 Client Error: Forbidden for url: https://english.ahram.org.eg/NewsContent/2/8/418064/World/Region/Israel-blames-Iran-over-lethal-attack-on-oil-tanke.aspx


Processing URLs:  20%|█▉        | 196/1000 [05:26<12:07,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/05/28/world/middleeast/iran-nuclear-deal-hassan-rouhani-donald-trump.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/28/world/middleeast/iran-nuclear-deal-hassan-rouhani-donald-trump.html?_r=0
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WG2IQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0WG2IQ


Processing URLs:  20%|█▉        | 198/1000 [05:28<14:02,  1.05s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950423001016: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950423001016 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304922840>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  20%|██        | 203/1000 [05:35<18:42,  1.41s/it]

URL filtered: https://www.youtube.com/watch?v=bqHPhIdoAK4


Processing URLs:  21%|██        | 206/1000 [05:37<14:13,  1.07s/it]

Error extracting text from http://english.aawsat.com/2016/05/article55351154/iranian-generals-prepare-military-escalation-near-damascus-aleppo: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/05/article55351154/iranian-generals-prepare-military-escalation-near-damascus-aleppo


Processing URLs:  21%|██        | 207/1000 [05:39<14:46,  1.12s/it]

URL filtered: https://www.youtube.com/watch?v=dKrVegVI0Us


Processing URLs:  21%|██▏       | 213/1000 [05:42<07:24,  1.77it/s]

Error extracting text from https://www.middleeastmonitor.com/20210531-israel-minister-leads-settler-raid-of-al-aqsa-mosque/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20210531-israel-minister-leads-settler-raid-of-al-aqsa-mosque/
Error extracting text from https://www.fbi.gov/history/famous-cases/oklahoma-city-bombing: 403 Client Error: Forbidden for url: https://www.fbi.gov/history/famous-cases/oklahoma-city-bombing


Processing URLs:  22%|██▏       | 216/1000 [05:47<13:33,  1.04s/it]

Error extracting text from https://www.reuters.com/business/energy/us-waive-sanctions-firm-ceo-behind-russias-nord-stream-2-pipeline-source-2021-05-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/us-waive-sanctions-firm-ceo-behind-russias-nord-stream-2-pipeline-source-2021-05-19/


Processing URLs:  22%|██▏       | 219/1000 [05:49<08:22,  1.55it/s]

Error extracting text from http://www.reuters.com/article/2015/11/12/us-usa-fed-dudley-idUSKCN0T129520151112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/12/us-usa-fed-dudley-idUSKCN0T129520151112


Processing URLs:  22%|██▏       | 221/1000 [05:50<07:10,  1.81it/s]

Error extracting text from http://seekingalpha.com/article/3736116-opec-agrees-to-disagree-crude-oil-holds-on-to-its-40-price-support-oil-and-gas-mlps-at-all-time-low?ifp=0: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3736116-opec-agrees-to-disagree-crude-oil-holds-on-to-its-40-price-support-oil-and-gas-mlps-at-all-time-low?ifp=0
Error extracting text from http://www.nytimes.com/2015/10/01/world/europe/russia-airstrikes-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/world/europe/russia-airstrikes-syria.html


Processing URLs:  22%|██▏       | 222/1000 [05:53<16:52,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds-payment/venezuela-pdvsa-bondholders-to-receive-late-payment-by-thursday-sources-idUSKBN1D02U1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-payment/venezuela-pdvsa-bondholders-to-receive-late-payment-by-thursday-sources-idUSKBN1D02U1
URL filtered: https://www.fool.com/investing/2021/05/21/what-silvergates-partnership-with-facebook-backed/


Processing URLs:  23%|██▎       | 226/1000 [05:54<09:07,  1.41it/s]

Error extracting text from https://www.wsj.com/articles/oil-prices-fall-as-investors-wait-for-opec-cuts-to-take-effect-1496141129: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-prices-fall-as-investors-wait-for-opec-cuts-to-take-effect-1496141129


Processing URLs:  23%|██▎       | 231/1000 [05:56<05:40,  2.26it/s]

URL filtered: https://www.youtube.com/watch?v=Lfvm09_Dtyo


Processing URLs:  23%|██▎       | 234/1000 [05:58<05:40,  2.25it/s]

Error extracting text from http://www.reuters.com/article/us-usa-russia-treaty-idUSKBN17220Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-treaty-idUSKBN17220Q


Processing URLs:  24%|██▎       | 235/1000 [05:59<06:14,  2.04it/s]



Processing URLs:  24%|██▎       | 237/1000 [06:59<2:44:51, 12.96s/it]

Error extracting text from https://archive.is/4NlCX#selection-599.221-599.328: HTTPSConnectionPool(host='archive.is', port=443): Read timed out. (read timeout=60)


Processing URLs:  24%|██▍       | 238/1000 [07:59<5:04:17, 23.96s/it]

Error extracting text from http://kremlin.ru/events/president/news/65942: HTTPConnectionPool(host='kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  24%|██▍       | 243/1000 [08:05<1:14:18,  5.89s/it]

Error extracting text from https://www.nytimes.com/2017/04/26/us/politics/nafta-executive-order-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/26/us/politics/nafta-executive-order-trump.html


Processing URLs:  25%|██▍       | 249/1000 [08:13<20:00,  1.60s/it]  

Error extracting text from https://www.npd.com/wps/portal/npd/us/news/press-releases/2021/after-a-slow-start--u-s--print-book-sales-rose-8-2-percent-in-2020--the-npd-group-says/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/news/press-releases/2021/after-a-slow-start--u-s--print-book-sales-rose-8-2-percent-in-2020--the-npd-group-says/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:  25%|██▌       | 251/1000 [08:16<16:10,  1.30s/it]

Error extracting text from http://aranews.net/2017/02/isis-says-turkeys-safe-zone-in-syria-aimed-at-preventing-kurdish-state-pressuring-assad/: 404 Client Error: Not Found for url: http://aranews.net/2017/02/isis-says-turkeys-safe-zone-in-syria-aimed-at-preventing-kurdish-state-pressuring-assad/


Processing URLs:  25%|██▌       | 254/1000 [08:22<18:56,  1.52s/it]

Error extracting text from http://aranews.net/2016/04/isis-jihadi-slaughtered-mosul-sexually-harassing-woman/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/isis-jihadi-slaughtered-mosul-sexually-harassing-woman/


Processing URLs:  26%|██▌       | 255/1000 [08:23<17:46,  1.43s/it]

URL filtered: https://www.youtube.com/watch?v=9waGXIEg4cg&amp;t=1220s


Processing URLs:  26%|██▌       | 260/1000 [08:26<09:20,  1.32it/s]

Error extracting text from http://evobsession.com/china-electric-car-sales-august-byd-takes-1-2-4/: 403 Client Error: Forbidden for url: http://evobsession.com/china-electric-car-sales-august-byd-takes-1-2-4/
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-china-idUSKBN1AJ0JZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-china-idUSKBN1AJ0JZ


Processing URLs:  26%|██▌       | 262/1000 [08:27<06:22,  1.93it/s]

Error extracting text from http://news.yahoo.com/japan-says-armed-chinese-vessel-spotted-off-disputed-085229513.html: 404 Client Error: Not Found for url: http://news.yahoo.com/japan-says-armed-chinese-vessel-spotted-off-disputed-085229513.html
URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/in-german-campaigning-an-establishment-firebrand-targets-merkel


Processing URLs:  27%|██▋       | 267/1000 [08:35<15:55,  1.30s/it]

Error extracting text from http://www.ibtimes.com/us-military-readiness-question-amid-calls-syria-invasion-against-isis-2221023: 403 Client Error: Forbidden for url: https://www.ibtimes.com/us-military-readiness-question-amid-calls-syria-invasion-against-isis-2221023


Processing URLs:  27%|██▋       | 270/1000 [08:38<14:01,  1.15s/it]

Error extracting text from https://bit.ly/3mD2CD4: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20201204-maduro-allies-set-for-poll-victory-with-guaido-on-the-ropes


Processing URLs:  27%|██▋       | 271/1000 [08:39<10:40,  1.14it/s]

Error extracting text from https://www.nytimes.com/2021/06/16/us/politics/stephen-breyer-supreme-court-retirement.html?searchResultPosition=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/16/us/politics/stephen-breyer-supreme-court-retirement.html?searchResultPosition=1


Processing URLs:  28%|██▊       | 275/1000 [08:40<05:13,  2.31it/s]

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/year-after-covid-vaccine-waiver-proposal-wto-talks-are-deadlocked-2021-10-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/year-after-covid-vaccine-waiver-proposal-wto-talks-are-deadlocked-2021-10-04/
Error extracting text from http://www.reuters.com/article/us-un-nuclear-usa-idUSKBN168468?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-nuclear-usa-idUSKBN168468?il=0
Error extracting text from http://www.nytimes.com/reuters/2016/01/21/world/21reuters-france-hollande.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/01/21/world/21reuters-france-hollande.html


Processing URLs:  28%|██▊       | 276/1000 [08:56<50:41,  4.20s/it]

Error extracting text from https://www.almasdarnews.com/article/video-assad-appears-alive-well-false-rumours-psychological-stress-stroke/: 522 Server Error:  for url: https://www.almasdarnews.com/article/video-assad-appears-alive-well-false-rumours-psychological-stress-stroke/


Processing URLs:  28%|██▊       | 278/1000 [08:58<33:32,  2.79s/it]

Error extracting text from https://www.theregister.co.uk/2017/02/22/spacex_capsule_iss_error/: 403 Client Error: Forbidden for url: https://www.theregister.com/2017/02/22/spacex_capsule_iss_error/


Processing URLs:  28%|██▊       | 283/1000 [09:08<24:01,  2.01s/it]

URL filtered: https://twitter.com/thomas_simy/status/932709414431322113


Processing URLs:  28%|██▊       | 285/1000 [09:08<14:14,  1.19s/it]

Error extracting text from https://nationalinterest.org/blog/buzz/putin-just-made-big-promise-russian-military-even-better-weapons-188888: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/buzz/putin-just-made-big-promise-russian-military-even-better-weapons-188888


Processing URLs:  29%|██▊       | 286/1000 [09:10<15:25,  1.30s/it]

Error extracting text from https://www.who.int/medicines/publications/druginformation/innlists/PL124-COVID.pdf: 404 Client Error: Not Found for url: https://www.who.int/medicines/publications/druginformation/innlists/PL124-COVID.pdf


Processing URLs:  30%|██▉       | 298/1000 [09:23<12:40,  1.08s/it]

Error extracting text from http://www.fxstreet.com/news/forex-news/article.aspx?storyid=f6ffe977-2743-4708-a282-084c9460e0d0: 410 Client Error: Gone for url: http://www.fxstreet.com/news/f6ffe977-2743-4708-a282-084c9460e0d0
URL filtered: http://www.politico.eu/article/peter-altmaier-chancellery-no-reason-for-plan-b-on-turkey-migration-deal-angela-merkel-adviser-immigration-refugees/?utm_content=bufferfae80&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer
URL filtered: http://english.yonhapnews.co.kr/news/2015/09/16/0200000000AEN20150916005100315.html?input=www.twitter.com


Processing URLs:  30%|███       | 302/1000 [09:29<15:26,  1.33s/it]

Error extracting text from http://www.eleccionesenperu.com/encuestas-presidenciales-peru.php: 436 Client Error:  for url: http://www.eleccionesenperu.com/encuestas-presidenciales-peru.php


Processing URLs:  30%|███       | 305/1000 [10:08<1:27:44,  7.57s/it]

Error extracting text from http://www.nytimes.com/2016/01/26/world/middleeast/un-envoy-for-syria-says-peace-talks-will-begin-friday.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/26/world/middleeast/un-envoy-for-syria-says-peace-talks-will-begin-friday.html


Processing URLs:  31%|███       | 308/1000 [10:09<36:06,  3.13s/it]  

Error extracting text from http://www.reuters.com/article/us-russia-putin-turkey-idUSKCN0XB0ZG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-putin-turkey-idUSKCN0XB0ZG


Processing URLs:  31%|███       | 309/1000 [10:10<28:30,  2.48s/it]

Error extracting text from https://balkaneu.com/north-macedonia-population-census-to-take-place-from-1-to-21-april-for-first-time-in-19-years/: 404 Client Error: Not Found for url: https://balkaneu.com/north-macedonia-population-census-to-take-place-from-1-to-21-april-for-first-time-in-19-years/


Processing URLs:  31%|███       | 312/1000 [10:12<15:05,  1.32s/it]

Error extracting text from http://www.latimes.com/opinion/op-ed/la-oe-shifrinson-russia-us-nato-deal--20160530-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/op-ed/la-oe-shifrinson-russia-us-nato-deal--20160530-snap-story.html


Processing URLs:  31%|███▏      | 314/1000 [10:16<19:09,  1.68s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Oct-17/376809-syria-security-chief-in-first-public-foreign-visit-to-egypt.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Oct-17/376809-syria-security-chief-in-first-public-foreign-visit-to-egypt.ashx


Processing URLs:  32%|███▏      | 315/1000 [10:17<17:22,  1.52s/it]

Error extracting text from https://usukraine.org/news/articles/ukraine-uses-turkish-drone-for-the-first-time-warns-of-renewed-buildup-of-russian-troops-on-border/NjA4MjE=): 404 Client Error: Not Found for url: https://usukraine.org/news/articles/ukraine-uses-turkish-drone-for-the-first-time-warns-of-renewed-buildup-of-russian-troops-on-border/NjA4MjE=)


Processing URLs:  32%|███▏      | 317/1000 [10:22<21:23,  1.88s/it]

Error extracting text from https://www.lesswrong.com/users/lsusr: 403 Client Error: Forbidden for url: https://www.lesswrong.com/users/lsusr


Processing URLs:  32%|███▏      | 319/1000 [10:39<1:05:48,  5.80s/it]

Error extracting text from http://www.nanosresearch.com/tickers/PDF/POLNAT-S15-T674.pdf: 522 Server Error:  for url: http://www.nanosresearch.com/tickers/PDF/POLNAT-S15-T674.pdf


Processing URLs:  32%|███▏      | 323/1000 [10:42<22:59,  2.04s/it]  

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/vaccines/recommendations-process.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/vaccines/recommendations-process.html


Processing URLs:  33%|███▎      | 330/1000 [11:03<21:23,  1.92s/it]

Error extracting text from http://www.nytimes.com/2017/01/02/science/spacex-launch-rockets-explosion.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/01/02/science/spacex-launch-rockets-explosion.html?_r=0


Processing URLs:  33%|███▎      | 332/1000 [11:28<1:19:52,  7.18s/it]

Error extracting text from http://ir.teslamotors.com/releasedetail.cfm?ReleaseID=963460: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /releasedetail.cfm?ReleaseID=963460 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30153b500>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  34%|███▎      | 335/1000 [11:29<34:42,  3.13s/it]  

Error extracting text from http://www.reuters.com/article/us-eu-google-antitrust-idUSKCN0XH0VX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-google-antitrust-idUSKCN0XH0VX
Error extracting text from https://www.reuters.com/article/us-taiwan-china-security-idUSKBN29S0BK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-taiwan-china-security-idUSKBN29S0BK


Processing URLs:  34%|███▍      | 340/1000 [11:33<15:24,  1.40s/it]

Error extracting text from https://t.co/6DhDNqSoaS&quot;&gt;https://t.co/6DhDNqSoaS&lt;/a&gt: 404 Client Error: Not Found for url: https://t.co/6DhDNqSoaS&quot;&gt;https://t.co/6DhDNqSoaS&lt;/a&gt


Processing URLs:  34%|███▍      | 342/1000 [11:38<20:13,  1.84s/it]

Error extracting text from http://marketrealist.com/2015/12/valuable-atmel-uks-dialog-semiconductor/: 404 Client Error: Not Found for url: https://marketrealist.com:443/2015/12/valuable-atmel-uks-dialog-semiconductor/


Processing URLs:  34%|███▍      | 344/1000 [11:43<21:46,  1.99s/it]

Error extracting text from https://www.nytimes.com/2017/07/22/us/politics/donald-trump-jeff-sessions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/22/us/politics/donald-trump-jeff-sessions.html


Processing URLs:  35%|███▍      | 347/1000 [11:46<13:11,  1.21s/it]

Error extracting text from http://www.nytimes.com/2016/05/11/world/asia/bangladesh-executed-motiur-rahman-nizami.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/11/world/asia/bangladesh-executed-motiur-rahman-nizami.html?_r=0


Processing URLs:  35%|███▍      | 349/1000 [11:46<08:31,  1.27it/s]

Error extracting text from https://www.wsj.com/articles/house-gop-releases-plan-to-repeal-replace-health-law-1488842133: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-gop-releases-plan-to-repeal-replace-health-law-1488842133


Processing URLs:  35%|███▌      | 354/1000 [11:52<10:58,  1.02s/it]

Error extracting text from http://mainichi.jp/english/english/newsselect/news/20151112p2g00m0dm067000c.html: 404 Client Error: Not Found for url: http://mainichi.jp/english/english/newsselect/news/20151112p2g00m0dm067000c.html


Processing URLs:  36%|███▌      | 356/1000 [11:53<06:56,  1.54it/s]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/301155: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/301155


Processing URLs:  36%|███▌      | 359/1000 [11:57<11:42,  1.10s/it]

Error extracting text from https://www.flightglobal.com/business-aviation/supersonic-business-jet-developer-aerion-folds/143867.article: 403 Client Error: Forbidden for url: https://www.flightglobal.com/business-aviation/supersonic-business-jet-developer-aerion-folds/143867.article


Processing URLs:  36%|███▌      | 361/1000 [12:00<12:35,  1.18s/it]

Error extracting text from http://www.comres.co.uk/polls/daily-mail-itv-news-eu-referendum-poll/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/daily-mail-itv-news-eu-referendum-poll/


Processing URLs:  37%|███▋      | 368/1000 [12:12<18:57,  1.80s/it]

URL filtered: https://twitter.com/jensstoltenberg


Processing URLs:  37%|███▋      | 372/1000 [12:19<19:27,  1.86s/it]

Error extracting text from http://en.musicplayon.com/play?v=274541: HTTPConnectionPool(host='en.musicplayon.com', port=80): Max retries exceeded with url: /play?v=274541 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301539a30>: Failed to resolve 'en.musicplayon.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 376/1000 [12:22<10:25,  1.00s/it]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/26/john-boehner-says-iran-elections-are-phony/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/26/john-boehner-says-iran-elections-are-phony/


Processing URLs:  38%|███▊      | 377/1000 [12:25<17:04,  1.64s/it]

Error extracting text from http://thebulletin.org/north-korea’s-“not-quite”-icbm-can’t-hit-lower-48-states11012: 404 Client Error: Not Found for url: https://thebulletin.org/north-korea%E2%80%99s-%E2%80%9Cnot-quite%E2%80%9D-icbm-can%E2%80%99t-hit-lower-48-states11012/


Processing URLs:  38%|███▊      | 380/1000 [12:41<38:36,  3.74s/it]

Error extracting text from https://www.nytimes.com/2021/03/26/us/far-right-extremism-anti-vaccine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/26/us/far-right-extremism-anti-vaccine.html


Processing URLs:  38%|███▊      | 382/1000 [12:46<29:19,  2.85s/it]

Error extracting text from http://nautilus.org/publications/books/dprkbb/russia/dprk-briefing-book-russian-policy-on-the-north-korean-nuclear-crisis/: 403 Client Error: Forbidden for url: http://nautilus.org/publications/books/dprkbb/russia/dprk-briefing-book-russian-policy-on-the-north-korean-nuclear-crisis/
URL filtered: https://www.youtube.com/watch?v=bDJb8WOJYdA


Processing URLs:  38%|███▊      | 385/1000 [12:50<19:03,  1.86s/it]

Error extracting text from http://www.nytimes.com/2015/12/29/world/middleeast/iran-hands-over-stockpile-of-enriched-uranium-to-russia.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/29/world/middleeast/iran-hands-over-stockpile-of-enriched-uranium-to-russia.html?_r=0


Processing URLs:  39%|███▉      | 388/1000 [12:55<19:35,  1.92s/it]

Error extracting text from http://www.turkishweekly.net/2015/03/09/news/montenegro-shrugs-off-french-veto-on-nato-enlargement/: 404 Client Error: Not Found for url: https://turkishweekly.net/2015/03/09/news/montenegro-shrugs-off-french-veto-on-nato-enlargement/


Processing URLs:  39%|███▉      | 390/1000 [13:01<24:03,  2.37s/it]

Error extracting text from https://www.niaid.nih.gov/news-events/biden-administration-invest-3-billion-american-rescue-plan-part-covid-19-antiviral: 403 Client Error: Forbidden for url: https://www.niaid.nih.gov/news-events/biden-administration-invest-3-billion-american-rescue-plan-part-covid-19-antiviral


Processing URLs:  39%|███▉      | 391/1000 [13:02<21:15,  2.09s/it]

Error extracting text from https://politicalwire.com/2021/06/23/where-each-senate-democrat-stands-on-the-filibuster/: 403 Client Error: Forbidden for url: https://politicalwire.com/2021/06/23/where-each-senate-democrat-stands-on-the-filibuster/


Processing URLs:  39%|███▉      | 392/1000 [13:03<17:06,  1.69s/it]

Error extracting text from https://www.forbes.com/sites/henrymiller/2016/12/21/desperately-seeking-an-fda-commissioner-a-critical-influencer-of-the-nations-health-and-economy/: 410 Client Error: Gone for url: https://www.forbes.com/sites/henrymiller/2016/12/21/desperately-seeking-an-fda-commissioner-a-critical-influencer-of-the-nations-health-and-economy/


Processing URLs:  39%|███▉      | 394/1000 [13:06<16:47,  1.66s/it]

Error extracting text from http://www.huffingtonpost.in/dr-mehrdad-khonsari-/the-war-against-isis-need_b_9297552.html?utm_hp_ref=india: 404 Client Error: Not Found for url: https://www.huffpost.com/news/topic/india/dr-mehrdad-khonsari-/the-war-against-isis-need_b_9297552.html?utm_hp_ref=india


Processing URLs:  40%|███▉      | 395/1000 [13:07<14:51,  1.47s/it]

Error extracting text from https://phys.org/news/2013-12-creation-entanglement-simultaneously-wormhole.html: 400 Client Error: Bad request for url: https://phys.org/news/2013-12-creation-entanglement-simultaneously-wormhole.html


Processing URLs:  40%|███▉      | 397/1000 [13:09<11:15,  1.12s/it]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/09/14/venezuela-averts-debt-default-again/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/09/14/venezuela-averts-debt-default-again/


Processing URLs:  40%|████      | 402/1000 [13:22<16:41,  1.67s/it]

URL filtered: https://www.youtube.com/watch?v=UWHrqXw7Rus
Error extracting text from http://www.nytimes.com/2016/01/14/us/politics/ted-cruz-wall-street-loan-senate-bid-2012.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/us/politics/ted-cruz-wall-street-loan-senate-bid-2012.html


Processing URLs:  41%|████      | 406/1000 [13:26<10:39,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-japan-china-idUSKCN10J08A?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-china-idUSKCN10J08A?il=0


Processing URLs:  41%|████      | 409/1000 [13:32<17:15,  1.75s/it]

Error extracting text from http://www.znbc.co.zm/?p=30928: 404 Client Error: Not Found for url: https://www.znbc.co.zm/?p=30928


Processing URLs:  41%|████      | 412/1000 [13:38<18:37,  1.90s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-ireland-idUKKBN17U134: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  41%|████▏     | 414/1000 [13:40<13:33,  1.39s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/10/03/0200000000AEN20151003002000315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  42%|████▏     | 416/1000 [13:50<28:17,  2.91s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2685.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2685.htm


Processing URLs:  42%|████▏     | 419/1000 [13:54<15:31,  1.60s/it]

Error extracting text from http://www.wsj.com/articles/potential-saudi-aramco-ipo-wont-include-reserves-1453627558: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/potential-saudi-aramco-ipo-wont-include-reserves-1453627558


Processing URLs:  42%|████▎     | 425/1000 [14:04<14:09,  1.48s/it]

Error extracting text from https://cleantechnica.com/2017/11/22/electric-versions-opel-corsa-peugeot-208-coming-share-platform/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2017/11/22/electric-versions-opel-corsa-peugeot-208-coming-share-platform/


Processing URLs:  43%|████▎     | 429/1000 [14:06<07:12,  1.32it/s]

Error extracting text from http://www.nytimes.com/2015/10/21/world/europe/ira-lives-on-but-poses-no-serious-threat-british-review-says.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/21/world/europe/ira-lives-on-but-poses-no-serious-threat-british-review-says.html


Processing URLs:  43%|████▎     | 433/1000 [14:09<05:35,  1.69it/s]

Error extracting text from https://www.reuters.com/world/americas/brazils-bolsonaro-disapproval-rating-rises-all-time-high-poll-2021-07-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazils-bolsonaro-disapproval-rating-rises-all-time-high-poll-2021-07-08/


Processing URLs:  44%|████▎     | 435/1000 [14:09<03:31,  2.67it/s]

Error extracting text from https://www.wsj.com/articles/errant-russian-strike-kills-turkish-soldiers-in-syria-1486666027: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/errant-russian-strike-kills-turkish-soldiers-in-syria-1486666027


Processing URLs:  44%|████▍     | 438/1000 [14:13<07:49,  1.20it/s]

Error extracting text from https://www.nytimes.com/2021/12/17/world/europe/russia-nato-security-deal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/17/world/europe/russia-nato-security-deal.html


Processing URLs:  44%|████▍     | 440/1000 [14:14<06:04,  1.54it/s]

Error extracting text from https://www.datacenterdynamics.com/en/news/ibms-managed-infrastructure-spin-off-to-be-called-kyndryl/: 403 Client Error: Forbidden for url: https://www.datacenterdynamics.com/en/news/ibms-managed-infrastructure-spin-off-to-be-called-kyndryl/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.gazetadopovo.com.br/vida-publica/lava-jato-e-pressao-por-impeachment-deixam-sustentacao-politica-de-dilma-por-um-triz-1b9fdrh42x3tk7f659uloylud&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.gazetadopovo.com.br/vida-publica/lava-jato-e-pressao-por-impeachment-deixam-sustentacao-politica-de-dilma-por-um-triz-1b9fdrh42x3tk7f659uloylud&amp;prev=search


Processing URLs:  44%|████▍     | 443/1000 [14:20<12:53,  1.39s/it]

Error extracting text from https://go.allout.org/en/a/venice-mayor/: 404 Client Error: Not Found for url: https://campaigns.allout.org/venice-mayor/
URL filtered: https://twitter.com/dandrezner/status/695441370018766853


Processing URLs:  45%|████▍     | 448/1000 [14:25<11:01,  1.20s/it]

Error extracting text from http://www.elombah.com/index.php/gallery/6296-biafra-evidence-of-genocide-by-buhari-and-cohorts-photo: 403 Client Error: Forbidden for url: http://www.elombah.com/index.php/gallery/6296-biafra-evidence-of-genocide-by-buhari-and-cohorts-photo
Error extracting text from http://www.reuters.com/article/us-usa-saudi-yemen-idUSKBN17M2ZK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-saudi-yemen-idUSKBN17M2ZK


Processing URLs:  45%|████▌     | 450/1000 [14:27<09:59,  1.09s/it]

URL filtered: https://twitter.com/HLForum/status/1042670700652318720


Processing URLs:  45%|████▌     | 453/1000 [14:30<11:05,  1.22s/it]

Error extracting text from https://www.state.gov/documents/organization/270603.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:  46%|████▌     | 458/1000 [14:39<12:46,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-nato-idUSKBN0TQ0HU20151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-nato-idUSKBN0TQ0HU20151207


Processing URLs:  47%|████▋     | 467/1000 [15:04<15:07,  1.70s/it]

Error extracting text from http://www.talkingnewmedia.com/2016/07/19/two-digital-media-companies-report-earnings-but-media-could-care-less-about-bottom-line/: 406 Client Error: Not Acceptable for url: http://www.talkingnewmedia.com/2016/07/19/two-digital-media-companies-report-earnings-but-media-could-care-less-about-bottom-line/


Processing URLs:  47%|████▋     | 470/1000 [15:05<07:22,  1.20it/s]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-police-explainer/netanyahu-what-happens-next-idUSKCN1FX2WZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-police-explainer/netanyahu-what-happens-next-idUSKCN1FX2WZ
Error extracting text from http://bigstory.ap.org/article/e19abf78b6fe43e7b7719f059901630d/apnewsbreak-govt-finds-top-secret-info-clinton-emails: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/e19abf78b6fe43e7b7719f059901630d/apnewsbreak-govt-finds-top-secret-info-clinton-emails (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30398b980>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  47%|████▋     | 472/1000 [15:07<09:27,  1.08s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-21/wall-street-exhales-as-venezuela-delivers-bond-payment-once-more


Processing URLs:  48%|████▊     | 475/1000 [15:10<08:10,  1.07it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/burundi-people-killed-gunfire-linked-rebels-38331995: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/burundi-people-killed-gunfire-linked-rebels-38331995


Processing URLs:  48%|████▊     | 476/1000 [15:14<15:22,  1.76s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-politics-idUSKCN2AV2VI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN2AV2VI


Processing URLs:  48%|████▊     | 481/1000 [15:17<09:17,  1.07s/it]

Error extracting text from http://ec.europa.eu/trade/policy/in-focus/ttip/about-ttip/questions-and-answers/#expandable-benefit-people: 404 Client Error: (Not Found) for url: https://ec.europa.eu/policy/in-focus/ttip/about-ttip/questions-and-answers/#expandable-benefit-people


Processing URLs:  48%|████▊     | 483/1000 [15:18<07:31,  1.15it/s]

Error extracting text from https://m.geo.tv/#category%7Clatest-news%7Cp106813-Indian-Prime-Minister-Modi-wishes-Nawaz-Sharif-speedy-recovery: HTTPSConnectionPool(host='m.geo.tv', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3029d5610>: Failed to resolve 'm.geo.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  49%|████▉     | 488/1000 [15:24<08:55,  1.05s/it]

Error extracting text from http://www.rand.org/pubs/research_reports/RR1478.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR1478.html


Processing URLs:  49%|████▉     | 489/1000 [15:25<07:41,  1.11it/s]

Error extracting text from http://thehill.com/blogs/congress-blog/foreign-policy/296130-eastern-europe-risks-losing-un-top-job-race: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/foreign-policy/296130-eastern-europe-risks-losing-un-top-job-race/


Processing URLs:  49%|████▉     | 492/1000 [16:30<2:38:24, 18.71s/it]

Error extracting text from http://www.dailytech.com/USB+Stick+Led+to+Worst+Cyber+Attack+on+US+Military+Russia+Suspected/article19458.htm: HTTPConnectionPool(host='www.dailytech.com', port=80): Max retries exceeded with url: /USB+Stick+Led+to+Worst+Cyber+Attack+on+US+Military+Russia+Suspected/article19458.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3029d7fe0>, 'Connection to www.dailytech.com timed out. (connect timeout=60)'))
URL filtered: https://www.bloomberg.com/news/articles/2021-08-20/brazil-justice-takes-aim-at-bolsonaro-allies-for-attacking-court


Processing URLs:  50%|████▉     | 497/1000 [16:36<40:41,  4.85s/it]  

Error extracting text from https://trends.google.com/trends/explore?date=all&q=russia: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?date=all&q=russia


Processing URLs:  50%|█████     | 502/1000 [16:38<10:20,  1.25s/it]

Error extracting text from http://www.nasdaq.com/article/german-prosecutors-california-regulator-open-fresh-vw-probes-20151125-00643#ixzz3sYbQ3DX8: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/german-prosecutors-california-regulator-open-fresh-vw-probes-20151125-00643#ixzz3sYbQ3DX8


Processing URLs:  51%|█████     | 506/1000 [16:44<10:54,  1.32s/it]

Error extracting text from https://www.sciencedaily.com/releases/2021/08/210831095614.htm: 403 Client Error: Forbidden for url: https://www.sciencedaily.com/releases/2021/08/210831095614.htm


Processing URLs:  51%|█████     | 511/1000 [17:24<1:26:06, 10.56s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_perils-ahead-for-turkeys-possible-anti-isil-ground-offensive-in-syria_404122.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_perils-ahead-for-turkeys-possible-anti-isil-ground-offensive-in-syria_404122.html


Processing URLs:  52%|█████▏    | 516/1000 [17:42<34:50,  4.32s/it]  

Error extracting text from https://cryptobriefing.com/are-cdps-the-new-cdos/: 403 Client Error: Forbidden for url: https://cryptobriefing.com/are-cdps-the-new-cdos/


Processing URLs:  52%|█████▏    | 520/1000 [17:46<12:07,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-china-insight-idUSKCN10B10G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-china-insight-idUSKCN10B10G


Processing URLs:  52%|█████▏    | 521/1000 [17:46<09:02,  1.13s/it]

Error extracting text from https://www.nytimes.com/2018/05/20/arts/television/meghan-markle-royal-wedding-blackness.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/05/20/arts/television/meghan-markle-royal-wedding-blackness.html


Processing URLs:  52%|█████▎    | 525/1000 [17:51<06:47,  1.17it/s]

Error extracting text from http://www.reuters.com/article/apple-iphone-idUSKBN0UJ1WC20160106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/apple-iphone-idUSKBN0UJ1WC20160106


Processing URLs:  53%|█████▎    | 527/1000 [17:54<08:23,  1.06s/it]

Error extracting text from https://www.c-span.org/video/?420932-101/attorney-general-nominee-jeff-sessions-testifies-confirmation-hearing&amp;start=17545: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?420932-101/attorney-general-nominee-jeff-sessions-testifies-confirmation-hearing&amp;start=17545


Processing URLs:  53%|█████▎    | 530/1000 [17:57<07:13,  1.08it/s]

Error extracting text from https://en-maktoob.news.yahoo.com/analysis-jailing-reporter-reflects-iran-power-struggle-074924196.html: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/analysis-jailing-reporter-reflects-iran-power-struggle-074924196.html
URL filtered: http://www.digitaltrends.com/social-media/facebook-fake-news-germany/


Processing URLs:  54%|█████▎    | 535/1000 [18:01<07:06,  1.09it/s]

Error extracting text from http://www.cnas.org/sites/default/files/publications-pdf/CNAS%20Maritime%206_Parameswaran_Final.pdf: 404 Client Error: Not Found for url: https://www.cnas.org:443/sites/default/files/publications-pdf/CNAS%20Maritime%206_Parameswaran_Final.pdf
Error extracting text from https://www.yahoo.com/news/obama-oks-sanctions-against-nkorea-195321314.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/obama-oks-sanctions-against-nkorea-195321314.html


Processing URLs:  54%|█████▍    | 539/1000 [18:05<06:02,  1.27it/s]

Error extracting text from https://www.reuters.com/business/aerospace-defense/delta-sees-return-profit-consumer-travel-demand-hits-historic-levels-2022-04-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/aerospace-defense/delta-sees-return-profit-consumer-travel-demand-hits-historic-levels-2022-04-13/
Error extracting text from http://www.etftrends.com/2015/09/its-takeoff-not-liftoff/: 403 Client Error: Forbidden for url: https://www.etftrends.com/2015/09/its-takeoff-not-liftoff/
URL filtered: https://www.youtube.com/watch?v=GgIsyoxZ7Uw


Processing URLs:  54%|█████▍    | 541/1000 [18:05<04:00,  1.91it/s]

Error extracting text from https://www.yahoo.com/news/russia-blocks-un-statement-calling-n-korea-sanctions-145320489.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/russia-blocks-un-statement-calling-n-korea-sanctions-145320489.html


Processing URLs:  55%|█████▍    | 545/1000 [18:20<13:43,  1.81s/it]

Error extracting text from http://finance.yahoo.com/news/turkey-poised-battle-kurds-syrian-193645973.html;_ylt=A0LEVxqUV8JW2K8AkjVXNyoA;_ylu=X3oDMTEyNW84ODh0BGNvbG8DYmYxBHBvcwMxBHZ0aWQDQjE1NjNfMQRzZWMDc2M-: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/turkey-poised-battle-kurds-syrian-193645973.html


Processing URLs:  55%|█████▍    | 546/1000 [19:20<2:15:04, 17.85s/it]

Error extracting text from https://www.miamiherald.com/news/politics-government/state-politics/article248952689.html: HTTPSConnectionPool(host='www.miamiherald.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  55%|█████▌    | 551/1000 [19:30<35:51,  4.79s/it]  

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-primary: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-primary


Processing URLs:  55%|█████▌    | 553/1000 [19:31<19:03,  2.56s/it]

Error extracting text from http://www.reuters.com/article/mideast-crisis-russia-syria-idUSR4N15V00V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-russia-syria-idUSR4N15V00V


Processing URLs:  55%|█████▌    | 554/1000 [19:32<15:25,  2.08s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/nkorea-could-return-icbm-nuclear-tests-this-year-us-intelligence-report-2022-03-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/nkorea-could-return-icbm-nuclear-tests-this-year-us-intelligence-report-2022-03-08/


Processing URLs:  56%|█████▌    | 558/1000 [19:37<11:44,  1.59s/it]

Error extracting text from https://apple.news/A_aGDCh4BRK6W7hJXa1JNIg: 404 Client Error: Not Found for url: https://apple.news/A_aGDCh4BRK6W7hJXa1JNIg


Processing URLs:  56%|█████▌    | 562/1000 [19:43<10:38,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-burundi-security-idUSKCN0WS0GB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-security-idUSKCN0WS0GB


Processing URLs:  56%|█████▋    | 564/1000 [19:47<10:45,  1.48s/it]

Error extracting text from https://www.reuters.com/business/energy/germany-has-four-months-certify-nord-stream-2-pipeline-2021-09-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/germany-has-four-months-certify-nord-stream-2-pipeline-2021-09-13/


Processing URLs:  57%|█████▋    | 567/1000 [19:48<06:52,  1.05it/s]

Error extracting text from http://www.bbc.com/weather/2328926: 404 Client Error: Not Found for url: https://www.bbc.com/weather/2328926


Processing URLs:  57%|█████▋    | 568/1000 [19:50<08:11,  1.14s/it]

Error extracting text from http://tass.ru/en/politics/843832: 404 Client Error: Not Found for url: https://tass.ru/en/politics/843832


Processing URLs:  57%|█████▋    | 571/1000 [20:58<2:15:31, 18.96s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-10-01/afghan-security-forces-killed-in-friendly-fire-incident: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  57%|█████▋    | 573/1000 [21:00<1:11:21, 10.03s/it]

Error extracting text from http://www.bq-magazine.com/economy/socioeconomics/2015/04/uae-population-by-nationality: 403 Client Error: Forbidden for url: http://www.bq-magazine.com/economy/socioeconomics/2015/04/uae-population-by-nationality


Processing URLs:  58%|█████▊    | 578/1000 [21:07<19:43,  2.80s/it]  

Error extracting text from http://www.reuters.com/article/us-china-southchinasea-flights-idUSKCN0WD17D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-southchinasea-flights-idUSKCN0WD17D


Processing URLs:  58%|█████▊    | 585/1000 [22:17<2:14:19, 19.42s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-11-27/following-missile-deal-nato-forced-to-shrug-off-turkeys-closer-ties-with-russia: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  59%|█████▊    | 587/1000 [22:20<1:10:46, 10.28s/it]

Error extracting text from https://in.reuters.com/article/us-usa-venezuela-sanctions-idINKCN1B32PD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  59%|█████▉    | 591/1000 [22:27<25:28,  3.74s/it]  

Error extracting text from http://news.yahoo.com/myanmar-presidential-nominees-named-march-17-093558048.html: 404 Client Error: Not Found for url: http://news.yahoo.com/myanmar-presidential-nominees-named-march-17-093558048.html
Error extracting text from http://bigstory.ap.org/article/78c98f06474d4481bcb9c50f43d0cb0a/pentagon-2-us-navy-boats-held-iran-will-be-returned: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/78c98f06474d4481bcb9c50f43d0cb0a/pentagon-2-us-navy-boats-held-iran-will-be-returned (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303322360>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  60%|█████▉    | 595/1000 [22:32<13:46,  2.04s/it]

URL filtered: https://www.youtube.com/watch?v=2b3ttqYDwF0


Processing URLs:  60%|█████▉    | 598/1000 [22:34<09:10,  1.37s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-26/iran-dismisses-u-s-cyber-attack-charges-for-lack-of-evidence
Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-7-august-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-7-august-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303008d70>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  60%|██████    | 603/1000 [22:46<15:09,  2.29s/it]

Error extracting text from http://thehill.com/opinion/international/355742-russias-has-weaponized-the-energy-sector-in-war-against-the-west: 403 Client Error: Forbidden for url: https://thehill.com/opinion/international/355742-russias-has-weaponized-the-energy-sector-in-war-against-the-west/
URL filtered: https://www.cnet.com/news/white-nationalist-jared-taylor-american-renaissance-sues-twitter-for-account-suspension/


Processing URLs:  61%|██████    | 607/1000 [22:49<08:49,  1.35s/it]

URL filtered: http://www.reuters.com/article/us-usa-databreaches/former-yahoo-ceo-apologizes-for-data-breach-blames-russians-idUSKBN1D825V?utm_source=twitter&amp;utm_medium=Social


Processing URLs:  61%|██████    | 612/1000 [22:54<06:41,  1.03s/it]

Error extracting text from http://www.federalreserve.gov/faqs/money_12848.htm: 404 Client Error: Not Found for url: https://www.federalreserve.gov/faqs/money_12848.htm


Processing URLs:  62%|██████▏   | 618/1000 [23:03<07:50,  1.23s/it]

Error extracting text from http://www.wsj.com/articles/north-korean-leader-threatens-sacred-war-pledges-economic-growth-1451640181: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korean-leader-threatens-sacred-war-pledges-economic-growth-1451640181


Processing URLs:  62%|██████▏   | 620/1000 [23:07<10:25,  1.65s/it]

URL filtered: http://uk.reuters.com/article/uk-turkey-eu-idUKKCN0ZG0RC?feedType=RSS&amp;feedName=worldNews&amp;utm_source=twitterfeed&amp;utm_medium=twitter&amp;utm_campaign=Feed%3A+Reuters%2FUKWorldNews+%28News+%2F+UK+%2F+World+News%29


Processing URLs:  62%|██████▏   | 623/1000 [23:08<05:32,  1.13it/s]

Error extracting text from http://www.reuters.com/article/2015/09/16/us-mideast-crisis-syria-pentagon-idUSKCN0RG22K20150916: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/16/us-mideast-crisis-syria-pentagon-idUSKCN0RG22K20150916


Processing URLs:  63%|██████▎   | 626/1000 [23:12<05:57,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-usa-election-japan-idUSKBN1360IU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-japan-idUSKBN1360IU


Processing URLs:  63%|██████▎   | 629/1000 [23:13<03:18,  1.87it/s]

Error extracting text from http://www.wsj.com/articles/china-deploys-missiles-on-disputed-island-in-south-china-sea-1455684150: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-deploys-missiles-on-disputed-island-in-south-china-sea-1455684150
Error extracting text from https://www.yahoo.com/news/model-3-delivery-promises-look-160029228.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/model-3-delivery-promises-look-160029228.html


Processing URLs:  63%|██████▎   | 631/1000 [23:15<04:46,  1.29it/s]

Error extracting text from http://www.yenisafak.com/en/news/turkish-forces-kill-152-pkk-terrorists-in-october-2797301: 422 Client Error:  for url: http://www.yenisafak.com/en/news/turkish-forces-kill-152-pkk-terrorists-in-october-2797301


Processing URLs:  64%|██████▍   | 638/1000 [23:32<16:48,  2.78s/it]

Error extracting text from http://www.pherson.org/our-staff/: 404 Client Error: Not Found for url: https://www.pherson.org/our-staff/


Processing URLs:  64%|██████▍   | 641/1000 [23:33<07:24,  1.24s/it]

URL filtered: http://www.balkaninsight.com/en/article/montenegrin-opposition-seeks-help-from-trump-s-strategist-03-02-2017#.WLqyFpBMDuM.twitter
Error extracting text from http://www.wsj.com/articles/china-throws-out-south-china-sea-rule-book-1482226667: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-throws-out-south-china-sea-rule-book-1482226667


Processing URLs:  64%|██████▍   | 642/1000 [23:33<05:42,  1.05it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-snp-idUSKCN12C2QA?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-snp-idUSKCN12C2QA?mod=related&amp;channelName=worldNews


Processing URLs:  64%|██████▍   | 644/1000 [23:35<05:37,  1.06it/s]

Error extracting text from http://nyti.ms/2gsbwUS: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/01/us/politics/james-mattis-secrtary-of-defense-trump.html


Processing URLs:  65%|██████▍   | 647/1000 [23:39<06:43,  1.14s/it]

Error extracting text from http://www.cnbc.com/2016/01/17/financial-times-japanas-abe-calls-for-putin-to-be-brought-in-from-the-cold.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/01/17/financial-times-japanas-abe-calls-for-putin-to-be-brought-in-from-the-cold.html


Processing URLs:  65%|██████▌   | 651/1000 [23:49<10:44,  1.85s/it]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-police/israeli-police-recommend-bribery-charges-against-netanyahu-idUSKCN1FX2JD?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-police/israeli-police-recommend-bribery-charges-against-netanyahu-idUSKCN1FX2JD?il=0


Processing URLs:  66%|██████▌   | 656/1000 [23:57<11:09,  1.95s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-03/jpmorgan-to-move-hundreds-of-staff-to-three-eu-offices-on-brexit


Processing URLs:  66%|██████▌   | 658/1000 [23:59<07:54,  1.39s/it]

Error extracting text from https://lonang.com/library/reference/vattel-law-of-nations/vatt-213/: 403 Client Error: Forbidden for url: https://lonang.com/library/reference/vattel-law-of-nations/vatt-213/


Processing URLs:  66%|██████▌   | 661/1000 [24:03<08:34,  1.52s/it]

URL filtered: http://www.tolonews.com/en/afghanistan/26737-17-villages-under-taliban-control-in-bahark-badakhshan?utm_term=Ultrascan+Humint&amp;utm_source=TOLO+News&amp;utm_medium=twitter&amp;utm_campaign=Ultrascan+AGI+Terrorism+AfPak


Processing URLs:  67%|██████▋   | 666/1000 [24:13<09:43,  1.75s/it]

Error extracting text from http://m.apa.az/en/news/235976: HTTPConnectionPool(host='m.apa.az', port=80): Max retries exceeded with url: /en/news/235976 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f0fb0>: Failed to resolve 'm.apa.az' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  67%|██████▋   | 668/1000 [24:14<06:14,  1.13s/it]

Error extracting text from http://thehill.com/business-a-lobbying/322833-hospitals-come-out-against-gop-healthcare-bill: 403 Client Error: Forbidden for url: https://thehill.com/business-a-lobbying/322833-hospitals-come-out-against-gop-healthcare-bill/


Processing URLs:  67%|██████▋   | 669/1000 [24:51<53:39,  9.73s/it]

Error extracting text from http://www.washingtonpost.com/sports/olympics/2021/08/09/boycott-beijing-olympics/&amp;ved=2ahUKEwiW7azC16byAhX66eAKHd8ADAEQxfQBMAR6BAgLEAM&amp;usg=AOvVaw3D2CsakTjojIKSnhwd29se: 404 Client Error: Not Found for url: https://www.washingtonpost.com/sports/olympics/2021/08/09/boycott-beijing-olympics/&amp/


Processing URLs:  67%|██████▋   | 670/1000 [24:53<42:15,  7.68s/it]

Error extracting text from http://en.trend.az/world/turkey/2670587.html: 404 Client Error: Not Found for url: https://www.trend.az/world/turkey/2670587.html


Processing URLs:  67%|██████▋   | 671/1000 [24:53<31:20,  5.72s/it]

Error extracting text from http://www.wsj.com/articles/markets-ignore-the-brexit-worst-case-scenario-a-sterling-crisis-1466444823: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/markets-ignore-the-brexit-worst-case-scenario-a-sterling-crisis-1466444823
URL filtered: https://www.bloomberg.com/news/articles/2016-12-25/expat-fee-subsidy-cuts-what-s-in-saudi-fiscal-balance-document


Processing URLs:  67%|██████▋   | 673/1000 [24:57<22:33,  4.14s/it]

Error extracting text from http://www.ipsos.pe/sites/default/files/opinion_data/Opinion%20Data%2025%2004%2016.pdf: 404 Client Error: Not Found for url: https://www.ipsos.com/es-pe/sites/default/files/opinion_data/Opinion%20Data%2025%2004%2016.pdf


Processing URLs:  68%|██████▊   | 676/1000 [25:20<38:11,  7.07s/it]

Error extracting text from http://www.investopedia.com/articles/insights/052016/did-goldman-sachs-break-ethics-rules-tesla-gs-tsla.asp?partner=YahooSA: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/insights/052016/did-goldman-sachs-break-ethics-rules-tesla-gs-tsla.asp?partner=YahooSA


Processing URLs:  68%|██████▊   | 677/1000 [25:20<28:05,  5.22s/it]

Error extracting text from https://www.opensecrets.org/politicians/contrib.php?cid=N00000019&amp;cycle=Career: 403 Client Error: Forbidden for url: https://www.opensecrets.org/politicians/contrib.php?cid=N00000019&amp;cycle=Career


Processing URLs:  68%|██████▊   | 681/1000 [25:23<09:40,  1.82s/it]

Error extracting text from http://www.reuters.com/article/2015/10/31/iran-oil-opec-idUSL8N12V07R20151031: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/31/iran-oil-opec-idUSL8N12V07R20151031


Processing URLs:  68%|██████▊   | 684/1000 [25:30<12:28,  2.37s/it]

Error extracting text from http://www.una.org.uk/news/16/07/results-emerge-security-council-straw-poll-next-secretary-general: 404 Client Error: Not Found for url: https://una.org.uk/news/16/07/results-emerge-security-council-straw-poll-next-secretary-general


Processing URLs:  69%|██████▉   | 689/1000 [25:43<15:59,  3.09s/it]

Error extracting text from http://www.timesofisrael.com/iran-says-it-gave-iaea-parchin-samples-it-drew-itself/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iran-says-it-gave-iaea-parchin-samples-it-drew-itself/


Processing URLs:  69%|██████▉   | 691/1000 [25:44<09:53,  1.92s/it]

Error extracting text from http://www.cnbc.com/2016/02/10/reuters-america-with-tpp-advancing-india-pins-hopes-on-china-backed-trade-bloc.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/02/10/reuters-america-with-tpp-advancing-india-pins-hopes-on-china-backed-trade-bloc.html


Processing URLs:  69%|██████▉   | 693/1000 [25:45<06:39,  1.30s/it]

Error extracting text from http://www.barrons.com/articles/BL-231B-11951: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/BL-231B-11951


Processing URLs:  70%|██████▉   | 695/1000 [25:48<06:45,  1.33s/it]

Error extracting text from https://www.un.org/press/en/2017/sc12946.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2017/sc12946.doc.htm


Processing URLs:  70%|██████▉   | 698/1000 [25:57<15:22,  3.05s/it]

URL filtered: https://www.youtube.com/watch?v=vkkCZwiENh8


Processing URLs:  71%|███████   | 706/1000 [26:04<04:33,  1.07it/s]

URL filtered: http://www.reuters.com/article/us-pomeranz-putin-commentary-idUSKBN13G07L?utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=5833618004d30169e229f6e1&amp;utm_medium=trueAnthem&amp;utm_source=twitter
Error extracting text from https://www.reuters.com/article/us-usa-biden-state-iran-idUSKBN29O2HD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-state-iran-idUSKBN29O2HD
URL filtered: https://backchannel.com/facebook-is-outsourcing-its-fake-news-problem-9999f01bdfd6#.zedvok8us


Processing URLs:  71%|███████   | 710/1000 [26:08<05:10,  1.07s/it]

Error extracting text from http://live.reuters.com/Event/Election_2016: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com


Processing URLs:  71%|███████▏  | 713/1000 [26:13<06:40,  1.39s/it]

Error extracting text from https://www.reuters.com/article/us-usa-trump-lawyers-exclusive/trump-using-campaign-rnc-funds-to-pay-legal-bills-from-russia-probe-sources-idUSKCN1BU2OS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-lawyers-exclusive/trump-using-campaign-rnc-funds-to-pay-legal-bills-from-russia-probe-sources-idUSKCN1BU2OS


Processing URLs:  72%|███████▏  | 718/1000 [26:18<04:06,  1.14it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.sonoticias.com.br/noticia/politica/leitao-diz-que-impeachment-de-dilma-volta-a-ser-debatido-no-congresso&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.sonoticias.com.br/noticia/politica/leitao-diz-que-impeachment-de-dilma-volta-a-ser-debatido-no-congresso&amp;prev=search


Processing URLs:  72%|███████▏  | 720/1000 [26:21<05:27,  1.17s/it]

URL filtered: https://www.youtube.com/watch?v=26R1elngkfg


Processing URLs:  72%|███████▏  | 722/1000 [27:22<1:06:18, 14.31s/it]

Error extracting text from http://deeplearning.net/: HTTPConnectionPool(host='deeplearning.net', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x301538830>, 'Connection to deeplearning.net timed out. (connect timeout=60)'))


Processing URLs:  72%|███████▏  | 724/1000 [27:24<39:26,  8.57s/it]  

Error extracting text from http://www.ibtimes.co.uk/isis-digs-moat-around-mosul-battle-key-iraqi-city-looms-1582418: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/isis-digs-moat-around-mosul-battle-key-iraqi-city-looms-1582418


Processing URLs:  73%|███████▎  | 727/1000 [27:28<18:11,  4.00s/it]

Error extracting text from https://www.psychologytoday.com/blog/the-winner-effect/201403/the-danger-lurks-inside-vladimir-putins-brain: 403 Client Error: Forbidden for url: https://www.psychologytoday.com/blog/the-winner-effect/201403/the-danger-lurks-inside-vladimir-putins-brain
URL filtered: http://www.bloomberg.com/news/articles/2016-05-31/pound-halts-rally-amid-signs-that-brexit-camp-is-gaining-ground?cmpid=wsdemand


Processing URLs:  73%|███████▎  | 730/1000 [27:29<08:18,  1.85s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-coalition-negotiators-agree-to-scrap-2020-climate-target-sources-idUSKBN1EX0OU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-coalition-negotiators-agree-to-scrap-2020-climate-target-sources-idUSKBN1EX0OU


Processing URLs:  73%|███████▎  | 733/1000 [27:34<06:19,  1.42s/it]

Error extracting text from http://www.nytimes.com/2017/09/29/opinion/gerrymandering-supreme-court.html?rref=collection%2Fsectioncollection%2Fopinion&amp;action=click&amp;contentCollection=opinion&amp;region=rank&amp;module=package&amp;version=highlights&amp;contentPlacement=1&amp;pgtype=sectionfront: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/09/29/opinion/gerrymandering-supreme-court.html?rref=collection%2Fsectioncollection%2Fopinion&amp;action=click&amp;contentCollection=opinion&amp;region=rank&amp;module=package&amp;version=highlights&amp;contentPlacement=1&amp;pgtype=sectionfront
Error extracting text from http://www.reuters.com/article/us-usa-trump-immigration-exclusive-idUSKBN1582XQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-immigration-exclusive-idUSKBN1582XQ


Processing URLs:  74%|███████▎  | 735/1000 [27:34<03:48,  1.16it/s]

Error extracting text from https://www.nytimes.com/live/2022/03/24/world/north-korea-icbm-launch/north-korea-icbm-launch: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/03/24/world/north-korea-icbm-launch/north-korea-icbm-launch


Processing URLs:  74%|███████▎  | 737/1000 [27:35<02:55,  1.50it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1691N1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1691N1


Processing URLs:  74%|███████▍  | 744/1000 [28:04<22:48,  5.34s/it]

Error extracting text from https://www.yardeni.com/pub/sp500corrbear.pdf: 403 Client Error: Forbidden for url: https://yardeni.com/our-charts/


Processing URLs:  75%|███████▍  | 746/1000 [28:06<13:01,  3.08s/it]

Error extracting text from https://fred.stlouisfed.org/series/DGS10): 404 Client Error: Not Found for url: https://fred.stlouisfed.org/series/DGS10)


Processing URLs:  75%|███████▌  | 750/1000 [28:19<10:26,  2.51s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-mosul-idUSKBN13U0HX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-mosul-idUSKBN13U0HX


Processing URLs:  75%|███████▌  | 753/1000 [28:24<06:53,  1.68s/it]

Error extracting text from http://www.wsj.com/articles/time-inc-appoints-new-digital-chief-1450195949: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-appoints-new-digital-chief-1450195949


Processing URLs:  76%|███████▌  | 759/1000 [28:36<05:33,  1.39s/it]

Error extracting text from https://www.afghanistan-analysts.org/resettling-nearly-half-a-million-afghans-in-nangrahar-the-consequences-of-the-mass-return-of-refugees/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/resettling-nearly-half-a-million-afghans-in-nangrahar-the-consequences-of-the-mass-return-of-refugees/


Processing URLs:  76%|███████▌  | 761/1000 [28:43<09:22,  2.35s/it]

Error extracting text from https://primary.guide/: 404 Client Error: Not Found for url: https://www.elidourado.com/election


Processing URLs:  76%|███████▋  | 764/1000 [28:51<10:19,  2.62s/it]

Error extracting text from http://news.trust.org/item/20160331104346-iewwq: 404 Client Error:  for url: https://news.trust.org:443/item/20160331104346-iewwq


Processing URLs:  76%|███████▋  | 765/1000 [28:52<07:59,  2.04s/it]

URL filtered: https://twitter.com/search?q=%23KitaSemuaPenghasut


Processing URLs:  77%|███████▋  | 767/1000 [29:52<58:04, 14.95s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/article47556560.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  77%|███████▋  | 771/1000 [29:57<20:18,  5.32s/it]

URL filtered: http://www.bloomberg.com/news/features/2016-08-22/china-s-best-bank-called-mirage-of-shadow-lending


Processing URLs:  78%|███████▊  | 776/1000 [30:00<06:27,  1.73s/it]

Error extracting text from http://www.wsj.com/articles/feds-lockhart-case-for-december-rate-rise-is-compelling-1449061943: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-lockhart-case-for-december-rate-rise-is-compelling-1449061943


Processing URLs:  78%|███████▊  | 778/1000 [30:04<06:08,  1.66s/it]

Error extracting text from http://www.businessinsider.com/r-brazil-bribery-scheme-may-have-extended-to-pension-funds-paper-2016-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-brazil-bribery-scheme-may-have-extended-to-pension-funds-paper-2016-1


Processing URLs:  78%|███████▊  | 781/1000 [30:07<04:11,  1.15s/it]

Error extracting text from http://cns.miis.edu/trillion_dollar_nuclear_triad/: 403 Client Error: Forbidden for url: https://www.nonproliferation.org/us-trillion-dollar-nuclear-triad/
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN16T0TZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN16T0TZ


Processing URLs:  78%|███████▊  | 782/1000 [31:07<1:07:28, 18.57s/it]

Error extracting text from http://www.kansascity.com/news/nation-world/world/article129453019.html: HTTPConnectionPool(host='www.kansascity.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  78%|███████▊  | 783/1000 [31:07<47:31, 13.14s/it]  

Error extracting text from https://www.nytimes.com/2022/03/20/world/americas/ukraine-war-global-food-crisis.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/03/20/world/americas/ukraine-war-global-food-crisis.html


Processing URLs:  79%|███████▊  | 787/1000 [31:20<19:20,  5.45s/it]

Error extracting text from http://www.phnompenhpost.com/national/nec-begin-registering-voters-it-missed-2017: 403 Client Error: Forbidden for url: http://www.phnompenhpost.com/national/nec-begin-registering-voters-it-missed-2017
Error extracting text from https://www.unicef.org/media/media_96560.html: 403 Client Error: Forbidden for url: https://www.unicef.org/media/media_96560.html


Processing URLs:  79%|███████▉  | 794/1000 [31:25<03:40,  1.07s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-makes-nato-operation-mandatory-for-its-troops-02-02-2016#sthash.GJxsHrnv.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-makes-nato-operation-mandatory-for-its-troops-02-02-2016#sthash.GJxsHrnv.dpuf
URL filtered: https://www.bloomberg.com/news/articles/2017-07-26/what-s-next-for-crypto-coins-as-sec-tames-wild-west-of-finance
Error extracting text from http://www.reuters.com/article/us-turkey-economy-idUSBREA0S17W20140130: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-economy-idUSBREA0S17W20140130
Error extracting text from https://www.reuters.com/video/watch/startup-aims-to-revive-supersonic-flight-idOVEQOBROF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/watch/startup-aims-to-revive-supersonic-flight-idOVEQOBROF


Processing URLs:  80%|███████▉  | 796/1000 [31:26<03:13,  1.05it/s]

Error extracting text from http://www.newsweek.com/jacob-zuma-south-africa-president-survive-anc-voters-522723?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/jacob-zuma-south-africa-president-survive-anc-voters-522723?rx=us
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN18T02K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN18T02K


Processing URLs:  80%|████████  | 803/1000 [31:31<02:00,  1.63it/s]

Error extracting text from http://www.the-japan-news.com/news/article/0002854628: HTTPConnectionPool(host='www.the-japan-news.com', port=80): Max retries exceeded with url: /news/article/0002854628 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff931190>: Failed to resolve 'www.the-japan-news.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  80%|████████  | 804/1000 [31:33<03:05,  1.05it/s]

Error extracting text from http://www.ibtimes.com/iran-nuclear-deal-implementation-day-could-be-pushed-back-some-companies-already-2203305: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-nuclear-deal-implementation-day-could-be-pushed-back-some-companies-already-2203305


Processing URLs:  81%|████████  | 806/1000 [31:35<03:32,  1.09s/it]

Error extracting text from http://asia.nikkei.com/print/article/229561: 404 Client Error: Not Found for url: https://asia.nikkei.com/print/article/229561


Processing URLs:  81%|████████  | 807/1000 [31:36<03:08,  1.02it/s]

Error extracting text from http://www.news1130.com/2017/01/06/unclear-canadians-can-buy-pot-legalized/: 403 Client Error: Forbidden for url: https://vancouver.citynews.ca/2017/01/06/unclear-canadians-can-buy-pot-legalized/
URL filtered: https://mobile.twitter.com/TOLOnews/status/1416965746299658241


Processing URLs:  81%|████████▏ | 813/1000 [31:40<01:59,  1.57it/s]

Error extracting text from http://pakobserver.net/indian-hegemony-over-saarc/: 403 Client Error: Forbidden for url: http://pakobserver.net/indian-hegemony-over-saarc/


Processing URLs:  82%|████████▏ | 815/1000 [31:42<02:22,  1.30it/s]

Error extracting text from http://seekingalpha.com/article/3976571-teslas-model-3-will-dominate-market: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3976571-teslas-model-3-will-dominate-market


Processing URLs:  82%|████████▏ | 817/1000 [31:45<03:12,  1.05s/it]

Error extracting text from https://www.transtats.bts.gov/osea/seasonaladjustment/?PageVar=TRUCK: HTTPSConnectionPool(host='www.transtats.bts.gov', port=443): Max retries exceeded with url: /osea/seasonaladjustment/?PageVar=TRUCK (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  82%|████████▏ | 824/1000 [32:00<05:54,  2.01s/it]

Error extracting text from http://www.nationalreview.com/article/439838/foreign-policy-obama-lame-duck-russia-china-iran-threats: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/439838/foreign-policy-obama-lame-duck-russia-china-iran-threats/


Processing URLs:  83%|████████▎ | 829/1000 [32:07<04:08,  1.45s/it]

Error extracting text from http://zik.ua/en/news/2016/07/08/no_summer_recess_rada_to_work_through_summer_714578: HTTPConnectionPool(host='zik.ua', port=80): Max retries exceeded with url: /en/news/2016/07/08/no_summer_recess_rada_to_work_through_summer_714578 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30398bd10>: Failed to resolve 'zik.ua' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  83%|████████▎ | 832/1000 [32:10<03:04,  1.10s/it]

Error extracting text from http://english.yonhapnews.co.kr/business/2017/02/14/0501000000AEN20170214004251320.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  84%|████████▎ | 837/1000 [32:18<03:22,  1.24s/it]

Error extracting text from https://www.google.com/amp/s/www.nytimes.com/2021/06/16/world/middleeast/israel-hamas-gaza-cease-fire.amp.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/16/world/middleeast/israel-hamas-gaza-cease-fire.html


Processing URLs:  84%|████████▍ | 838/1000 [32:19<03:18,  1.22s/it]

Error extracting text from https://www.theamericanconservative.com/articles/jordan-peterson-claims-hes-no-conservative/: 403 Client Error: Forbidden for url: https://www.theamericanconservative.com/articles/jordan-peterson-claims-hes-no-conservative/


Processing URLs:  84%|████████▍ | 839/1000 [32:20<03:33,  1.33s/it]

Error extracting text from http://blogs.wsj.com/washwire/2015/10/26/reasonable-doubts-about-the-inflation-outlook: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/10/26/reasonable-doubts-about-the-inflation-outlook


Processing URLs:  84%|████████▍ | 841/1000 [32:21<02:23,  1.10it/s]

Error extracting text from http://uk.businessinsider.com/r-french-eu-exit-would-be-tricky-for-a-le-pen-presidency-2016-12: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-french-eu-exit-would-be-tricky-for-a-le-pen-presidency-2016-12


Processing URLs:  84%|████████▍ | 844/1000 [32:24<02:24,  1.08it/s]

Error extracting text from https://panampost.com/elisa-vasquez/2014/07/17/nicaragua-rolls-out-red-carpet-for-russian-chinese-continental-hub/: 403 Client Error: Forbidden for url: https://panampost.com/elisa-vasquez/2014/07/17/nicaragua-rolls-out-red-carpet-for-russian-chinese-continental-hub/


Processing URLs:  84%|████████▍ | 845/1000 [32:30<05:48,  2.25s/it]

URL filtered: https://twitter.com/realDonaldTrump/status/1267129644228247552


Processing URLs:  85%|████████▌ | 852/1000 [32:39<03:01,  1.23s/it]

Error extracting text from https://amti.csis.org: 403 Client Error: Forbidden for url: https://amti.csis.org/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-amnesty-idUSKBN15M00F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-amnesty-idUSKBN15M00F


Processing URLs:  86%|████████▌ | 855/1000 [32:44<03:33,  1.47s/it]

URL filtered: https://www.news.com.au/world/cnn-bloomberg-bbc-reacts-to-australias-vaccine-rollout/news-story/2b1473613de8bb5aeac8299bd453ae93


Processing URLs:  86%|████████▌ | 858/1000 [32:46<02:09,  1.09it/s]

Error extracting text from http://www.wsj.com/articles/burundi-lawmakers-vote-to-withdraw-from-international-criminal-court-1476278413: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/burundi-lawmakers-vote-to-withdraw-from-international-criminal-court-1476278413


Processing URLs:  86%|████████▌ | 860/1000 [32:49<02:45,  1.18s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VV0RD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VV0RD


Processing URLs:  86%|████████▌ | 861/1000 [32:50<02:42,  1.17s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_52044.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_52044.htm


Processing URLs:  86%|████████▋ | 864/1000 [32:51<01:50,  1.24it/s]

Error extracting text from https://www.wsj.com/articles/the-science-suggests-a-wuhan-lab-leak-11622995184: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-science-suggests-a-wuhan-lab-leak-11622995184


Processing URLs:  87%|████████▋ | 866/1000 [33:52<26:13, 11.74s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2022-03-02/supreme-court-hearings-will-start-march-21-democrats-aim-for-jackson-confirmation-before-easter: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://www.wsj.com/articles/no-blanket-protection-for-internet-platforms-11609779436?mod=searchresults_pos2&amp;page=1: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/no-blanket-protection-for-internet-platforms-11609779436?mod=searchresults_pos2&amp;page=1


Processing URLs:  87%|████████▋ | 867/1000 [33:54<20:14,  9.13s/it]

Error extracting text from http://www.wboc.com/story/31288416/the-latest-belgium-imposes-controls-on-french-border: 404 Client Error: Not Found for url: https://www.wboc.com/story/31288416/the-latest-belgium-imposes-controls-on-french-border/


Processing URLs:  87%|████████▋ | 869/1000 [33:56<11:13,  5.14s/it]

Error extracting text from http://thehill.com/policy/finance/263122-11t-spending-bill-could-be-unveiled-monday-night: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/263122-11t-spending-bill-could-be-unveiled-monday-night/


Processing URLs:  87%|████████▋ | 872/1000 [33:57<04:17,  2.01s/it]

Error extracting text from http://www.nytimes.com/2009/07/23/world/africa/23sudan.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2009/07/23/world/africa/23sudan.html?_r=1
Error extracting text from http://www.cdm.me/english/grigic-nato-ratification-process-will-last-one-year: 403 Client Error: Forbidden for url: https://www.cdm.me/english/grigic-nato-ratification-process-will-last-one-year


Processing URLs:  87%|████████▋ | 873/1000 [33:58<03:08,  1.48s/it]

Error extracting text from http://www.vanguardngr.com/2016/03/agatu-genocide-benue-lawmakers-slam-buhari/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/03/agatu-genocide-benue-lawmakers-slam-buhari/


Processing URLs:  88%|████████▊ | 877/1000 [34:01<02:03,  1.00s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/duke-energy-ceo-cyber-threats-grow-epa-lawsuit-37315680: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/duke-energy-ceo-cyber-threats-grow-epa-lawsuit-37315680


Processing URLs:  88%|████████▊ | 882/1000 [34:10<02:17,  1.16s/it]

Error extracting text from https://publishingperspectives.com/2021/01/npd-2020-was-the-us-markets-bestselling-year-for-print-in-a-decade-covid19/: 403 Client Error: Forbidden for url: https://publishingperspectives.com/2021/01/npd-2020-was-the-us-markets-bestselling-year-for-print-in-a-decade-covid19/
Error extracting text from https://www.neweurope.eu/article/divided-nato-membership-montenegrins-go-polls-october-16/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/divided-nato-membership-montenegrins-go-polls-october-16/


Processing URLs:  88%|████████▊ | 883/1000 [34:10<01:41,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-idUSKBN0TR2TS20151208?#mRvHH01q6Gbx2WTR.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-idUSKBN0TR2TS20151208#mRvHH01q6Gbx2WTR.97


Processing URLs:  88%|████████▊ | 884/1000 [34:10<01:34,  1.23it/s]

Error extracting text from http://www.reuters.com/article/2015/10/18/us-montenegro-protests-idUSKCN0SC0SR20151018: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/18/us-montenegro-protests-idUSKCN0SC0SR20151018


Processing URLs:  89%|████████▉ | 888/1000 [34:13<01:15,  1.48it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VB1NY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VB1NY


Processing URLs:  89%|████████▉ | 892/1000 [34:19<02:02,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN17U33V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN17U33V


Processing URLs:  90%|████████▉ | 897/1000 [34:28<02:55,  1.70s/it]

Error extracting text from http://ewn.co.za/2016/09/17/Pityana-vows-to-mobilise-public-if-Zuma-continues-to-ignore-calls-to-resign: 404 Client Error: Not Found for url: https://www.ewn.co.za/2016/09/17/Pityana-vows-to-mobilise-public-if-Zuma-continues-to-ignore-calls-to-resign
Error extracting text from http://www.reuters.com/article/us-tesla-results-idUSKBN1612QO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-results-idUSKBN1612QO


Processing URLs:  90%|█████████ | 901/1000 [34:30<01:07,  1.47it/s]

Error extracting text from https://ics-cert.us-cert.gov/alerts/IR-ALERT-H-16-056-01: 403 Client Error: Forbidden for url: https://ics-cert.us-cert.gov/alerts/IR-ALERT-H-16-056-01
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-russia/russia-to-the-united-states-stay-in-iran-nuclear-deal-idUSKCN1BQ2UK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-russia/russia-to-the-united-states-stay-in-iran-nuclear-deal-idUSKCN1BQ2UK


Processing URLs:  90%|█████████ | 903/1000 [34:33<01:37,  1.01s/it]

Error extracting text from http://en.trend.az/iran/nuclearp/2434828.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2434828.html


Processing URLs:  90%|█████████ | 904/1000 [34:34<01:47,  1.12s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/exclusive-japan-s-far/2358158.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/exclusive-japan-s-far/2358158.html


Processing URLs:  91%|█████████ | 906/1000 [34:34<00:59,  1.58it/s]

Error extracting text from https://www.yahoo.com/news/colombia-government-eln-rebels-launch-peace-talks-october-003708953.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/colombia-government-eln-rebels-launch-peace-talks-october-003708953.html
Error extracting text from http://www.nytimes.com/2015/11/05/world/middleeast/iran-president-pushes-back-over-anti-us-crackdown.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/05/world/middleeast/iran-president-pushes-back-over-anti-us-crackdown.html


Processing URLs:  91%|█████████ | 907/1000 [34:35<00:46,  1.98it/s]

Error extracting text from https://blogs.wsj.com/moneybeat/2017/09/01/washington-ready-to-block-russian-takeover-of-u-s-oil-firm-energy-journal/: 403 Client Error: Forbidden for url: https://blogs.wsj.com/moneybeat/2017/09/01/washington-ready-to-block-russian-takeover-of-u-s-oil-firm-energy-journal/


Processing URLs:  91%|█████████ | 909/1000 [34:36<00:54,  1.68it/s]

Error extracting text from https://www.nytimes.com/2017/07/10/world/americas/kremlin-adoptions-sanctions-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/10/world/americas/kremlin-adoptions-sanctions-russia.html


Processing URLs:  92%|█████████▏| 915/1000 [34:49<02:59,  2.12s/it]

Error extracting text from http://www.tabletmag.com/jewish-news-and-politics/200941/assad-the-teflon-don: 403 Client Error: Forbidden for url: http://www.tabletmag.com/jewish-news-and-politics/200941/assad-the-teflon-don


Processing URLs:  92%|█████████▏| 916/1000 [34:50<02:43,  1.95s/it]

Error extracting text from https://tas-nextev.taleo.net/careersection/nextev_careers/jobsearch.ftl?lang=en#: HTTPSConnectionPool(host='tas-nextev.taleo.net', port=443): Max retries exceeded with url: /careersection/nextev_careers/jobsearch.ftl?lang=en (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff0bae40>: Failed to resolve 'tas-nextev.taleo.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▏| 918/1000 [34:55<02:50,  2.08s/it]

Error extracting text from http://thebulletin.org/north-koreas-nuclear-weapons-what-now: 404 Client Error: Not Found for url: https://thebulletin.org/2016/08/north-koreas-nuclear-weapons-what-now/


Processing URLs:  92%|█████████▏| 921/1000 [35:06<03:13,  2.45s/it]

Error extracting text from http://www.wsj.com/articles/in-afghanistan-a-secret-plan-pays-off-the-taliban-1463964545: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-afghanistan-a-secret-plan-pays-off-the-taliban-1463964545


Processing URLs:  92%|█████████▏| 923/1000 [35:07<01:49,  1.43s/it]

Error extracting text from http://www.irrawaddy.com/burma/htin-kyaw-tipped-as-nld-presidential-nominee.html: 403 Client Error: Forbidden for url: http://www.irrawaddy.com/burma/htin-kyaw-tipped-as-nld-presidential-nominee.html


Processing URLs:  93%|█████████▎| 926/1000 [35:13<01:51,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-russia-turkey-jet-idUSKCN0ZD1PR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-turkey-jet-idUSKCN0ZD1PR
Error extracting text from http://www.nytimes.com/2016/01/21/world/asia/north-korea-nuclear-china.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/21/world/asia/north-korea-nuclear-china.html?_r=0


Processing URLs:  93%|█████████▎| 927/1000 [35:15<02:00,  1.65s/it]

Error extracting text from http://in.reuters.com/article/india-economy-inflation-idINKCN10N1C9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  93%|█████████▎| 929/1000 [35:16<01:26,  1.22s/it]

Error extracting text from https://www.reuters.com/world/middle-east/us-envoy-back-gulf-push-yemen-truce-battles-spread-2021-07-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/us-envoy-back-gulf-push-yemen-truce-battles-spread-2021-07-28/


Processing URLs:  93%|█████████▎| 932/1000 [35:20<01:31,  1.34s/it]

Error extracting text from http://ewn.co.za/2016/02/10/Burundi-is-on-the-brink-a-crisis-explained: 404 Client Error: Not Found for url: https://www.ewn.co.za/2016/02/10/Burundi-is-on-the-brink-a-crisis-explained


Processing URLs:  93%|█████████▎| 933/1000 [35:21<01:21,  1.22s/it]

Error extracting text from https://global.handelsblatt.com/opinion/no-such-thing-as-the-global-village-681703: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/opinion/no-such-thing-as-the-global-village-681703


Processing URLs:  94%|█████████▎| 937/1000 [35:38<03:41,  3.52s/it]

Error extracting text from http://en.mercopress.com/2016/05/31/keiko-fujimori-seems-set-to-become-peru-s-next-president-on-june-5-: 404 Client Error: Not Found for url: https://en.mercopress.com/2016/05/31/keiko-fujimori-seems-set-to-become-peru-s-next-president-on-june-5-


Processing URLs:  94%|█████████▍| 945/1000 [35:45<00:54,  1.01it/s]

Error extracting text from http://www.reuters.com/video/2016/11/15/fight-to-retake-mosul-enters-fifth-week?videoId=370466443: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/11/15/fight-to-retake-mosul-enters-fifth-week?videoId=370466443


Processing URLs:  95%|█████████▍| 946/1000 [35:47<01:16,  1.41s/it]

Error extracting text from http://www.newsweek.com/rick-dearborn-trump-russia-email-654587: 403 Client Error: Forbidden for url: https://www.newsweek.com/rick-dearborn-trump-russia-email-654587
URL filtered: https://www.youtube.com/watch?v=UkH34XtDFxg


Processing URLs:  95%|█████████▍| 948/1000 [35:48<00:45,  1.14it/s]

Error extracting text from http://thehill.com/policy/national-security/325346-fbi-has-info-suggesting-coordination-between-trump-aides-russia: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/325346-fbi-has-info-suggesting-coordination-between-trump-aides-russia/


Processing URLs:  95%|█████████▌| 951/1000 [35:54<01:13,  1.50s/it]

Error extracting text from http://www.ibtimes.com/venezuela-debt-default-amid-oil-slump-fears-rise-again-after-china-market-meltdown-2072982: 403 Client Error: Forbidden for url: https://www.ibtimes.com/venezuela-debt-default-amid-oil-slump-fears-rise-again-after-china-market-meltdown-2072982


Processing URLs:  95%|█████████▌| 952/1000 [35:56<01:22,  1.73s/it]

Error extracting text from http://www.lowyinterpreter.org/post/2016/03/23/Philippines-vs-China-in-South-China-Sea-Tough-talking-by-observers-could-box-China-in.aspx: 404 Client Error: Not Found for url: https://www.lowyinstitute.org/the-interpreter/post/2016/03/23/Philippines-vs-China-in-South-China-Sea-Tough-talking-by-observers-could-box-China-in.aspx


Processing URLs:  96%|█████████▌| 957/1000 [36:09<01:58,  2.75s/it]

Error extracting text from http://www.newsweek.com/chinas-xi-jinping-eyeing-return-supreme-power-dictator-491894: 403 Client Error: Forbidden for url: https://www.newsweek.com/chinas-xi-jinping-eyeing-return-supreme-power-dictator-491894


Processing URLs:  96%|█████████▋| 963/1000 [36:16<00:53,  1.46s/it]

Error extracting text from https://www.nytimes.com/2017/04/19/us/politics/carter-page-russia-trump.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/19/us/politics/carter-page-russia-trump.html?_r=1


Processing URLs:  97%|█████████▋| 967/1000 [36:19<00:29,  1.13it/s]

Error extracting text from http://translate.google.com/translate?&amp;ie=UTF-8&amp;sl=&amp;tl=en&amp;u=http://www.pcnen.com/portal/2016/01/27/dukanovic-pozvao-opoziciju-u-vladu/: 400 Client Error: Bad Request for url: https://translate.google.com/translate?&amp;ie=UTF-8&amp;sl&amp;tl=en&amp;u=http://www.pcnen.com/portal/2016/01/27/dukanovic-pozvao-opoziciju-u-vladu/


Processing URLs:  97%|█████████▋| 971/1000 [36:27<00:40,  1.41s/it]

Error extracting text from http://www.tbo.com/news/business/congress-failure-to-reauthorize-export-import-bank-could-hinder-local-exporters-20151024/: 404 Client Error: Not Found for url: https://www.tbo.com:443/news/business/congress-failure-to-reauthorize-export-import-bank-could-hinder-local-exporters-20151024/
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-politics-insight-idUSKCN0UV0JC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-politics-insight-idUSKCN0UV0JC


Processing URLs:  98%|█████████▊| 978/1000 [36:35<00:21,  1.04it/s]

Error extracting text from http://www.advisorperspectives.com/dshort/updates/2016/12/01/the-q-ratio-and-market-valuation-november-update: 403 Client Error: Forbidden for url: https://www.advisorperspectives.com/dshort/updates/2016/12/01/the-q-ratio-and-market-valuation-november-update


Processing URLs:  98%|█████████▊| 979/1000 [36:36<00:18,  1.16it/s]

Error extracting text from https://www.nytimes.com/2021/01/30/world/europe/covid-vaccines-eu.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/30/world/europe/covid-vaccines-eu.html


Processing URLs:  98%|█████████▊| 981/1000 [36:38<00:15,  1.24it/s]

Error extracting text from https://www.wsj.com/articles/u-s-says-it-is-preparing-sanctions-against-more-venezuelan-officials-1503520203: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-says-it-is-preparing-sanctions-against-more-venezuelan-officials-1503520203
URL filtered: https://www.youtube.com/watch?v=4qHYOToZItY


Processing URLs:  98%|█████████▊| 983/1000 [36:41<00:22,  1.30s/it]

Error extracting text from http://frontex.europa.eu/trends-and-routes/migratory-routes-map: 404 Client Error: NOT FOUND for url: https://www.frontex.europa.eu//trends-and-routes/migratory-routes-map


Processing URLs:  98%|█████████▊| 985/1000 [36:45<00:24,  1.62s/it]

Error extracting text from http://en.trend.az/iran/nuclearp/2456707.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2456707.html
URL filtered: https://www.bloomberg.com/news/articles/2017-04-19/tesla-said-to-prepay-solar-bonds-but-not-to-musk-or-rives


Processing URLs:  99%|█████████▉| 988/1000 [37:48<03:02, 15.18s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-04-28/anti-isis-coalition-faces-challenges-in-mosul: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  99%|█████████▉| 991/1000 [37:53<01:04,  7.16s/it]

Error extracting text from http://tass.ru/en/economy/849000: 404 Client Error: Not Found for url: https://tass.ru/en/economy/849000


Processing URLs:  99%|█████████▉| 992/1000 [37:55<00:45,  5.75s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekly&amp;id=amazingspiderman2.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekly&amp;id=amazingspiderman2.htm


Processing URLs:  99%|█████████▉| 994/1000 [37:58<00:22,  3.71s/it]

Error extracting text from http://thehill.com/homenews/senate/354102-mccain-senate-armed-services-panel-to-investigate-russias-disinformation: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/354102-mccain-senate-armed-services-panel-to-investigate-russias-disinformation/


Processing URLs: 100%|█████████▉| 996/1000 [38:01<00:09,  2.44s/it]

Error extracting text from https://www.wsj.com/articles/senate-expected-to-confirm-neil-gorsuch-as-supreme-court-justice-1491557404: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/senate-expected-to-confirm-neil-gorsuch-as-supreme-court-justice-1491557404
Error extracting text from http://patriotupdate.com/slovenia-offers-host-trump-putin-meeting/: HTTPConnectionPool(host='patriotupdate.com', port=80): Max retries exceeded with url: /slovenia-offers-host-trump-putin-meeting/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcff20>: Failed to resolve 'patriotupdate.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/Conflicts/status/1425819929995989003


Processing URLs: 100%|█████████▉| 999/1000 [39:02<00:12, 12.47s/it]

Error extracting text from http://aa.com.tr/en/politics/colombia-govt-confident-of-impending-peace-deal-with-farc/533242: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs: 100%|██████████| 1000/1000 [39:04<00:00,  2.34s/it]
Processing URLs:   0%|          | 5/1000 [00:07<17:02,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#6iTx3ZvSUUIMm0uj.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#6iTx3ZvSUUIMm0uj.97


Processing URLs:   1%|          | 8/1000 [00:11<23:48,  1.44s/it]

Error extracting text from http://www.state.gov/documents/organization/236395.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:   1%|          | 11/1000 [05:21<26:04:33, 94.92s/it]

URL filtered: https://www.youtube.com/watch?v=EVFdTjvgfZo&amp;feature=youtu.be


Processing URLs:   1%|▏         | 13/1000 [06:21<17:43:28, 64.65s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-11-21/denmark-to-ramp-up-cyber-security-efforts-defense-minister: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   2%|▏         | 18/1000 [06:27<3:52:09, 14.18s/it] 

Error extracting text from https://ycharts.com/indicators/world_crude_oil_production: 403 Client Error: Forbidden for url: https://ycharts.com/indicators/world_crude_oil_production


Processing URLs:   2%|▏         | 20/1000 [06:31<2:10:46,  8.01s/it]

Error extracting text from https://medium.com/@karpathy/icml-accepted-papers-institution-stats-bad8d2943f5d: 403 Client Error: Forbidden for url: https://medium.com/@karpathy/icml-accepted-papers-institution-stats-bad8d2943f5d


Processing URLs:   2%|▏         | 22/1000 [06:33<1:13:40,  4.52s/it]

Error extracting text from https://www.americaspace.com/2021/03/05/starliners-oft-2-launch-date-under-review-ahead-of-busy-april-at-space-station/: 403 Client Error: Forbidden for url: https://www.americaspace.com/2021/03/05/starliners-oft-2-launch-date-under-review-ahead-of-busy-april-at-space-station/
URL filtered: https://www.bloomberg.com/news/articles/2021-02-18/u-k-airlines-urge-johnson-to-lay-out-path-for-travel-reopening


Processing URLs:   3%|▎         | 26/1000 [06:35<30:18,  1.87s/it]  

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jovempan.uol.com.br/noticias/brasil/politica/para-juristas-delacao-de-delcidio-reforca-chance-de-impeachment-de-dilma.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://jovempan.uol.com.br/noticias/brasil/politica/para-juristas-delacao-de-delcidio-reforca-chance-de-impeachment-de-dilma.html&amp;prev=search


Processing URLs:   3%|▎         | 28/1000 [06:40<33:31,  2.07s/it]

Error extracting text from http://www.amazon.com/Superforecasting-The-Art-Science-Prediction/dp/0804136696: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Superforecasting-The-Art-Science-Prediction/dp/0804136696


Processing URLs:   4%|▎         | 35/1000 [06:56<27:28,  1.71s/it]

Error extracting text from http://www.france24.com/en/20160229-iran-vote-nuclear-deal-gives-reformists-momentum: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160229-iran-vote-nuclear-deal-gives-reformists-momentum


Processing URLs:   4%|▍         | 39/1000 [06:58<11:09,  1.44it/s]

Error extracting text from http://www.reuters.com/article/us-nuclear-cyber-idUSKCN12A1OC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nuclear-cyber-idUSKCN12A1OC
Error extracting text from http://www.wsj.com/articles/tesla-reports-lower-than-expected-first-quarter-sales-1459802960: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tesla-reports-lower-than-expected-first-quarter-sales-1459802960


Processing URLs:   4%|▍         | 40/1000 [06:58<09:53,  1.62it/s]

Error extracting text from https://thehill.com/homenews/house/535443-ethics-complaint-filed-against-biggs-gosar-and-cawthorn-over-capitol-riot: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/535443-ethics-complaint-filed-against-biggs-gosar-and-cawthorn-over-capitol-riot/


Processing URLs:   4%|▍         | 42/1000 [07:03<25:05,  1.57s/it]

Error extracting text from https://www.dailystar.com.lb/News/Middle-East/2016/May-05/350624-aleppo-truce-revived-after-days-of-bloodshed.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/May-05/350624-aleppo-truce-revived-after-days-of-bloodshed.ashx


Processing URLs:   4%|▍         | 44/1000 [07:04<15:38,  1.02it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html#polls


Processing URLs:   5%|▍         | 47/1000 [07:05<09:42,  1.64it/s]

Error extracting text from https://bit.ly/3lVunrA: 403 Client Error: Forbidden for url: https://www.bls.gov/news.release/laus.nr0.htm
Error extracting text from http://www.nytimes.com/2016/08/07/world/middleeast/military-syria-putin-us-proxy-war.html?smid=nytcore-ipad-share&amp;smprod=nytcore-ipad&amp;_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/07/world/middleeast/military-syria-putin-us-proxy-war.html?smid=nytcore-ipad-share&amp;smprod=nytcore-ipad&amp;_r=1


Processing URLs:   5%|▍         | 49/1000 [07:07<11:31,  1.38it/s]

Error extracting text from https://www.nytimes.com/2017/03/24/us/politics/health-care-affordable-care-act.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/24/us/politics/health-care-affordable-care-act.html


Processing URLs:   5%|▌         | 50/1000 [07:08<10:22,  1.53it/s]

Error extracting text from https://www.bbc.com/news/uk-politics-5870068: 404 Client Error: Not Found for url: https://www.bbc.com/news/uk-politics-5870068


Processing URLs:   5%|▌         | 52/1000 [07:13<23:08,  1.47s/it]

Error extracting text from https://t.co/ifL8VPqo1f: 400 Client Error: Bad Request for url: https://twitter.com/estNATO/status/667738573450248192/photo/1
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYDKPH6JIJV201-32K0TQA2UDDI2RN1GMD0JEECUN


Processing URLs:   6%|▌         | 57/1000 [07:19<16:56,  1.08s/it]

URL filtered: https://www.youtube.com/watch?v=D_M-SskpGi4
Error extracting text from http://www.nytimes.com/2016/08/31/world/europe/spain-government-mariano-rajoy.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/31/world/europe/spain-government-mariano-rajoy.html?_r=0


Processing URLs:   6%|▌         | 62/1000 [07:24<11:15,  1.39it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://rota2014.blogspot.com/2016/03/governo-corrupto-da-dupla-lula-dilma-ja.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://rota2014.blogspot.com/2016/03/governo-corrupto-da-dupla-lula-dilma-ja.html&amp;prev=search
Error extracting text from http://www.nytimes.com/2015/11/17/world/middleeast/syria-iran-reaffirms-a-role-for-assad.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/17/world/middleeast/syria-iran-reaffirms-a-role-for-assad.html


Processing URLs:   6%|▋         | 63/1000 [07:24<08:42,  1.79it/s]

Error extracting text from https://www.piie.com/blogs/trade-investment-policy-watch/how-long-does-it-take-conclude-trade-agreement-us: 403 Client Error: Forbidden for url: https://www.piie.com/blogs/trade-investment-policy-watch/how-long-does-it-take-conclude-trade-agreement-us


Processing URLs:   7%|▋         | 66/1000 [07:26<09:29,  1.64it/s]

URL filtered: https://twitter.com/ctbto_alerts
Error extracting text from http://www.timesofisrael.com/for-saudi-arabia-depite-ahmadinejads-visit-iran-remains-the-snake/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/for-saudi-arabia-depite-ahmadinejads-visit-iran-remains-the-snake/


Processing URLs:   7%|▋         | 74/1000 [07:36<14:59,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-battle-idUSKBN1350GB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-battle-idUSKBN1350GB


Processing URLs:   8%|▊         | 77/1000 [07:40<20:13,  1.31s/it]

URL filtered: https://twitter.com/rcepnews


Processing URLs:   8%|▊         | 81/1000 [07:44<20:03,  1.31s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_49736.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49736.htm


Processing URLs:   9%|▊         | 86/1000 [07:53<27:41,  1.82s/it]

URL filtered: https://www.youtube.com/watch?v=9bKwRW0l-Qk
Error extracting text from https://www.france24.com/en/20180527-italys-pm-designate-gives-efforts-form-government: 403 Client Error: Forbidden for url: https://www.france24.com/en/20180527-italys-pm-designate-gives-efforts-form-government
URL filtered: https://www.bloomberg.com/news/articles/2021-06-27/india-shifts-50-000-troops-to-china-border-in-historic-defense-shift


Processing URLs:   9%|▉         | 91/1000 [07:54<11:46,  1.29it/s]

Error extracting text from https://www.reuters.tv/v/2eQ/2016/10/13/erdogan-s-popularity-grows-despite-controversy: HTTPSConnectionPool(host='www.reuters.tv', port=443): Max retries exceeded with url: /v/2eQ/2016/10/13/erdogan-s-popularity-grows-despite-controversy (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe184320>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|▉         | 95/1000 [08:00<13:47,  1.09it/s]

Error extracting text from https://support.ancestry.com/s/question/0D515000027ncX3CAI/uploading-raw-data-from-23andme-to-ancestrycom: 403 Client Error: Forbidden for url: https://support.ancestry.com/s/question/0D515000027ncX3CAI/uploading-raw-data-from-23andme-to-ancestrycom


Processing URLs:  10%|▉         | 96/1000 [08:02<18:08,  1.20s/it]

Error extracting text from http://www.state.gov/secretary/remarks/2015/12/250876.htmhttp://www.state.gov/secretary/remarks/2015/12/250876.htm: 404 Client Error: Not Found for url: https://www.state.gov/remarks-secretary-pompeo/


Processing URLs:  10%|█         | 100/1000 [08:09<17:44,  1.18s/it]

Error extracting text from https://www.google.com/amp/s/www.nytimes.com/2021/08/03/nyregion/nyc-vaccine-mandate.amp.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/03/nyregion/nyc-vaccine-mandate.html
Error extracting text from https://ahvalnews.com/turkish-lira/turkish-lira-falls-after-erdogan-tells-biden-position-s-400s-unchanged: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkish-lira/turkish-lira-falls-after-erdogan-tells-biden-position-s-400s-unchanged


Processing URLs:  10%|█         | 101/1000 [08:09<13:40,  1.10it/s]

Error extracting text from http://www.cdm.me/english/sutlic-peric-montenegro-in-nato-no-later-than-spring-2017: 403 Client Error: Forbidden for url: https://www.cdm.me/english/sutlic-peric-montenegro-in-nato-no-later-than-spring-2017
Error extracting text from http://www.balkaninsight.com/en/article/governments-anti-eu-rhetoric-affects-serbian-citizens-02-07-2017: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/governments-anti-eu-rhetoric-affects-serbian-citizens-02-07-2017


Processing URLs:  11%|█         | 106/1000 [09:17<4:25:30, 17.82s/it]

Error extracting text from http://www.u.tv/News/2015/10/07/Onslaught-of-Tory-policies-threatens-Stormont-46457: HTTPSConnectionPool(host='www.itv.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  11%|█         | 108/1000 [09:18<2:20:29,  9.45s/it]

Error extracting text from http://www.reuters.com/article/us-china-southkorea-security-idUSKCN0ZF12J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-southkorea-security-idUSKCN0ZF12J
URL filtered: http://www.bloomberg.com/news/articles/2016-05-12/iran-oil-output-rose-to-pre-sanctions-levels-in-april-iea-says-io408r1c


Processing URLs:  11%|█▏        | 114/1000 [09:27<36:59,  2.50s/it]  

Error extracting text from https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html?action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=opinion-c-col-right-region&amp;region=opinion-c-col-right-region&amp;WT.nav=opinion-c-col-right-region&amp;_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html?action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=opinion-c-col-right-region&amp;region=opinion-c-col-right-region&amp;WT.nav=opinion-c-col-right-region&amp;_r=1


Processing URLs:  12%|█▏        | 115/1000 [09:28<29:19,  1.99s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/17/gitrep-16mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/17/gitrep-16mar16pm/
URL filtered: https://www.bloomberg.com/news/articles/2017-04-10/fed-rate-hikes-raise-risks-for-asian-nations-swimming-in-debt


Processing URLs:  12%|█▏        | 117/1000 [09:30<23:52,  1.62s/it]

Error extracting text from http://www.foxnews.com/politics/2015/10/13/fox-news-poll-biden-more-electable-than-clinton/?intcmp=hpbt1: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/10/13/fox-news-poll-biden-more-electable-than-clinton/?intcmp=hpbt1
Error extracting text from http://www.iol.co.za/news/africa/chikane-joins-peace-mission-to-burundi-1993887: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/africa/chikane-joins-peace-mission-to-burundi-1993887


Processing URLs:  12%|█▏        | 119/1000 [09:30<15:13,  1.04s/it]

Error extracting text from http://news.yahoo.com/iran-fires-2-missiles-marked-israel-must-wiped-071612751.html?soc_src=mediacontentstory&amp;soc_trk=ma: 404 Client Error: Not Found for url: http://news.yahoo.com/iran-fires-2-missiles-marked-israel-must-wiped-071612751.html?soc_src=mediacontentstory&amp;soc_trk=ma


Processing URLs:  12%|█▏        | 121/1000 [09:32<14:43,  1.01s/it]

Error extracting text from https://www.wsj.com/articles/oil-buoyed-by-geopolitical-concerns-1510312343: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-buoyed-by-geopolitical-concerns-1510312343


Processing URLs:  12%|█▏        | 122/1000 [09:34<15:13,  1.04s/it]

Error extracting text from http://www.reuters.com/video/2016/07/01/chinas-xi-warns-on-war-against-corruptio?videoId=369133783&amp;videoChannel=118169: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/07/01/chinas-xi-warns-on-war-against-corruptio?videoId=369133783&amp;videoChannel=118169


Processing URLs:  12%|█▏        | 123/1000 [10:34<4:04:36, 16.73s/it]

Error extracting text from http://www.seattletimes.com/business/supporters-of-embattled-ex-im-bank-move-to-bypass-gop-foes/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  12%|█▏        | 124/1000 [10:34<3:00:00, 12.33s/it]

Error extracting text from http://www.arirang.co.kr/News/News_View.asp?nseq=188383: 404 Client Error:  for url: http://www.arirang.co.kr/News/News_View.asp?nseq=188383


Processing URLs:  13%|█▎        | 131/1000 [10:43<33:01,  2.28s/it]  

Error extracting text from https://finance.yahoo.com/news/germany-four-months-certify-nord-114609484.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/germany-four-months-certify-nord-114609484.html


Processing URLs:  14%|█▍        | 145/1000 [11:07<20:18,  1.42s/it]

Error extracting text from http://philosophyfaculty.ucsd.edu/faculty/rarneson/Courses/thomsonTROLLEY.pdf: 404 Client Error: Not Found for url: http://philosophyfaculty.ucsd.edu/faculty/rarneson/Courses/thomsonTROLLEY.pdf


Processing URLs:  15%|█▍        | 146/1000 [11:08<17:38,  1.24s/it]

Error extracting text from http://allafrica.com/stories/201610090013.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201610090013.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffd5e240>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  15%|█▍        | 147/1000 [11:09<18:30,  1.30s/it]

Error extracting text from http://www.balkaninsight.com/en/article/support-for-eu-in-serbia-dropping-russia-seen-as-important-power-survey-03-08-2017: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/support-for-eu-in-serbia-dropping-russia-seen-as-important-power-survey-03-08-2017


Processing URLs:  15%|█▍        | 149/1000 [12:09<3:26:55, 14.59s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2018-02-18/second-afghan-governor-defies-president-ghani: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  15%|█▌        | 151/1000 [13:10<5:39:07, 23.97s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article41972739.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://www.bloomberg.com/news/articles/2016-12-23/record-saudi-bond-sale-was-just-the-start-here-is-what-s-next
URL filtered: https://www.youtube.com/watch?v=2OOb8ngwnH0


Processing URLs:  16%|█▌        | 160/1000 [13:27<1:00:06,  4.29s/it]

Error extracting text from http://fuelfix.com/blog/2017/02/06/opec-meets-91-percent-of-promised-output-cuts-in-january: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2017/02/06/opec-meets-91-percent-of-promised-output-cuts-in-january


Processing URLs:  16%|█▌        | 162/1000 [13:32<46:11,  3.31s/it]  

URL filtered: https://www.bloomberg.com/view/articles/2016-08-07/why-china-can-t-solve-its-debt-problem


Processing URLs:  16%|█▋        | 164/1000 [13:33<28:51,  2.07s/it]

Error extracting text from https://warontherocks.com/2021/05/a-five-ring-circus-in-china-the-proposed-boycott-of-the-2022-winter-olympics/: 403 Client Error: Forbidden for url: https://warontherocks.com/2021/05/a-five-ring-circus-in-china-the-proposed-boycott-of-the-2022-winter-olympics/


Processing URLs:  17%|█▋        | 167/1000 [14:35<4:01:11, 17.37s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/national-security/article91949562.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  17%|█▋        | 168/1000 [14:37<2:59:46, 12.97s/it]

Error extracting text from https://bit.ly/3dBU4Kt: 403 Client Error: Forbidden for url: https://capx.co/after-14-years-scotland-is-suffering-from-snp-stockholm-syndrome/?omhide=true&utm_source=newsletter&utm_medium=email&utm_campaign=21/04/2021&cmid=b7c80f8e-4a52-4f35-a6e6-0f216c7a45d4


Processing URLs:  17%|█▋        | 171/1000 [14:41<1:18:42,  5.70s/it]

Error extracting text from http://www.ibtimes.com/north-koreas-mobile-missile-launcher-seen-moving-japans-nhk-reports-2293167: 403 Client Error: Forbidden for url: https://www.ibtimes.com/north-koreas-mobile-missile-launcher-seen-moving-japans-nhk-reports-2293167


Processing URLs:  17%|█▋        | 173/1000 [14:44<48:43,  3.53s/it]  

Error extracting text from http://www.aina.org/news/20160625002613.htm: 404 Client Error:  for url: http://www.aina.org/news/20160625002613.htm


Processing URLs:  17%|█▋        | 174/1000 [15:01<1:41:15,  7.35s/it]

Error extracting text from https://www.investopedia.com/how-amazon-makes-money-4587523#:~:text=Amazon%20makes%20money%20through%20its,profits%20and%20is%20growing%20fast: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/how-amazon-makes-money-4587523#:~:text=Amazon%20makes%20money%20through%20its,profits%20and%20is%20growing%20fast


Processing URLs:  18%|█▊        | 176/1000 [16:03<5:04:03, 22.14s/it]

Error extracting text from https://www.seattletimes.com/nation-world/as-haiti-reels-from-earthquake-and-deals-with-discord-leader-expects-election-delay/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  18%|█▊        | 178/1000 [17:06<6:42:46, 29.40s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-06-20/venezuelan-crisis-takes-center-stage-at-oas-meeting-in-mexico: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  18%|█▊        | 180/1000 [17:07<3:22:41, 14.83s/it]

Error extracting text from https://www.un.org/press/en/2021/sc14462.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/sc14462.doc.htm


Processing URLs:  18%|█▊        | 184/1000 [17:10<53:04,  3.90s/it]  

Error extracting text from http://www.barrons.com/quote/stock/HYEM: 403 Client Error: Forbidden for url: http://www.barrons.com/quote/stock/HYEM


Processing URLs:  19%|█▊        | 187/1000 [17:13<26:56,  1.99s/it]

Error extracting text from https://www.apks.com/en/perspectives/publications/2017/07/fda-releases-work-plan-implementing: HTTPSConnectionPool(host='www.apks.com', port=443): Max retries exceeded with url: /en/perspectives/publications/2017/07/fda-releases-work-plan-implementing (Caused by SSLError(SSLError(1, '[SSL: TLSV1_UNRECOGNIZED_NAME] tlsv1 unrecognized name (_ssl.c:1000)')))


Processing URLs:  19%|█▉        | 189/1000 [17:15<18:07,  1.34s/it]

Error extracting text from http://www.latimes.com/politics/la-na-democrats-clinton-sanders-new-hampshire-20151224-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-democrats-clinton-sanders-new-hampshire-20151224-story.html


Processing URLs:  19%|█▉        | 190/1000 [17:16<19:42,  1.46s/it]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2016/april/tradoc_154480.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2016/april/tradoc_154480.pdf


Processing URLs:  20%|█▉        | 195/1000 [17:29<31:37,  2.36s/it]

Error extracting text from http://www.scout.com/military/warrior/story/1644822-u-s-may-put-weapons-in-south-china-sea: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  20%|█▉        | 197/1000 [17:30<19:28,  1.45s/it]

Error extracting text from http://www.nytimes.com/2016/06/05/world/middleeast/iraqi-army-seen-as-ill-equipped-to-retake-mosul-from-isis-despite-us-aid.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/05/world/middleeast/iraqi-army-seen-as-ill-equipped-to-retake-mosul-from-isis-despite-us-aid.html


Processing URLs:  20%|██        | 200/1000 [17:54<1:29:26,  6.71s/it]

Error extracting text from https://www.almasdarnews.com/article/battle-aleppo-reaches-uneasy-stalemate-map-update/: 522 Server Error:  for url: https://www.almasdarnews.com/article/battle-aleppo-reaches-uneasy-stalemate-map-update/


Processing URLs:  20%|██        | 202/1000 [17:58<57:28,  4.32s/it]  

Error extracting text from http://www.balkaninsight.com/en/article/nato-rejects-montenegro-membership-in-2014: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-rejects-montenegro-membership-in-2014


Processing URLs:  20%|██        | 204/1000 [17:58<31:51,  2.40s/it]

Error extracting text from http://fr.africatime.com/burundi/articles/au-burundi-les-fonctionnaires-sont-mecontents: 404 Client Error: Not Found for url: http://fr.africatime.com/burundi/articles/au-burundi-les-fonctionnaires-sont-mecontents


Processing URLs:  20%|██        | 205/1000 [18:01<31:54,  2.41s/it]

Error extracting text from https://www.stripes.com/news/middle-east/taliban-capture-two-more-districts-as-summertime-fighting-continues-1.479462#.WXYhADOUX-Y: 404 Client Error: Not Found for url: https://www.stripes.com/theaters/middle_east/taliban-capture-two-more-districts-as-summertime-fighting-continues-1.479462#.WXYhADOUX-Y
URL filtered: https://www.youtube.com/watch?v=xGbI87tyr_4


Processing URLs:  21%|██        | 209/1000 [18:04<17:29,  1.33s/it]

Error extracting text from http://www.comres.co.uk/polls/itv-news-eu-referendum-poll/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/itv-news-eu-referendum-poll/


Processing URLs:  21%|██        | 212/1000 [18:08<14:45,  1.12s/it]

Error extracting text from http://www.latimes.com/business/la-fi-hy-musk-subsidies-20150531-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-hy-musk-subsidies-20150531-story.html


Processing URLs:  21%|██▏       | 213/1000 [18:09<16:19,  1.24s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X00ST: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X00ST


Processing URLs:  22%|██▏       | 215/1000 [18:11<13:35,  1.04s/it]

Error extracting text from http://maritime-executive.com/article/israeli-navy-backs-netanyahus-submarine-deal: 404 Client Error: Not Found for url: https://maritime-executive.com/403.shtml


Processing URLs:  22%|██▏       | 220/1000 [18:24<27:15,  2.10s/it]

Error extracting text from http://www.techinsider.io/north-korea-worlds-best-hackers-2016-5-: 404 Client Error: Not Found for url: https://www.businessinsider.com/north-korea-worlds-best-hackers-2016-5-


Processing URLs:  22%|██▏       | 222/1000 [18:26<20:44,  1.60s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-06-20/joe-manchin-and-the-fight-over-voting-rights-in-american-history


Processing URLs:  23%|██▎       | 228/1000 [18:31<13:11,  1.03s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/279743-republican-bill-would-force-obama-to-sanction-iranian-hackers: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/279743-republican-bill-would-force-obama-to-sanction-iranian-hackers/


Processing URLs:  23%|██▎       | 229/1000 [19:32<3:50:18, 17.92s/it]

Error extracting text from http://www.marketminder.com/a/fisher-investments-book-review-is-forecasting-a-general-skillset/053f643e-86b2-4936-b253-cda14fe868a6.aspx: HTTPConnectionPool(host='www.marketminder.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  23%|██▎       | 234/1000 [19:36<50:56,  3.99s/it]  

Error extracting text from https://coronavirus.data.gov.uk/details/whats-new/record/863ee4e6-faae-4c30-a935-431bc02640d9: 404 Client Error: The requested content does not exist. for url: https://coronavirus.data.gov.uk/details/whats-new/record/863ee4e6-faae-4c30-a935-431bc02640d9


Processing URLs:  24%|██▍       | 241/1000 [19:52<29:25,  2.33s/it]

Error extracting text from http://uk.reuters.com/article/uk-poland-constitution-eu-schinas-idUKKCN0YF17Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  24%|██▍       | 242/1000 [20:52<4:08:11, 19.65s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-11-27/trump-will-not-campaign-for-alabama-republican-senate-candidate-moore: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  24%|██▍       | 243/1000 [20:52<2:54:25, 13.83s/it]

Error extracting text from https://www.nytimes.com/2021/07/03/world/asia/japan-myanmar-soccer-asylum.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/03/world/asia/japan-myanmar-soccer-asylum.html


Processing URLs:  25%|██▍       | 248/1000 [20:59<41:45,  3.33s/it]  

Error extracting text from http://www.snp.org/statement_on_euref_result_and_it_s_implications_for_scotland: 403 Client Error: Forbidden for url: https://www.snp.org/statement_on_euref_result_and_it_s_implications_for_scotland


Processing URLs:  25%|██▌       | 251/1000 [21:05<30:21,  2.43s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-opposition-idUSKCN0V40MJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-opposition-idUSKCN0V40MJ


Processing URLs:  25%|██▌       | 253/1000 [21:05<18:04,  1.45s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/338748-lewandowski-shut-down-the-special-counsel-investigation: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/338748-lewandowski-shut-down-the-special-counsel-investigation/


Processing URLs:  26%|██▌       | 255/1000 [22:07<3:28:38, 16.80s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/venezuela/article170795077.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  26%|██▌       | 260/1000 [22:16<1:03:48,  5.17s/it]

Error extracting text from http://rbth.com/news/2017/04/24/erdogan-hopes-to-discuss-contract-on-s-400-missile-systems-with-putin_749133: 404 Client Error: Not Found for url: https://www.rbth.com/news/2017/04/24/erdogan-hopes-to-discuss-contract-on-s-400-missile-systems-with-putin_749133


Processing URLs:  26%|██▌       | 261/1000 [22:17<50:58,  4.14s/it]  

Error extracting text from http://finance.yahoo.com/news/senator-kentucky-now-working-putin-204934020.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/senator-kentucky-now-working-putin-204934020.html


Processing URLs:  26%|██▋       | 263/1000 [22:19<31:07,  2.53s/it]

Error extracting text from http://thehill.com/homenews/campaign/361080-top-alabama-newspaper-voters-must-reject-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361080-top-alabama-newspaper-voters-must-reject-moore/
URL filtered: https://www.youtube.com/watch?v=eziAZivQ3J8&amp;t=467s


Processing URLs:  27%|██▋       | 267/1000 [22:21<15:35,  1.28s/it]

Error extracting text from http://www.wsj.com/articles/oil-export-ban-drags-down-spending-negotiations-1450198702: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-export-ban-drags-down-spending-negotiations-1450198702


Processing URLs:  27%|██▋       | 272/1000 [22:27<11:14,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-britain-politics-election-idUSKBN16R0UY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-politics-election-idUSKBN16R0UY?il=0


Processing URLs:  28%|██▊       | 280/1000 [22:39<18:16,  1.52s/it]

Error extracting text from http://tass.ru/en/world/838638: 404 Client Error: Not Found for url: https://tass.ru/en/world/838638


Processing URLs:  28%|██▊       | 281/1000 [22:47<40:32,  3.38s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/08/10/771144/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/08/10/771144/story.html


Processing URLs:  28%|██▊       | 282/1000 [22:47<30:32,  2.55s/it]

Error extracting text from http://evobsession.com/wp-content/uploads/2016/01/CEMAC-Automotve-Lithium-ion-Battery-LIB-Supply-Chain-and-U.S.-Competitiveness-Considerations-June-2015.pdf: 403 Client Error: Forbidden for url: http://evobsession.com/wp-content/uploads/2016/01/CEMAC-Automotve-Lithium-ion-Battery-LIB-Supply-Chain-and-U.S.-Competitiveness-Considerations-June-2015.pdf


Processing URLs:  28%|██▊       | 285/1000 [22:51<19:33,  1.64s/it]

Error extracting text from https://www.reuters.com/article/us-usa-trump-congress-factbox/factbox-now-what-happens-to-the-892-billion-covid-aid-bill-here-are-three-options-idUSKBN28Y22S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-congress-factbox/factbox-now-what-happens-to-the-892-billion-covid-aid-bill-here-are-three-options-idUSKBN28Y22S


Processing URLs:  29%|██▉       | 288/1000 [22:53<12:12,  1.03s/it]

Error extracting text from http://www.rigzone.com/news/oil_gas/a/151783/Oil_Cargoes_Shift_Away_Venezuela_As_Hurricanes_Extend_Disruptions?all=HG2: 403 Client Error: Forbidden for url: http://www.rigzone.com/news/oil_gas/a/151783/Oil_Cargoes_Shift_Away_Venezuela_As_Hurricanes_Extend_Disruptions?all=HG2


Processing URLs:  29%|██▉       | 290/1000 [22:57<17:11,  1.45s/it]

Error extracting text from http://www.tradingeconomics.com/south-africa/unemployment-rate/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/south-africa/unemployment-rate/forecast
Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-putin-idUSKBN18Y0KC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-putin-idUSKBN18Y0KC


Processing URLs:  29%|██▉       | 292/1000 [22:59<14:18,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-india-kashmir-idUSKBN17J0HZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-kashmir-idUSKBN17J0HZ


Processing URLs:  29%|██▉       | 294/1000 [22:59<09:32,  1.23it/s]

Error extracting text from https://www.nytimes.com/2017/08/12/us/politics/mueller-trump-russia-priebus.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/12/us/politics/mueller-trump-russia-priebus.html


Processing URLs:  30%|██▉       | 295/1000 [23:01<13:06,  1.12s/it]

URL filtered: https://twitter.com/elonmusk/status/1076608579652616192


Processing URLs:  30%|███       | 300/1000 [23:10<13:30,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-germany-idUSKBN15H0MZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-germany-idUSKBN15H0MZ
URL filtered: https://twitter.com/CJBrownLaw/status/748612948747128832


Processing URLs:  30%|███       | 305/1000 [23:11<05:28,  2.11it/s]

Error extracting text from http://www.nytimes.com/2016/01/14/world/asia/south-korea-china-north-nuclear.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/world/asia/south-korea-china-north-nuclear.html
Error extracting text from https://ahvalnews.com/turkey-banking/turkish-banks-teeter-brink: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkey-banking/turkish-banks-teeter-brink


Processing URLs:  31%|███       | 306/1000 [23:13<10:31,  1.10it/s]

Error extracting text from http://www.oxitec.com/press-release-recent-outbreak-of-zika-virus-in-brazil-creates-pressing-need-for-effective-vector-control-solutions/: 404 Client Error: Not Found for url: https://www.oxitec.com/press-release-recent-outbreak-of-zika-virus-in-brazil-creates-pressing-need-for-effective-vector-control-solutions/


Processing URLs:  31%|███       | 307/1000 [23:14<09:33,  1.21it/s]

Error extracting text from https://www.reuters.com/world/europe/nord-stream-2-gas-pipeline-start-operating-days-russias-lavrov-2021-09-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/nord-stream-2-gas-pipeline-start-operating-days-russias-lavrov-2021-09-06/


Processing URLs:  31%|███       | 309/1000 [23:14<06:05,  1.89it/s]

Error extracting text from https://www.nytimes.com/live/2021/09/20/world/canada-election-2021: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/09/20/world/canada-election-2021


Processing URLs:  31%|███       | 310/1000 [23:14<05:44,  2.00it/s]

Error extracting text from http://www.amazon.com/best-sellers-books-Amazon/zgbs/books/ref=zg_bs_unv_b_1_2689_3: 503 Server Error: Service Unavailable for url: https://www.amazon.com/best-sellers-books-Amazon/zgbs/books/ref=zg_bs_unv_b_1_2689_3


Processing URLs:  31%|███       | 311/1000 [23:15<04:56,  2.32it/s]

Error extracting text from https://www.nytimes.com/2018/01/24/us/politics/trump-mueller.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/24/us/politics/trump-mueller.html


Processing URLs:  31%|███▏      | 313/1000 [23:16<04:59,  2.30it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/07/21/world/asia/ap-as-china-south-china-sea.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/07/21/world/asia/ap-as-china-south-china-sea.html


Processing URLs:  31%|███▏      | 314/1000 [23:17<07:06,  1.61it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/news_131132.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_131132.htm?selectedLocale=en


Processing URLs:  32%|███▏      | 318/1000 [23:22<11:02,  1.03it/s]

Error extracting text from http://www.balkaninsight.com/en/article/corruption-justice-top-agenda-in-commission-report-11-10-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/corruption-justice-top-agenda-in-commission-report-11-10-2015


Processing URLs:  32%|███▏      | 321/1000 [23:23<07:41,  1.47it/s]

Error extracting text from https://theconversation.com/four-in-five-new-zealanders-plan-to-get-vaccinated-but-many-people-want-more-information-about-vaccine-safety-164322: 403 Client Error: Forbidden for url: https://theconversation.com/four-in-five-new-zealanders-plan-to-get-vaccinated-but-many-people-want-more-information-about-vaccine-safety-164322
URL filtered: https://twitter.com/katyafimava/status/1434874845712556033


Processing URLs:  32%|███▏      | 324/1000 [23:27<12:29,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKBN16F0IO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKBN16F0IO?il=0
URL filtered: https://www.linkedin.com/in/srdjan-darmanovic-214715a4
URL filtered: http://www.bloomberg.com/news/articles/2016-10-20/where-the-next-crisis-will-come-from


Processing URLs:  33%|███▎      | 329/1000 [23:31<10:33,  1.06it/s]

Error extracting text from http://www.iranpolitik.com/2015/12/22/analysis/larijani-refuses-join-conservative-electoral-alliance/: 404 Client Error: Not Found for url: http://www.iranpolitik.com/2015/12/22/analysis/larijani-refuses-join-conservative-electoral-alliance/


Processing URLs:  33%|███▎      | 330/1000 [23:32<10:27,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN1AF02K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN1AF02K


Processing URLs:  34%|███▎      | 335/1000 [23:37<10:21,  1.07it/s]

Error extracting text from http://www.nytimes.com/2016/04/22/world/middleeast/europe-says-us-regulations-keeping-it-from-trade-with-iran.html?emc=eta1&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/world/middleeast/europe-says-us-regulations-keeping-it-from-trade-with-iran.html?emc=eta1&amp;_r=0


Processing URLs:  34%|███▎      | 337/1000 [23:40<13:14,  1.20s/it]

Error extracting text from http://www.theglobeandmail.com/report-on-business/industry-news/energy-and-resources/opec-set-for-policy-rollover-no-sign-of-saudi-cut-plan-sources/article27596242/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/report-on-business/industry-news/energy-and-resources/opec-set-for-policy-rollover-no-sign-of-saudi-cut-plan-sources/article27596242/


Processing URLs:  34%|███▍      | 339/1000 [23:51<32:26,  2.95s/it]

Error extracting text from https://www.swift.com/about_swift/shownews?param_dcr=news.data/en/swift_com/2015/Iran_sanctions_agreement_update.xml: 404 Client Error: Not Found for url: https://www.swift.com/about_swift/shownews?param_dcr=news.data/en/swift_com/2015/Iran_sanctions_agreement_update.xml


Processing URLs:  34%|███▍      | 341/1000 [24:00<40:20,  3.67s/it]

Error extracting text from http://www.au.int/en/about/constitutive_act: 404 Client Error: Not Found for url: https://au.int/en/about/constitutive_act


Processing URLs:  34%|███▍      | 344/1000 [24:03<21:38,  1.98s/it]

Error extracting text from http://strategicstudiesinstitute.army.mil/pubs/display.cfm?pubID=1325: HTTPConnectionPool(host='strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /pubs/display.cfm?pubID=1325 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb4cb0>: Failed to resolve 'strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  35%|███▌      | 350/1000 [24:10<16:30,  1.52s/it]

URL filtered: http://www.rand.org/blog/2016/02/chinas-naval-modernization-where-is-it-headed.html?utm_source=linkedin.com&amp;utm_medium=rand_social


Processing URLs:  35%|███▌      | 352/1000 [24:15<21:57,  2.03s/it]

Error extracting text from http://www.defense.gov/Portals/1/Documents/pubs/1225_Report_Dec_2015_-_Final_20151210.pdf: 404 Client Error: Not Found for url: https://www.defense.gov/Portals/1/Documents/pubs/1225_Report_Dec_2015_-_Final_20151210.pdf


Processing URLs:  35%|███▌      | 354/1000 [24:18<18:21,  1.71s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/255585-report-biden-to-skip-first-dem-debate: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/255585-report-biden-to-skip-first-dem-debate/


Processing URLs:  36%|███▌      | 358/1000 [24:23<13:58,  1.31s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/11/28/world/middleeast/ap-ml-iraq-the-mosul-battle-.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/11/28/world/middleeast/ap-ml-iraq-the-mosul-battle-.html?_r=0


Processing URLs:  36%|███▌      | 359/1000 [24:26<17:13,  1.61s/it]

Error extracting text from http://www.motoring.com.au/nextev-high-voltage-hypercar-104619/: 404 Client Error: Not Found for url: https://www.carsales.com.au:443/404/


Processing URLs:  36%|███▌      | 362/1000 [24:30<14:57,  1.41s/it]

Error extracting text from http://gcaptain.com/uncertainty-surrounds-panama-canal-expansion-delivery-timeline/#.VmMRrYFOKrU: 403 Client Error: Forbidden for url: http://gcaptain.com/uncertainty-surrounds-panama-canal-expansion-delivery-timeline/#.VmMRrYFOKrU


Processing URLs:  36%|███▋      | 363/1000 [24:30<12:09,  1.14s/it]

URL filtered: https://www.youtube.com/watch?v=iek-rFWVmIc&amp;t=3m53s


Processing URLs:  37%|███▋      | 366/1000 [24:32<08:21,  1.26it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/09/19/3-reasons-venezuela-7b-debt-swap-smells-bad/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/09/19/3-reasons-venezuela-7b-debt-swap-smells-bad/


Processing URLs:  37%|███▋      | 370/1000 [24:38<09:15,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16Q0HI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16Q0HI
Error extracting text from https://www.reuters.com/article/us-asia-iran-oil/iran-pushes-to-retain-asia-oil-buyers-as-possible-u-s-sanctions-loom-idUSKBN1DR0JS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asia-iran-oil/iran-pushes-to-retain-asia-oil-buyers-as-possible-u-s-sanctions-loom-idUSKBN1DR0JS


Processing URLs:  37%|███▋      | 374/1000 [24:40<06:59,  1.49it/s]

Error extracting text from http://www.thenationalherald.com/135843/: 403 Client Error: Forbidden for url: https://www.thenationalherald.com/135843/
Error extracting text from https://www.reuters.com/world/africa/rwanda-says-will-start-deploying-troops-mozambique-2021-07-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/rwanda-says-will-start-deploying-troops-mozambique-2021-07-09/


Processing URLs:  38%|███▊      | 376/1000 [24:42<07:07,  1.46it/s]

Error extracting text from http://www.channelnewsasia.com/news/world/turkish-pm-threatens-kurdish-militia-in-north-syria-warns-of-ir/3178914.html?cx_tag=similarcna&amp;cid=tg:recos:similarcna:standard#cxrecs_s: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/turkish-pm-threatens-kurdish-militia-in-north-syria-warns-of-ir/3178914.html?cx_tag=similarcna&amp;cid=tg:recos:similarcna:standard#cxrecs_s
URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/venezuelan-economist-who-urged-default-cynical-on-restructuring


Processing URLs:  38%|███▊      | 380/1000 [24:47<13:15,  1.28s/it]

Error extracting text from http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv: 404 Client Error: Not Found for url: https://www.cboe.com/404/


Processing URLs:  38%|███▊      | 384/1000 [24:51<09:47,  1.05it/s]

Error extracting text from https://www.pewresearch.org/science/2020/12/03/intent-to-get-a-covid-19-vaccine-rises-to-60-as-confidence-in-research-and-development-process-increases/&gt: 404 Client Error: Not Found for url: https://www.pewresearch.org/science/2020/12/03/intent-to-get-a-covid-19-vaccine-rises-to-60-as-confidence-in-research-and-development-process-increases/&gt


Processing URLs:  39%|███▉      | 390/1000 [25:14<34:16,  3.37s/it]

Error extracting text from https://www.nytimes.com/2016/09/24/upshot/it-lives-birtherism-is-diminished-but-far-from-dead.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/09/24/upshot/it-lives-birtherism-is-diminished-but-far-from-dead.html


Processing URLs:  40%|████      | 400/1000 [25:37<23:08,  2.31s/it]

Error extracting text from https://ctc.usma.edu/app/uploads/2018/02/Targeted-Terror-3.pdf: HTTPSConnectionPool(host='ctc.usma.edu', port=443): Max retries exceeded with url: /app/uploads/2018/02/Targeted-Terror-3.pdf (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'ctc.usma.edu'. (_ssl.c:1000)")))
URL filtered: https://www.pinterest.com/pin/331014641335227457/


Processing URLs:  40%|████      | 403/1000 [25:39<12:05,  1.22s/it]

Error extracting text from http://www.nytimes.com/2016/03/25/world/americas/dilma-rousseff-president-of-brazil-resists-calls-for-her-resignation.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/25/world/americas/dilma-rousseff-president-of-brazil-resists-calls-for-her-resignation.html?_r=0


Processing URLs:  41%|████      | 406/1000 [25:42<09:16,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-insight-idUSKCN11E2DE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-insight-idUSKCN11E2DE
Error extracting text from http://cs.cv/: HTTPConnectionPool(host='cs.cv', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff1ceb40>: Failed to resolve 'cs.cv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  41%|████      | 409/1000 [25:44<08:15,  1.19it/s]

Error extracting text from http://www.rrstar.com/news/20170918/rauner-to-illinois-lawmakers-help-me-make-more-cuts: 404 Client Error: OK for url: https://www.rrstar.com/news/20170918/rauner-to-illinois-lawmakers-help-me-make-more-cuts


Processing URLs:  41%|████▏     | 414/1000 [25:54<17:36,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16D2RC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16D2RC


Processing URLs:  42%|████▏     | 419/1000 [26:02<15:17,  1.58s/it]

Error extracting text from https://www.ropesgray.com/-/media/Files/articles/2014/July/Summer14-PopofskyC.pdf?la=en&amp;hash=3B6CDB12F8F459A9EFCEEB08BC76A4E4C79E5008: 403 Client Error: Forbidden for url: https://www.ropesgray.com/-/media/Files/articles/2014/July/Summer14-PopofskyC.pdf?la=en&amp;hash=3B6CDB12F8F459A9EFCEEB08BC76A4E4C79E5008


Processing URLs:  42%|████▏     | 421/1000 [26:06<16:16,  1.69s/it]

Error extracting text from http://aranews.net/2016/10/removing-isis-mosul-will-take-time-coalition-official/: 404 Client Error: Not Found for url: http://aranews.net/2016/10/removing-isis-mosul-will-take-time-coalition-official/


Processing URLs:  42%|████▎     | 425/1000 [26:15<19:36,  2.05s/it]

Error extracting text from http://ec.europa.eu/dgs/home-affairs/what-we-do/policies/european-agenda-migration/background-information/docs/20160504/turkey_progress_visa_liberalisation_roadmap_en.pdf: 404 Client Error: Not Found for url: https://home-affairs.ec.europa.eu/sites/default/files/what-we-do/policies/european-agenda-migration/background-information/docs/20160504/turkey_progress_visa_liberalisation_roadmap_en.pdf
Error extracting text from http://www.nytimes.com/2015/09/20/world/warily-eyeing-china-philippines-may-invite-us-back-to-subic-bay.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/20/world/warily-eyeing-china-philippines-may-invite-us-back-to-subic-bay.html


Processing URLs:  43%|████▎     | 426/1000 [26:16<14:54,  1.56s/it]

Error extracting text from https://thehill.com/policy/international/middle-east-north-africa/554969-iran-ends-inspectors-access-to-nuclear-site: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/middle-east-north-africa/554969-iran-ends-inspectors-access-to-nuclear-site/


Processing URLs:  43%|████▎     | 434/1000 [26:23<05:44,  1.64it/s]

Error extracting text from https://www.reuters.com/article/venezuela-bonds/venezuelas-pdvsa-makes-most-of-2017n-bond-payment-sources-say-idUSL1N1NE0UZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-bonds/venezuelas-pdvsa-makes-most-of-2017n-bond-payment-sources-say-idUSL1N1NE0UZ
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-fourth-term-in-doubt-as-german-coalition-talks-fail-idUSKBN1DJ0I3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-fourth-term-in-doubt-as-german-coalition-talks-fail-idUSKBN1DJ0I3


Processing URLs:  44%|████▎     | 436/1000 [26:23<03:32,  2.65it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/9/obama-proposes-ramp-spending-against-cyberattacks/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/9/obama-proposes-ramp-spending-against-cyberattacks/


Processing URLs:  44%|████▍     | 441/1000 [26:31<07:25,  1.26it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://painel.blogfolha.uol.com.br/2016/02/25/diante-de-prisao-de-marqueteiro-ala-oposicionista-do-pmdb-estuda-apoio-a-novo-pedido-de-impeachment/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://painel.blogfolha.uol.com.br/2016/02/25/diante-de-prisao-de-marqueteiro-ala-oposicionista-do-pmdb-estuda-apoio-a-novo-pedido-de-impeachment/&amp;prev=search
Error extracting text from http://www.gettyimages.co.uk/pictures/posters-for-canadian-liberal-party-leader-justin-trudeau-news-photo-493142812: 403 Client Error: Forbidden for url: https://www.gettyimages.co.uk/pictures/posters-for-canadian-liberal-party-leader-justin-trudeau-news-photo-493142812


Processing URLs:  44%|████▍     | 445/1000 [26:34<07:08,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0VE0ZA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0VE0ZA
URL filtered: http://www.bloomberg.com/professional/blog/saudi-aramco-boss-says-drilling-ipo-unaffected-oil-price/


Processing URLs:  45%|████▍     | 448/1000 [26:35<05:09,  1.78it/s]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/myanmar-s-transition/2346136.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/myanmar-s-transition/2346136.html


Processing URLs:  45%|████▍     | 449/1000 [26:36<05:41,  1.61it/s]

Error extracting text from https://www.newsweek.com/exclusive-how-amateur-sleuths-broke-wuhan-lab-story-embarrassed-media-1596958: 403 Client Error: Forbidden for url: https://www.newsweek.com/exclusive-how-amateur-sleuths-broke-wuhan-lab-story-embarrassed-media-1596958


Processing URLs:  45%|████▌     | 450/1000 [26:37<06:18,  1.45it/s]

Error extracting text from https://www.newsweek.com/trump-encourages-wild-protests-dc-date-electoral-college-vote-count-1556153: 403 Client Error: Forbidden for url: https://www.newsweek.com/trump-encourages-wild-protests-dc-date-electoral-college-vote-count-1556153


Processing URLs:  45%|████▌     | 452/1000 [26:39<05:55,  1.54it/s]

Error extracting text from http://www.wsj.com/articles/losing-count-u-s-terror-rules-drive-money-underground-1459349211t: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/losing-count-u-s-terror-rules-drive-money-underground-1459349211t


Processing URLs:  45%|████▌     | 453/1000 [26:40<07:06,  1.28it/s]

Error extracting text from https://larswericson.wordpress.com/2016/05/01/gitrep-1may16am/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/01/gitrep-1may16am/


Processing URLs:  46%|████▌     | 455/1000 [26:43<10:30,  1.16s/it]

Error extracting text from https://www.yahoo.com/news/indonesia-vows-defend-every-inch-territory-072725896.html?nhp=1: 404 Client Error: Not Found for url: https://www.yahoo.com/news/indonesia-vows-defend-every-inch-territory-072725896.html?nhp=1


Processing URLs:  46%|████▌     | 456/1000 [26:44<08:56,  1.01it/s]

Error extracting text from https://www.japantimes.co.jp/news/2021/03/13/national/science-health/merck-astrazeneca-clinical-trials/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2021/03/13/national/science-health/merck-astrazeneca-clinical-trials/


Processing URLs:  46%|████▌     | 459/1000 [26:47<10:28,  1.16s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-05-23/north-korea-scored-successful-missile-launch-pentagon-spies-say


Processing URLs:  46%|████▌     | 462/1000 [26:51<09:45,  1.09s/it]

Error extracting text from http://www.wsj.com/articles/trumps-immigration-revamp-to-include-plans-for-safe-zones-inside-syria-1485374123: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trumps-immigration-revamp-to-include-plans-for-safe-zones-inside-syria-1485374123


Processing URLs:  46%|████▋     | 465/1000 [27:05<27:14,  3.06s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-japan-idUSKBN17P01Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-japan-idUSKBN17P01Y


Processing URLs:  47%|████▋     | 470/1000 [27:10<12:17,  1.39s/it]

Error extracting text from https://www.reuters.com/article/us-lebanon-politics-aoun/lebanese-president-presses-saudi-to-say-why-hariri-has-not-returned-idUSKBN1DB0ET: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-lebanon-politics-aoun/lebanese-president-presses-saudi-to-say-why-hariri-has-not-returned-idUSKBN1DB0ET


Processing URLs:  48%|████▊     | 479/1000 [27:26<20:49,  2.40s/it]

Error extracting text from http://38north.org/2015/10/jbermudez101415/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  48%|████▊     | 480/1000 [27:28<17:54,  2.07s/it]

Error extracting text from https://www.lesswrong.com/posts/XfHXQPPKNY8BXkn72/honoring-petrov-day-on-lesswrong-in-2020: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/XfHXQPPKNY8BXkn72/honoring-petrov-day-on-lesswrong-in-2020


Processing URLs:  48%|████▊     | 483/1000 [27:32<12:40,  1.47s/it]

Error extracting text from http://www.pravdareport.com/history/05-12-2016/136341-soviet_union-0/#sthash.HGb9OjH6.dpuf: 404 Client Error: Not Found for url: https://www.pravda.ru/history/05-12-2016/136341-soviet_union-0/#sthash.HGb9OjH6.dpuf


Processing URLs:  48%|████▊     | 484/1000 [27:33<11:19,  1.32s/it]

URL filtered: https://twitter.com/BTabrum/status/730147093210370048


Processing URLs:  50%|████▉     | 495/1000 [27:57<24:04,  2.86s/it]

Error extracting text from http://apanews.net/index.php/fr/news/manifestation-des-etudiants-a-luniversite-de-cocody-pour-la-rehabilitation-des-cites-universitaires: 403 Client Error: Forbidden for url: https://apanews.net/index.php/fr/news/manifestation-des-etudiants-a-luniversite-de-cocody-pour-la-rehabilitation-des-cites-universitaires


Processing URLs:  50%|████▉     | 497/1000 [27:59<15:28,  1.85s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-07-28/republicans-threaten-to-oppose-treasury-picks-over-nord-stream-2
URL filtered: https://www.youtube.com/watch?v=9mTlnrXFAXE


Processing URLs:  50%|█████     | 500/1000 [28:59<1:40:42, 12.08s/it]

Error extracting text from https://archive.is/bERJ6: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  50%|█████     | 501/1000 [28:59<1:19:43,  9.59s/it]

Error extracting text from https://news.yahoo.com/fears-iraqi-government-army-over-060458082.html: 404 Client Error: Not Found for url: https://news.yahoo.com/fears-iraqi-government-army-over-060458082.html


Processing URLs:  51%|█████     | 509/1000 [29:13<14:44,  1.80s/it]  

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/nh/new_hampshire_senate_ayotte_vs_hassan-3862.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/nh/new_hampshire_senate_ayotte_vs_hassan-3862.html
Error extracting text from http://www.reuters.com/investigates/special-report/usa-trump-property/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/investigates/special-report/usa-trump-property/


Processing URLs:  51%|█████▏    | 513/1000 [29:16<06:46,  1.20it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-bill/2144: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-bill/2144
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://novojornal.jor.br/politica/dilma-pede-ao-stf-nulidade-de-ato-de-cunha-que-abriu-impeachment&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://novojornal.jor.br/politica/dilma-pede-ao-stf-nulidade-de-ato-de-cunha-que-abriu-impeachment&amp;prev=search


Processing URLs:  51%|█████▏    | 514/1000 [29:17<06:33,  1.23it/s]

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-19/even-some-erdogan-diehards-flinch-at-giving-him-more-powers


Processing URLs:  52%|█████▏    | 516/1000 [29:18<05:35,  1.44it/s]

Error extracting text from http://www.iraqinews.com/iraq-war/liberation-mosul-will-within-months-says-john-allen/&gt: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/liberation-mosul-will-within-months-says-john-allen/&gt


Processing URLs:  52%|█████▏    | 523/1000 [29:29<09:06,  1.15s/it]

URL filtered: https://twitter.com/Krinklefish828/status/1348336579228569600


Processing URLs:  53%|█████▎    | 526/1000 [29:30<05:28,  1.44it/s]

Error extracting text from https://documents-dds-ny.un.org/doc/UNDOC/GEN/N07/325/21/PDF/N0732521.pdf?OpenElement: HTTPSConnectionPool(host='documents-dds-ny.un.org', port=443): Max retries exceeded with url: /doc/UNDOC/GEN/N07/325/21/PDF/N0732521.pdf?OpenElement (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VH0DX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VH0DX


Processing URLs:  53%|█████▎    | 528/1000 [29:32<06:20,  1.24it/s]

Error extracting text from https://www.nytimes.com/2021/06/04/us/democrats-filibuster-senate.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/04/us/democrats-filibuster-senate.html


Processing URLs:  54%|█████▍    | 541/1000 [30:53<2:28:20, 19.39s/it]

Error extracting text from http://kremlin.ru/events/president/transcripts/56355: HTTPConnectionPool(host='kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  54%|█████▍    | 544/1000 [30:59<1:07:41,  8.91s/it]

Error extracting text from http://www.reuters.com/article/us-finland-politics-trump-putin-idUSKBN1721BZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-finland-politics-trump-putin-idUSKBN1721BZ?il=0


Processing URLs:  55%|█████▍    | 545/1000 [31:00<51:44,  6.82s/it]  

Error extracting text from https://www.devex.com/news/drone-meet-the-humanitarian-cluster-approach-91436: 403 Client Error: Forbidden for url: https://www.devex.com/news/drone-meet-the-humanitarian-cluster-approach-91436


Processing URLs:  56%|█████▌    | 556/1000 [31:12<07:17,  1.02it/s]

Error extracting text from http://www.nytimes.com/2016/10/01/business/dealbook/deutsche-bank-stock-bailout.html?rref=collection%2Ftimestopic%2FDeutsche%20Bank%20AG&action=click&contentCollection=business&region=stream&module=stream_unit&version=latest&contentPlacement=4&pgtype=collection: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/01/business/dealbook/deutsche-bank-stock-bailout.html?rref=collection%2Ftimestopic%2FDeutsche%20Bank%20AG&action=click&contentCollection=business&region=stream&module=stream_unit&version=latest&contentPlacement=4&pgtype=collection


Processing URLs:  56%|█████▌    | 557/1000 [31:13<07:08,  1.03it/s]

Error extracting text from http://www.un.org: 403 Client Error: Forbidden for url: https://www.un.org/


Processing URLs:  56%|█████▌    | 562/1000 [31:17<05:57,  1.22it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-01/temer-s-blunders-are-rousseff-s-hope-back-to-brazil-presidency


Processing URLs:  56%|█████▋    | 564/1000 [31:18<04:47,  1.52it/s]

Error extracting text from https://donate.doctorswithoutborders.org/innovations: 404 Client Error: Not Found for url: https://donate.doctorswithoutborders.org/innovations


Processing URLs:  57%|█████▋    | 567/1000 [31:26<10:06,  1.40s/it]

Error extracting text from http://www.si.com/nfl/2016/07/28/deflategate-roger-goodell-new-england-patriots-tom-brady: 403 Client Error: Forbidden for url: http://www.si.com/nfl/2016/07/28/deflategate-roger-goodell-new-england-patriots-tom-brady
Error extracting text from https://www.donaldjtrump.com/: 403 Client Error: Forbidden for url: https://www.donaldjtrump.com/
URL filtered: https://www.youtube.com/watch?v=P7L4pXi2_5I


Processing URLs:  57%|█████▋    | 574/1000 [32:35<2:09:24, 18.23s/it]

Error extracting text from https://sports.ladbrokes.com/en-gb/betting/politics/british/eu-referendum/uk-european-referendum/220800266/: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /en-gb/betting/politics/british/eu-referendum/uk-european-referendum/220800266/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3037763c0>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:  58%|█████▊    | 577/1000 [32:36<47:05,  6.68s/it]  

Error extracting text from http://www.reuters.com/article/us-afghanistan-election-idUSKBN0UC14O20151229: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-election-idUSKBN0UC14O20151229
Error extracting text from https://www.the-newshub.com/international/american-soldier-fighting-in-iraq-challenges-islamic-state-to-pokemon-battle: 403 Client Error: Forbidden for url: https://www.the-newshub.com/international/american-soldier-fighting-in-iraq-challenges-islamic-state-to-pokemon-battle


Processing URLs:  58%|█████▊    | 579/1000 [32:43<32:39,  4.65s/it]

Error extracting text from https://www.nytimes.com/2021/09/12/world/europe/iran-iaea-nuclear-deal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/09/12/world/europe/iran-iaea-nuclear-deal.html


Processing URLs:  58%|█████▊    | 583/1000 [32:56<21:12,  3.05s/it]

Error extracting text from https://www.wsj.com/articles/trump-administration-gives-final-approval-for-dakota-access-pipeline-1486500445: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-administration-gives-final-approval-for-dakota-access-pipeline-1486500445
Error extracting text from https://www.congress.gov/bill/117th-congress/house-bill/1/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/117th-congress/house-bill/1/text


Processing URLs:  59%|█████▊    | 586/1000 [32:58<11:05,  1.61s/it]

Error extracting text from http://www.aina.org/news/20160501130122.htm: 404 Client Error:  for url: http://www.aina.org/news/20160501130122.htm


Processing URLs:  59%|█████▉    | 589/1000 [32:59<04:48,  1.43it/s]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN185033: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN185033


Processing URLs:  59%|█████▉    | 592/1000 [33:04<08:17,  1.22s/it]

Error extracting text from http://www.bankofengland.co.uk/publications/Documents/inflationreport/2016/aug.pdf: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/publications/Documents/inflationreport/2016/aug.pdf


Processing URLs:  59%|█████▉    | 594/1000 [33:05<05:17,  1.28it/s]

Error extracting text from http://www.reuters.com/article/us-china-japan-islands-idUSKBN0U606H20151223: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-japan-islands-idUSKBN0U606H20151223


Processing URLs:  60%|█████▉    | 597/1000 [33:14<16:36,  2.47s/it]

Error extracting text from https://www.morehumanintelligence.com.au/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Error extracting text from http://files.shareholder.com/downloads/NFLX/1469568125x0x821407/db785b50-90fe-44da-9f5b-37dbf0dcd0e1/Q1_15_Earnings_Letter_final_tables.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/NFLX/1469568125x0x821407/db785b50-90fe-44da-9f5b-37dbf0dcd0e1/Q1_15_Earnings_Letter_final_tables.pdf


Processing URLs:  60%|█████▉    | 599/1000 [33:14<09:16,  1.39s/it]

Error extracting text from https://www.nato.int/: 403 Client Error: Forbidden for url: https://www.nato.int/


Processing URLs:  60%|██████    | 600/1000 [33:16<10:47,  1.62s/it]

Error extracting text from http://www.dailyrepublic.com/opinion/state-national-columnists/how-trump-is-enabling-famine/: 404 Client Error: Not Found for url: https://www.dailyrepublic.com/opinion/state-national-columnists/how-trump-is-enabling-famine/


Processing URLs:  60%|██████    | 601/1000 [33:18<10:17,  1.55s/it]

Error extracting text from https://planetski.eu/2021/02/05/tirol-may-go-into-quarantine-as-variant-virus-cases-rise/: 403 Client Error: Forbidden for url: https://planetski.eu/2021/02/05/tirol-may-go-into-quarantine-as-variant-virus-cases-rise/


Processing URLs:  61%|██████    | 608/1000 [33:27<06:24,  1.02it/s]

Error extracting text from https://www.reuters.com/article/us-usa-nuclear-russia/u-s-presses-russia-to-comply-with-nuclear-missile-treaty-idUSKBN1E224A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nuclear-russia/u-s-presses-russia-to-comply-with-nuclear-missile-treaty-idUSKBN1E224A
Error extracting text from http://www.wsj.com/articles/betting-against-a-fed-rate-rise-1445790240: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/betting-against-a-fed-rate-rise-1445790240


Processing URLs:  61%|██████    | 611/1000 [33:33<08:37,  1.33s/it]

Error extracting text from https://www.yahoo.com/news/status-main-battle-fronts-iraq-syria-173802899.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/status-main-battle-fronts-iraq-syria-173802899.html
Error extracting text from http://www.nytimes.com/2015/06/20/world/asia/afghan-parliaments-term-is-extended-after-squabbles-delay-elections.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/20/world/asia/afghan-parliaments-term-is-extended-after-squabbles-delay-elections.html
Error extracting text from https://www.congress.gov/bill/116th-congress/senate-bill/5020/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/116th-congress/senate-bill/5020/text


Processing URLs:  61%|██████▏   | 614/1000 [33:35<05:43,  1.13it/s]

Error extracting text from http://www.theepochtimes.com/n3/2191503-chinas-xi-jinping-paves-political-transition-with-anti-corruption-drive-2/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2191503-chinas-xi-jinping-paves-political-transition-with-anti-corruption-drive-2/
Error extracting text from http://www.wsj.com/articles/philippines-duterte-wants-ex-president-ramos-to-meet-with-china-on-maritime-dispute-1468520651: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/philippines-duterte-wants-ex-president-ramos-to-meet-with-china-on-maritime-dispute-1468520651


Processing URLs:  62%|██████▏   | 617/1000 [33:40<09:02,  1.42s/it]

Error extracting text from http://www.deutschlandfunk.de/bericht-nato-wird-montenegro-als-29-mitgliedsland-aufnehmen.447.de.html?drn:news_id=549331: 404 Client Error: Not Found for url: https://www.deutschlandfunk.de/bericht-nato-wird-montenegro-als-29-mitgliedsland-aufnehmen.447.de.html?drn:news_id=549331


Processing URLs:  62%|██████▏   | 618/1000 [33:41<06:59,  1.10s/it]

Error extracting text from http://www.paddypower.com/bet/politics/other-politics/us-politics?ev_oc_grp_ids=2043930: HTTPConnectionPool(host='www.paddypower.com', port=80): Max retries exceeded with url: /bet/politics/other-politics/us-politics?ev_oc_grp_ids=2043930 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30398b3b0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.timesofisrael.com/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/


Processing URLs:  62%|██████▏   | 623/1000 [33:44<05:49,  1.08it/s]

Error extracting text from https://www.teslamotors.com/blog/reserving-model-3: 403 Client Error: Forbidden for url: https://www.teslamotors.com/blog/reserving-model-3


Processing URLs:  63%|██████▎   | 626/1000 [33:47<05:30,  1.13it/s]

Error extracting text from http://www.latimes.com/business/technology/la-fi-tn-apple-iphone-supply-20160106-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/technology/la-fi-tn-apple-iphone-supply-20160106-story.html


Processing URLs:  63%|██████▎   | 628/1000 [33:51<08:20,  1.35s/it]

URL filtered: https://www.youtube.com/watch?v=hT_nvWreIhg


Processing URLs:  63%|██████▎   | 632/1000 [33:55<05:56,  1.03it/s]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/pe.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/pe.html


Processing URLs:  64%|██████▍   | 638/1000 [34:05<07:53,  1.31s/it]

Error extracting text from https://www.atptour.com/en/rankings/singles: 403 Client Error: Forbidden for url: https://www.atptour.com/en/rankings/singles


Processing URLs:  64%|██████▍   | 641/1000 [34:08<06:24,  1.07s/it]

Error extracting text from https://cnnphilippines.com/news/2017/03/11/Chinese-ships-Benham-Rise.html: 503 Server Error: Unavailable for url: https://cnnphilippines.com/news/2017/03/11/Chinese-ships-Benham-Rise.html


Processing URLs:  65%|██████▍   | 646/1000 [34:29<11:26,  1.94s/it]

Error extracting text from http://allafrica.com/stories/201605201207.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201605201207.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffb58c20>: Failed to establish a new connection: [Errno 61] Connection refused'))
URL filtered: https://m.youtube.com/watch?v=WoNhRBk_n30
Error extracting text from http://www.reuters.com/article/us-usa-iran-military-idUSKCN10Z2OP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-military-idUSKCN10Z2OP


Processing URLs:  65%|██████▍   | 647/1000 [34:30<10:59,  1.87s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-bonds-reaction/unversed-in-debt-details-venezuelans-desperate-for-any-relief-idUSKBN1D40GV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-reaction/unversed-in-debt-details-venezuelans-desperate-for-any-relief-idUSKBN1D40GV


Processing URLs:  65%|██████▍   | 649/1000 [34:32<07:53,  1.35s/it]

Error extracting text from https://tradingeconomics.com/commodity/coal: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/coal


Processing URLs:  65%|██████▌   | 652/1000 [35:35<1:36:50, 16.70s/it]

Error extracting text from https://www.themoscowtimes.com/2021/02/09/navalny-allies-announce-new-courtyard-protests-a72873: HTTPSConnectionPool(host='www.themoscowtimes.com', port=443): Max retries exceeded with url: /2021/02/09/navalny-allies-announce-new-courtyard-protests-a72873 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fece1520>, 'Connection to www.themoscowtimes.com timed out. (connect timeout=60)'))


Processing URLs:  65%|██████▌   | 654/1000 [35:36<51:35,  8.95s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-trump-gorsuch-idUSKBN1740NH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-gorsuch-idUSKBN1740NH


Processing URLs:  66%|██████▌   | 659/1000 [35:45<19:11,  3.38s/it]

Error extracting text from http://www.nkeconwatch.com/: 403 Client Error: Forbidden for url: http://www.nkeconwatch.com/
URL filtered: https://www.youtube.com/watch?v=GAfXRKzi60Q


Processing URLs:  67%|██████▋   | 670/1000 [36:18<27:32,  5.01s/it]

Error extracting text from https://www.nytimes.com/2021/03/09/technology/section-230-congress.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/09/technology/section-230-congress.html


Processing URLs:  67%|██████▋   | 673/1000 [36:21<12:26,  2.28s/it]

Error extracting text from https://www.jnj.com/johnson-johnson-announces-submission-of-application-to-the-u-s-fda-for-emergency-use-authorization-of-its-investigational-single-shot-janssen-covid-19-vaccine-candidate: 403 Client Error: Forbidden for url: https://www.jnj.com/johnson-johnson-announces-submission-of-application-to-the-u-s-fda-for-emergency-use-authorization-of-its-investigational-single-shot-janssen-covid-19-vaccine-candidate


Processing URLs:  67%|██████▋   | 674/1000 [36:23<11:59,  2.21s/it]

Error extracting text from http://intpolicydigest.org/2016/10/14/gradual-decay-post-cold-war-arms-control/: 403 Client Error: Forbidden for url: https://intpolicydigest.org/gradual-decay-post-cold-war-arms-control


Processing URLs:  68%|██████▊   | 681/1000 [36:39<10:25,  1.96s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-syria-islamic-sta-idUSKCN0XF0D5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-syria-islamic-sta-idUSKCN0XF0D5


Processing URLs:  68%|██████▊   | 685/1000 [36:48<12:12,  2.33s/it]

URL filtered: https://www.bnnbloomberg.ca/china-rejects-its-exclusion-from-wto-vaccine-waiver-proposal-1.1762407


Processing URLs:  70%|██████▉   | 695/1000 [37:10<05:49,  1.15s/it]

Error extracting text from http://www.nytimes.com/2016/03/31/world/asia/south-china-sea-us-navy.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/31/world/asia/south-china-sea-us-navy.html
Error extracting text from http://www.nytimes.com/2016/01/11/world/middleeast/neglect-may-do-what-isis-didnt-breach-iraqi-dam.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/11/world/middleeast/neglect-may-do-what-isis-didnt-breach-iraqi-dam.html


Processing URLs:  70%|██████▉   | 697/1000 [37:10<03:29,  1.44it/s]

Error extracting text from https://www.nytimes.com/2017/08/21/us/navy-collisions-history-mccain-fitzgerald.html?_r=0[9/1/2017: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/21/us/navy-collisions-history-mccain-fitzgerald.html?_r=0%5B9/1/2017


Processing URLs:  70%|██████▉   | 699/1000 [37:12<03:51,  1.30it/s]

Error extracting text from http://www.nytimes.com/2015/10/12/world/middleeast/iran-tests-long-range-missile-possibly-violating-nuclear-accord.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/12/world/middleeast/iran-tests-long-range-missile-possibly-violating-nuclear-accord.html?_r=0


Processing URLs:  70%|███████   | 700/1000 [37:14<05:34,  1.11s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-war-china-will-conduct-drills-disputed-region-despite-meddling-2479372: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-war-china-will-conduct-drills-disputed-region-despite-meddling-2479372


Processing URLs:  70%|███████   | 703/1000 [37:23<12:02,  2.43s/it]

Error extracting text from http://wbtw.com/2016/02/16/horry-county-schools-approve-paying-computer-virus-ransom-making-payment-problematic/: 404 Client Error: Not Found for url: https://www.wbtw.com/2016/02/16/horry-county-schools-approve-paying-computer-virus-ransom-making-payment-problematic/


Processing URLs:  70%|███████   | 704/1000 [37:24<10:13,  2.07s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/trump-ban-bump-stocks-atf-53276472: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/trump-ban-bump-stocks-atf-53276472


Processing URLs:  71%|███████   | 706/1000 [37:30<13:40,  2.79s/it]

Error extracting text from https://www.reuters.com/article/us-usa-bonds-debtceiling-idUSKBN1A52A5?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-bonds-debtceiling-idUSKBN1A52A5?il=0


Processing URLs:  71%|███████   | 708/1000 [37:35<12:19,  2.53s/it]

Error extracting text from https://concentricteam.com/about-us/our-thinking/: 404 Client Error: Not Found for url: https://concentric.vc/about-us/our-thinking/


Processing URLs:  71%|███████   | 712/1000 [37:43<12:11,  2.54s/it]

Error extracting text from http://nysepost.com/us-proposes-sharp-ramping-up-of-north-korea-sanctions-at-un-137429: 404 Client Error: Not Found for url: https://nysepost.com/us-proposes-sharp-ramping-up-of-north-korea-sanctions-at-un-137429


Processing URLs:  71%|███████▏  | 713/1000 [37:47<13:46,  2.88s/it]

Error extracting text from http://smallwarsjournal.com/jrnl/art/confusing-a-“revolution”-with-“terrorism: 404 Client Error: Not Found for url: https://smallwarsjournal.com/jrnl/art/confusing-a-%e2%80%9crevolution%e2%80%9d-with-%e2%80%9cterrorism


Processing URLs:  72%|███████▏  | 718/1000 [37:51<04:49,  1.02s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-parliament-vote-idUSKCN18E32I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-parliament-vote-idUSKCN18E32I
Error extracting text from http://www.nytimes.com/2015/12/05/business/economy/jobs-report-hiring-unemployment-november.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/05/business/economy/jobs-report-hiring-unemployment-november.html?_r=0


Processing URLs:  72%|███████▏  | 719/1000 [37:52<03:30,  1.33it/s]

Error extracting text from http://www.reuters.com/article/venezuela-cenbank-idUSL1N0XL0TY20150424: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-cenbank-idUSL1N0XL0TY20150424


Processing URLs:  73%|███████▎  | 726/1000 [38:04<08:49,  1.93s/it]

URL filtered: https://www.facebook.com/zuck/posts/10103253901916271
Error extracting text from http://www.nytimes.com/2015/09/13/business/economy/the-feds-policy-mechanics-retool-for-a-rise-in-interest-rates.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/13/business/economy/the-feds-policy-mechanics-retool-for-a-rise-in-interest-rates.html


Processing URLs:  73%|███████▎  | 730/1000 [38:05<03:45,  1.20it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/nicaraguan-government-sets-date-presidential-election-71785222: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/nicaraguan-government-sets-date-presidential-election-71785222
URL filtered: https://www.bloomberg.com/news/articles/2021-10-08/wall-street-could-get-four-bitcoin-futures-etfs-by-end-of-month


Processing URLs:  73%|███████▎  | 733/1000 [38:06<02:42,  1.65it/s]

Error extracting text from https://news.sky.com/story/covid-19-what-are-the-options-for-further-coronavirus-restrictions-boris-johnson-is-said-to-have-been-presented-with-12500950): 404 Client Error: Not Found for url: https://news.sky.com/story/covid-19-what-are-the-options-for-further-coronavirus-restrictions-boris-johnson-is-said-to-have-been-presented-with-12500950)
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/identifikatsiya-sistem-i-zadachi-upravleniya-na-puti-k-sovremennym-sistemnym-metodologiyam&amp;usg=ALkJrhjJyKXTWt9vAMrLKNxnX5EMLXD6FQ: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/identifikatsiya-sistem-i-zadachi-upravleniya-na-puti-k-sovremennym-sistemnym-metodologiyam&amp;usg

Processing URLs:  73%|███████▎  | 734/1000 [38:07<03:03,  1.45it/s]

Error extracting text from http://news.xinhuanet.com/english/2016-02/17/c_135104167.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/17/c_135104167.htm


Processing URLs:  74%|███████▎  | 736/1000 [38:10<04:20,  1.01it/s]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-08/merkel-hemmed-in-by-hard-liners-challenging-her-policy-positions


Processing URLs:  74%|███████▍  | 739/1000 [38:12<03:43,  1.17it/s]

Error extracting text from http://www.cnbc.com/2016/04/02/financial-times-imf-weighing-exit-from-greek-bailout.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/04/02/financial-times-imf-weighing-exit-from-greek-bailout.html


Processing URLs:  74%|███████▍  | 740/1000 [38:12<03:02,  1.42it/s]

Error extracting text from https://www.yahoo.com/news/n-korea-prepares-ballistic-missile-launch-report-062245039.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/n-korea-prepares-ballistic-missile-launch-report-062245039.html


Processing URLs:  74%|███████▍  | 742/1000 [38:15<04:18,  1.00s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-indiana-presidential-republican-primary: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-indiana-presidential-republican-primary


Processing URLs:  75%|███████▍  | 746/1000 [39:20<1:18:11, 18.47s/it]

Error extracting text from http://www.miamiherald.com/opinion/op-ed/article71435012.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  75%|███████▌  | 752/1000 [39:33<16:15,  3.93s/it]  

Error extracting text from http://news.yahoo.com/frances-hollande-says-iraq-syria-air-strikes-accelerated-104334594.html: 404 Client Error: Not Found for url: http://news.yahoo.com/frances-hollande-says-iraq-syria-air-strikes-accelerated-104334594.html


Processing URLs:  75%|███████▌  | 754/1000 [39:36<10:28,  2.56s/it]

Error extracting text from https://www.aninews.in/news/world/others/ethiopia-rejects-us-allegations-of-ethnic-cleansing-in-tigray20210314083541/: 403 Client Error: Forbidden for url: https://www.aninews.in/news/world/others/ethiopia-rejects-us-allegations-of-ethnic-cleansing-in-tigray20210314083541/


Processing URLs:  76%|███████▌  | 758/1000 [39:40<05:29,  1.36s/it]

Error extracting text from http://iportal.rada.gov.ua/en/news/News/133471.html: 404 Client Error: File Not Found for url: https://iportal.rada.gov.ua/en/news/News/133471.html
Error extracting text from http://thenationonlineng.net/tension-suspected-fulani-herdsmen-kill-farmer-kogi/: 403 Client Error: Forbidden for url: https://thenationonlineng.net/tension-suspected-fulani-herdsmen-kill-farmer-kogi/


Processing URLs:  76%|███████▌  | 759/1000 [39:40<04:03,  1.01s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/south-koreas-park-geun-hye-to-accept-impeachment-vote-to-resign-in-april-party: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  76%|███████▌  | 761/1000 [39:40<02:20,  1.70it/s]

Error extracting text from https://www.nytimes.com/2017/01/21/upshot/what-does-the-order-against-the-health-law-actually-do.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/21/upshot/what-does-the-order-against-the-health-law-actually-do.html?_r=0
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN15A0ON: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN15A0ON


Processing URLs:  76%|███████▌  | 762/1000 [39:42<03:18,  1.20it/s]

URL filtered: https://www.youtube.com/watch?v=8Xjr2hnOHiM


Processing URLs:  76%|███████▋  | 764/1000 [39:43<02:21,  1.66it/s]

URL filtered: http://www.bloomberg.com/quote/CL1:COM


Processing URLs:  77%|███████▋  | 768/1000 [39:48<04:39,  1.21s/it]

Error extracting text from https://www.navytimes.com/flashpoints/2017/07/31/trump-considers-withdrawal-from-afghanistan/: 404 Client Error: Not Found for url: https://www.navytimes.com/flashpoints/2017/07/31/trump-considers-withdrawal-from-afghanistan/


Processing URLs:  77%|███████▋  | 769/1000 [39:49<04:09,  1.08s/it]

Error extracting text from http://whitegenocideproject.com/un-demands-open-borders-for-europe-this-is-their-future/: 403 Client Error: Forbidden for url: http://whitegenocideproject.com/un-demands-open-borders-for-europe-this-is-their-future/


Processing URLs:  77%|███████▋  | 774/1000 [40:00<06:25,  1.71s/it]

Error extracting text from https://www.nytimes.com/2017/07/26/business/energy-environment/uk-diesel-petrol-emissions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/26/business/energy-environment/uk-diesel-petrol-emissions.html


Processing URLs:  79%|███████▉  | 788/1000 [40:24<05:50,  1.65s/it]

URL filtered: https://www.youtube.com/watch?v=NHRHUHW6HQE


Processing URLs:  79%|███████▉  | 791/1000 [40:27<04:22,  1.25s/it]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/03/24/2-venezuela-experts-on-gold-sales-bond-risk-whats-next/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/03/24/2-venezuela-experts-on-gold-sales-bond-risk-whats-next/


Processing URLs:  80%|███████▉  | 796/1000 [40:42<07:09,  2.10s/it]

Error extracting text from https://www.nytimes.com/2017/10/25/technology/fcc-media-ownership-rules.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/25/technology/fcc-media-ownership-rules.html
URL filtered: https://twitter.com/jaredlholt/status/1348145028397465601


Processing URLs:  80%|███████▉  | 799/1000 [40:45<05:20,  1.59s/it]

Error extracting text from http://tass.ru/en/world/770418: 404 Client Error: Not Found for url: https://tass.ru/en/world/770418


Processing URLs:  80%|████████  | 803/1000 [40:55<06:41,  2.04s/it]

Error extracting text from https://news.google.com/articles/CBMiPWh0dHBzOi8vd3d3LnJldXRlcnMuY29tL2FydGljbGUvdXMtaXJhbi1udWNsZWFyLWlkVVNLQ04yQVgyM0jSATRodHRwczovL21vYmlsZS5yZXV0ZXJzLmNvbS9hcnRpY2xlL2FtcC9pZFVTS0NOMkFYMjNI?hl=en-US&amp;gl=US&amp;ceid=US%3Aen: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKCN2AX23H


Processing URLs:  80%|████████  | 804/1000 [40:56<05:46,  1.77s/it]

Error extracting text from http://www.dailysabah.com/d/business/2016/12/07/ankara-moscow-seek-to-compensate-for-time-lost: 404 Client Error: Not Found for url: https://www.dailysabah.com/d/business/2016/12/07/ankara-moscow-seek-to-compensate-for-time-lost


Processing URLs:  81%|████████  | 808/1000 [41:01<04:09,  1.30s/it]

Error extracting text from http://www.reuters.com/article/northdakota-pipeline-idUSW1N1FL00H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/northdakota-pipeline-idUSW1N1FL00H


Processing URLs:  81%|████████  | 810/1000 [41:03<03:36,  1.14s/it]

Error extracting text from http://www.al-monitor.com/pulse/politics/2016/04/syria-alawites-document-dissociation-assad-regime.html#: 404 Client Error: Not Found for url: https://www.al-monitor.com/politics/2016/04/syria-alawites-document-dissociation-assad-regime.html


Processing URLs:  81%|████████  | 811/1000 [41:04<04:11,  1.33s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-21/iran-sees-opec-keeping-oil-output-cap-unchanged-at-next-meeting


Processing URLs:  82%|████████▏ | 815/1000 [42:07<50:48, 16.48s/it]

Error extracting text from http://www.usnews.com/opinion/blogs/world-report/2012/11/26/sudans-bashir-government-faces-more-problems-after-failed-coup: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  82%|████████▏ | 817/1000 [42:09<27:47,  9.11s/it]

Error extracting text from https://www.nytimes.com/2018/05/03/us/politics/green-berets-saudi-yemen-border-houthi.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/05/03/us/politics/green-berets-saudi-yemen-border-houthi.html


Processing URLs:  82%|████████▏ | 818/1000 [42:10<21:04,  6.95s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/american-indians-protest-trump-pipeline-washington-45955375: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/american-indians-protest-trump-pipeline-washington-45955375


Processing URLs:  83%|████████▎ | 826/1000 [42:26<05:45,  1.98s/it]

Error extracting text from http://www.nationalreview.com/article/424209/joe-biden-josh-alcorn: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/424209/joe-biden-josh-alcorn/
URL filtered: https://www.bloomberg.com/news/articles/2017-01-20/opec-russia-meet-in-vienna-for-first-check-on-oil-cuts-progress


Processing URLs:  83%|████████▎ | 828/1000 [42:26<03:34,  1.25s/it]

Error extracting text from https://www.nytimes.com/2017/01/27/us/politics/refugee-muslim-executive-order-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/27/us/politics/refugee-muslim-executive-order-trump.html


Processing URLs:  83%|████████▎ | 830/1000 [42:30<03:41,  1.30s/it]

Error extracting text from http://www.reuters.com/article/2015/09/24/brazil-crisis-idUSL1N11U1PY20150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/brazil-crisis-idUSL1N11U1PY20150924


Processing URLs:  83%|████████▎ | 831/1000 [42:31<03:32,  1.26s/it]

Error extracting text from http://capone.mtsu.edu/studskl/hd/Left_Right_Brain.html?Right_Brain=9&amp;Left_Brain=10&amp;amp: 404 Client Error: Not Found for url: http://capone.mtsu.edu/studskl/hd/Left_Right_Brain.html?Right_Brain=9&amp;Left_Brain=10&amp;amp


Processing URLs:  83%|████████▎ | 833/1000 [42:36<05:23,  1.94s/it]

Error extracting text from http://www.lloydslist.com/ll/sector/containers/article471625.ece: 404 Client Error: Page not found for url: https://www.lloydslist.com:443/ll/sector/containers/article471625.ece


Processing URLs:  83%|████████▎ | 834/1000 [42:36<04:08,  1.49s/it]

Error extracting text from https://developers.diem.com/main/changelog: 404 Client Error: Not Found for url: https://developers.diem.com/main/changelog


Processing URLs:  84%|████████▎ | 836/1000 [42:38<03:03,  1.12s/it]

Error extracting text from http://ukpollingreport.co.uk/blog/archives/9589: 404 Client Error: Not Found for url: http://ukpollingreport.co.uk/blog/archives/9589


Processing URLs:  84%|████████▍ | 838/1000 [42:42<03:53,  1.44s/it]

Error extracting text from http://www.wsj.com/articles/pressure-builds-on-developing-nations-1441926717: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pressure-builds-on-developing-nations-1441926717


Processing URLs:  84%|████████▍ | 840/1000 [42:44<03:02,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-australia-iran-trade-idUSKCN0WH066: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-australia-iran-trade-idUSKCN0WH066


Processing URLs:  84%|████████▍ | 842/1000 [42:45<02:16,  1.16it/s]

Error extracting text from https://boingboing.net/2016/02/21/plummeting-oil-prices-and-13-y.html: 403 Client Error: Forbidden for url: https://boingboing.net/2016/02/21/plummeting-oil-prices-and-13-y.html


Processing URLs:  84%|████████▍ | 845/1000 [42:48<02:03,  1.25it/s]

Error extracting text from http://www.thegatewaypundit.com/2016/04/rnc-rules-committee-member-even-trump-gets-1237-delegates-doesnt-mean-hell-nominee-video/: 403 Client Error: Forbidden for url: https://www.thegatewaypundit.com/2016/04/rnc-rules-committee-member-even-trump-gets-1237-delegates-doesnt-mean-hell-nominee-video/
URL filtered: https://www.youtube.com/watch?v=uim5l46GuiU
URL filtered: https://www.cnbc.com/2017/09/25/russian-facebook-ads-targeted-black-lives-matter-muslims-election.html


Processing URLs:  85%|████████▌ | 850/1000 [42:51<01:56,  1.29it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-09/china-to-ban-sale-of-fossil-fuel-cars-in-electric-vehicle-push


Processing URLs:  85%|████████▌ | 852/1000 [42:53<02:16,  1.08it/s]

Error extracting text from http://www.hawaiinewsnow.com/story/31250645/campaigning-begins-for-irans-parliamentary-elections: 404 Client Error: Not Found for url: https://www.hawaiinewsnow.com/story/31250645/campaigning-begins-for-irans-parliamentary-elections/


Processing URLs:  85%|████████▌ | 854/1000 [42:57<03:05,  1.27s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/49917447.cms?utm_source=contentofinterest&amp;utm_medium=text&amp;utm_campaign=cppst: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/49917447.cms?utm_source=contentofinterest&amp;utm_medium=text&amp;utm_campaign=cppst


Processing URLs:  86%|████████▌ | 859/1000 [43:08<05:39,  2.41s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-article-idUSKBN15L19T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-article-idUSKBN15L19T


Processing URLs:  86%|████████▌ | 862/1000 [43:11<03:46,  1.64s/it]

Error extracting text from http://nationalinterest.org/feature/how-iran-dominates-the-middle-east-13136: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/how-iran-dominates-the-middle-east-13136


Processing URLs:  86%|████████▋ | 863/1000 [43:14<04:26,  1.94s/it]

Error extracting text from https://www.fi.edu/blog/How-Does-Weather-Affect-a-Rocket-Launch%3F#:~:text=For%20example%2C%20a%20light%20wind,push%20the%20rocket%20off%2Dcourse: 404 Client Error: Not Found for url: https://fi.edu/blog/How-Does-Weather-Affect-a-Rocket-Launch#:~:text=For%20example%2C%20a%20light%20wind,push%20the%20rocket%20off-course


Processing URLs:  86%|████████▋ | 864/1000 [43:15<03:48,  1.68s/it]

Error extracting text from http://ekurd.net/iraq-deploy-troops-retake-mosul-2016-02-08: 403 Client Error: Forbidden for url: https://ekurd.net/iraq-deploy-troops-retake-mosul-2016-02-08


Processing URLs:  87%|████████▋ | 869/1000 [43:25<03:21,  1.54s/it]

Error extracting text from http://www.nytimes.com/2015/08/31/world/middleeast/iran-sentences-two-charged-with-spying-to-10-years-in-prison.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/31/world/middleeast/iran-sentences-two-charged-with-spying-to-10-years-in-prison.html


Processing URLs:  87%|████████▋ | 870/1000 [43:29<05:08,  2.38s/it]

Error extracting text from https://radiotamazuj.org/en/article/south-sudan-central-bank-denies-it-will-create-new-currency-denominations: 404 Client Error: Not Found for url: https://radiotamazuj.org/en/article/south-sudan-central-bank-denies-it-will-create-new-currency-denominations


Processing URLs:  87%|████████▋ | 871/1000 [43:30<03:59,  1.86s/it]

Error extracting text from https://origin-nyi.thehill.com/policy/energy-environment/335898-dakota-access-pipeline-now-in-service: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/335898-dakota-access-pipeline-now-in-service/


Processing URLs:  87%|████████▋ | 872/1000 [43:31<03:52,  1.82s/it]

Error extracting text from http://news.sky.com/story/1698637/fighter-jets-buzz-us-plane-in-south-china-sea: 404 Client Error: Not Found for url: https://news.sky.com/story/1698637/fighter-jets-buzz-us-plane-in-south-china-sea


Processing URLs:  87%|████████▋ | 874/1000 [43:35<03:42,  1.76s/it]

Error extracting text from https://www.reuters.com/article/ukraine-crisis-food-mideast-idAFL8N2V31BU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/ukraine-crisis-food-mideast-idAFL8N2V31BU


Processing URLs:  88%|████████▊ | 877/1000 [43:37<02:13,  1.09s/it]

Error extracting text from https://ca.travelpulse.com/news/destinations/new-york-city-reawakens-new-hotels-attractions-coming-during-tourism-rebound.html: 405 Client Error: Not Allowed for url: https://www.travelpulse.ca/news/destinations/new-york-city-reawakens-new-hotels-attractions-coming-during-tourism-rebound.html
Error extracting text from https://www.reuters.com/world/europe/euro-zone-firms-see-wages-rising-by-3-or-more-ecb-says-2022-02-04/).: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/euro-zone-firms-see-wages-rising-by-3-or-more-ecb-says-2022-02-04/).


Processing URLs:  88%|████████▊ | 878/1000 [43:37<01:45,  1.15it/s]

Error extracting text from https://www.nytimes.com/2017/09/22/world/europe/florence-theresa-may-speech-brexit.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/22/world/europe/florence-theresa-may-speech-brexit.html


Processing URLs:  88%|████████▊ | 880/1000 [43:40<02:07,  1.06s/it]

Error extracting text from http://www.timesofisrael.com/one-year-on-us-obstacles-blunt-hopes-from-iran-deal/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/one-year-on-us-obstacles-blunt-hopes-from-iran-deal/


Processing URLs:  88%|████████▊ | 883/1000 [43:45<02:39,  1.36s/it]

Error extracting text from http://www.mb.com.ph/analyst-talks-effective-approach-vs-adiz/: 403 Client Error: Forbidden for url: https://mb.com.ph/analyst-talks-effective-approach-vs-adiz/


Processing URLs:  89%|████████▉ | 888/1000 [43:53<03:30,  1.88s/it]

Error extracting text from http://peacekeeper.ru/en/?module=news&amp;action=view&amp;id=29012: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  89%|████████▉ | 892/1000 [43:58<02:22,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-syria-envoy-idUSKCN0VR240: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-syria-envoy-idUSKCN0VR240


Processing URLs:  89%|████████▉ | 893/1000 [43:59<02:03,  1.15s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&amp;s=W_EPC0_SAX_YCUOK_MBBL&amp;f=W: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  90%|████████▉ | 898/1000 [44:06<02:00,  1.18s/it]

Error extracting text from http://economictimes.indiatimes.com/news/politics-and-nation/india-attempting-to-block-new-saarc-chief-pakistan-daily/articleshow/56918968.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/politics-and-nation/india-attempting-to-block-new-saarc-chief-pakistan-daily/articleshow/56918968.cms
Error extracting text from https://medium.com/backchannel/has-deepmind-really-passed-go-adc85e256bec#.hlyihagpo: 403 Client Error: Forbidden for url: https://medium.com/backchannel/has-deepmind-really-passed-go-adc85e256bec#.hlyihagpo


Processing URLs:  90%|█████████ | 900/1000 [44:07<01:23,  1.20it/s]

Error extracting text from http://www.reuters.com/article/2015/11/02/us-opec-report-idUSKCN0SR22320151102#eSTd8THcpWr7pbBi.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/02/us-opec-report-idUSKCN0SR22320151102#eSTd8THcpWr7pbBi.97


Processing URLs:  90%|█████████ | 902/1000 [44:09<01:26,  1.14it/s]

Error extracting text from http://thehill.com/homenews/administration/275641-obama-says-he-wont-drop-garland-for-clinton: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/275641-obama-says-he-wont-drop-garland-for-clinton/


Processing URLs:  90%|█████████ | 905/1000 [44:12<01:15,  1.26it/s]

Error extracting text from http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016#sthash.XlGrznjE.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016#sthash.XlGrznjE.dpuf


Processing URLs:  91%|█████████ | 910/1000 [44:20<01:50,  1.23s/it]

Error extracting text from https://www.middleeastmonitor.com/news/middle-east/24818-rouhani-cancels-austria-visit-after-opposition-protest: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/middle-east/24818-rouhani-cancels-austria-visit-after-opposition-protest


Processing URLs:  91%|█████████ | 912/1000 [44:23<01:52,  1.28s/it]

Error extracting text from https://www.predictit.org/markets/detail/7392/Who-will-be-elected-president-of-the-Philippines-in-2022: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/7392/Who-will-be-elected-president-of-the-Philippines-in-2022


Processing URLs:  92%|█████████▏| 917/1000 [44:35<03:07,  2.26s/it]

Error extracting text from http://www.defensenews.com/story/defense-news/2015/10/08/chinese-newspaper-spy-satellites-target-us-carriers/73568146/: 404 Client Error: Not Found for url: https://www.defensenews.com/story/defense-news/2015/10/08/chinese-newspaper-spy-satellites-target-us-carriers/73568146/


Processing URLs:  92%|█████████▏| 918/1000 [44:36<02:26,  1.79s/it]

Error extracting text from https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-report/2021/february/monetary-policy-report-february-2021.pdf?la=en&amp;hash=3638A7091B34164428A54277B55BD6901709AA44: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-report/2021/february/monetary-policy-report-february-2021.pdf?la=en&amp;hash=3638A7091B34164428A54277B55BD6901709AA44


Processing URLs:  92%|█████████▏| 921/1000 [44:40<02:00,  1.52s/it]

Error extracting text from http://ukpollingreport.co.uk/historical-polls/voting-intention-1987-1992: 404 Client Error: Not Found for url: http://ukpollingreport.co.uk/historical-polls/voting-intention-1987-1992
URL filtered: https://www.youtube.com/watch?v=IST6qRfVqwY


Processing URLs:  93%|█████████▎| 926/1000 [44:46<01:28,  1.20s/it]

Error extracting text from http://www.wsj.com/articles/china-installs-weapons-in-south-china-sea-satellites-show-1481771066?utm_source=huffingtonpost.com&amp;utm_medium=referral&amp;utm_campaign=pubexchange: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-installs-weapons-in-south-china-sea-satellites-show-1481771066?utm_source=huffingtonpost.com&amp;utm_medium=referral&amp;utm_campaign=pubexchange


Processing URLs:  93%|█████████▎| 933/1000 [44:52<00:48,  1.37it/s]

Error extracting text from http://www.latimes.com/world/europe/la-fg-europe-trump-20170203-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/europe/la-fg-europe-trump-20170203-story.html


Processing URLs:  93%|█████████▎| 934/1000 [44:54<01:00,  1.09it/s]

Error extracting text from http://www.econlib.org/library/YPDBooks/Keynes/kynsCP2.html: 403 Client Error: Forbidden for url: http://www.econlib.org/library/YPDBooks/Keynes/kynsCP2.html


Processing URLs:  94%|█████████▎| 936/1000 [45:54<19:47, 18.56s/it]

Error extracting text from http://www.usnews.com/opinion/blogs/peter-roff/2013/05/28/study-finds-fact-checkers-biased-against-republicans: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  94%|█████████▍| 941/1000 [46:04<04:51,  4.94s/it]

Error extracting text from http://vm.ee/et/ajalugu-eesti-liitumine-natoga: 404 Client Error: Not Found for url: https://vm.ee/et/ajalugu-eesti-liitumine-natoga


error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserErro

Error extracting text from http://www.foliomag.com/2016/time-inc-s-new-ad-product-aims-to-harness-social-engagement/: Document is empty


Processing URLs:  94%|█████████▍| 943/1000 [46:10<03:47,  3.99s/it]

Error extracting text from https://www.predictit.org/markets/detail/7128/How-many-seats-will-the-SNP-win-in-Scotland&#39;s-next-election: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/7128/How-many-seats-will-the-SNP-win-in-Scotland&#39;s-next-election


Processing URLs:  95%|█████████▍| 946/1000 [46:33<06:29,  7.22s/it]

Error extracting text from http://www.washingtonpost.com/world/africa/zimbabwe-multitudes-march-for-mugabe-life-rule/2016/05/25/969d7184-22a8-11e6-b944-52f7b1793dae_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/africa/zimbabwe-multitudes-march-for-mugabe-life-rule/2016/05/25/969d7184-22a8-11e6-b944-52f7b1793dae_story.html


Processing URLs:  95%|█████████▌| 951/1000 [46:43<02:06,  2.58s/it]

Error extracting text from https://www.latimes.com/world/la-fg-taiwan-nicaragua-20190223-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-taiwan-nicaragua-20190223-story.html


Processing URLs:  95%|█████████▌| 954/1000 [46:45<01:01,  1.33s/it]

Error extracting text from http://intelligencebriefs.com/sna-capture-more-areas-from-al-shabaab-militants-in-bay-region/: 406 Client Error: Not Acceptable for url: http://intelligencebriefs.com/sna-capture-more-areas-from-al-shabaab-militants-in-bay-region/


Processing URLs:  96%|█████████▌| 961/1000 [46:52<00:39,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-security-idUSKCN10D1NN?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-idUSKCN10D1NN?il=0


Processing URLs:  97%|█████████▋| 966/1000 [47:05<01:14,  2.18s/it]

URL filtered: https://www.youtube.com/watch?v=QmxxpJbXX8Q


Processing URLs:  97%|█████████▋| 970/1000 [47:10<00:44,  1.47s/it]

Error extracting text from http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?emc=edit_th_20160610&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?emc=edit_th_20160610&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  97%|█████████▋| 972/1000 [47:14<00:50,  1.81s/it]

Error extracting text from http://www.postregister.com/articles/nation-world/2016/11/09/turkey-syrian-kurds-disagree-over-raqqa-offensive: 404 Client Error: Not Found for url: https://www.postregister.com/articles/nation-world/2016/11/09/turkey-syrian-kurds-disagree-over-raqqa-offensive/


Processing URLs:  98%|█████████▊| 978/1000 [47:20<00:21,  1.04it/s]

Error extracting text from http://www.advisorperspectives.com/dshort/updates/Inflation-Since-1872: 403 Client Error: Forbidden for url: https://www.advisorperspectives.com/dshort/updates/Inflation-Since-1872


Processing URLs:  98%|█████████▊| 979/1000 [47:21<00:20,  1.05it/s]

Error extracting text from https://www.thenation.com/article/who-is-felix-sater-and-why-is-donald-trump-so-afraid-of-him/: 404 Client Error: Not Found for url: https://www.thenation.com/article/who-is-felix-sater-and-why-is-donald-trump-so-afraid-of-him/
URL filtered: https://www.nytimes.com/2016/12/15/technology/facebook-fake-news.html


Processing URLs:  98%|█████████▊| 982/1000 [47:25<00:18,  1.04s/it]

Error extracting text from http://www.defensenews.com/articles/us-lawmakers-urge-obama-to-punish-russia-missile-treaty-breach: 404 Client Error: Not Found for url: https://www.defensenews.com/articles/us-lawmakers-urge-obama-to-punish-russia-over-missile-treaty-breach/
Error extracting text from http://www.reuters.com/article/iran-oil-exports-idUSL3N1HT1QS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/iran-oil-exports-idUSL3N1HT1QS


Processing URLs:  99%|█████████▊| 987/1000 [47:40<00:31,  2.45s/it]

Error extracting text from http://www.memrise.com/course/97259/persian-farsi-basic-course/garden/learn/: 404 Client Error: Not Found for url: https://app.memrise.com/course/97259/persian-farsi-basic-course/garden/learn/


Processing URLs:  99%|█████████▉| 992/1000 [47:44<00:08,  1.07s/it]

Error extracting text from http://uk.reuters.com/article/2015/12/03/uk-britain-eu-idUKKBN0TM1YP20151203: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.infobae.com/new-resizer/N9eyilufQt0KYlopxkbRiq90ffE=/600x0/s3.amazonaws.com/arc-wordpress-client-uploads/infobae-wp/wp-content/uploads/2017/07/24131824/Caricatura-Trump-Putin.jpg?token=bar: 403 Client Error: Forbidden for url: http://www.infobae.com/new-resizer/N9eyilufQt0KYlopxkbRiq90ffE=/600x0/s3.amazonaws.com/arc-wordpress-client-uploads/infobae-wp/wp-content/uploads/2017/07/24131824/Caricatura-Trump-Putin.jpg?token=bar


Processing URLs: 100%|█████████▉| 995/1000 [47:51<00:08,  1.60s/it]

Error extracting text from https://cvppindia.com/projects.html: 404 Client Error: Not Found for url: https://www.cvppindia.com/projects.html


Processing URLs: 100%|█████████▉| 996/1000 [47:52<00:05,  1.39s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53527#.VvnCcj-u_wc: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53527#.VvnCcj-u_wc


Processing URLs: 100%|█████████▉| 997/1000 [47:52<00:03,  1.04s/it]

Error extracting text from http://www.parliament.uk/mps-lords-and-offices/mps/current-state-of-the-parties/: 403 Client Error: Forbidden for url: http://www.parliament.uk/mps-lords-and-offices/mps/current-state-of-the-parties/


Processing URLs: 100%|██████████| 1000/1000 [47:56<00:00,  2.88s/it]


URL filtered: https://twitter.com/kvogt/status/920312043776778240


Processing URLs:   0%|          | 4/1000 [00:12<35:41,  2.15s/it]  

Error extracting text from https://news.usni.org/2016/07/12/senator-wants-navy-freedom-navigation-operation-past-mischief-reef-soon: 403 Client Error: Forbidden for url: https://news.usni.org/2016/07/12/senator-wants-navy-freedom-navigation-operation-past-mischief-reef-soon


Processing URLs:   0%|          | 5/1000 [00:13<26:26,  1.59s/it]

Error extracting text from http://cleantechnica.com/2015/07/07/renewables-17-of-us-electricity-production-in-april-exclusive/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2015/07/07/renewables-17-of-us-electricity-production-in-april-exclusive/


Processing URLs:   1%|          | 9/1000 [00:19<28:33,  1.73s/it]

Error extracting text from https://macropolo.org/xi-may-ccp-core-will-govern-enduring-core-ideas/: 403 Client Error: Forbidden for url: https://macropolo.org/xi-may-ccp-core-will-govern-enduring-core-ideas/


Processing URLs:   1%|          | 12/1000 [00:24<21:23,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-iran-usa-sanctions-idUSKBN16U2VI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-sanctions-idUSKBN16U2VI


Processing URLs:   2%|▏         | 20/1000 [00:37<28:16,  1.73s/it]

Error extracting text from https://ballotpedia.org/SCOTUS_case_reversal_rates_: 404 Client Error: Not Found for url: https://ballotpedia.org/SCOTUS_case_reversal_rates


Processing URLs:   2%|▏         | 23/1000 [00:41<23:21,  1.43s/it]

Error extracting text from https://tradingeconomics.com/united-states/unemployment-rate: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/united-states/unemployment-rate
Error extracting text from https://www.reuters.com/article/us-saudi-aramco-ipo-crownprince-exclusiv/exclusive-saudi-aramco-ipo-on-track-for-2018-saudi-crown-prince-idUSKBN1CV0YW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-ipo-crownprince-exclusiv/exclusive-saudi-aramco-ipo-on-track-for-2018-saudi-crown-prince-idUSKBN1CV0YW


Processing URLs:   3%|▎         | 26/1000 [00:43<14:33,  1.12it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/258547-dems-blocked-on-ex-im-reauthorization: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/258547-dems-blocked-on-ex-im-reauthorization/


Processing URLs:   3%|▎         | 28/1000 [00:46<20:46,  1.28s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/ipsos-peru-ppk-tiene-equipo-mas-honesto-keiko-fujimori-hace-mejor-campana-noticia-1903864: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/ipsos-peru-ppk-tiene-equipo-mas-honesto-keiko-fujimori-hace-mejor-campana-noticia-1903864/


Processing URLs:   3%|▎         | 29/1000 [00:46<17:19,  1.07s/it]

Error extracting text from http://thehill.com/homenews/senate/347522-co-founder-of-firm-tied-to-trump-dossier-interviews-with-senate-panel: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/347522-co-founder-of-firm-tied-to-trump-dossier-interviews-with-senate-panel/


Processing URLs:   3%|▎         | 32/1000 [00:52<26:51,  1.67s/it]

Error extracting text from http://thebulletinpanama.com/2015/12/case-postponed-in-gupc-canal-expansion-claim/: 404 Client Error: Not Found for url: https://thebulletinpanama.com/2015/12/case-postponed-in-gupc-canal-expansion-claim/


Processing URLs:   4%|▍         | 38/1000 [01:18<36:39,  2.29s/it]  

Error extracting text from https://news.usni.org/2016/05/11/beijing-vows-to-increase-south-china-sea-defenses-calls-u-s-greatest-threat-in-region: 403 Client Error: Forbidden for url: https://news.usni.org/2016/05/11/beijing-vows-to-increase-south-china-sea-defenses-calls-u-s-greatest-threat-in-region


Processing URLs:   4%|▍         | 42/1000 [01:23<22:52,  1.43s/it]

Error extracting text from https://www.arabnews.com/node/1248336/world: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1248336/world


Processing URLs:   4%|▍         | 43/1000 [01:24<19:14,  1.21s/it]

Error extracting text from https://www.fda.gov/media/143892/download: 404 Client Error: Not Found for url: https://www.fda.gov/media/143892/download


Processing URLs:   5%|▍         | 47/1000 [01:28<16:13,  1.02s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/04/bangladesh-authorities-fail-to-curb-brutal-killing-spree-as-lgbti-editor-hacked-to-death/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/04/bangladesh-authorities-fail-to-curb-brutal-killing-spree-as-lgbti-editor-hacked-to-death/


Processing URLs:   5%|▍         | 48/1000 [01:30<17:54,  1.13s/it]

Error extracting text from https://www.reuters.com/business/environment/scientists-warn-bad-year-fires-brazils-amazon-wetlands-2021-05-27/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/environment/scientists-warn-bad-year-fires-brazils-amazon-wetlands-2021-05-27/


Processing URLs:   5%|▌         | 54/1000 [01:33<09:20,  1.69it/s]

Error extracting text from https://www.france24.com/en/live-news/20210710-who-will-lead-haiti-after-president-s-killing: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210710-who-will-lead-haiti-after-president-s-killing
URL filtered: https://www.youtube.com/watch?v=Y-VLwBjJtHI
Error extracting text from http://www.pcmag.com/article2/0,2817,2497982,00.asp: 403 Client Error: Forbidden for url: http://www.pcmag.com/article2/0,2817,2497982,00.asp


Processing URLs:   6%|▌         | 55/1000 [01:35<14:12,  1.11it/s]

URL filtered: http://www.cnn.com/2017/10/30/opinions/facebook-congress-russia-threat-chen-opinion/index.html
URL filtered: https://www.bloomberg.com/news/articles/2017-11-28/u-k-and-eu-agree-on-brexit-bill-in-breakthrough-telegraph-says


Processing URLs:   6%|▌         | 60/1000 [01:37<08:30,  1.84it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN1691SJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN1691SJ
Error extracting text from http://www.nytimes.com/2016/04/03/us/politics/2-republican-senators-revoke-support-for-garland-hearings.html?emc=edit_th_20160403&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/03/us/politics/2-republican-senators-revoke-support-for-garland-hearings.html?emc=edit_th_20160403&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://www.reuters.com/article/us-oil-meeting-draft-idUSKCN0XE02Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-meeting-draft-idUSKCN0XE02Y


Processing URLs:   6%|▋         | 64/1000 [01:41<13:50,  1.13it/s]

Error extracting text from http://www.opinion.co.uk/article.php?s=daily-telegraph-poll-march-2016: HTTPConnectionPool(host='www.opinion.co.uk', port=80): Max retries exceeded with url: /article.php?s=daily-telegraph-poll-march-2016 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feeb41d0>: Failed to resolve 'www.opinion.co.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 66/1000 [01:53<44:30,  2.86s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-14/obamacare-noses-into-the-tax-debate-bringing-along-its-baggage


Processing URLs:   7%|▋         | 69/1000 [01:55<28:01,  1.81s/it]

Error extracting text from https://www.wsj.com/articles/hunter-bidens-family-name-aided-deals-with-foreign-tycoons-11608682462: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hunter-bidens-family-name-aided-deals-with-foreign-tycoons-11608682462
Error extracting text from http://keplerscience.arc.nasa.gov/index.html: HTTPConnectionPool(host='keplerscience.arc.nasa.gov', port=80): Max retries exceeded with url: /index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feeb6a20>: Failed to resolve 'keplerscience.arc.nasa.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 72/1000 [01:57<17:57,  1.16s/it]

Error extracting text from http://aranews.net/2016/04/syrian-rebels-massacre-kurdish-civilians-aleppo/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/syrian-rebels-massacre-kurdish-civilians-aleppo/


Processing URLs:   7%|▋         | 74/1000 [01:58<14:35,  1.06it/s]

Error extracting text from http://www.nytimes.com/2015/09/16/business/economy/export-import-bank-general-electric-boeing.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/16/business/economy/export-import-bank-general-electric-boeing.html


Processing URLs:   8%|▊         | 78/1000 [02:11<31:05,  2.02s/it]

Error extracting text from http://www.morningnewsusa.com/cyberattack-iran-us-contingency-plan-escalation-nuke-conflict-2359050.html: HTTPConnectionPool(host='www.morningnewsusa.com', port=80): Max retries exceeded with url: /cyberattack-iran-us-contingency-plan-escalation-nuke-conflict-2359050.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5cc80>: Failed to resolve 'www.morningnewsusa.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/BBCNews/status/930181672259088384
Error extracting text from http://www.reuters.com/article/us-results-oilfield-preview-idUSKBN1A50BC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-results-oilfield-preview-idUSKBN1A50BC?il=0


Processing URLs:   8%|▊         | 80/1000 [02:13<24:22,  1.59s/it]

Error extracting text from https://www.predictit.org/Contract/11890/Will-a-%40realDonaldTrump-tweet-mention-QAnon-by-Sept-30#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/11890/Will-a-%40realDonaldTrump-tweet-mention-QAnon-by-Sept-30#data


Processing URLs:   8%|▊         | 84/1000 [02:22<32:55,  2.16s/it]

Error extracting text from http://www.stripes.com/news/us-sends-more-troops-to-iraq-to-prepare-for-mosul-battle-1.427978: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/us-sends-more-troops-to-iraq-to-prepare-for-mosul-battle-1.427978


Processing URLs:   8%|▊         | 85/1000 [02:22<27:21,  1.79s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-15/brazilian-court-said-to-lean-toward-rejecting-rousseff-accounts


Processing URLs:   9%|▊         | 87/1000 [02:23<18:09,  1.19s/it]

Error extracting text from http://www.mo4ch.com/hundreds-of-royal-marines-to-join-the-sas-to-train-new-syrian-army-and-fight-isis/: 404 Client Error: Not Found for url: https://www.mo4ch.com/hundreds-of-royal-marines-to-join-the-sas-to-train-new-syrian-army-and-fight-isis/


Processing URLs:   9%|▉         | 88/1000 [02:24<16:01,  1.05s/it]

Error extracting text from http://blogs.reuters.com/macroscope/2015/10/30/bank-of-japan-reruns-inflation-downgrade-script/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /macroscope/2015/10/30/bank-of-japan-reruns-inflation-downgrade-script/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5f770>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/edsaperia


Processing URLs:   9%|▉         | 92/1000 [02:26<09:32,  1.58it/s]

Error extracting text from https://www.wsj.com/articles/turkeys-erdogan-approves-referendum-on-presidential-power-1486742933: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkeys-erdogan-approves-referendum-on-presidential-power-1486742933
Error extracting text from http://www.wsj.com/articles/global-economy-week-ahead-u-s-china-meetings-europe-pmi-fed-minutes-1479672001: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/global-economy-week-ahead-u-s-china-meetings-europe-pmi-fed-minutes-1479672001
URL filtered: https://www.youtube.com/watch?v=3e6UKCJt-g8


Processing URLs:   9%|▉         | 94/1000 [02:26<06:22,  2.37it/s]

Error extracting text from https://www.nytimes.com/2017/04/29/world/asia/marines-return-to-helmand-province-for-a-job-they-thought-was-done.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/29/world/asia/marines-return-to-helmand-province-for-a-job-they-thought-was-done.html


Processing URLs:  10%|▉         | 96/1000 [02:27<07:29,  2.01it/s]

Error extracting text from https://www.treatearly.org/promising-drugs: 404 Client Error: Not Found for url: https://www.treatearly.org/promising-drugs
Error extracting text from http://www.reuters.com/article/us-usa-trump-epa-idUSKBN15E1MM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-epa-idUSKBN15E1MM


Processing URLs:  10%|█         | 102/1000 [02:55<33:50,  2.26s/it] 

Error extracting text from http://thehill.com/homenews/administration/325096-warren-delay-gorsuch-vote-because-of-russia-investigation: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/325096-warren-delay-gorsuch-vote-because-of-russia-investigation/


Processing URLs:  10%|█         | 105/1000 [03:02<27:09,  1.82s/it]

Error extracting text from http://www.reuters.com/article/2015/07/19/us-iran-nuclear-germany-idUSKCN0PT0FO20150719: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/07/19/us-iran-nuclear-germany-idUSKCN0PT0FO20150719


Processing URLs:  11%|█         | 107/1000 [03:03<17:40,  1.19s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/311.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/311.htm


Processing URLs:  11%|█         | 109/1000 [03:05<19:14,  1.30s/it]

Error extracting text from https://www.ipsos-mori.com/researchpublications/researcharchive/3736/Economist-Ipsos-MORI-May-2016-Issues-Index.aspx: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-uk/researchpublications/researcharchive/3736/Economist-Ipsos-MORI-May-2016-Issues-Index.aspx


Processing URLs:  12%|█▏        | 124/1000 [04:04<1:40:27,  6.88s/it]

Error extracting text from http://tehrantimes.com/index_View.asp?code=249580: 504 Server Error: Gateway Time-out for url: https://tehrantimes.com/index_View.asp?code=249580


Processing URLs:  13%|█▎        | 126/1000 [04:07<57:32,  3.95s/it]  

Error extracting text from http://www.scotsman.com/news/uk/princess-diana-and-dodi-fayed-murdered-by-sas-1-3051747: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/uk/princess-diana-and-dodi-fayed-murdered-by-sas-1-3051747


Processing URLs:  13%|█▎        | 130/1000 [04:11<25:47,  1.78s/it]

Error extracting text from https://www.ccjdigital.com/economic-trends/article/15066615/june-truck-tonnage-index-slides-for-second-consecutive-month: 403 Client Error: Forbidden for url: https://www.ccjdigital.com/economic-trends/article/15066615/june-truck-tonnage-index-slides-for-second-consecutive-month


Processing URLs:  13%|█▎        | 134/1000 [04:16<16:41,  1.16s/it]

Error extracting text from http://www.wsj.com/articles/germanys-schauble-sees-no-need-for-immediate-decision-on-greece-payments-1457438844: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germanys-schauble-sees-no-need-for-immediate-decision-on-greece-payments-1457438844


Processing URLs:  14%|█▎        | 136/1000 [05:17<4:30:30, 18.79s/it]

Error extracting text from https://sports.ladbrokes.com/sports-central/uk-eu-referendum/: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /sports-central/uk-eu-referendum/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300a5d520>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:  14%|█▍        | 139/1000 [05:20<1:41:22,  7.06s/it]

Error extracting text from https://allthingsnuclear.org/dwright/new-north-korean-icbm/): 403 Client Error: Forbidden for url: https://blog.ucsusa.org/dwright/new-north-korean-icbm/)
Error extracting text from http://www.reuters.com/article/us-russia-turkey-business-idUSKCN0WW0EK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-turkey-business-idUSKCN0WW0EK
Error extracting text from https://www.reuters.com/finance/stocks/overview/AAPL.OQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/finance/stocks/overview/AAPL.OQ


Processing URLs:  14%|█▍        | 143/1000 [05:24<40:07,  2.81s/it]  

Error extracting text from http://app.debka.com/p/article/25818/Mosul-offensive-folds-waiting-now-for-Trump: 404 Client Error: Not Found for url: http://app.debka.com/p/article/25818/Mosul-offensive-folds-waiting-now-for-Trump


Processing URLs:  14%|█▍        | 144/1000 [05:24<30:02,  2.11s/it]

Error extracting text from https://www.nytimes.com/2017/08/06/world/europe/russia-america-military-exercise-trump-putin.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/06/world/europe/russia-america-military-exercise-trump-putin.html


Processing URLs:  15%|█▍        | 146/1000 [05:29<31:40,  2.23s/it]

Error extracting text from http://www.reuters.com/article/us-china-scs-training-idUSKCN0XE0GZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-scs-training-idUSKCN0XE0GZ


Processing URLs:  15%|█▍        | 148/1000 [05:32<27:05,  1.91s/it]

Error extracting text from http://measure.in/: 406 Client Error: Not Acceptable for url: https://www.hitmedia.in/


Processing URLs:  15%|█▍        | 149/1000 [05:33<25:06,  1.77s/it]

URL filtered: https://www.youtube.com/watch?v=IFlI3_cqVSk


Processing URLs:  16%|█▌        | 157/1000 [06:40<4:10:31, 17.83s/it]

Error extracting text from http://www.edmunds.com/fuel-economy/will-californias-zero-emissions-mandate-alter-the-car-landscape.html: HTTPConnectionPool(host='www.edmunds.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▌        | 159/1000 [06:41<2:08:38,  9.18s/it]

Error extracting text from http://www.nytimes.com/2009/03/31/world/africa/31arab.html?fta=y: 403 Client Error: Forbidden for url: http://www.nytimes.com/2009/03/31/world/africa/31arab.html?fta=y


Processing URLs:  16%|█▌        | 160/1000 [06:43<1:36:28,  6.89s/it]

Error extracting text from http://www.cnbc.com/2016/02/26/new-irs-cyberattack-total-is-more-than-twice-previously-disclosed-dj-citing-irs.html: 503 Server Error: Service Unavailable for url: https://www.cnbc.com/2016/02/26/new-irs-cyberattack-total-is-more-than-twice-previously-disclosed-dj-citing-irs.html


Processing URLs:  16%|█▋        | 163/1000 [06:46<45:59,  3.30s/it]  

URL filtered: https://www.catholicnews.com/hearing-mulls-2022-winter-olympic-boycott-over-china-human-rights-record/#.YKZFiQo4onM.twitter


Processing URLs:  17%|█▋        | 166/1000 [06:49<25:07,  1.81s/it]

Error extracting text from http://www.wsj.com/articles/pound-higher-as-brexit-fears-abate-1466172801: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pound-higher-as-brexit-fears-abate-1466172801


Processing URLs:  17%|█▋        | 174/1000 [07:32<2:23:35, 10.43s/it]

Error extracting text from https://thestack.com/security/2016/07/08/6000-strong-north-korean-hacker-army-collects-866-million-per-year/: 522 Server Error:  for url: https://thestack.com/security/2016/07/08/6000-strong-north-korean-hacker-army-collects-866-million-per-year/


Processing URLs:  18%|█▊        | 176/1000 [07:37<1:27:23,  6.36s/it]

Error extracting text from http://peru21.pe/politica/keiko-fujimori-le-saco-casi-seis-puntos-ventaja-ppk-segun-pulso-peru-2247686: 404 Client Error: Not Found for url: https://peru21.pe/politica/keiko-fujimori-le-saco-casi-seis-puntos-ventaja-ppk-segun-pulso-peru-2247686/


Processing URLs:  18%|█▊        | 178/1000 [07:39<50:03,  3.65s/it]  

Error extracting text from http://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volkswagen-usa-idUSKBN0UI1QP20160104


Processing URLs:  18%|█▊        | 181/1000 [07:43<28:30,  2.09s/it]

Error extracting text from http://www.iranhumanrights.org/2015/10/jason-rezaian-confessions/: 403 Client Error: Forbidden for url: http://www.iranhumanrights.org/2015/10/jason-rezaian-confessions/


Processing URLs:  18%|█▊        | 182/1000 [07:46<31:53,  2.34s/it]

Error extracting text from http://thebulletin.org/end-moscow-ankara-nuclear-cooperation9059: 404 Client Error: Not Found for url: https://thebulletin.org/end-moscow-ankara-nuclear-cooperation9059/
URL filtered: http://www.independent.co.uk/news/world/europe/isis-video-russia-vladimir-putin-youtube-jihad-threat-a7165916.html


Processing URLs:  20%|█▉        | 199/1000 [08:05<09:42,  1.38it/s]

Error extracting text from http://researchbriefings.parliament.uk/ResearchBriefing/Summary/SN05871: 403 Client Error: Forbidden for url: http://researchbriefings.parliament.uk/ResearchBriefing/Summary/SN05871
Error extracting text from http://economistsview.typepad.com/timduy/2015/10/brainard-drops-a-policy-bomb.html: 403 Client Error: Forbidden for url: https://economistsview.typepad.com/timduy/2015/10/brainard-drops-a-policy-bomb.html


Processing URLs:  20%|██        | 200/1000 [09:05<4:06:41, 18.50s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article171248977.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  20%|██        | 204/1000 [09:09<1:09:11,  5.22s/it]

Error extracting text from https://www.ipobgovernment.org/ipob1/biafra-history/: 429 Client Error: Too Many Requests for url: https://www.ipobgovernment.org/ipob1/biafra-history/


Processing URLs:  21%|██        | 207/1000 [09:13<31:56,  2.42s/it]  

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-usa-miliary-idUSKBN19P0BM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-usa-miliary-idUSKBN19P0BM
URL filtered: https://www.bloombergquint.com/onweb/diplomatic-pressure-mounts-on-ethiopia-over-filling-of-giant-dam


Processing URLs:  21%|██        | 209/1000 [09:14<21:15,  1.61s/it]

Error extracting text from https://www.cbc.ca/news/business/wheat-prices-ukraine-1.6377239;: 404 Client Error: Not Found for url: https://www.cbc.ca/news/business/wheat-prices-ukraine-1.6377239;


Processing URLs:  21%|██        | 210/1000 [09:16<22:59,  1.75s/it]

Error extracting text from http://www.thefiscaltimes.com/2015/10/01/McCarthy-Isn-t-Speaker-Yet-Some-Conservatives-Want-Fire-Him-Already&gt: 404 Client Error: Not Found for url: https://www.thefiscaltimes.com:443/2015/10/01/McCarthy-Isn-t-Speaker-Yet-Some-Conservatives-Want-Fire-Him-Already&gt


Processing URLs:  21%|██▏       | 213/1000 [09:21<21:51,  1.67s/it]

Error extracting text from https://www.iata.org/en/pressroom/2021-releases/2021-08-31-01/: 404 Client Error: Not Found for url: https://www.iata.org/en/pressroom/2021-releases/2021-08-31-01/


Processing URLs:  21%|██▏       | 214/1000 [09:22<17:17,  1.32s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-01/republicans-rebrand-obamacare-strategy-from-repeal-to-repair


Processing URLs:  22%|██▏       | 216/1000 [09:22<10:08,  1.29it/s]

Error extracting text from http://www.wsj.com/articles/isis-herds-civilians-to-mosul-as-human-shields-1477675640: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/isis-herds-civilians-to-mosul-as-human-shields-1477675640


Processing URLs:  22%|██▏       | 218/1000 [09:24<13:08,  1.01s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-02/venezuela-will-seek-to-restructure-debt-as-sanctions-take-hold
URL filtered: https://www.bloomberg.com/view/articles/2016-08-11/venezuela-has-good-reasons-to-avoid-default
URL filtered: https://www.youtube.com/watch?v=aip0BAWrdLw
URL filtered: https://www.bloomberg.com/news/articles/2017-11-10/venezuelan-oil-output-heads-to-29-year-low-as-cash-crunch-grows


Processing URLs:  23%|██▎       | 226/1000 [09:32<14:14,  1.10s/it]

Error extracting text from https://chinapower.csis.org/military-spending/#:~:text=In%20March%202021%2C%20China%20announced,183.5%20billion: 403 Client Error: Forbidden for url: https://chinapower.csis.org/military-spending/#:~:text=In%20March%202021%2C%20China%20announced,183.5%20billion


Processing URLs:  23%|██▎       | 228/1000 [09:33<11:54,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13U1U2?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13U1U2?feedType=RSS&amp;feedName=worldNews


Processing URLs:  23%|██▎       | 229/1000 [09:36<17:03,  1.33s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-07/waymo-driverless-cars-are-now-driverless-in-ground-breaking-test


Processing URLs:  23%|██▎       | 233/1000 [09:42<23:28,  1.84s/it]

Error extracting text from https://nearshoreamericas.com/nicaragua-ortega-gains-sweeping-powers-to-declare-political-rivals-terrorists/: 403 Client Error: Forbidden for url: https://nearshoreamericas.com/nicaragua-ortega-gains-sweeping-powers-to-declare-political-rivals-terrorists/


Processing URLs:  23%|██▎       | 234/1000 [09:45<26:13,  2.05s/it]

URL filtered: http://www.bbc.com/news/world-asia-35476099?ns_mchannel=social&amp;ns_campaign=bbc_breaking&amp;ns_source=twitter&amp;ns_linkname=news_central


Processing URLs:  24%|██▎       | 236/1000 [09:46<17:41,  1.39s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/hold-informal-poll-12-candidates-chief-40759267: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/hold-informal-poll-12-candidates-chief-40759267


Processing URLs:  24%|██▍       | 239/1000 [09:50<15:21,  1.21s/it]

URL filtered: https://www.youtube.com/watch?v=yixhyPN0r3g


Processing URLs:  24%|██▍       | 244/1000 [10:15<49:33,  3.93s/it]  

Error extracting text from https://legiscan.com/AR/rollcall/SB120/id/573473: 403 Client Error: Forbidden for url: https://legiscan.com/AR/rollcall/SB120/id/573473


Processing URLs:  24%|██▍       | 245/1000 [10:18<46:12,  3.67s/it]

Error extracting text from http://www.gov.me/en/News/162286/Prime-Minister-dukanovic-hosts-farewell-visit-by-German-Ambassador.html: 404 Client Error: not found for url: https://www.gov.me/en/News/162286/Prime-Minister-dukanovic-hosts-farewell-visit-by-German-Ambassador.html


Processing URLs:  25%|██▍       | 246/1000 [10:18<35:23,  2.82s/it]

Error extracting text from http://www.wsj.com/articles/u-s-pursued-secret-contacts-with-assad-regime-for-years-1450917657: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-pursued-secret-contacts-with-assad-regime-for-years-1450917657


Processing URLs:  25%|██▍       | 247/1000 [10:19<26:07,  2.08s/it]

Error extracting text from http://www.reuters.com/article/us-burundi-politics-army-idUSKCN0XG2EZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-politics-army-idUSKCN0XG2EZ


Processing URLs:  25%|██▌       | 250/1000 [10:23<17:55,  1.43s/it]

Error extracting text from http://www.reuters.com/article/2015/10/28/us-usa-election-clinton-idUSKCN0SM2A520151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-usa-election-clinton-idUSKCN0SM2A520151028


Processing URLs:  25%|██▌       | 253/1000 [10:26<13:20,  1.07s/it]

Error extracting text from http://www.latimes.com/la-ol-opinion-newsletter-trump-russia-20170715-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/la-ol-opinion-newsletter-trump-russia-20170715-htmlstory.html


Processing URLs:  26%|██▌       | 257/1000 [10:28<08:22,  1.48it/s]

Error extracting text from http://independentnig.com/2016/04/global-oil-market-iran-says-wont-freeze-oil-production/: 406 Client Error: Not Acceptable for url: http://independentnig.com/2016/04/global-oil-market-iran-says-wont-freeze-oil-production/
Error extracting text from http://www.reuters.com/article/us-britain-eu-negotiation-juncker-idUSKBN17V0JP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-negotiation-juncker-idUSKBN17V0JP


Processing URLs:  26%|██▌       | 259/1000 [10:30<09:59,  1.24it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VouTofl97cs: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VouTofl97cs


Processing URLs:  26%|██▌       | 260/1000 [10:31<10:13,  1.21it/s]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-venezuela-comment-c048dc4a-91e8-11e5-befa-99ceebcbb272-20151123-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-venezuela-comment-c048dc4a-91e8-11e5-befa-99ceebcbb272-20151123-story.html


Processing URLs:  26%|██▌       | 261/1000 [10:36<24:05,  1.96s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-10-21/brexit-talks-loom-large-as-eu-fumbles-trade-deal-with-canada


Processing URLs:  27%|██▋       | 272/1000 [10:57<28:00,  2.31s/it]

Error extracting text from http://38north.org/2014/05/sohae052014/#_ftn1: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml#_ftn1


Processing URLs:  27%|██▋       | 273/1000 [11:57<3:56:06, 19.49s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2016-05-19/us-could-lift-arms-embargo-on-vietnam-amid-china-tensions: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 274/1000 [12:00<2:55:57, 14.54s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-05-17/trump-has-to-decide-50-000-troops-to-afghanistan


Processing URLs:  28%|██▊       | 282/1000 [12:06<21:02,  1.76s/it]  

Error extracting text from http://www.wsj.com/articles/panama-bernie-1459984539?tesla=y: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/panama-bernie-1459984539?tesla=y


Processing URLs:  28%|██▊       | 283/1000 [12:07<15:34,  1.30s/it]

Error extracting text from https://www.jstor.org/stable/resrep11289?seq=70#metadata_info_tab_contents: 420 Client Error: Enhance Your Calm for url: https://www.jstor.org/stable/resrep11289?seq=70#metadata_info_tab_contents


Processing URLs:  29%|██▊       | 286/1000 [12:11<17:38,  1.48s/it]

Error extracting text from http://www.bea.gov/newsreleases/national/gdp/2016/tech3q16_2nd.htm: 404 Client Error: Not Found for url: https://www.bea.gov/newsreleases/national/gdp/2016/tech3q16_2nd.htm


Processing URLs:  29%|██▉       | 289/1000 [12:17<17:55,  1.51s/it]

Error extracting text from https://news.riskadvisory.net/2015/16/iran-summary-of-nuclear-deal-timeline-and-sanctions-relief/: HTTPSConnectionPool(host='news.riskadvisory.net', port=443): Max retries exceeded with url: /2015/16/iran-summary-of-nuclear-deal-timeline-and-sanctions-relief/ (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  29%|██▉       | 291/1000 [12:20<16:38,  1.41s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/05/15/US-says-bid-to-retake-Iraq-s-Mosul-from-is-making-progress-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/05/15/US-says-bid-to-retake-Iraq-s-Mosul-from-is-making-progress-.html


Processing URLs:  29%|██▉       | 293/1000 [12:23<19:34,  1.66s/it]

Error extracting text from https://www.mda.mil/about/mission.html: HTTPSConnectionPool(host='www.mda.mil', port=443): Max retries exceeded with url: /about/mission.html (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  30%|██▉       | 295/1000 [12:27<20:17,  1.73s/it]

Error extracting text from http://www2.unwto.org/content/why-tourism: 404 Client Error: Not Found for url: https://www.unwto.org/content/why-tourism


Processing URLs:  30%|██▉       | 296/1000 [12:27<15:02,  1.28s/it]

Error extracting text from http://www.wsj.com/articles/tesla-weighs-new-challenge-to-state-direct-sales-bans-1459189069?mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tesla-weighs-new-challenge-to-state-direct-sales-bans-1459189069?mg=id-wsj


Processing URLs:  30%|██▉       | 297/1000 [12:29<16:15,  1.39s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-06/investors-start-doubting-oil-rally-after-failure-to-top-55


Processing URLs:  30%|██▉       | 299/1000 [12:30<11:50,  1.01s/it]

Error extracting text from http://asiapacificreport.nz/2016/06/16/beijings-invisible-hand-felt-as-hong-kong-press-freedom-declines/: 403 Client Error: Forbidden for url: https://asiapacificreport.nz/2016/06/16/beijings-invisible-hand-felt-as-hong-kong-press-freedom-declines/


Processing URLs:  30%|███       | 302/1000 [12:36<15:01,  1.29s/it]

Error extracting text from http://gulfstateanalytics.com/archives/work/italian-analyst-italy-eyes-growing-economic-and-security-cooperation-with-post-sanctions-iran: 403 Client Error: Forbidden for url: http://gulfstateanalytics.com/archives/work/italian-analyst-italy-eyes-growing-economic-and-security-cooperation-with-post-sanctions-iran
Error extracting text from https://www.breakingtravelnews.com/news/article/expo-2020-dubai-on-track-for-nine-million-visitors/: 403 Client Error: Forbidden for url: https://www.breakingtravelnews.com/news/article/expo-2020-dubai-on-track-for-nine-million-visitors/


Processing URLs:  31%|███       | 306/1000 [12:50<22:33,  1.95s/it]

Error extracting text from http://english.aawsat.com/2016/06/article55351942/saudi-ambassador-iraq-iranians-heave-sectarian-strife-fallujah: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/06/article55351942/saudi-ambassador-iraq-iranians-heave-sectarian-strife-fallujah


Processing URLs:  31%|███       | 308/1000 [12:53<20:37,  1.79s/it]

Error extracting text from https://www.naij.com/863597-shocker-buhari-nigerias-problem-not-solution-us-intelligence-chief.html: 410 Client Error: Gone for url: https://www.legit.ng/863597-shocker-buhari-nigerias-problem-not-solution-us-intelligence-chief.html


Processing URLs:  31%|███       | 310/1000 [12:55<13:54,  1.21s/it]

Error extracting text from http://www.arabianbusiness.com/saudi-finance-minister-very-optimistic-on-2016-budget-gap-652471.html: 403 Client Error: HTTP Forbidden for url: https://www.arabianbusiness.com/saudi-finance-minister-very-optimistic-on-2016-budget-gap-652471.html


Processing URLs:  31%|███       | 312/1000 [12:57<14:00,  1.22s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-22/opec-still-waiting-for-evidence-oil-cuts-are-doing-their-job


Processing URLs:  32%|███▏      | 322/1000 [13:12<14:37,  1.29s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-usa-airstrikes-idUSKCN1B518I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-usa-airstrikes-idUSKCN1B518I


Processing URLs:  32%|███▏      | 324/1000 [13:14<12:25,  1.10s/it]

Error extracting text from https://www.nytimes.com/2017/08/25/opinion/the-slaughter-of-children-in-yemen.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/25/opinion/the-slaughter-of-children-in-yemen.html


Processing URLs:  33%|███▎      | 327/1000 [13:27<34:56,  3.12s/it]

Error extracting text from http://www.un.org/News/dh/infocus/Syria/FinalCommuniqueActionGroupforSyria.pdf: 403 Client Error: Forbidden for url: https://www.un.org/News/dh/infocus/Syria/FinalCommuniqueActionGroupforSyria.pdf
URL filtered: http://www.bloomberg.com/news/articles/2015-12-17/majority-on-top-brazil-court-indicates-impeachment-can-proceed


Processing URLs:  33%|███▎      | 330/1000 [13:30<21:25,  1.92s/it]

Error extracting text from https://www.donaldjtrump.com/press-releases/donald-j.-trump-delivers-groundbreaking-contract-for-the-american-vote1: 403 Client Error: Forbidden for url: https://www.donaldjtrump.com/press-releases/donald-j.-trump-delivers-groundbreaking-contract-for-the-american-vote1


Processing URLs:  33%|███▎      | 333/1000 [13:35<16:49,  1.51s/it]

Error extracting text from https://www.fox29.com/news/7-shot-outside-golf-social-in-fishtown-police-say: 403 Client Error: Forbidden for url: https://www.fox29.com/news/7-shot-outside-golf-social-in-fishtown-police-say


Processing URLs:  34%|███▎      | 336/1000 [13:38<11:23,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-opec-iran-idUSKCN0YD0UJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-iran-idUSKCN0YD0UJ


Processing URLs:  34%|███▎      | 337/1000 [13:39<12:29,  1.13s/it]

Error extracting text from https://www.nh.gov/oep/energy/programs/documents/sb191pc-2014-7-8-alliance-automobile-manufacturers.pdf: 404 Client Error: Not Found for url: https://www.nh.gov/oep/energy/programs/documents/sb191pc-2014-7-8-alliance-automobile-manufacturers.pdf


Processing URLs:  34%|███▍      | 339/1000 [13:41<11:09,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN18X2JP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN18X2JP


Processing URLs:  34%|███▍      | 342/1000 [13:44<10:10,  1.08it/s]

Error extracting text from http://www.tradingeconomics.com/dominican-republic/gdp-growth-annual: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/dominican-republic/gdp-growth-annual


Processing URLs:  34%|███▍      | 343/1000 [13:44<08:41,  1.26it/s]

Error extracting text from http://goodjudgment.com/wp-content/uploads/2022/03/1570-Post-Mortem.pdf: 403 Client Error: Forbidden for url: http://goodjudgment.com/wp-content/uploads/2022/03/1570-Post-Mortem.pdf


Processing URLs:  35%|███▍      | 346/1000 [13:45<05:15,  2.07it/s]

Error extracting text from http://www.wsj.com/articles/the-european-union-shows-poland-why-we-have-brexit-1467747768: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-european-union-shows-poland-why-we-have-brexit-1467747768
Error extracting text from http://www.nytimes.com/2015/09/26/us/boehner-will-resign-from-congress.html?smid=tw-share&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/26/us/boehner-will-resign-from-congress.html?smid=tw-share&amp;_r=0


Processing URLs:  35%|███▍      | 348/1000 [13:49<10:03,  1.08it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-24/u-s-rating-under-mnuchin-debt-prioritization-is-topic-of-debate


Processing URLs:  35%|███▌      | 352/1000 [13:52<09:19,  1.16it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/apr/27/donald-trump-carly-fiorinas-not-going-do-trick/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/apr/27/donald-trump-carly-fiorinas-not-going-do-trick/


Processing URLs:  36%|███▌      | 355/1000 [13:59<20:20,  1.89s/it]

Error extracting text from https://www.cambodiadaily.com/news/japanese-chinese-navies-to-visit-cambodia-days-apart-108270/: 404 Client Error: Not Found for url: https://www.cambodiadaily.com/news/japanese-chinese-navies-to-visit-cambodia-days-apart-108270/


Processing URLs:  36%|███▌      | 357/1000 [14:03<19:55,  1.86s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-02/un-climate-chief-s-request-to-meet-tillerson-goes-unanswered


Processing URLs:  36%|███▌      | 360/1000 [14:05<11:05,  1.04s/it]

Error extracting text from https://www.wsj.com/livecoverage/trump-impeachment-house-biden: 403 Client Error: Forbidden for url: https://www.wsj.com/livecoverage/trump-impeachment-house-biden


Processing URLs:  36%|███▋      | 364/1000 [14:09<09:19,  1.14it/s]

URL filtered: https://twitter.com/katyafimava/status/1455933648226639883
URL filtered: https://www.bloomberg.com/news/articles/2017-11-06/apple-said-to-sell-debt-to-help-fund-300-billion-capital-return


Processing URLs:  36%|███▋      | 365/1000 [14:09<08:17,  1.28it/s]

Error extracting text from http://greece.greekreporter.com/2015/11/04/szhulz-recognizes-greek-pms-request-for-milder-bailout-demands-amid-refugee-crisis/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2015/11/04/szhulz-recognizes-greek-pms-request-for-milder-bailout-demands-amid-refugee-crisis/


Processing URLs:  37%|███▋      | 367/1000 [14:13<13:40,  1.30s/it]

Error extracting text from https://www.cnbc.com/2017/09/22/senator-john-mccain-says-he-cannot-support-graham-cassidy-obamacare-repeal-bill.html: 503 Server Error: Service Unavailable for url: https://www.cnbc.com/2017/09/22/senator-john-mccain-says-he-cannot-support-graham-cassidy-obamacare-repeal-bill.html
URL filtered: https://www.youtube.com/watch?v=jshk8ZVIgdI


Processing URLs:  37%|███▋      | 370/1000 [14:14<08:19,  1.26it/s]

Error extracting text from https://www.iranhumanrights.org/2016/03/minoo-khaleghi-disqualified/: 403 Client Error: Forbidden for url: https://www.iranhumanrights.org/2016/03/minoo-khaleghi-disqualified/
Error extracting text from http://www.latimes.com/world/middleeast/la-fg-cia-pentagon-isis-20160327-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-cia-pentagon-isis-20160327-story.html


Processing URLs:  37%|███▋      | 372/1000 [14:16<08:50,  1.18it/s]

Error extracting text from https://www.reuters.com/article/us-china-singapore-trade-idUSKBN1930RR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-singapore-trade-idUSKBN1930RR


Processing URLs:  38%|███▊      | 376/1000 [14:25<17:27,  1.68s/it]

Error extracting text from https://www.nytimes.com/2021/05/19/business/economy/fed-april-2021-meeting-minutes.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/19/business/economy/fed-april-2021-meeting-minutes.html


Processing URLs:  38%|███▊      | 377/1000 [14:25<13:01,  1.25s/it]

Error extracting text from https://www.wsj.com/articles/senate-waits-for-minimum-wage-ruling-from-parliamentarian-11614266273: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/senate-waits-for-minimum-wage-ruling-from-parliamentarian-11614266273


Processing URLs:  38%|███▊      | 382/1000 [14:30<09:37,  1.07it/s]

Error extracting text from http://www.nytimes.com/2016/08/06/world/africa/south-africa-election-anc.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/06/world/africa/south-africa-election-anc.html?_r=0


Processing URLs:  38%|███▊      | 384/1000 [14:32<07:42,  1.33it/s]

Error extracting text from https://www.nytimes.com/2021/07/16/upshot/student-loans-biden.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/16/upshot/student-loans-biden.html


Processing URLs:  39%|███▉      | 393/1000 [14:53<23:13,  2.30s/it]

Error extracting text from https://syriancivilwarmap.com/: 404 Client Error: Not Found for url: https://syriancivilwarmap.com/
Error extracting text from http://www.straitstimes.com/business/economy/draghi-a-rabbit-the-euro-and-the-magic-trick-traders-are-hoping-for: 403 Client Error: Forbidden for url: https://www.straitstimes.com/business/economy/draghi-a-rabbit-the-euro-and-the-magic-trick-traders-are-hoping-for


Processing URLs:  40%|███▉      | 396/1000 [15:09<35:55,  3.57s/it]

Error extracting text from https://www.nytimes.com/2021/07/09/world/asia/taliban-kandahar-afghanistan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/09/world/asia/taliban-kandahar-afghanistan.html


Processing URLs:  40%|███▉      | 397/1000 [16:10<3:26:43, 20.57s/it]

Error extracting text from https://archive.is/bvcJy: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  40%|███▉      | 399/1000 [16:13<1:48:30, 10.83s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/314378-trump-team-trump-is-not-meeting-with-putin-in-first-foreign: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/314378-trump-team-trump-is-not-meeting-with-putin-in-first-foreign/


Processing URLs:  40%|████      | 401/1000 [16:17<1:03:42,  6.38s/it]

Error extracting text from http://www.cedem.me/me/?jezik=eng: 404 Client Error: Not Found for url: http://www.cedem.me/me/?jezik=eng


Processing URLs:  41%|████      | 406/1000 [16:25<18:41,  1.89s/it]  

Error extracting text from https://www.nytimes.com/2017/08/21/upshot/the-showdown-over-how-we-define-fringe-views-in-america.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/21/upshot/the-showdown-over-how-we-define-fringe-views-in-america.html
Error extracting text from http://www.wsj.com/articles/eu-gives-poland-three-months-to-resolve-court-crisis-1469630743: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-gives-poland-three-months-to-resolve-court-crisis-1469630743


Processing URLs:  41%|████      | 411/1000 [16:33<17:09,  1.75s/it]

Error extracting text from http://www.businessinsider.com/r-myanmar-army-chief-endorses-election-of-suu-kyis-president-2016-3: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-myanmar-army-chief-endorses-election-of-suu-kyis-president-2016-3


Processing URLs:  42%|████▏     | 419/1000 [16:47<13:33,  1.40s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/report-china-could-make-big-move-the-south-china-starting-17350: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/report-china-could-make-big-move-the-south-china-starting-17350


Processing URLs:  42%|████▎     | 425/1000 [17:01<10:46,  1.12s/it]

Error extracting text from http://www.arabianbusiness.com/lebanon-s-hezbollah-takes-aim-at-saudi-arabia-on-ashura-609913.html#.ViyTYyhOrvU: 403 Client Error: HTTP Forbidden for url: https://www.arabianbusiness.com/lebanon-s-hezbollah-takes-aim-at-saudi-arabia-on-ashura-609913.html#.ViyTYyhOrvU
Error extracting text from http://www.reuters.com/article/us-northkorea-missile-analysis-idUSKCN0UQ0CC20160112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-analysis-idUSKCN0UQ0CC20160112


Processing URLs:  43%|████▎     | 427/1000 [17:12<28:17,  2.96s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-pentagon-idUSKCN115254: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-pentagon-idUSKCN115254


Processing URLs:  43%|████▎     | 434/1000 [17:29<19:16,  2.04s/it]

Error extracting text from https://theconversation.com/inside-the-tory-rebellion-against-foreign-aid-cuts-162373: 403 Client Error: Forbidden for url: https://theconversation.com/inside-the-tory-rebellion-against-foreign-aid-cuts-162373


Processing URLs:  44%|████▎     | 435/1000 [17:30<15:28,  1.64s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/energy-environment/257510-momentum-for-us-oil-exports-builds-in-colorado: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/energy-environment/257510-momentum-for-us-oil-exports-builds-in-colorado/


Processing URLs:  44%|████▎     | 436/1000 [17:33<19:09,  2.04s/it]



Processing URLs:  44%|████▍     | 439/1000 [17:38<16:40,  1.78s/it]

Error extracting text from https://af.reuters.com/article/worldNews/idAFKCN2263E0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  44%|████▍     | 440/1000 [17:40<18:31,  1.99s/it]

URL filtered: https://www.bloomberg.com/news/articles/2015-12-11/exxon-names-refining-head-woods-heir-apparent-to-rex-tillerson


Processing URLs:  44%|████▍     | 442/1000 [17:42<14:06,  1.52s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720801/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720801/


Processing URLs:  45%|████▍     | 449/1000 [17:54<16:16,  1.77s/it]

Error extracting text from http://www.militarytimes.com/articles/the-isis-war-has-a-new-commander-and-isis-may-be-the-least-of-his-worries: 404 Client Error: Not Found for url: https://www.militarytimes.com/isiswar/


Processing URLs:  45%|████▌     | 452/1000 [17:58<13:06,  1.44s/it]

Error extracting text from https://www.humboldtforum.org/de/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/de/
URL filtered: https://twitter.com/thebudgetguy


Processing URLs:  46%|████▌     | 456/1000 [18:02<09:49,  1.08s/it]

Error extracting text from http://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea


Processing URLs:  46%|████▌     | 457/1000 [18:04<10:26,  1.15s/it]

Error extracting text from https://goo.gl/M0J2a4: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/abbas-confidant-we-will-take-hundreds-of-idf-soldiers-to-icc-this-year/
Error extracting text from https://www.predictit.org/markets/detail/4365/Which-party-will-control-the-House-after-2020-election: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/4365/Which-party-will-control-the-House-after-2020-election


Processing URLs:  46%|████▌     | 460/1000 [18:04<05:38,  1.59it/s]

Error extracting text from http://allafrica.com/stories/201607200805.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607200805.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3023deab0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.basnews.com/index.php/en/news/kurdistan/268832: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/kurdistan/268832


Processing URLs:  46%|████▋     | 463/1000 [18:07<06:39,  1.34it/s]

Error extracting text from http://economicpolicy.oxfordjournals.org/content/economicpolicy/28/75/513.full.pdf: 403 Client Error: Forbidden for url: http://economicpolicy.oxfordjournals.org/content/economicpolicy/28/75/513.full.pdf


Processing URLs:  46%|████▋     | 464/1000 [18:08<08:41,  1.03it/s]

Error extracting text from https://theicct.org/sites/default/files/publications/China-NEV-mandate_ICCT-policy-update_20032018_vF-updated.pdf: 403 Client Error: Forbidden for url: https://theicct.org/sites/default/files/publications/China-NEV-mandate_ICCT-policy-update_20032018_vF-updated.pdf


Processing URLs:  47%|████▋     | 466/1000 [18:09<05:58,  1.49it/s]

Error extracting text from http://www.reuters.com/article/2015/12/03/opec-meeting-idUSL1N13S3AK20151203#Wuz8HIiul6R2gQkj.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/03/opec-meeting-idUSL1N13S3AK20151203#Wuz8HIiul6R2gQkj.97


Processing URLs:  48%|████▊     | 478/1000 [18:43<14:40,  1.69s/it]

URL filtered: https://www.bloomberg.com/news/articles/2022-03-09/germany-is-stalling-eu-efforts-to-broaden-russia-s-swift-ban


Processing URLs:  48%|████▊     | 480/1000 [18:46<13:23,  1.55s/it]

Error extracting text from https://www.elisascience.org/whitepaper/: 404 Client Error: Not Found for url: https://www.elisascience.org/whitepaper/


Processing URLs:  48%|████▊     | 484/1000 [19:00<17:05,  1.99s/it]

Error extracting text from http://www.nytimes.com/2016/11/02/world/middleeast/iraq-mosul-isis.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/02/world/middleeast/iraq-mosul-isis.html?_r=0


Processing URLs:  50%|████▉     | 495/1000 [19:21<09:45,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=OY0COX0gcyw
Error extracting text from https://www.nasdaq.com/articles/uber-ceo-says-company-to-consider-crypto-for-rides-not-its-balance-sheet-2021-02-11: 403 Client Error: Forbidden for url: https://www.nasdaq.com/articles/uber-ceo-says-company-to-consider-crypto-for-rides-not-its-balance-sheet-2021-02-11


Processing URLs:  50%|████▉     | 496/1000 [19:22<09:17,  1.11s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=54261#.V25-wzXQJPY: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=54261#.V25-wzXQJPY


Processing URLs:  50%|████▉     | 497/1000 [19:22<07:22,  1.14it/s]

Error extracting text from http://www.reuters.com/article/2015/11/28/us-northkorea-missile-idUSKBN0TH09M20151128: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/28/us-northkorea-missile-idUSKBN0TH09M20151128


Processing URLs:  50%|████▉     | 499/1000 [19:25<09:47,  1.17s/it]

URL filtered: https://twitter.com/ECMOKaragianni1/status/1379754846702743553


Processing URLs:  51%|█████     | 507/1000 [19:46<16:25,  2.00s/it]

Error extracting text from http://www.boston.com/news/politics/2016/01/04/dialed-down-bill-clinton-returns-new-hampshire-the-proud-husband/m92pF9XAxZTrO2bPSel3tJ/story.html: 404 Client Error: Not Found for url: https://www.boston.com/news/politics/2016/01/04/dialed-down-bill-clinton-returns-new-hampshire-the-proud-husband/m92pF9XAxZTrO2bPSel3tJ/story.html
URL filtered: http://www.bloomberg.com/news/articles/2016-09-12/mayor-s-investigation-shakes-up-china-s-political-chessboard


Processing URLs:  51%|█████     | 510/1000 [19:49<11:27,  1.40s/it]

Error extracting text from http://nationalinterest.org/feature/the-right-way-sanction-cyber-threats-13975: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/the-right-way-sanction-cyber-threats-13975
URL filtered: https://www.bloomberg.com/news/articles/2017-12-11/may-s-fragile-truce-tested-as-brexit-pledges-start-to-unravel


Processing URLs:  51%|█████     | 512/1000 [19:51<10:04,  1.24s/it]

Error extracting text from https://reut.rs/3nC4rSc: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/morgan-stanley-sees-15-chance-scottish-independence-uk-2021-04-28/


Processing URLs:  52%|█████▏    | 515/1000 [19:55<10:29,  1.30s/it]

URL filtered: https://twitter.com/IpsosMORIScot/status/


Processing URLs:  52%|█████▏    | 517/1000 [19:55<06:27,  1.25it/s]

Error extracting text from https://www.timesofisrael.com/idf-strikes-gaza-terror-targets-after-arson-attacks-in-1st-raids-since-ceasefire/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/idf-strikes-gaza-terror-targets-after-arson-attacks-in-1st-raids-since-ceasefire/


Processing URLs:  52%|█████▏    | 518/1000 [19:56<05:30,  1.46it/s]

Error extracting text from http://www.realcleardefense.com/articles/2016/10/17/the_battle_for_mosul_begins_110216.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/10/17/the_battle_for_mosul_begins_110216.html


Processing URLs:  52%|█████▏    | 519/1000 [19:59<11:36,  1.45s/it]

Error extracting text from http://nation.com.pk/international/30-Mar-2016/italy-rescues-over-1-500-migrants-in-strait-of-sicily: 503 Server Error: Backend fetch failed for url: https://www.nation.com.pk/international/30-Mar-2016/italy-rescues-over-1-500-migrants-in-strait-of-sicily


Processing URLs:  52%|█████▎    | 525/1000 [20:10<10:22,  1.31s/it]

Error extracting text from http://cbbss.org/?p=7074: 403 Client Error: Forbidden for url: http://cbbss.org/?p=7074
Error extracting text from https://balkaninsight.com/2021/03/01/north-macedonia-launches-diaspora-headcount-as-boycott-calls-grow/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/03/01/north-macedonia-launches-diaspora-headcount-as-boycott-calls-grow/


Processing URLs:  53%|█████▎    | 530/1000 [20:15<06:53,  1.14it/s]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.valor.com.br/politica/4465150/oposicionistas-dizem-que-abreviar-mandato-de-dilma-e-inevitavel&amp;usg=ALkJrhisE6yRofi50XlmwC6wPc1hf8Jlww: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.valor.com.br/politica/4465150/oposicionistas-dizem-que-abreviar-mandato-de-dilma-e-inevitavel&amp;usg=ALkJrhisE6yRofi50XlmwC6wPc1hf8Jlww


Processing URLs:  53%|█████▎    | 532/1000 [20:19<10:00,  1.28s/it]

Error extracting text from https://www.nytimes.com/2017/09/06/world/asia/north-korea-putin-oil-embargo.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/06/world/asia/north-korea-putin-oil-embargo.html


Processing URLs:  54%|█████▎    | 536/1000 [20:24<08:13,  1.06s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2021/02/aleksei-navalny-prisoner-of-conscience/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2021/02/aleksei-navalny-prisoner-of-conscience/


Processing URLs:  54%|█████▍    | 538/1000 [20:26<06:37,  1.16it/s]

Error extracting text from https://www.reuters.com/article/us-russia-politics-navalny/putins-approval-rating-holds-steady-despite-navalny-crackdown-poll-idUSKBN2A429G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-politics-navalny/putins-approval-rating-holds-steady-despite-navalny-crackdown-poll-idUSKBN2A429G


Processing URLs:  54%|█████▍    | 541/1000 [20:28<05:20,  1.43it/s]

URL filtered: https://twitter.com/ianbremmer/status/1427708407272857603
Error extracting text from https://news.yahoo.com/us-general-air-force-keep-flying-over-south-090103766.html: 404 Client Error: Not Found for url: https://news.yahoo.com/us-general-air-force-keep-flying-over-south-090103766.html


Processing URLs:  54%|█████▍    | 543/1000 [20:31<08:14,  1.08s/it]

Error extracting text from http://af.reuters.com/article/guineaBissauNews/idAFL1N1A00T9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  54%|█████▍    | 544/1000 [20:32<07:20,  1.04it/s]

Error extracting text from https://www.americanbar.org/content/dam/aba/migrated/intelprop/magazine/LandslideJan2010_Hofer.authcheckdam.pdf: 403 Client Error: Forbidden for url: https://www.americanbar.org/content/dam/aba/migrated/intelprop/magazine/LandslideJan2010_Hofer.authcheckdam.pdf


Processing URLs:  55%|█████▍    | 546/1000 [20:37<12:17,  1.62s/it]

URL filtered: https://www.youtube.com/watch?v=jv9sDn_2XkI


Processing URLs:  55%|█████▌    | 553/1000 [20:50<18:12,  2.44s/it]

Error extracting text from http://morungexpress.com/myanmar-army-chief-to-get-five-year-extension-as-talks-with-suu-kyi-continue-media/: 404 Client Error: Not Found for url: https://morungexpress.com/myanmar-army-chief-to-get-five-year-extension-as-talks-with-suu-kyi-continue-media


Processing URLs:  55%|█████▌    | 554/1000 [20:50<13:28,  1.81s/it]

Error extracting text from http://www.nytimes.com/2015/10/17/opinion/what-iran-fears-from-reporters-like-jason-rezaian-and-me.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/17/opinion/what-iran-fears-from-reporters-like-jason-rezaian-and-me.html


Processing URLs:  56%|█████▌    | 556/1000 [21:00<22:10,  3.00s/it]

Error extracting text from http://www.wsj.com/articles/house-votes-to-reauthorize-u-s-export-import-bank-1445986019: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-votes-to-reauthorize-u-s-export-import-bank-1445986019


Processing URLs:  56%|█████▌    | 557/1000 [21:02<20:20,  2.75s/it]

URL filtered: https://www.youtube.com/watch?v=gK2z5-cceP4


Processing URLs:  56%|█████▌    | 559/1000 [21:03<11:28,  1.56s/it]

Error extracting text from https://www.reuters.com/article/us-amazon-com-labor/amazon-seeks-to-halt-union-election-at-alabama-warehouse-idUSKBN29R23P?utm_source=morning_brew: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-amazon-com-labor/amazon-seeks-to-halt-union-election-at-alabama-warehouse-idUSKBN29R23P?utm_source=morning_brew


Processing URLs:  57%|█████▋    | 568/1000 [21:12<05:56,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-khamenei-idUSKCN0WW0PT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-khamenei-idUSKCN0WW0PT


Processing URLs:  57%|█████▋    | 570/1000 [22:15<2:16:32, 19.05s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-08-12/china-prepared-to-recognize-taliban-if-kabul-falls-sources-say-undermining-us-threats: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  57%|█████▋    | 573/1000 [22:22<56:02,  7.88s/it]  

Error extracting text from http://www.channelnewsasia.com/news/business/panama-to-open-enlarged/2631954.html?cx_tag=undefined&amp;cid=tg:recos:undefined:standard#cxrecs_s: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/panama-to-open-enlarged/2631954.html?cx_tag=undefined&amp;cid=tg:recos:undefined:standard#cxrecs_s


Processing URLs:  58%|█████▊    | 577/1000 [22:43<43:36,  6.19s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/chinese-navy-holds-live-fire-drills-in-south-china-sea/2016/07/09/ebed14dc-4634-11e6-a76d-3550dba926ac_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/chinese-navy-holds-live-fire-drills-in-south-china-sea/2016/07/09/ebed14dc-4634-11e6-a76d-3550dba926ac_story.html


Processing URLs:  58%|█████▊    | 578/1000 [22:44<32:30,  4.62s/it]

Error extracting text from http://time.com/4871110/paul-ryan-robert-mueller-investigation-trump-russia/: 404 Client Error: Not Found for url: https://time.com/4871110/paul-ryan-robert-mueller-investigation-trump-russia/


Processing URLs:  58%|█████▊    | 579/1000 [22:45<25:41,  3.66s/it]

Error extracting text from https://www.stripes.com/news/marines-help-afghan-forces-clear-insurgents-from-helmand-district-1.485143#.WahfhDMfkdU: 404 Client Error: Not Found for url: https://www.stripes.com/news/marines-help-afghan-forces-clear-insurgents-from-helmand-district-1.485143#.WahfhDMfkdU


Processing URLs:  58%|█████▊    | 581/1000 [22:48<17:09,  2.46s/it]

Error extracting text from http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.1.117: 403 Client Error: Forbidden for url: https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.30.1.117


Processing URLs:  58%|█████▊    | 583/1000 [22:50<11:09,  1.60s/it]

Error extracting text from http://www.emergingmarkets.org/Article/3544434/Financial-Markets/China-faces-Catch-22-dilemma-over-Venezuela-debt-pile-as-default-looms.html: 404 Client Error: Not Found for url: http://www.emergingmarkets.org/Article/3544434/Financial-Markets/China-faces-Catch-22-dilemma-over-Venezuela-debt-pile-as-default-looms.html
Error extracting text from http://www.foreign.senate.gov/press/chair/release/corker-iran-testing-will-of-us-and-international-community-after-second-straight-day-of-ballistic-missile-launches: 403 Client Error: Forbidden for url: http://www.foreign.senate.gov/press/chair/release/corker-iran-testing-will-of-us-and-international-community-after-second-straight-day-of-ballistic-missile-launches
URL filtered: https://www.bloomberg.com/news/articles/2017-06-19/qatar-reminds-gulf-critics-of-9-11-as-crisis-enters-third-week


Processing URLs:  59%|█████▊    | 587/1000 [22:58<11:55,  1.73s/it]

Error extracting text from https://goodjudgment.io/economist/: 404 Client Error: Not Found for url: https://goodjudgment.io/economist/


Processing URLs:  59%|█████▉    | 588/1000 [22:58<09:04,  1.32s/it]

Error extracting text from https://olympics.com/tokyo-2020/en/: 403 Client Error: Forbidden for url: https://olympics.com/tokyo-2020/en/


Processing URLs:  60%|█████▉    | 595/1000 [23:19<13:06,  1.94s/it]

Error extracting text from http://thehill.com/business-a-lobbying/business-a-lobbying/308764-transition-official-trump-will-not-rip-up-nafta: 403 Client Error: Forbidden for url: https://thehill.com/business-a-lobbying/business-a-lobbying/308764-transition-official-trump-will-not-rip-up-nafta/


Processing URLs:  60%|██████    | 602/1000 [23:48<19:13,  2.90s/it]

Error extracting text from http://www.balkaninsight.com/en/article/podgorica-protesters-give-pm-six-days-to-resign-10-18-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/podgorica-protesters-give-pm-six-days-to-resign-10-18-2015
Error extracting text from http://english.aawsat.com/2016/08/article55355740/ninevehs-nujaifi-iran-seeks-provoke-arab-kurdish-strife: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/08/article55355740/ninevehs-nujaifi-iran-seeks-provoke-arab-kurdish-strife


Processing URLs:  60%|██████    | 603/1000 [23:48<14:25,  2.18s/it]

Error extracting text from http://business.financialpost.com/news/economy/outraged-mexicans-to-donald-trump-go-ahead-and-tear-up-nafta-were-sick-of-it-too: 403 Client Error: Forbidden for url: https://financialpost.com/news/economy/outraged-mexicans-to-donald-trump-go-ahead-and-tear-up-nafta-were-sick-of-it-too


Processing URLs:  60%|██████    | 605/1000 [23:52<14:09,  2.15s/it]

Error extracting text from http://bangladeshchronicle.net/2016/08/bangladesh-ranked-6th-riskiest-country-to-do-business/: 404 Client Error: Not Found for url: https://bangladeshchronicle.net/2016/08/bangladesh-ranked-6th-riskiest-country-to-do-business/


Processing URLs:  61%|██████    | 612/1000 [25:01<2:04:12, 19.21s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/national-politics/article186577863.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  61%|██████▏   | 614/1000 [25:03<1:03:32,  9.88s/it]

Error extracting text from http://www.careerbuilder.com/jobseeker/jobs/jobdetails.aspx?showNewJDP=yes&amp;job_did=J8Q84C68C7G4SW3J9MX: 403 Client Error: Forbidden for url: http://www.careerbuilder.com/regional_sites


Processing URLs:  62%|██████▏   | 615/1000 [25:03<44:46,  6.98s/it]  

Error extracting text from https://www.jstor.org/stable/723341?seq=1#page_scan_tab_contents: 420 Client Error: Enhance Your Calm for url: https://www.jstor.org/stable/723341?seq=1#page_scan_tab_contents


Processing URLs:  62%|██████▏   | 617/1000 [25:08<28:50,  4.52s/it]

Error extracting text from https://www.armed-services.senate.gov/imo/media/doc/Mattis_06-13-17.pdf: 403 Client Error: Forbidden for url: https://www.armed-services.senate.gov/imo/media/doc/Mattis_06-13-17.pdf


Processing URLs:  62%|██████▏   | 620/1000 [25:14<19:04,  3.01s/it]

Error extracting text from https://www.dea.gov/divisions/hq/2016/hq062716_attach.pdf: 404 Client Error: Not Found for url: https://www.dea.gov/divisions/hq/2016/hq062716_attach.pdf


Processing URLs:  62%|██████▏   | 621/1000 [25:14<14:10,  2.24s/it]

Error extracting text from http://www.business-anti-corruption.com/media/4000093/EU_montenegro_2013.pdf: 404 Client Error: Not Found for url: https://www.ganintegrity.com/media/4000093/EU_montenegro_2013.pdf


Processing URLs:  62%|██████▏   | 622/1000 [25:17<15:17,  2.43s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-economy/venezuela-central-bank-has-2-billion-cash-to-pay-2017-debt-report-idUSKCN1AX2EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy/venezuela-central-bank-has-2-billion-cash-to-pay-2017-debt-report-idUSKCN1AX2EH


Processing URLs:  63%|██████▎   | 630/1000 [25:27<09:09,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN1572UA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN1572UA


Processing URLs:  64%|██████▎   | 636/1000 [25:35<06:48,  1.12s/it]

Error extracting text from http://www.nytimes.com/2016/06/26/magazine/will-trump-swallow-the-gop-whole.html?action=click&amp;contentCollection=Politics&amp;module=Trending&amp;version=Full&amp;region=Marginalia&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/26/magazine/will-trump-swallow-the-gop-whole.html?action=click&amp;contentCollection=Politics&amp;module=Trending&amp;version=Full&amp;region=Marginalia&amp;pgtype=article


Processing URLs:  64%|██████▍   | 640/1000 [25:44<11:23,  1.90s/it]

Error extracting text from http://www.asahi.com/ajw/articles/AJ201605190028.html: 404 Client Error: Not Found for url: https://www.asahi.com/ajw/articles/AJ201605190028.html


Processing URLs:  64%|██████▍   | 644/1000 [25:54<12:35,  2.12s/it]

Error extracting text from https://aibirds.org/man-vs-machine-challenge/results.html: HTTPSConnectionPool(host='aibirds.org', port=443): Max retries exceeded with url: /man-vs-machine-challenge/results.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))
URL filtered: http://www.bloombergview.com/articles/2015-10-14/joe-biden-wins-race-to-be-clinton-s-understudy-after-debate


Processing URLs:  65%|██████▍   | 646/1000 [25:56<09:28,  1.61s/it]

Error extracting text from http://cherna.gora.me/news/garcevic-hungary-will-ratify-the-protocol-on-13-june/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/garcevic-hungary-will-ratify-the-protocol-on-13-june/


Processing URLs:  65%|██████▍   | 648/1000 [26:00<11:00,  1.88s/it]

Error extracting text from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.4797&amp;rep=rep1&amp;type=pdf: 401 Client Error: Unauthorized for url: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.4797&amp;rep=rep1&amp;type=pdf


Processing URLs:  65%|██████▌   | 650/1000 [26:12<19:50,  3.40s/it]



Processing URLs:  66%|██████▌   | 657/1000 [26:19<07:15,  1.27s/it]

Error extracting text from https://www.wsj.com/articles/europes-periphery-debt-market-welcomes-new-member-france-1487604548?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/europes-periphery-debt-market-welcomes-new-member-france-1487604548?mod=e2fb


Processing URLs:  66%|██████▋   | 664/1000 [26:35<08:27,  1.51s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VpFqc_krLIV: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/04/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/#.VpFqc_krLIV


Processing URLs:  66%|██████▋   | 665/1000 [26:37<08:26,  1.51s/it]

URL filtered: https://www.washingtonpost.com/news/the-switch/wp/2017/11/01/how-russian-trolls-got-into-your-facebook-feed/?utm_term=.97ee6e849bf5


Processing URLs:  67%|██████▋   | 669/1000 [26:42<08:14,  1.49s/it]

URL filtered: https://www.cnbc.com/2017/09/29/russian-facebook-ads-how-many-people-could-you-reach-with-100000.html


Processing URLs:  68%|██████▊   | 675/1000 [26:54<09:38,  1.78s/it]

Error extracting text from http://news.sky.com/story/1604344/hopes-of-syria-solution-after-saudi-talks: 404 Client Error: Not Found for url: https://news.sky.com/story/1604344/hopes-of-syria-solution-after-saudi-talks


Processing URLs:  68%|██████▊   | 677/1000 [26:55<07:13,  1.34s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/52931872.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/52931872.cms
URL filtered: https://www.bloomberg.com/gadfly/articles/2017-11-05/opec-is-already-thinking-about-70-oil


Processing URLs:  68%|██████▊   | 681/1000 [26:58<04:39,  1.14it/s]

Error extracting text from https://www.reuters.com/article/us-usa-tax/trump-signs-tax-government-spending-bills-into-law-idUSKBN1EG1T1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax/trump-signs-tax-government-spending-bills-into-law-idUSKBN1EG1T1


Processing URLs:  69%|██████▊   | 687/1000 [27:06<04:46,  1.09it/s]

Error extracting text from http://europe.newsweek.com/cambodia-blocks-statement-south-china-sea-ruling-483492: 403 Client Error: Forbidden for url: https://www.newsweek.com/cambodia-blocks-statement-south-china-sea-ruling-483492
Error extracting text from https://stopagitprop.com/2016/07/23/kgb-active-measures-and-russian-hybrid-warfare-a-brief-comparison/: HTTPSConnectionPool(host='stopagitprop.com', port=443): Max retries exceeded with url: /2016/07/23/kgb-active-measures-and-russian-hybrid-warfare-a-brief-comparison/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2feaf2060>: Failed to resolve 'stopagitprop.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/06/04/world/asia/us-sanctions-expected-to-hit-small-banks-business-with-north-korea.html?ref=asia: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/04/world/asia/us-sanctions-expected-to-hit-small-banks-business

Processing URLs:  69%|██████▉   | 689/1000 [27:09<05:42,  1.10s/it]

Error extracting text from http://nationalinterest.org/feature/washington-must-remedy-colombias-flawed-farc-deal-15108: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/washington-must-remedy-colombias-flawed-farc-deal-15108


Processing URLs:  69%|██████▉   | 694/1000 [27:18<07:40,  1.51s/it]

Error extracting text from https://www.bishopranch.com/media-coverage/driverless-shuttles-coming-east-bay-tested/: 403 Client Error: Forbidden for url: https://www.bishopranch.com/media-coverage/driverless-shuttles-coming-east-bay-tested/


Processing URLs:  70%|██████▉   | 695/1000 [27:19<07:26,  1.46s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/donald-trump-barrels-ahead-with-plan-to-gut-obamacare/articleshow/57610820.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/donald-trump-barrels-ahead-with-plan-to-gut-obamacare/articleshow/57610820.cms


Processing URLs:  70%|██████▉   | 699/1000 [27:26<07:03,  1.41s/it]

Error extracting text from http://nationalinterest.org/feature/the-long-road-fallujah-mosul-15018: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/the-long-road-fallujah-mosul-15018


Processing URLs:  71%|███████   | 706/1000 [27:33<04:14,  1.15it/s]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-law/israel-pushes-on-with-law-seen-protecting-pm-under-criminal-probe-idUSKBN1DR2JD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-law/israel-pushes-on-with-law-seen-protecting-pm-under-criminal-probe-idUSKBN1DR2JD


Processing URLs:  71%|███████   | 708/1000 [27:36<06:20,  1.30s/it]

Error extracting text from https://www.reuters.com/business/energy/oil-prices-slip-drop-chinese-crude-imports-rings-alarm-bells-demand-2021-07-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-prices-slip-drop-chinese-crude-imports-rings-alarm-bells-demand-2021-07-14/


Processing URLs:  71%|███████   | 710/1000 [27:37<04:19,  1.12it/s]

Error extracting text from http://autoweek.com/article/vw-diesel-scandal/report-vw-settlement-us-regulators-unlikely-end-march: 403 Client Error: Forbidden for url: http://autoweek.com/article/vw-diesel-scandal/report-vw-settlement-us-regulators-unlikely-end-march


Processing URLs:  71%|███████   | 711/1000 [27:40<06:18,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-testsite-idUSKCN0XX2CV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-testsite-idUSKCN0XX2CV


Processing URLs:  71%|███████▏  | 714/1000 [27:49<12:39,  2.66s/it]

Error extracting text from http://www.buenosairesherald.com/article/200112/rousseff-reshuffles-shrinks-cabinet-: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/200112/rousseff-reshuffles-shrinks-cabinet-


Processing URLs:  72%|███████▏  | 715/1000 [29:26<2:02:52, 25.87s/it]

Error extracting text from http://en.mehrnews.com/news/122100/Iran-Greece-stress-political-will-for-expanding-ties: HTTPSConnectionPool(host='en.mehrnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 717/1000 [29:27<1:06:54, 14.19s/it]

Error extracting text from http://www.tradingeconomics.com/china/car-registrations/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/china/car-registrations/forecast


Processing URLs:  72%|███████▏  | 718/1000 [29:28<49:11, 10.47s/it]  

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.aperoladomamore.net/lava-jato-e-tse-pode-tirar-dilma-do-poder-antes-do-impeachment-entenda/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.aperoladomamore.net/lava-jato-e-tse-pode-tirar-dilma-do-poder-antes-do-impeachment-entenda/&amp;prev=search


Processing URLs:  72%|███████▏  | 720/1000 [29:30<29:43,  6.37s/it]

Error extracting text from https://www.dfa.ie/news-and-media/press-releases/press-release-archive/2017/november/statement-by-minister-coveney-on-northern-ireland/: 405 Client Error: Not Allowed for url: https://www.gov.ie/en/publications/?q=&sort_by=published_date&type=press_releases&type=general_publications&type=speeches&organisation=department-of-foreign-affairs
URL filtered: http://www.bloomberg.com/news/articles/2016-07-19/turkish-credit-risk-climbs-third-day-as-moody-s-reviews-rating


Processing URLs:  72%|███████▏  | 724/1000 [29:41<18:21,  3.99s/it]

Error extracting text from https://www.freightwaves.com/news/freightwaves-classics-port-of-los-angeles-is-the-nations-busiest: 403 Client Error: Forbidden for url: https://www.freightwaves.com/news/freightwaves-classics-port-of-los-angeles-is-the-nations-busiest


Processing URLs:  72%|███████▎  | 725/1000 [29:42<14:49,  3.23s/it]

Error extracting text from http://seekingalpha.com/article/3535796-tetlock-and-gardner-superforecasting: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3535796-tetlock-and-gardner-superforecasting


Processing URLs:  73%|███████▎  | 729/1000 [29:52<12:27,  2.76s/it]

Error extracting text from http://www.insidesources.com/vietnam-wants-to-be-americas-bridge-to-north-korea/: 403 Client Error: Forbidden for url: https://insidesources.com/vietnam-wants-to-be-americas-bridge-to-north-korea/


Processing URLs:  73%|███████▎  | 730/1000 [29:53<10:45,  2.39s/it]



Processing URLs:  73%|███████▎  | 733/1000 [29:57<06:39,  1.50s/it]

Error extracting text from https://www.google.com/amp/s/finance.yahoo.com/amphtml/news/gold-steadies-near-three-month-014845844.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/amphtml/news/gold-steadies-near-three-month-014845844.html


Processing URLs:  74%|███████▎  | 735/1000 [29:59<05:41,  1.29s/it]

Error extracting text from http://aranews.net/2016/01/17757/: 404 Client Error: Not Found for url: http://aranews.net/2016/01/17757/
URL filtered: http://www.rand.org/blog/2016/04/the-effect-on-south-koreas-neighbors.html?utm_source=linkedin.com&amp;utm_medium=rand_social


Processing URLs:  74%|███████▍  | 740/1000 [30:02<02:38,  1.64it/s]

Error extracting text from http://www.nytimes.com/2015/04/15/world/americas/obama-cuba-remove-from-state-terror-list.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/04/15/world/americas/obama-cuba-remove-from-state-terror-list.html
Error extracting text from http://www.nytimes.com/2016/08/03/opinion/the-case-for-finally-bombing-assad.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/03/opinion/the-case-for-finally-bombing-assad.html


Processing URLs:  74%|███████▍  | 743/1000 [30:05<03:39,  1.17it/s]

Error extracting text from http://www.wsj.com/articles/yemen-rebels-saudi-arabia-begin-peace-talks-to-end-nearly-year-long-war-1457520050: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/yemen-rebels-saudi-arabia-begin-peace-talks-to-end-nearly-year-long-war-1457520050


Processing URLs:  74%|███████▍  | 744/1000 [30:06<03:34,  1.19it/s]

Error extracting text from https://www.psychologytoday.com/blog/the-time-cure/201709/the-dangerous-case-donald-trump: 403 Client Error: Forbidden for url: https://www.psychologytoday.com/blog/the-time-cure/201709/the-dangerous-case-donald-trump


Processing URLs:  75%|███████▍  | 746/1000 [30:07<02:25,  1.74it/s]

Error extracting text from http://www.nytimes.com/aponline/2015/12/15/world/europe/ap-eu-russia-space-station.htm: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/12/15/world/europe/ap-eu-russia-space-station.htm


Processing URLs:  75%|███████▌  | 752/1000 [30:22<08:52,  2.15s/it]

Error extracting text from http://thehill.com/homenews/house/257167-gop-disarray-has-some-republicans-talking-about-dealing-with-dems: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/257167-gop-disarray-has-some-republicans-talking-about-dealing-with-dems/


Processing URLs:  75%|███████▌  | 753/1000 [30:22<06:57,  1.69s/it]

URL filtered: http://www.pressgazette.co.uk/channel-4-news-editor-condemns-tiny-revenue-from-videos-shared-2bn-times-on-facebook/


Processing URLs:  76%|███████▌  | 762/1000 [30:34<04:09,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13V1G1?feedType=RSS&amp;feedName=topNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13V1G1?feedType=RSS&amp;feedName=topNews


Processing URLs:  76%|███████▋  | 764/1000 [30:39<06:45,  1.72s/it]

Error extracting text from https://www.globalpolicy.org/component/content/article/169/36383.html: 404 Client Error: Not Found for url: https://archive.globalpolicy.org/component/content/article/169/36383.html


Processing URLs:  76%|███████▋  | 765/1000 [30:56<24:34,  6.28s/it]

Error extracting text from http://www.treehugger.com/corporate-responsibility/volkswagen-ceo-resigns-over-emission-rigging-scandal.html: 406 Client Error: Not Acceptable for url: https://www.treehugger.com/corporate-responsibility/volkswagen-ceo-resigns-over-emission-rigging-scandal.html


Processing URLs:  77%|███████▋  | 766/1000 [30:56<17:48,  4.56s/it]

Error extracting text from http://thehill.com/homenews/sunday-talk-shows/336271-putin-i-didnt-even-really-talk-to-flynn-at-2015-dinner: 403 Client Error: Forbidden for url: https://thehill.com/homenews/sunday-talk-shows/336271-putin-i-didnt-even-really-talk-to-flynn-at-2015-dinner/


Processing URLs:  77%|███████▋  | 769/1000 [30:59<07:40,  1.99s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/us-britain-raising-tension-over-south-china-sea-chinese-envoy: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/east-asia/us-britain-raising-tension-over-south-china-sea-chinese-envoy
URL filtered: https://twitter.com/mkmissioneu?lang=en


Processing URLs:  77%|███████▋  | 773/1000 [31:02<04:24,  1.16s/it]

Error extracting text from http://tass.ru/en/politics/891681: 404 Client Error: Not Found for url: https://tass.ru/en/politics/891681
Error extracting text from http://www.nytimes.com/2016/09/28/world/asia/afghanistan-corruption-financial-disclosure.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/28/world/asia/afghanistan-corruption-financial-disclosure.html?_r=0


Processing URLs:  78%|███████▊  | 779/1000 [31:19<08:02,  2.18s/it]

Error extracting text from https://www.state.gov/secretary/remarks/2017/12/276770.htm: 404 Client Error: Not Found for url: https://www.state.gov/remarks-secretary-pompeo/


Processing URLs:  78%|███████▊  | 781/1000 [31:23<07:17,  2.00s/it]

Error extracting text from http://www.spectator.co.uk/2016/01/the-truth-about-islamic-state-its-in-crisis/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/01/the-truth-about-islamic-state-its-in-crisis/


Processing URLs:  78%|███████▊  | 782/1000 [31:24<06:22,  1.75s/it]

Error extracting text from http://www.nytimes.com/2010/02/04/world/africa/04bashir.html?scp=8&amp;sq=&amp;st=nyt: 403 Client Error: Forbidden for url: http://www.nytimes.com/2010/02/04/world/africa/04bashir.html?scp=8&amp;sq=&amp;st=nyt


Processing URLs:  79%|███████▊  | 786/1000 [31:29<04:15,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/obama-says-u-s-russia-could-agree-on-syria-cease-fire-deal-1472965367: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/obama-says-u-s-russia-could-agree-on-syria-cease-fire-deal-1472965367


Processing URLs:  79%|███████▉  | 790/1000 [31:34<04:12,  1.20s/it]

Error extracting text from http://www.nytimes.com/2016/06/21/upshot/telling-sign-many-supporters-of-brexit-expect-defeat.htm: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/21/upshot/telling-sign-many-supporters-of-brexit-expect-defeat.htm
Error extracting text from http://news.yahoo.com/eus-juncker-rules-brexit-says-no-plan-b-094925761.html: 404 Client Error: Not Found for url: http://news.yahoo.com/eus-juncker-rules-brexit-says-no-plan-b-094925761.html


Processing URLs:  79%|███████▉  | 793/1000 [31:41<08:02,  2.33s/it]

Error extracting text from http://scitechnation.com/what-china-will-do-if-it-loses-the-south-china-sea-arbitration-ruling/: 436 Client Error:  for url: http://ww16.scitechnation.com/what-china-will-do-if-it-loses-the-south-china-sea-arbitration-ruling/?sub1=20240202-0946-299b-9923-30438908b558


Processing URLs:  80%|███████▉  | 796/1000 [31:44<04:27,  1.31s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/360233-russia-type-information-campaigns-meddled-with-18-nations-elections-in: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/360233-russia-type-information-campaigns-meddled-with-18-nations-elections-in/


Processing URLs:  80%|███████▉  | 797/1000 [32:46<1:06:16, 19.59s/it]

Error extracting text from http://newswire.net/newsroom/news/00090548-cia-russia-is-ready-to-attack-targets-in-syria.html: HTTPSConnectionPool(host='newswire.net', port=443): Read timed out. (read timeout=60)


Processing URLs:  80%|███████▉  | 799/1000 [32:48<33:40, 10.05s/it]  

Error extracting text from https://www.reuters.com/article/us-usa-trump-swalwell-idUSKBN2AX1JP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-swalwell-idUSKBN2AX1JP


Processing URLs:  80%|████████  | 800/1000 [32:49<24:00,  7.20s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/editorials/ct-iran-backlash-nuclear-rouhani-edit-1114-20151112-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/editorials/ct-iran-backlash-nuclear-rouhani-edit-1114-20151112-story.html
Error extracting text from https://www.reuters.com/article/uk-britain-boe-bailey-negative/boes-bailey-says-there-are-a-lot-of-issues-with-negative-rates-idUSKBN29H12Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-boe-bailey-negative/boes-bailey-says-there-are-a-lot-of-issues-with-negative-rates-idUSKBN29H12Z


Processing URLs:  80%|████████  | 802/1000 [32:49<13:10,  3.99s/it]

Error extracting text from http://ndb.int/news.php: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /news.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3050d6780>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  81%|████████  | 808/1000 [32:55<03:54,  1.22s/it]

Error extracting text from https://www.edinburghnews.scotsman.com/news/politics/anas-sarwar-msp-who-favourite-replace-richard-leonards-leader-scottish-labour-party-3101925: 403 Client Error: Forbidden for url: https://www.edinburghnews.scotsman.com/news/politics/anas-sarwar-msp-who-favourite-replace-richard-leonards-leader-scottish-labour-party-3101925


Processing URLs:  81%|████████  | 810/1000 [33:57<59:00, 18.63s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/article169072272.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  81%|████████  | 812/1000 [33:58<30:16,  9.66s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-15/fed-faces-this-checklist-of-hurdles-for-a-december-rate-hike


Processing URLs:  82%|████████▏ | 817/1000 [34:11<10:39,  3.50s/it]

Error extracting text from https://www.confidencial.com.ni/politica/kitty-monterrey-no-hablemos-de-dos-bloques-sino-de-una-opcion-y-esperamos-ser-nosotros/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/politica/kitty-monterrey-no-hablemos-de-dos-bloques-sino-de-una-opcion-y-esperamos-ser-nosotros/
Error extracting text from http://bigstory.ap.org/article/d847c596d6854e50869c987d1ab526ab/fujimori-hit-money-laundering-probe-peru-election: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/d847c596d6854e50869c987d1ab526ab/fujimori-hit-money-laundering-probe-peru-election (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3050d53d0>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 819/1000 [34:14<08:04,  2.68s/it]

Error extracting text from http://en.trend.az/iran/politics/2459660.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2459660.html


Processing URLs:  82%|████████▏ | 820/1000 [35:14<57:50, 19.28s/it]

Error extracting text from http://192.155.192.104/adobe_flashplayer_7.exe: HTTPConnectionPool(host='192.155.192.104', port=80): Max retries exceeded with url: /adobe_flashplayer_7.exe (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3050d7f20>, 'Connection to 192.155.192.104 timed out. (connect timeout=60)'))


Processing URLs:  82%|████████▏ | 824/1000 [35:18<15:55,  5.43s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-ukraine-idUSKBN15T2IY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-ukraine-idUSKBN15T2IY


Processing URLs:  82%|████████▎ | 825/1000 [35:20<12:30,  4.29s/it]

Error extracting text from http://www.newsweek.com/syria-assad-peace-talks-saudi-arabia-isis-nusra-islamic-state-403988: 403 Client Error: Forbidden for url: https://www.newsweek.com/syria-assad-peace-talks-saudi-arabia-isis-nusra-islamic-state-403988


Processing URLs:  83%|████████▎ | 830/1000 [35:38<11:23,  4.02s/it]

Error extracting text from https://www.iiss.org/en/events/shangri%20la%20dialogue/archive/shangri-la-dialogue-2016-4a4b/special-sessions-ff25/session-5-af76: 404 Client Error: Not Found for url: https://www.iiss.org/en/events/shangri%20la%20dialogue/archive/shangri-la-dialogue-2016-4a4b/special-sessions-ff25/session-5-af76


Processing URLs:  83%|████████▎ | 832/1000 [35:42<07:44,  2.77s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZNOUN6KLVR401-1GO6M9KLFAM4J7I07CG9FU3C39


Processing URLs:  84%|████████▎ | 835/1000 [35:44<04:21,  1.58s/it]

Error extracting text from https://www.taoiseach.gov.ie/eng/Historical_Information/The_Constitution/: 405 Client Error: Not Allowed for url: https://www.taoiseach.gov.ie/eng/Historical_Information/The_Constitution/


Processing URLs:  84%|████████▍ | 838/1000 [36:10<16:01,  5.93s/it]

Error extracting text from http://www.focus-fen.net/news/2016/04/06/402834/russian-fm-to-visit-japan-on-april-15.html: HTTPConnectionPool(host='www.focus-fen.net', port=80): Max retries exceeded with url: /news/2016/04/06/402834/russian-fm-to-visit-japan-on-april-15.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff2e7230>: Failed to resolve 'www.focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.oddschecker.com/politics/us-politics/us-republican-primaries/iowa-caucus: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/us-politics/us-republican-primaries/iowa-caucus


Processing URLs:  84%|████████▍ | 842/1000 [36:12<06:08,  2.33s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&amp;s=wcrfpus2&amp;f=4: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  84%|████████▍ | 844/1000 [36:13<03:41,  1.42s/it]

Error extracting text from http://www.latimes.com/world/la-fg-iraq-mosul-20160921-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-iraq-mosul-20160921-snap-story.html


Processing URLs:  85%|████████▍ | 848/1000 [36:27<07:47,  3.07s/it]

Error extracting text from http://icasualties.org/Iraq/Index.aspx: 404 Client Error: Not Found for url: http://icasualties.org/Iraq/Index.aspx


Processing URLs:  85%|████████▌ | 854/1000 [36:37<03:33,  1.47s/it]

Error extracting text from http://www.reuters.com/article/us-usa-economy-atlantafed-idUSKCN0ZH5CA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-economy-atlantafed-idUSKCN0ZH5CA?il=0


Processing URLs:  86%|████████▌ | 856/1000 [36:40<03:42,  1.55s/it]

Error extracting text from https://www.carbonbrief.org/in-depth-q-and-a-how-article-6-carbon-markets-could-make-or-break-the-paris-agreement: 403 Client Error: Forbidden for url: https://www.carbonbrief.org/in-depth-q-and-a-how-article-6-carbon-markets-could-make-or-break-the-paris-agreement
Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-idUSKCN0ZY0FJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-idUSKCN0ZY0FJ


Processing URLs:  86%|████████▌ | 858/1000 [36:44<04:07,  1.75s/it]

Error extracting text from http://press.ihs.com/press-release/aerospace-defense-security/islamic-state-monthly-revenue-totals-80-million-ihs-says: 403 Client Error: Forbidden for url: https://investor.spglobal.com/news-releases/default.aspx


Processing URLs:  86%|████████▌ | 860/1000 [36:48<04:11,  1.79s/it]

Error extracting text from http://www.middle-east-online.com/english/?id=74445: 404 Client Error: Not Found for url: https://www.middle-east-online.com/english/?id=74445


Processing URLs:  86%|████████▌ | 861/1000 [36:48<03:11,  1.38s/it]

Error extracting text from http://www.wsj.com/articles/mendoza-concedes-in-peru-as-fujimori-and-kuczynski-head-to-runoff-for-presidency-1460499409: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/mendoza-concedes-in-peru-as-fujimori-and-kuczynski-head-to-runoff-for-presidency-1460499409


Processing URLs:  86%|████████▋ | 863/1000 [36:49<02:03,  1.11it/s]

Error extracting text from http://www.nytimes.com/2016/11/28/opinion/why-corruption-matters.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/28/opinion/why-corruption-matters.html


Processing URLs:  86%|████████▋ | 865/1000 [36:53<02:50,  1.26s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2016/02/04/a-whiff-of-panic-in-the-kremlin-as-economy-sinks-further/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2016/02/04/a-whiff-of-panic-in-the-kremlin-as-economy-sinks-further/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe44af60>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  87%|████████▋ | 866/1000 [36:53<02:27,  1.10s/it]

Error extracting text from http://www.thanhniennews.com/world/us-hopes-for-talks-with-china-about-possible-thaad-move-to-skorea-60487.html: HTTPConnectionPool(host='www.thanhniennews.com', port=80): Max retries exceeded with url: /world/us-hopes-for-talks-with-china-about-possible-thaad-move-to-skorea-60487.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3023de540>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  88%|████████▊ | 879/1000 [37:22<03:49,  1.90s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1326439/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1326439/


Processing URLs:  88%|████████▊ | 880/1000 [37:22<02:47,  1.40s/it]

Error extracting text from https://www.nytimes.com/2017/04/18/us/politics/trump-advisers-paris-climate-accord.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/18/us/politics/trump-advisers-paris-climate-accord.html?_r=0


Processing URLs:  88%|████████▊ | 882/1000 [37:24<02:24,  1.22s/it]

Error extracting text from https://blog.boomsupersonic.com/tomorrows-air-travel-is-supersonic-and-sustainable-12e39244dbce: 403 Client Error: Forbidden for url: https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Fblog.boomsupersonic.com%2Ftomorrows-air-travel-is-supersonic-and-sustainable-12e39244dbce


Processing URLs:  89%|████████▊ | 886/1000 [37:29<02:01,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/08/09/world/europe/vladimir-putin-russia-siberia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/09/world/europe/vladimir-putin-russia-siberia.html


Processing URLs:  89%|████████▊ | 887/1000 [37:29<01:34,  1.19it/s]

Error extracting text from https://www.nysun.com/foreign/europe-is-making-gingerly-an-opening-to-the-free/91738/: 404 Client Error: Not Found for url: https://www.nysun.com/foreign/europe-is-making-gingerly-an-opening-to-the-free/91738


Processing URLs:  89%|████████▉ | 889/1000 [37:30<01:15,  1.48it/s]

Error extracting text from http://men.c4defence.com/AFP/NATO-Montenegro-politics-membership/8453/3: HTTPConnectionPool(host='men.c4defence.com', port=80): Max retries exceeded with url: /AFP/NATO-Montenegro-politics-membership/8453/3 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304722ab0>: Failed to resolve 'men.c4defence.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  89%|████████▉ | 894/1000 [37:38<02:18,  1.30s/it]

Error extracting text from http://uk.reuters.com/article/2015/12/04/us-china-ipo-idUKKBN0TN02O20151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  90%|█████████ | 900/1000 [38:43<30:58, 18.58s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-07-19/the-us-cannot-re-enter-afghanistan-once-it-leaves: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  91%|█████████ | 906/1000 [38:51<05:07,  3.28s/it]

Error extracting text from https://www.nytimes.com/2021/03/01/us/extremism-capitol-riot.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/01/us/extremism-capitol-riot.html


Processing URLs:  91%|█████████ | 910/1000 [38:54<01:55,  1.28s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-security-aselsan-idUSKBN16G0UQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-aselsan-idUSKBN16G0UQ


Processing URLs:  91%|█████████▏| 913/1000 [39:00<02:39,  1.84s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-02-17/astrazeneca-s-covid-19-vaccines-are-going-unused-in-germany


Processing URLs:  92%|█████████▏| 919/1000 [39:09<02:24,  1.79s/it]

Error extracting text from https://www.reuters.com/world/us/filibuster-imperils-pelosis-abortion-bill-us-senate-klobuchar-2021-09-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/filibuster-imperils-pelosis-abortion-bill-us-senate-klobuchar-2021-09-05/


Processing URLs:  92%|█████████▏| 921/1000 [39:10<01:25,  1.08s/it]

URL filtered: http://www.businessinsider.com/russian-facebook-ads-2016-election-trump-clinton-bernie-2017-11


Processing URLs:  92%|█████████▏| 923/1000 [39:10<00:55,  1.39it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/03/29/asia-pacific/myanmar-lifts-four-year-curfew-in-state-after-communal-violence/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/03/29/asia-pacific/myanmar-lifts-four-year-curfew-in-state-after-communal-violence/


Processing URLs:  93%|█████████▎| 928/1000 [39:13<00:45,  1.57it/s]

Error extracting text from https://abetterway.speaker.gov/_assets/pdf/ABetterWay-Tax-PolicyPaper.pdf: HTTPSConnectionPool(host='abetterway.speaker.gov', port=443): Max retries exceeded with url: /_assets/pdf/ABetterWay-Tax-PolicyPaper.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe449940>: Failed to resolve 'abetterway.speaker.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  93%|█████████▎| 930/1000 [39:29<04:05,  3.51s/it]

Error extracting text from http://newsworldindia.in/world/north-korean-earthquake-near-nuclear-site-sends-panic-across-east-asia-over-possible-atomic-test/167298/: 408 Client Error: Request Time-out for url: http://newsworldindia.in/world/north-korean-earthquake-near-nuclear-site-sends-panic-across-east-asia-over-possible-atomic-test/167298/
Error extracting text from http://www.nytimes.com/2016/11/05/science/elon-musk-spacex-rocket-launches.html?ref=technology: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/05/science/elon-musk-spacex-rocket-launches.html?ref=technology


Processing URLs:  93%|█████████▎| 933/1000 [39:41<04:04,  3.65s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-wheat-russia-idUSKBN16T2QT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-wheat-russia-idUSKBN16T2QT


Processing URLs:  94%|█████████▎| 937/1000 [39:45<01:33,  1.49s/it]

Error extracting text from http://www.who.int/emergencies/zika-virus/response/en/: 404 Client Error: Not Found for url: https://www.who.int/emergencies/zika-virus/response/en/
Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-idUSKCN0VV0N3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-idUSKCN0VV0N3
Error extracting text from https://www.reuters.com/business/energy/record-gas-prices-could-hasten-nord-stream-2-launch-analysts-say-2021-09-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/record-gas-prices-could-hasten-nord-stream-2-launch-analysts-say-2021-09-14/


Processing URLs:  94%|█████████▍| 940/1000 [39:50<01:33,  1.56s/it]

Error extracting text from https://www.caracaschronicles.com/2021/02/02/biden-and-venezuela-sanctions-what-toexpect/: 403 Client Error: Forbidden for url: https://www.caracaschronicles.com/2021/02/02/biden-and-venezuela-sanctions-what-to%02expect/


Processing URLs:  94%|█████████▍| 942/1000 [39:52<01:06,  1.14s/it]

Error extracting text from https://www.science.org/doi/10.1126/science.abm4454.: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.abm4454.


Processing URLs:  94%|█████████▍| 943/1000 [39:54<01:20,  1.42s/it]

Error extracting text from http://phys.org/news/2016-08-iran-rounds-social-network-users.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-08-iran-rounds-social-network-users.html


Processing URLs:  94%|█████████▍| 945/1000 [39:55<00:52,  1.04it/s]

Error extracting text from https://www.nytimes.com/2020/03/13/nyregion/coronavirus-panic-buying.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/03/13/nyregion/coronavirus-panic-buying.html


Processing URLs:  95%|█████████▍| 946/1000 [39:56<00:58,  1.09s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O47HVD6KLVS501-2IQRHVAAQIPGJFGHMT1UDL777S


Processing URLs:  95%|█████████▌| 951/1000 [40:04<00:54,  1.11s/it]

URL filtered: https://www.bloomberglaw.com/product/blaw/exp_blp/ewogICAgImN0eHQiOiAiRE9DIiwKICAgICJpZCI6ICJPWVAwN1k2VkRLSFo/cmVzb3VyY2VfaWQ9NzA2YWMx
Error extracting text from http://www.reuters.com/article/us-usa-trade-nafta-idUSKBN17S2DG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-nafta-idUSKBN17S2DG


Processing URLs:  95%|█████████▌| 952/1000 [40:04<00:46,  1.04it/s]

Error extracting text from http://nyti.ms/2bYF5cX: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/29/opinion/spain-a-country-with-no-government.html?smid=pl-share


Processing URLs:  96%|█████████▌| 956/1000 [40:18<02:00,  2.73s/it]

Error extracting text from http://www.scientificamerican.com/article/go-players-react-to-computer-defeat/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/go-players-react-to-computer-defeat/
URL filtered: https://blendle.com/i/the-wall-street-journal/the-empathy-trap/bnl-wallstreetjournal840-20161203-23_1/r/sh-tw?medium=twitter&amp;campaign=social-share&amp;source=blendle


Processing URLs:  96%|█████████▌| 958/1000 [40:20<01:22,  1.95s/it]

Error extracting text from https://www.nasa.gov/content/j2m-getting-to-mars-sls-and-orion: 404 Client Error: Not Found for url: https://www.nasa.gov/content/j2m-getting-to-mars-sls-and-orion
URL filtered: https://www.bloomberg.com/news/articles/2017-04-21/oil-heads-for-weekly-loss-as-u-s-production-offsets-opec-cuts


Processing URLs:  96%|█████████▌| 960/1000 [40:22<01:02,  1.55s/it]

Error extracting text from https://www.lesswrong.com/posts/vvzfFcbmKgEsDBRHh/honoring-petrov-day-on-lesswrong-in-2019: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/vvzfFcbmKgEsDBRHh/honoring-petrov-day-on-lesswrong-in-2019


Processing URLs:  96%|█████████▌| 962/1000 [40:26<01:04,  1.70s/it]

Error extracting text from http://www.newsweek.com/farc-rebel-rehab-hopes-create-lasting-peace-colombia-451620: 403 Client Error: Forbidden for url: https://www.newsweek.com/farc-rebel-rehab-hopes-create-lasting-peace-colombia-451620


Processing URLs:  97%|█████████▋| 968/1000 [40:34<00:44,  1.39s/it]

Error extracting text from http://business.financialpost.com/news/agriculture/pot-czar-casts-doubt-on-legalization-time-frame-as-marijuana-companies-shoulder-extreme-valuations: 403 Client Error: Forbidden for url: https://financialpost.com/news/agriculture/pot-czar-casts-doubt-on-legalization-time-frame-as-marijuana-companies-shoulder-extreme-valuations
Error extracting text from https://www.reuters.com/article/us-iran-satellite/u-s-says-iran-rocket-test-breaches-u-n-resolution-idUSKBN1AC1YY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-satellite/u-s-says-iran-rocket-test-breaches-u-n-resolution-idUSKBN1AC1YY


Processing URLs:  97%|█████████▋| 971/1000 [40:34<00:21,  1.33it/s]

Error extracting text from https://www.un.org/press/en/content/meetings-coverage: 403 Client Error: Forbidden for url: https://www.un.org/press/en/content/meetings-coverage


Processing URLs:  98%|█████████▊| 976/1000 [40:41<00:27,  1.16s/it]

Error extracting text from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/deepgo.pdf: 404 Client Error: Not found - file doesn't exist or is read protected [even tried multi] for url: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Publications_files/deepgo.pdf


Processing URLs:  98%|█████████▊| 981/1000 [40:55<00:35,  1.88s/it]

URL filtered: https://twitter.com/AHoweBlogger/status/938085600946741248


Processing URLs:  98%|█████████▊| 985/1000 [40:59<00:20,  1.39s/it]

Error extracting text from http://iranfrontpage.com/headlines/id/5027/: 404 Client Error: Not Found for url: https://iranfrontpage.com/headlines/id/5027/


Processing URLs:  99%|█████████▊| 987/1000 [41:00<00:10,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0WN2BU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0WN2BU


Processing URLs:  99%|█████████▉| 988/1000 [41:02<00:14,  1.17s/it]

Error extracting text from https://www.reuters.com/business/environment/sudan-asks-un-security-council-meet-over-ethiopias-blue-nile-dam-2021-06-22/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/environment/sudan-asks-un-security-council-meet-over-ethiopias-blue-nile-dam-2021-06-22/


Processing URLs: 100%|█████████▉| 997/1000 [41:16<00:04,  1.42s/it]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2016/05/17-wide-cybersecurity-rule-adopted/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/05/17-wide-cybersecurity-rule-adopted/


Processing URLs: 100%|█████████▉| 999/1000 [41:34<00:05,  5.89s/it]

Error extracting text from https://www.investopedia.com/ask/answers/042115/what-are-some-examples-pork-barrel-politics-united-states.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/ask/answers/042115/what-are-some-examples-pork-barrel-politics-united-states.asp


Processing URLs: 100%|██████████| 1000/1000 [41:34<00:00,  2.49s/it]
Processing URLs:   0%|          | 0/1000 [00:00<?, ?it/s]

Error extracting text from http://www.financialexpress.com/economy/rcep-meet-india-says-no-early-harvest-differences-widen-split/331097/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/economy/rcep-meet-india-says-no-early-harvest-differences-widen-split/331097/


Processing URLs:   0%|          | 2/1000 [00:00<04:03,  4.10it/s]

Error extracting text from https://www.wsj.com/articles/venezuelas-shortages-spur-perilous-sea-journeys-1498172121: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-shortages-spur-perilous-sea-journeys-1498172121


Processing URLs:   0%|          | 3/1000 [00:00<05:01,  3.30it/s]

Error extracting text from http://www.realclearpolitics.com/video/2016/09/20/chuck_todd_obamas_failure_to_pass_tpp_illustrates_governments_basic_inability_to_function.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/video/2016/09/20/chuck_todd_obamas_failure_to_pass_tpp_illustrates_governments_basic_inability_to_function.html


Processing URLs:   0%|          | 4/1000 [00:31<3:14:37, 11.72s/it]

Error extracting text from http://www.mmtimes.com/index.php/business/property-news/19191-parliament-urges-enquiry-into-last-minute-rush-for-lucrative-deals.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/business/property-news/19191-parliament-urges-enquiry-into-last-minute-rush-for-lucrative-deals.html


Processing URLs:   1%|          | 8/1000 [00:38<1:00:45,  3.67s/it]

URL filtered: https://twitter.com/jaredlholt
URL filtered: https://twitter.com/elonmusk/status/1076613555091234816


Processing URLs:   1%|          | 12/1000 [00:45<39:04,  2.37s/it] 

Error extracting text from http://www.therussophile.org/foreign-ministry-spokesperson-maria-zakharovas-reply-to-a-media-question-on-russian-assessments-of-the-upcoming-fourth-nuclear-security-summit-in-washington-march-31-april-1-2016.html/: 404 Client Error:  for url: https://therussophile.org/foreign-ministry-spokesperson-maria-zakharovas-reply-to-a-media-question-on-russian-assessments-of-the-upcoming-fourth-nuclear-security-summit-in-washington-march-31-april-1-2016.html/
Error extracting text from http://www.reuters.com/article/us-brazil-politics-impeachment-idUSKCN0YT2HD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-impeachment-idUSKCN0YT2HD


Processing URLs:   1%|▏         | 14/1000 [00:45<24:31,  1.49s/it]

Error extracting text from https://www.bencarson.com/news/news-updates/club-for-growth-super-pac-poll-shows-trump-drop-in-iowa: HTTPSConnectionPool(host='www.bencarson.com', port=443): Max retries exceeded with url: /news/news-updates/club-for-growth-super-pac-poll-shows-trump-drop-in-iowa (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3042e7e60>: Failed to resolve 'www.bencarson.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   2%|▏         | 16/1000 [00:47<19:43,  1.20s/it]

Error extracting text from https://www.nytimes.com/2017/03/06/us/politics/affordable-care-act-obamacare-health.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/06/us/politics/affordable-care-act-obamacare-health.html?_r=0


Processing URLs:   2%|▏         | 17/1000 [00:47<16:10,  1.01it/s]

Error extracting text from https://financialpost.com/pmn/business-pmn/chinas-baidu-to-launch-paid-driverless-ride-hailing-services-in-beijing-2: 403 Client Error: Forbidden for url: https://financialpost.com/pmn/business-pmn/chinas-baidu-to-launch-paid-driverless-ride-hailing-services-in-beijing-2


Processing URLs:   2%|▏         | 19/1000 [00:54<34:04,  2.08s/it]

Error extracting text from http://www.ibtimes.com/japans-shinzo-abe-calls-summit-russias-vladimir-putin-over-northern-territories-2247350: 403 Client Error: Forbidden for url: https://www.ibtimes.com/japans-shinzo-abe-calls-summit-russias-vladimir-putin-over-northern-territories-2247350


Processing URLs:   2%|▏         | 24/1000 [01:02<26:01,  1.60s/it]

Error extracting text from https://www.state.gov/secretary/remarks/2017/04/270315.htm: 404 Client Error: Not Found for url: https://www.state.gov/remarks-secretary-pompeo/


Processing URLs:   3%|▎         | 28/1000 [02:08<5:16:31, 19.54s/it]

Error extracting text from http://aa.com.tr/en/middle-east/daesh-shells-iraqi-troops-with-chlorine-gas-near-mosul/633479: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:   3%|▎         | 32/1000 [02:13<1:30:22,  5.60s/it]

Error extracting text from http://www.straitstimes.com/asia/askst-will-the-regional-comprehensive-economic-partnership-conclude-this-year: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:   3%|▎         | 33/1000 [02:13<1:04:51,  4.02s/it]

Error extracting text from http://www.rand.org/pubs/monograph_reports/MR1033.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/monograph_reports/MR1033.html


Processing URLs:   4%|▎         | 35/1000 [02:18<46:30,  2.89s/it]  

Error extracting text from https://www.teslamotors.com/: 403 Client Error: Forbidden for url: https://www.teslamotors.com/


Processing URLs:   4%|▍         | 40/1000 [02:24<24:58,  1.56s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper-4096.html#: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper-4096.html


Processing URLs:   4%|▍         | 43/1000 [02:27<18:27,  1.16s/it]

Error extracting text from http://www.un.org/press/en/2016/sc12315.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2016/sc12315.doc.htm


Processing URLs:   5%|▍         | 49/1000 [03:38<5:09:27, 19.52s/it]

Error extracting text from https://www.mcclatchydc.com/news/politics-government/white-house/article253553829.html: HTTPSConnectionPool(host='www.mcclatchydc.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   5%|▌         | 50/1000 [03:40<3:48:41, 14.44s/it]

Error extracting text from http://www.icla.up.ac.za/images/un/hrc/UNIB-_Oral_update_HRC_31__22_March_2016_-_Eng.pdf: 404 Client Error: Not Found for url: http://www.icla.up.ac.za/images/un/hrc/UNIB-_Oral_update_HRC_31__22_March_2016_-_Eng.pdf


Processing URLs:   6%|▌         | 57/1000 [03:53<49:32,  3.15s/it]  

Error extracting text from http://www.newsweek.com/donald-trump-approval-rating-plunged-republicans-flee-president-amid-russia-612979: 403 Client Error: Forbidden for url: https://www.newsweek.com/donald-trump-approval-rating-plunged-republicans-flee-president-amid-russia-612979


Processing URLs:   6%|▋         | 63/1000 [03:57<15:36,  1.00it/s]

Error extracting text from http://www.reuters.com/article/global-oil-idusl3n13w06m20151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/global-oil-idusl3n13w06m20151207


Processing URLs:   7%|▋         | 66/1000 [04:06<37:15,  2.39s/it]

Error extracting text from https://www.reuters.com/technology/stablecoins-face-same-safeguards-traditional-payments-2021-10-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/technology/stablecoins-face-same-safeguards-traditional-payments-2021-10-06/


Processing URLs:   7%|▋         | 69/1000 [04:10<29:06,  1.88s/it]

Error extracting text from http://bigstory.ap.org/article/0844d0a242f248b69b8f65ee588f13dc/iraqi-commander-about-2500-militants-killed-fallujah: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/0844d0a242f248b69b8f65ee588f13dc/iraqi-commander-about-2500-militants-killed-fallujah (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301fc5250>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 72/1000 [04:12<17:59,  1.16s/it]

Error extracting text from https://uk.news.yahoo.com/wikileaks-says-imf-believes-greek-default-could-coincide-232116028.html: 404 Client Error: Not Found for url: https://uk.news.yahoo.com/wikileaks-says-imf-believes-greek-default-could-coincide-232116028.html


Processing URLs:   8%|▊         | 75/1000 [04:15<13:38,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/some-bondholders-have-received-late-pdvsa-bond-payment-sources-idUSKBN1D15Y1?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/some-bondholders-have-received-late-pdvsa-bond-payment-sources-idUSKBN1D15Y1?il=0
Error extracting text from http://globalnation.inquirer.net/153721/ph-closely-monitoring-chinese-activities-scarborough-shoal: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/153721/ph-closely-monitoring-chinese-activities-scarborough-shoal


Processing URLs:   8%|▊         | 76/1000 [04:16<13:28,  1.14it/s]

Error extracting text from https://coronavirus.upenn.edu/announcement/message-penn-community-0: 404 Client Error: Not Found for url: https://wellness.upenn.edu/announcement/message-penn-community-0


Processing URLs:   8%|▊         | 79/1000 [04:23<26:03,  1.70s/it]

Error extracting text from http://www.samaa.tv/pakistan/2016/05/heroic-welcome-as-ali-haider-gilani-returns-to-hometown/: 403 Client Error: Forbidden for url: https://www.samaa.tv/pakistan/2016/05/heroic-welcome-as-ali-haider-gilani-returns-to-hometown/


Processing URLs:   8%|▊         | 82/1000 [04:29<23:44,  1.55s/it]

Error extracting text from http://news.yahoo.com/japan-sees-drop-core-inflation-boj-mulls-more-003432561.html: 404 Client Error: Not Found for url: http://news.yahoo.com/japan-sees-drop-core-inflation-boj-mulls-more-003432561.html


Processing URLs:   8%|▊         | 85/1000 [04:37<36:58,  2.42s/it]

Error extracting text from http://www.theweek.co.uk/scottish-independence/55716/scotlands-budget-deficit-is-highest-in-eu: 404 Client Error: Not Found for url: https://theweek.com/scottish-independence/55716/scotlands-budget-deficit-is-highest-in-eu


Processing URLs:   9%|▊         | 86/1000 [04:39<36:12,  2.38s/it]

URL filtered: http://mobile.reuters.com/article/idUSKBN1560SX?utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=5884f45204d3015016468abf&amp;utm_medium=trueAnthem&amp;utm_source=twitter
Error extracting text from http://wj.parliament.af/: HTTPConnectionPool(host='wj.parliament.af', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feeb7e90>: Failed to resolve 'wj.parliament.af' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 89/1000 [05:10<1:43:40,  6.83s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/19076-nld-blasts-u-ye-htut-s-comments-on-presidency-bid.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/19076-nld-blasts-u-ye-htut-s-comments-on-presidency-bid.html
URL filtered: https://www.youtube.com/watch?v=0NAvTEQ9eXQ


Processing URLs:   9%|▉         | 92/1000 [05:14<1:04:05,  4.24s/it]

Error extracting text from https://goo.gl/KIQD1x: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /the-links-between-food-crises-and-violence-in-east-south-and-west-africa-an-acled-briefing-note/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffb5a210>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 93/1000 [05:45<2:34:30, 10.22s/it]

Error extracting text from http://news.statetimes.in/div-com-reviews-progress-of-hydroelectric-power-projects/: 522 Server Error:  for url: http://news.statetimes.in/div-com-reviews-progress-of-hydroelectric-power-projects/


Processing URLs:   9%|▉         | 94/1000 [05:48<2:06:18,  8.37s/it]

Error extracting text from http://blogs.marketwatch.com/thetell/2016/08/03/tesla-earnings-arrive-amid-cash-burn-solarcity-concerns-live-blog/: 404 Client Error: Not Found for url: https://www.marketwatch.com/story/tesla-earnings-arrive-amid-cash-burn-solarcity-concerns-live-blog-2016-08-03


Processing URLs:  10%|▉         | 97/1000 [05:54<1:04:59,  4.32s/it]

Error extracting text from https://www.nytimes.com/2020/10/05/health/trump-covid-public-health.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/10/05/health/trump-covid-public-health.html


Processing URLs:  10%|█         | 100/1000 [06:00<40:14,  2.68s/it] 

Error extracting text from http://www.ibtimes.com/ban-ki-moon-president-south-korean-lawmakers-want-rope-un-chief-new-party-2465758: 403 Client Error: Forbidden for url: https://www.ibtimes.com/ban-ki-moon-president-south-korean-lawmakers-want-rope-un-chief-new-party-2465758


Processing URLs:  10%|█         | 105/1000 [06:04<15:33,  1.04s/it]

Error extracting text from http://thenationonlineng.net/fulani-herdsmen-kill-15-tiv-farmers/: 403 Client Error: Forbidden for url: https://thenationonlineng.net/fulani-herdsmen-kill-15-tiv-farmers/


Processing URLs:  11%|█         | 107/1000 [06:06<13:45,  1.08it/s]

Error extracting text from https://static.googleusercontent.com/media/www.google.com/en//selfdrivingcar/files/reports/report-0816.pdf: 404 Client Error: Not Found for url: https://static.googleusercontent.com/media/www.google.com/en//selfdrivingcar/files/reports/report-0816.pdf


Processing URLs:  11%|█         | 108/1000 [06:07<12:09,  1.22it/s]

Error extracting text from http://thehill.com/policy/energy-environment/329522-exxon-seeks-waiver-from-russia-sanctions: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/329522-exxon-seeks-waiver-from-russia-sanctions/


Processing URLs:  11%|█         | 110/1000 [06:12<27:38,  1.86s/it]

Error extracting text from http://www.projectblue.org/organizations/: 404 Client Error: Not Found for url: https://www.boldlygo.org/project-blue


Processing URLs:  11%|█         | 112/1000 [06:14<19:42,  1.33s/it]

Error extracting text from https://johnib.wordpress.com/tag/hongxiang-industrial/: 410 Client Error: Gone for url: https://johnib.wordpress.com/tag/hongxiang-industrial/


Processing URLs:  11%|█▏        | 114/1000 [06:15<15:19,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0YM19S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0YM19S


Processing URLs:  12%|█▏        | 121/1000 [06:28<19:24,  1.32s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/321930-sessions-spoke-with-russian-ambassador-during-trumps-campaign: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/321930-sessions-spoke-with-russian-ambassador-during-trumps-campaign/
URL filtered: https://www.youtube.com/watch?v=TNiWnSOsAnE


Processing URLs:  12%|█▏        | 123/1000 [06:28<12:11,  1.20it/s]

Error extracting text from http://factor-tech.com/feature/jimmy-wales-all-major-internet-traffic-is-going-to-be-encrypted-very-very-soon: 404 Client Error: Site not found for url: http://factor-tech.com/feature/jimmy-wales-all-major-internet-traffic-is-going-to-be-encrypted-very-very-soon


Processing URLs:  12%|█▎        | 125/1000 [06:31<14:13,  1.02it/s]

Error extracting text from http://time.com/4714686/dakota-access-pipeline-oil-missouri-river/: 404 Client Error: Not Found for url: https://time.com/4714686/dakota-access-pipeline-oil-missouri-river/
Error extracting text from http://www.reuters.com/article/us-brazil-corruption-idUSKCN0W52E2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-idUSKCN0W52E2


Processing URLs:  13%|█▎        | 128/1000 [06:33<11:27,  1.27it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper-4096.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper-4096.html


Processing URLs:  13%|█▎        | 130/1000 [06:40<29:21,  2.02s/it]

Error extracting text from http://www.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=&amp;reference=2015/0270(COD): 404 Client Error: Not Found for url: https://oeil.secure.europarl.europa.eu/oeil/popups/ficheprocedure.do?lang=&amp;reference=2015/0270(COD)


Processing URLs:  14%|█▍        | 139/1000 [07:02<23:04,  1.61s/it]

Error extracting text from http://www.wsj.com/articles/u-s-warns-five-economic-powers-over-policies-1461960876: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-warns-five-economic-powers-over-policies-1461960876
Error extracting text from http://www.reuters.com/article/us-britain-eu-juncker-idUSKBN16P0PF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-juncker-idUSKBN16P0PF?il=0


Processing URLs:  14%|█▍        | 140/1000 [07:02<17:11,  1.20s/it]

Error extracting text from http://canadafreepress.com/article/israels-engagement-in-syria-causes-and-significance: 403 Client Error: Forbidden for url: https://canadafreepress.com/article/israels-engagement-in-syria-causes-and-significance


Processing URLs:  14%|█▍        | 144/1000 [07:07<17:13,  1.21s/it]

Error extracting text from https://electionlawblog.org/?p=92675: 403 Client Error: Forbidden for url: https://electionlawblog.org/?p=92675


Processing URLs:  15%|█▍        | 147/1000 [07:10<16:35,  1.17s/it]

Error extracting text from http://www.cfr.org/asia-and-pacific/asean-association-southeast-asian-nations/p18616: 404 Client Error: Not Found for url: https://www.cfr.org/asia-and-pacific/asean-association-southeast-asian-nations/p18616


Processing URLs:  15%|█▍        | 148/1000 [07:12<20:01,  1.41s/it]

URL filtered: https://www.youtube.com/watch?v=CUhflgWvvoo


Processing URLs:  15%|█▌        | 151/1000 [07:16<18:01,  1.27s/it]

Error extracting text from https://www.avalara.com/blog/2016/12/01/2017-sales-tax-changes/: 404 Client Error: Not Found for url: https://www.avalara.com/blog/2016/12/01/2017-sales-tax-changes/


Processing URLs:  15%|█▌        | 152/1000 [07:17<19:19,  1.37s/it]

Error extracting text from http://www.gallup.com/opinion/polling-matters/191264/cruz-image-plummets-trump-improves-among-republicans.aspx?g_source=Opinion&amp;g_medium=lead&amp;g_campaign=tiles: 404 Client Error: Not Found for url: https://www.gallup.com/opinion/polling-matters/191264/cruz-image-plummets-trump-improves-among-republicans.aspx?g_source=Opinion&amp;g_medium=lead&amp;g_campaign=tiles


Processing URLs:  16%|█▌        | 159/1000 [07:29<16:47,  1.20s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-russia-oil-specialreport-idUSKBN1AR14U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-russia-oil-specialreport-idUSKBN1AR14U


Processing URLs:  16%|█▌        | 161/1000 [07:31<15:39,  1.12s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-foreign-journalists-helped-cover-mass-graves-government-official-claims-1547283: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-foreign-journalists-helped-cover-mass-graves-government-official-claims-1547283


Processing URLs:  16%|█▋        | 164/1000 [07:33<10:34,  1.32it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-economy-exclusive-idUSKCN0VE1AT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-exclusive-idUSKCN0VE1AT


Processing URLs:  17%|█▋        | 172/1000 [07:43<14:02,  1.02s/it]

Error extracting text from https://www.npd.com/wps/portal/npd/us/industry-expertise/books/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/industry-expertise/books/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:  18%|█▊        | 175/1000 [07:50<24:16,  1.77s/it]

Error extracting text from http://www.kyivpost.com/article/opinion/op-ed/timothy-ash-very-disappointing-if-imf-delays-loan-to-ukraine-419218.html: 403 Client Error: Forbidden for url: https://www.kyivpost.com/article/opinion/op-ed/timothy-ash-very-disappointing-if-imf-delays-loan-to-ukraine-419218.html


Processing URLs:  18%|█▊        | 178/1000 [07:53<14:25,  1.05s/it]

Error extracting text from https://trends.google.com/trends/explore?geo=US&amp;q=amazon%20union: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?geo=US&amp;q=amazon%20union
Error extracting text from https://www.wsj.com/articles/saudi-arabias-oil-supremacy-falters-1490088604: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabias-oil-supremacy-falters-1490088604


Processing URLs:  18%|█▊        | 180/1000 [07:55<13:49,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN15N04N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN15N04N


Processing URLs:  18%|█▊        | 182/1000 [07:57<12:16,  1.11it/s]

Error extracting text from http://thenationonlineng.net/fulani-herdsmen-kill-five-tiv-farmers/: 403 Client Error: Forbidden for url: https://thenationonlineng.net/fulani-herdsmen-kill-five-tiv-farmers/


Processing URLs:  19%|█▊        | 186/1000 [08:05<22:28,  1.66s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-OHKSRG6S972I01-0RIFIFIDDI0010SR7KO2NUUDD8


Processing URLs:  20%|█▉        | 196/1000 [08:38<56:26,  4.21s/it]  

Error extracting text from http://www.fxempire.com/news/commodities-news/saudis-unlikely-to-budge-at-next-weeks-opec-meeting-308649: 403 Client Error: Forbidden for url: https://www.fxempire.com/news/commodities-news/saudis-unlikely-to-budge-at-next-weeks-opec-meeting-308649


Processing URLs:  20%|█▉        | 197/1000 [09:39<4:39:47, 20.91s/it]

Error extracting text from http://mockingbird.creighton.edu/english/fajardo/teaching/srp435/junkbond.htm: HTTPConnectionPool(host='mockingbird.creighton.edu', port=80): Max retries exceeded with url: /english/fajardo/teaching/srp435/junkbond.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2feaf2e40>, 'Connection to mockingbird.creighton.edu timed out. (connect timeout=60)'))


Processing URLs:  20%|█▉        | 198/1000 [09:47<3:51:09, 17.29s/it]

Error extracting text from https://www.washingtonpost.com/politics/what-the-worlds-central-banks-are-doing-at-a-glance/2015/12/16/09d43ca2-a430-11e5-8318-bd8caed8c588_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/what-the-worlds-central-banks-are-doing-at-a-glance/2015/12/16/09d43ca2-a430-11e5-8318-bd8caed8c588_story.html


Processing URLs:  20%|██        | 202/1000 [09:57<1:24:11,  6.33s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-16/famine-victims-in-northeast-nigeria-forecast-to-double-in-2017


Processing URLs:  20%|██        | 205/1000 [09:59<38:32,  2.91s/it]  

Error extracting text from http://www.nytimes.com/2015/10/29/us/politics/house-vote-moves-budget-deal-closer-to-approval.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/29/us/politics/house-vote-moves-budget-deal-closer-to-approval.html


Processing URLs:  21%|██        | 208/1000 [10:05<31:27,  2.38s/it]

Error extracting text from https://www.unicef.org/media/91626/file/Yemen%20Humanitarian%20Situation%20Report%201%20-%2030%20Nov%202020.pdf: 403 Client Error: Forbidden for url: https://www.unicef.org/media/91626/file/Yemen%20Humanitarian%20Situation%20Report%201%20-%2030%20Nov%202020.pdf
URL filtered: https://www.bloomberg.com/news/articles/2017-03-30/venezuela-s-supreme-court-takes-over-national-assembly-duties


Processing URLs:  22%|██▏       | 215/1000 [10:10<11:14,  1.16it/s]

Error extracting text from http://www.sanders.senate.gov/newsroom/press-releases/sanders-calls-for-common-sense-gun-safety-legislation: 403 Client Error: Forbidden for url: http://www.sanders.senate.gov/newsroom/press-releases/sanders-calls-for-common-sense-gun-safety-legislation


Processing URLs:  22%|██▏       | 217/1000 [10:11<09:57,  1.31it/s]

Error extracting text from https://www.flightglobal.com/strategy/paris-boom-xb-1-schedule-slips-while-jal-eyes-overture/133222.article: 403 Client Error: Forbidden for url: https://www.flightglobal.com/strategy/paris-boom-xb-1-schedule-slips-while-jal-eyes-overture/133222.article


Processing URLs:  22%|██▏       | 218/1000 [10:12<11:24,  1.14it/s]

Error extracting text from http://www.reuters.com/article/2015/10/13/us-mideast-crisis-syria-aleppo-exclusive-idUSKCN0S72U020151013: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/13/us-mideast-crisis-syria-aleppo-exclusive-idUSKCN0S72U020151013


Processing URLs:  22%|██▏       | 222/1000 [10:16<11:13,  1.15it/s]

Error extracting text from https://www.fox5ny.com/news/scientists-seek-covid-19-origin-nearly-2-years-into-pandemic: 403 Client Error: Forbidden for url: https://www.fox5ny.com/news/scientists-seek-covid-19-origin-nearly-2-years-into-pandemic


Processing URLs:  22%|██▎       | 225/1000 [10:21<18:28,  1.43s/it]

URL filtered: https://www.bloombergquint.com/politics/trump-trespasses-on-fed-independence-blasting-powell-rate-hikes#gs.ahRRz9g


Processing URLs:  23%|██▎       | 229/1000 [10:24<12:13,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/pentagon-cia-chiefs-dont-think-russia-will-abide-by-syria-ceasefire-officials-say-1456235932: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pentagon-cia-chiefs-dont-think-russia-will-abide-by-syria-ceasefire-officials-say-1456235932


Processing URLs:  23%|██▎       | 230/1000 [10:25<10:29,  1.22it/s]

Error extracting text from http://www.wsj.com/articles/fed-shouldnt-be-locked-into-gradual-approach-to-rate-hikes-say-bullard-and-lacker-1447355585: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-shouldnt-be-locked-into-gradual-approach-to-rate-hikes-say-bullard-and-lacker-1447355585


Processing URLs:  23%|██▎       | 231/1000 [10:27<17:42,  1.38s/it]

Error extracting text from http://elcooperante.com/jorge-rodriguez-la-cev-se-comporta-como-partido-politico-y-han-boicoteado-el-dialogo/: 403 Client Error: Forbidden for url: http://elcooperante.com/jorge-rodriguez-la-cev-se-comporta-como-partido-politico-y-han-boicoteado-el-dialogo/
URL filtered: https://twitter.com/borisjohnson/status/1473001328695783428?s=21


Processing URLs:  24%|██▍       | 239/1000 [10:35<12:28,  1.02it/s]

Error extracting text from http://www.koogle.tv/media/news/north-korea-fires-missile-from-submarine/: 404 Client Error: Not Found for url: http://www.koogle.tv/media/news/north-korea-fires-missile-from-submarine/


Processing URLs:  24%|██▍       | 240/1000 [10:37<13:43,  1.08s/it]

URL filtered: https://www.bloomberg.com/graphics/2020-united-states-coronavirus-outbreak/


Processing URLs:  25%|██▍       | 247/1000 [10:46<18:30,  1.48s/it]

Error extracting text from http://www.wsj.com/articles/feds-yellen-december-is-live-possibility-for-first-rate-increase-1446654282?mod=pls_whats_news_us_business_f&amp;alg=y&amp;mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-yellen-december-is-live-possibility-for-first-rate-increase-1446654282?mod=pls_whats_news_us_business_f&amp;alg=y&amp;mg=id-wsj


Processing URLs:  25%|██▌       | 254/1000 [11:37<55:37,  4.47s/it]  

Error extracting text from http://www.worldoil.com/news/2016/12/7/shell-returns-to-iran-with-deal-to-assess-oil-and-gas-fields: 500 Server Error: Internal Server Error for url: https://worldoil.com/news/2016/12/7/shell-returns-to-iran-with-deal-to-assess-oil-and-gas-fields


Processing URLs:  26%|██▌       | 256/1000 [11:38<31:38,  2.55s/it]

Error extracting text from http://seekingalpha.com/news/2858056-one-step-closer-to-ex-im-bank-revival: 403 Client Error: Forbidden for url: https://seekingalpha.com/news/2858056-one-step-closer-to-ex-im-bank-revival


Processing URLs:  26%|██▌       | 258/1000 [11:39<19:00,  1.54s/it]

Error extracting text from https://au.news.yahoo.com/world/a/31642317/eu-raises-pressure-on-poland-in-rule-of-law-row/: 404 Client Error: Not Found for url: https://au.news.yahoo.com/eu-raises-pressure-on-poland-in-rule-of-law-row-31642317.html


Processing URLs:  26%|██▌       | 261/1000 [11:42<13:28,  1.09s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/north-koreas-ballistic-missile-submarine-major-threat-or-23308: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/north-koreas-ballistic-missile-submarine-major-threat-or-23308


Processing URLs:  26%|██▋       | 263/1000 [11:43<07:55,  1.55it/s]

Error extracting text from http://www.nationmultimedia.com/business/UK-provides-$300m-loan-to-boost-exports-to-Myanmar-30280009.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/UK-provides-$300m-loan-to-boost-exports-to-Myanmar-30280009.html


Processing URLs:  26%|██▋       | 265/1000 [11:45<09:44,  1.26it/s]

Error extracting text from http://www.arirang.co.kr/News/News_View.asp?nseq=183912: 404 Client Error:  for url: http://www.arirang.co.kr/News/News_View.asp?nseq=183912


Processing URLs:  27%|██▋       | 266/1000 [11:46<11:08,  1.10it/s]

Error extracting text from http://www.who.int/csr/disease/swineflu/phase/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/disease/swineflu/phase/en/


Processing URLs:  27%|██▋       | 269/1000 [11:52<17:47,  1.46s/it]

Error extracting text from https://news.google.com/newspapers?nid=888&amp;dat=19770110&amp;id=2bhaAAAAIBAJ&amp;sjid=kF0DAAAAIBAJ&amp;pg=6972,1653635: 404 Client Error: Not Found for url: https://news.google.com/newspapers?nid=888&amp;dat=19770110&amp;id=2bhaAAAAIBAJ&amp;sjid=kF0DAAAAIBAJ&amp;pg=6972,1653635


Processing URLs:  28%|██▊       | 277/1000 [13:13<4:10:27, 20.79s/it]

Error extracting text from http://www.williamhillplc.com/media/newsroom/media-releases/2016/kiwi-clark-clear-favourite-for-un-secretary-general/: HTTPSConnectionPool(host='corporate.888.com', port=443): Max retries exceeded with url: /media/newsroom/media-releases/2016/kiwi-clark-clear-favourite-for-un-secretary-general/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300db2240>, 'Connection to corporate.888.com timed out. (connect timeout=60)'))


Processing URLs:  28%|██▊       | 278/1000 [13:13<2:56:06, 14.64s/it]

Error extracting text from https://news.artnet.com/art-world/restitution-benin-bronzes-humboldt-forum-1953883: 403 Client Error: Forbidden for url: https://news.artnet.com/art-world/restitution-benin-bronzes-humboldt-forum-1953883


Processing URLs:  29%|██▊       | 286/1000 [13:25<22:20,  1.88s/it]  

Error extracting text from https://www.reuters.com/article/uk-britain-eu-idUSKCN1SY0HJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-eu-idUSKCN1SY0HJ


Processing URLs:  29%|██▊       | 287/1000 [13:27<22:54,  1.93s/it]

Error extracting text from http://www.hydroworld.com/articles/2017/03/indus-water-commission-plans-to-meet-in-pakistan-as-india-resumes-work-on-shuttered-hydropower-project.html: 403 Client Error: Forbidden for url: https://www.hydroreview.com/


Processing URLs:  29%|██▉       | 294/1000 [13:40<20:03,  1.70s/it]

Error extracting text from https://amti.csis.org/the-challenges-facing-philippines-china-joint-development-in-the-south-china-sea/: 403 Client Error: Forbidden for url: https://amti.csis.org/the-challenges-facing-philippines-china-joint-development-in-the-south-china-sea/


Processing URLs:  30%|██▉       | 295/1000 [13:41<18:54,  1.61s/it]

Error extracting text from http://www.dailyreckoning.com.au/the-crash-you-can-see-coming/2016/01/13/: 403 Client Error: Forbidden for url: https://daily.fattail.com.au/


Processing URLs:  30%|██▉       | 296/1000 [13:42<16:55,  1.44s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/suu-kyi-loyalist-confirmed-myanmar-presidential-race-37568844: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/suu-kyi-loyalist-confirmed-myanmar-presidential-race-37568844


Processing URLs:  30%|███       | 300/1000 [13:51<21:49,  1.87s/it]

URL filtered: https://www.youtube.com/watch?v=XKDsZqgpgcw


Processing URLs:  30%|███       | 303/1000 [13:55<19:26,  1.67s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-27/rift-at-venezuela-broker-dealer-torino-leads-to-founder-s-exodus


Processing URLs:  30%|███       | 305/1000 [13:57<16:43,  1.44s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-sanctions-idUKKBN1AH5FS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  31%|███       | 307/1000 [14:02<20:24,  1.77s/it]

Error extracting text from http://election.princeton.edu/2016/05/05/looking-back-on-the-primaries-did-data-journalism-really-lose/: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /05/05/looking-back-on-the-primaries-did-data-journalism-really-lose/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2feaf0650>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/news/articles/2017-03-27/canada-pot-stocks-surge-after-report-of-legalization-date


Processing URLs:  31%|███       | 310/1000 [14:03<12:11,  1.06s/it]

Error extracting text from http://www.reuters.com/article/britain-eu-sterling-idUSL8N14Z1V0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-eu-sterling-idUSL8N14Z1V0


Processing URLs:  31%|███       | 311/1000 [14:13<37:14,  3.24s/it]

URL filtered: https://www.youtube.com/watch?v=l9JM_uSG5Aw


Processing URLs:  31%|███▏      | 313/1000 [14:24<46:26,  4.06s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/keiko-fujimori-looks-likely-to-1st-round-of-peru-election/2016/04/10/42c69ff4-ff85-11e5-8bb1-f124a43f84dc_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/keiko-fujimori-looks-likely-to-1st-round-of-peru-election/2016/04/10/42c69ff4-ff85-11e5-8bb1-f124a43f84dc_story.html


Processing URLs:  31%|███▏      | 314/1000 [14:24<37:38,  3.29s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/08/09/world/crucial-battle-aleppo-poised-commence/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/08/09/world/crucial-battle-aleppo-poised-commence/


Processing URLs:  32%|███▏      | 319/1000 [15:30<3:32:14, 18.70s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-08-11/nigerian-military-makes-unauthorized-search-of-un-base: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  32%|███▏      | 321/1000 [15:33<1:54:58, 10.16s/it]

Error extracting text from https://www.justsecurity.org/76641/beyond-the-coup-can-the-united-nations-escape-its-history-in-myanmar/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/76641/beyond-the-coup-can-the-united-nations-escape-its-history-in-myanmar/


Processing URLs:  33%|███▎      | 327/1000 [15:40<21:54,  1.95s/it]  

Error extracting text from http://fox6now.com/2016/03/06/two-syrians-sentenced-in-death-of-syrian-boy-alan-kurdi/: 403 Client Error: Forbidden for url: http://fox6now.com/2016/03/06/two-syrians-sentenced-in-death-of-syrian-boy-alan-kurdi/


Processing URLs:  33%|███▎      | 328/1000 [15:40<16:11,  1.45s/it]

Error extracting text from http://www.balkans.com/open-news.php?uniquenumber=208088: 404 Client Error: Not Found for url: http://www.balkans.com/open-news.php?uniquenumber=208088


Processing URLs:  33%|███▎      | 329/1000 [15:45<27:57,  2.50s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-30/u-k-eu-struggle-for-irish-compromise-four-days-from-deadline


Processing URLs:  34%|███▎      | 335/1000 [15:54<16:58,  1.53s/it]

Error extracting text from http://aranews.net/2015/11/isis-executes-73-of-its-own-militants-for-evacuating-headquarters-in-iraqs-yezidi-region/: 404 Client Error: Not Found for url: http://aranews.net/2015/11/isis-executes-73-of-its-own-militants-for-evacuating-headquarters-in-iraqs-yezidi-region/


Processing URLs:  34%|███▎      | 337/1000 [15:56<14:30,  1.31s/it]



Processing URLs:  34%|███▍      | 343/1000 [16:04<10:33,  1.04it/s]

Error extracting text from http://gcaptain.com/panama-canal-crack-fix-wont-delay-expansion-opening-contractor-says/: 403 Client Error: Forbidden for url: http://gcaptain.com/panama-canal-crack-fix-wont-delay-expansion-opening-contractor-says/
Error extracting text from http://agencias.abc.es/agencias/noticia.asp?noticia=2068172: 403 Client Error: Forbidden for url: http://agencias.abc.es/agencias/noticia.asp?noticia=2068172


Processing URLs:  34%|███▍      | 344/1000 [16:04<07:53,  1.39it/s]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0WJ34B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0WJ34B


Processing URLs:  34%|███▍      | 345/1000 [16:07<13:29,  1.24s/it]

Error extracting text from http://www.newsweek.com/kremlin-says-putin-and-trump-will-meet-july-anyway-556052?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/kremlin-says-putin-and-trump-will-meet-july-anyway-556052?rx=us


Processing URLs:  35%|███▍      | 346/1000 [16:07<10:15,  1.06it/s]

Error extracting text from https://www.nytimes.com/2021/02/13/world/americas/venezuela-juan-guaido.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/13/world/americas/venezuela-juan-guaido.html


Processing URLs:  35%|███▍      | 348/1000 [16:13<22:58,  2.11s/it]

Error extracting text from http://www.ethnologue.com/statistics/status: 404 Client Error: Not Found for url: https://www.ethnologue.com/statistics/status


Processing URLs:  36%|███▌      | 355/1000 [16:26<17:56,  1.67s/it]

Error extracting text from https://www.wsj.com/articles/south-korean-president-park-geun-hyes-ouster-to-trigger-shift-on-u-s-policy-1489167265: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korean-president-park-geun-hyes-ouster-to-trigger-shift-on-u-s-policy-1489167265


Processing URLs:  36%|███▌      | 358/1000 [16:30<16:22,  1.53s/it]

Error extracting text from https://www.faa.gov/news/media/attachments/SFA_Supersonic_Final_Rule.pdf: 404 Client Error: Not Found for url: https://www.faa.gov/news/media/attachments/SFA_Supersonic_Final_Rule.pdf


Processing URLs:  36%|███▌      | 361/1000 [16:35<15:58,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-eu-amazon-com-antitrust-idUSKBN158145: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-amazon-com-antitrust-idUSKBN158145
Error extracting text from https://www.reuters.com/article/us-venezuela-economy/venezuela-allows-1-7-billion-gold-swap-with-deutsche-to-lapse-legislator-idUSKBN1CS2CD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy/venezuela-allows-1-7-billion-gold-swap-with-deutsche-to-lapse-legislator-idUSKBN1CS2CD


Processing URLs:  36%|███▋      | 365/1000 [16:38<09:18,  1.14it/s]

URL filtered: https://www.bloomberg.com/politics/articles/2017-05-13/merkel-party-seeks-pre-election-boost-in-challenger-s-homeland


Processing URLs:  37%|███▋      | 367/1000 [16:39<06:43,  1.57it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-mosul-exclusive-idUSKCN12314Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-mosul-exclusive-idUSKCN12314Z


Processing URLs:  37%|███▋      | 371/1000 [16:43<10:16,  1.02it/s]

URL filtered: http://gizmodo.com/facebook-finally-rolls-out-disputed-news-tag-everyone-w-1792959827


Processing URLs:  37%|███▋      | 374/1000 [19:46<5:44:05, 32.98s/it]

Error extracting text from http://www.sacbee.com/news/politics-government/article184786603.html: HTTPConnectionPool(host='www.sacbee.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  38%|███▊      | 377/1000 [19:50<2:30:01, 14.45s/it]

Error extracting text from http://www.state.gov/secretary/remarks/2016/02/253164.htm: 404 Client Error: Not Found for url: https://www.state.gov/remarks-secretary-pompeo/


Processing URLs:  38%|███▊      | 378/1000 [19:51<1:49:51, 10.60s/it]

Error extracting text from http://thehill.com/opinion/white-house/351869-have-no-doubt-president-trump-will-wind-up-firing-robert-mueller: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/351869-have-no-doubt-president-trump-will-wind-up-firing-robert-mueller/


Processing URLs:  38%|███▊      | 380/1000 [19:56<1:08:14,  6.60s/it]

Error extracting text from http://russia-insider.com/en/politics/busted-us-sent-isis-15-billion-weapons-europe-ukraine/ri18298: 503 Server Error: Service Unavailable for url: https://russia-insider.com/en/politics/busted-us-sent-isis-15-billion-weapons-europe-ukraine/ri18298


Processing URLs:  38%|███▊      | 381/1000 [19:57<52:38,  5.10s/it]  

URL filtered: https://www.weforum.org/agenda/2016/03/what-does-success-at-work-really-mean?utm_content=bufferc1e0b&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  39%|███▉      | 392/1000 [20:26<26:09,  2.58s/it]

URL filtered: http://www.politico.eu/article/un-report-assad-again-used-chemical-weapons-defying-obama/?utm_content=bufferde55f&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer
Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-china-idUSKBN1450VE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-china-idUSKBN1450VE


Processing URLs:  40%|███▉      | 396/1000 [20:33<19:21,  1.92s/it]

Error extracting text from http://www.presstv.com/Detail/2016/02/01/448226/Iran-frozen-assets-sanctions-Nobakht/%5D: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/02/01/448226/Iran-frozen-assets-sanctions-Nobakht/%5D


Processing URLs:  40%|███▉      | 399/1000 [20:36<13:20,  1.33s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-23/un-climate-chief-seeks-out-tillerson-to-keep-paris-deal-alive
URL filtered: http://www.bloomberg.com/news/articles/2015-10-30/venezuela-s-president-maduro-says-he-won-t-hand-over-revolution


Processing URLs:  41%|████      | 411/1000 [22:04<3:18:04, 20.18s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2021-03-01/china-said-to-speed-up-move-to-more-survivable-nuclear-force: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  42%|████▏     | 417/1000 [22:25<47:33,  4.90s/it]  

URL filtered: https://www.facebook.com/SenatorMarcoRubio/posts/1542347642457261


Processing URLs:  42%|████▏     | 421/1000 [22:26<16:37,  1.72s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-perspec-moore-roy-alabama-senate-1207-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-perspec-moore-roy-alabama-senate-1207-story.html
URL filtered: https://www.bloomberg.com/gadfly/articles/2017-04-23/u-s-shale-s-the-wild-horse-that-opec-just-can-t-tame
Error extracting text from http://www.reuters.com/article/us-russia-budget-deficit-idUSKCN0X90TN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-budget-deficit-idUSKCN0X90TN


Processing URLs:  42%|████▏     | 422/1000 [22:27<16:02,  1.67s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-civil-society-rejects-government-claims-violence-not-ethnically-motivated-1569055: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-civil-society-rejects-government-claims-violence-not-ethnically-motivated-1569055


Processing URLs:  42%|████▏     | 424/1000 [22:28<10:03,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-usa-missile-defense-hawaii-idUSKCN0V0008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-missile-defense-hawaii-idUSKCN0V0008
Error extracting text from https://www.reuters.com/article/us-iran-missiles-factory-idUSKBN18L0Z9?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-factory-idUSKBN18L0Z9?il=0


Processing URLs:  43%|████▎     | 426/1000 [22:28<07:00,  1.37it/s]

Error extracting text from http://www.chicagotribune.com/news/opinion/editorials/ct-democratic-debate-clinton-sanders-edit-1014-20151013-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/editorials/ct-democratic-debate-clinton-sanders-edit-1014-20151013-story.html


Processing URLs:  43%|████▎     | 430/1000 [23:39<2:07:17, 13.40s/it]



Processing URLs:  43%|████▎     | 433/1000 [23:44<1:03:24,  6.71s/it]

Error extracting text from http://www.wsj.com/articles/eu-faces-difficult-choice-as-poland-fires-latest-salvo-in-rule-of-law-dispute-1477665662: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-faces-difficult-choice-as-poland-fires-latest-salvo-in-rule-of-law-dispute-1477665662


Processing URLs:  44%|████▎     | 436/1000 [23:47<31:11,  3.32s/it]  

Error extracting text from https://www.reuters.com/world/middle-east/oil-rises-further-hopes-tighter-supply-opec-talks-abandoned-2021-07-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/oil-rises-further-hopes-tighter-supply-opec-talks-abandoned-2021-07-06/


Processing URLs:  44%|████▍     | 438/1000 [23:48<19:42,  2.10s/it]

Error extracting text from http://governors.rutgers.edu/on-governors/us-governors/when-governors-seek-re-election/: 404 Client Error: Not Found for url: https://governors.rutgers.edu/on-governors/us-governors/when-governors-seek-re-election/


Processing URLs:  44%|████▍     | 440/1000 [23:53<19:20,  2.07s/it]

Error extracting text from https://www.reuters.com/world/myanmar-will-not-address-world-leaders-un-afghanistan-will-2021-09-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/myanmar-will-not-address-world-leaders-un-afghanistan-will-2021-09-24/


Processing URLs:  44%|████▍     | 442/1000 [23:53<12:11,  1.31s/it]

Error extracting text from http://focustaiwan.tw/news/acs/201508310021.aspx: 403 Client Error: Forbidden for url: https://focustaiwan.tw:443/news/acs/201508310021.aspx


Processing URLs:  44%|████▍     | 445/1000 [23:55<08:05,  1.14it/s]

Error extracting text from http://www.nytimes.com/2016/08/02/world/africa/nigeria-army-shiites-zaria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/02/world/africa/nigeria-army-shiites-zaria.html


Processing URLs:  45%|████▍     | 446/1000 [23:56<07:07,  1.30it/s]

Error extracting text from http://www.statesman.com/news/sign-probes-into-russia-trump-campaign-will-die-down/bzix6oZhal03mR8xXdCvHK/: 404 Client Error: OK for url: https://www.statesman.com/news/sign-probes-into-russia-trump-campaign-will-die-down/bzix6oZhal03mR8xXdCvHK/


Processing URLs:  45%|████▍     | 447/1000 [23:59<14:40,  1.59s/it]

Error extracting text from http://www.iihl.org/wp-content/uploads/2015/12/ROE-HANDBOOK-ENGLISH.pdf: 404 Client Error: Not Found for url: https://iihl.org/wp-content/uploads/2015/12/ROE-HANDBOOK-ENGLISH.pdf


Processing URLs:  45%|████▌     | 452/1000 [24:04<09:04,  1.01it/s]

Error extracting text from http://www.peacefare.net/2015/12/28/what-difference-does-ramadi-make/: 406 Client Error: Not Acceptable for url: http://www.peacefare.net/2015/12/28/what-difference-does-ramadi-make/


Processing URLs:  45%|████▌     | 454/1000 [24:05<06:25,  1.42it/s]

Error extracting text from http://www.nytimes.com/2015/10/24/world/middleeast/us-and-russia-find-common-goals-on-syria-if-not-on-assad.html?emc=edit_th_20151024&amp;nl=todaysheadlines&amp;nlid=45205797: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/24/world/middleeast/us-and-russia-find-common-goals-on-syria-if-not-on-assad.html?emc=edit_th_20151024&amp;nl=todaysheadlines&amp;nlid=45205797


Processing URLs:  46%|████▌     | 456/1000 [24:07<08:44,  1.04it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XP05H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XP05H


Processing URLs:  46%|████▋     | 464/1000 [24:32<32:45,  3.67s/it]

Error extracting text from https://decisiondeskhq.com/vote-tracker/estimating-cloture-count-for-supreme-court-nominee-neil-gorsuch/: 403 Client Error: Forbidden for url: https://decisiondeskhq.com/vote-tracker/estimating-cloture-count-for-supreme-court-nominee-neil-gorsuch/


Processing URLs:  47%|████▋     | 467/1000 [24:34<15:21,  1.73s/it]

Error extracting text from https://www.reuters.com/article/us-russia-usa-reaction/putin-offers-biden-public-talks-after-u-s-president-says-he-thinks-he-is-a-killer-idUSKBN2BA0S1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-reaction/putin-offers-biden-public-talks-after-u-s-president-says-he-thinks-he-is-a-killer-idUSKBN2BA0S1


Processing URLs:  47%|████▋     | 469/1000 [24:36<11:05,  1.25s/it]

URL filtered: https://www.youtube.com/watch?v=Xo5SBhuPO8I


Processing URLs:  47%|████▋     | 472/1000 [24:37<07:01,  1.25it/s]

Error extracting text from http://news.sky.com/story/1567364/stormont-crisis-ministers-resign-22-times: 404 Client Error: Not Found for url: https://news.sky.com/story/1567364/stormont-crisis-ministers-resign-22-times


Processing URLs:  47%|████▋     | 473/1000 [24:39<09:55,  1.13s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYS8236S973701-4LROPPCHCDEV18RI2NO03MU670


Processing URLs:  48%|████▊     | 475/1000 [24:41<08:48,  1.01s/it]

Error extracting text from http://www.ibtimes.com/syria-war-news-un-talks-assad-victory-first-time-army-rebels-fight-isis-2493842: 403 Client Error: Forbidden for url: https://www.ibtimes.com/syria-war-news-un-talks-assad-victory-first-time-army-rebels-fight-isis-2493842


Processing URLs:  48%|████▊     | 477/1000 [24:43<09:49,  1.13s/it]

Error extracting text from https://www.wsj.com/articles/new-u-s-china-rivalry-risks-lethal-confrontation-1484644399: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/new-u-s-china-rivalry-risks-lethal-confrontation-1484644399


Processing URLs:  48%|████▊     | 479/1000 [24:44<06:16,  1.38it/s]

Error extracting text from http://www.tandfonline.com/doi/pdf/10.1080/1350485021012458: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/pdf/10.1080/1350485021012458
Error extracting text from http://www.nytimes.com/2016/08/07/world/middleeast/military-syria-putin-us-proxy-war.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/07/world/middleeast/military-syria-putin-us-proxy-war.html?_r=0


Processing URLs:  48%|████▊     | 481/1000 [24:45<06:06,  1.42it/s]

Error extracting text from http://www.theamericanmirror.com/exclusive-records-show-ben-carson-became-republican-less-than-one-year-ago/: 403 Client Error: Forbidden for url: http://www.theamericanmirror.com/exclusive-records-show-ben-carson-became-republican-less-than-one-year-ago/


Processing URLs:  48%|████▊     | 483/1000 [24:47<07:14,  1.19it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/361448-gop-rep-we-need-a-counter-to-russian-disinformation: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/361448-gop-rep-we-need-a-counter-to-russian-disinformation/
URL filtered: https://twitter.com/GJ_Open


Processing URLs:  49%|████▉     | 488/1000 [24:57<14:07,  1.66s/it]

Error extracting text from http://www.illinoishomepage.net/news/capitol-news/budget-bill-on-hold-for-new-general-assembly/637767411: 500 Server Error: Domain Not Found for url: http://www.illinoishomepage.net/news/capitol-news/budget-bill-on-hold-for-new-general-assembly/637767411


Processing URLs:  49%|████▉     | 489/1000 [24:57<11:47,  1.38s/it]

Error extracting text from http://mobile.reuters.com/article/worldNews/idUSKCN0ZM1XL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/worldNews/idUSKCN0ZM1XL


Processing URLs:  49%|████▉     | 494/1000 [25:03<09:29,  1.13s/it]

Error extracting text from https://phys.org/news/2017-09-nasa-typhoon-doksuri-south-china.html: 400 Client Error: Bad request for url: https://phys.org/news/2017-09-nasa-typhoon-doksuri-south-china.html


Processing URLs:  50%|████▉     | 495/1000 [25:05<11:24,  1.35s/it]

Error extracting text from https://www.barrons.com/articles/repealing-section-230-portion-of-internet-law-would-be-terrible-for-airbnb-heres-why-51610409540: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/repealing-section-230-portion-of-internet-law-would-be-terrible-for-airbnb-heres-why-51610409540


Processing URLs:  50%|████▉     | 496/1000 [25:05<08:41,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/u-s-moves-to-cut-off-north-korea-from-banking-system-1464797927: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-moves-to-cut-off-north-korea-from-banking-system-1464797927
URL filtered: https://www.youtube.com/watch?v=uMlRpN8ANrU


Processing URLs:  50%|████▉     | 498/1000 [25:07<08:47,  1.05s/it]

Error extracting text from http://www.vlada.si/en/media_room/government_press_releases/press_release/article/act_ratifying_the_protocol_to_the_north_atlantic_treaty_on_the_accession_of_montenegro_58182/: 410 Client Error: Gone for url: https://www.gov.si/gone?src=http://www.vlada.si&url=http://vlada.arhiv-spletisc.gov.si/en/media_room/government_press_releases/press_release/article/act_ratifying_the_protocol_to_the_north_atlantic_treaty_on_the_accession_of_montenegro_58182/


Processing URLs:  50%|████▉     | 499/1000 [25:10<12:50,  1.54s/it]

Error extracting text from https://www.faa.gov/uas/programs_partnerships/focus_area_pathfinder/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/programs_partnerships/focus_area_pathfinder/
Error extracting text from https://www.reuters.com/article/us-illinois-budget-idUSKBN19Q2P1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-illinois-budget-idUSKBN19Q2P1


Processing URLs:  50%|█████     | 502/1000 [25:12<08:16,  1.00it/s]

Error extracting text from http://europe.newsweek.com/robert-mugabe-zimbabwe-428497: 403 Client Error: Forbidden for url: https://www.newsweek.com/robert-mugabe-zimbabwe-428497


Processing URLs:  50%|█████     | 503/1000 [25:13<09:33,  1.15s/it]

Error extracting text from http://www.ibtimes.com/double-charm-cerns-lhcb-experiment-discovers-new-type-baryon-2562486: 403 Client Error: Forbidden for url: https://www.ibtimes.com/double-charm-cerns-lhcb-experiment-discovers-new-type-baryon-2562486


Processing URLs:  50%|█████     | 505/1000 [25:17<10:28,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds-payment/holders-of-pdvsa-2020-bond-to-receive-late-payment-on-thursday-sources-idUSKBN1D02U1?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds-payment/holders-of-pdvsa-2020-bond-to-receive-late-payment-on-thursday-sources-idUSKBN1D02U1?il=0


Processing URLs:  51%|█████     | 509/1000 [25:21<08:15,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKCN12Q0B6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN12Q0B6
URL filtered: https://twitter.com/justinwolfers


Processing URLs:  52%|█████▏    | 516/1000 [25:35<18:46,  2.33s/it]

URL filtered: https://twitter.com/wfrhatch/status/742325765543464960


Processing URLs:  52%|█████▏    | 520/1000 [25:41<14:22,  1.80s/it]

Error extracting text from https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html


Processing URLs:  52%|█████▏    | 521/1000 [25:42<14:06,  1.77s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-24/boston-fed-joins-eight-other-banks-in-seeking-discount-rate-rise


Processing URLs:  52%|█████▏    | 524/1000 [25:44<08:55,  1.13s/it]

Error extracting text from http://www.wsj.com/articles/north-korea-plans-satellite-launch-1454425967: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-plans-satellite-launch-1454425967


Processing URLs:  53%|█████▎    | 526/1000 [25:48<09:40,  1.23s/it]

Error extracting text from http://www.politico.com/story/2017/03/neil-gorsuch-senate-confirmation-angus-king-: 404 Client Error: Not Found for url: https://www.politico.com/story/2017/03/neil-gorsuch-senate-confirmation-angus-king-


Processing URLs:  53%|█████▎    | 528/1000 [25:49<06:49,  1.15it/s]

Error extracting text from https://news.usni.org/2017/11/15/former-secnav-lehman-russian-cyber-forces-stealing-u-s-technological-edge: 403 Client Error: Forbidden for url: https://news.usni.org/2017/11/15/former-secnav-lehman-russian-cyber-forces-stealing-u-s-technological-edge


Processing URLs:  53%|█████▎    | 531/1000 [25:54<12:22,  1.58s/it]

Error extracting text from http://www.abc10.com/story/news/local/california/2016/01/28/drought-water-restrictions-to-continue/79491236/: 503 Server Error: Service Unavailable for url: https://www.abc10.com/story/news/local/california/2016/01/28/drought-water-restrictions-to-continue/79491236/


Processing URLs:  53%|█████▎    | 534/1000 [26:01<13:30,  1.74s/it]

URL filtered: https://www.youtube.com/watch?v=VsWELlReArQ


Processing URLs:  54%|█████▍    | 539/1000 [26:09<11:01,  1.43s/it]

Error extracting text from http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347869A1.pdf: 403 Client Error: Forbidden for url: http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347869A1.pdf


Processing URLs:  54%|█████▍    | 540/1000 [26:09<08:42,  1.14s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/are-russia-america-headed-missile-showdown-23152: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/are-russia-america-headed-missile-showdown-23152


Processing URLs:  54%|█████▍    | 541/1000 [26:10<07:57,  1.04s/it]

Error extracting text from http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=6m: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/crude-oil.aspx?timeframe=6m
URL filtered: https://www.youtube.com/watch?v=wILQc1d5fIE


Processing URLs:  55%|█████▍    | 547/1000 [26:14<05:44,  1.31it/s]

Error extracting text from https://www.nytimes.com/2022/05/12/sports/tennis/nadal-italian-open-shapovalov.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/05/12/sports/tennis/nadal-italian-open-shapovalov.html


Processing URLs:  55%|█████▌    | 552/1000 [26:31<23:08,  3.10s/it]

Error extracting text from https://www.nytimes.com/2017/05/17/us/politics/tax-code-republicans-ryan-mcconnell.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/17/us/politics/tax-code-republicans-ryan-mcconnell.html?_r=0
URL filtered: https://www.bloomberg.com/politics/articles/2017-03-15/schumer-likely-to-oppose-gorsuch-confirmation-as-hearing-nears


Processing URLs:  55%|█████▌    | 554/1000 [26:35<18:48,  2.53s/it]

Error extracting text from http://archive.boston.com/bostonglobe/ideas/articles/2011/09/11: 403 Client Error: Forbidden for url: http://archive.boston.com/bostonglobe/ideas/articles/2011/09/11/


Processing URLs:  56%|█████▌    | 556/1000 [26:35<11:11,  1.51s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-islamic-state-revenue-idUSKCN0Y22CW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-islamic-state-revenue-idUSKCN0Y22CW


Processing URLs:  56%|█████▌    | 557/1000 [26:35<08:38,  1.17s/it]

Error extracting text from https://www.yahoo.com/news/burundi-warns-against-execution-investigations-icc-105538473.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/burundi-warns-against-execution-investigations-icc-105538473.html


Processing URLs:  56%|█████▌    | 558/1000 [26:36<07:04,  1.04it/s]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S1286457916000083: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S1286457916000083


Processing URLs:  56%|█████▌    | 560/1000 [26:38<08:23,  1.14s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O3GR1A6K50XV01-7TRP22JOE7SKUMJDUP9A3DOVU5


Processing URLs:  56%|█████▋    | 563/1000 [26:39<05:04,  1.44it/s]

Error extracting text from http://fakty.ictv.ua/ru/ukraine/20170310-turechchyna-skasuye-vizy-dlya-ukrayintsiv-istorychna-ugoda/: 403 Client Error: Forbidden for url: https://fakty.com.ua/ru/ukraine/20170310-turechchyna-skasuye-vizy-dlya-ukrayintsiv-istorychna-ugoda/


Processing URLs:  56%|█████▋    | 565/1000 [26:40<03:53,  1.87it/s]

Error extracting text from http://www.chicagotribune.com/news/local/politics/ct-moodys-illinois-credit-rating-20170720-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/local/politics/ct-moodys-illinois-credit-rating-20170720-story.html
Error extracting text from https://balkaninsight.com/author/sinisa-jakov-marusic/: 403 Client Error: Forbidden for url: https://balkaninsight.com/author/sinisa-jakov-marusic/


Processing URLs:  57%|█████▋    | 566/1000 [26:42<06:57,  1.04it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/trump-lets-iran-deal-live-but-signals-he-may-not-for-long/articleshow/59657814.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/trump-lets-iran-deal-live-but-signals-he-may-not-for-long/articleshow/59657814.cms


Processing URLs:  57%|█████▋    | 567/1000 [26:43<06:08,  1.17it/s]

Error extracting text from http://on.wsj.com/2wgazYa: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-threatened-to-kill-the-at-t-time-warner-deal-but-its-very-much-alive-1502998077


Processing URLs:  57%|█████▋    | 569/1000 [26:50<13:41,  1.91s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/262306-poll-cruz-surges-ahead-of-trump-carson-in-iowa: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/262306-poll-cruz-surges-ahead-of-trump-carson-in-iowa/


Processing URLs:  57%|█████▋    | 571/1000 [26:55<15:50,  2.22s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-19/key-figures-in-saudi-arabia-s-2018-budget-2017-fiscal-data


Processing URLs:  57%|█████▋    | 574/1000 [26:58<11:23,  1.60s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/head-venezuelas-super-assembly-vows-target-opponents-49048928: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/head-venezuelas-super-assembly-vows-target-opponents-49048928


Processing URLs:  58%|█████▊    | 577/1000 [27:00<06:22,  1.11it/s]

Error extracting text from https://www.neweurope.eu/article/state-of-play-un-secretary-general-election/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/state-of-play-un-secretary-general-election/
Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VF05N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VF05N


Processing URLs:  58%|█████▊    | 579/1000 [27:00<03:45,  1.86it/s]

Error extracting text from https://www.nytimes.com/2021/11/03/world/weapons-ukraine-russia.html).: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/11/03/world/weapons-ukraine-russia.html).
Error extracting text from http://www.reuters.com/article/us-india-afghanistan-idUSKBN13T0CM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-afghanistan-idUSKBN13T0CM


Processing URLs:  58%|█████▊    | 580/1000 [27:02<07:12,  1.03s/it]

Error extracting text from http://original.antiwar.com/updates/2016/09/20/isis-fleeing-shirqat-65-killed-iraq/: 403 Client Error: Forbidden for url: https://original.antiwar.com/updates/2016/09/20/isis-fleeing-shirqat-65-killed-iraq/


Processing URLs:  58%|█████▊    | 583/1000 [27:09<11:49,  1.70s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/2016/02/impeachment-e-cassacao-ganham-muita-forca-com-o-envolvimento-do-marqueteiro-de-dilma-00802495.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/2016/02/impeachment-e-cassacao-ganham-muita-forca-com-o-envolvimento-do-marqueteiro-de-dilma-00802495.html&amp;prev=search
Error extracting text from http://www.financialexpress.com/article/fe-columnist/overplaying-the-china-card/279576/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/fe-columnist/overplaying-the-china-card/279576/


Processing URLs:  59%|█████▊    | 587/1000 [27:14<09:10,  1.33s/it]

Error extracting text from https://www.nytimes.com/2018/02/09/world/asia/kim-yo-jong-history-facts.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/09/world/asia/kim-yo-jong-history-facts.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  59%|█████▉    | 588/1000 [27:17<12:10,  1.77s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/721122/missile-defense-agency-budget-addresses-escalating-north-korea-iran-threats: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/721122/missile-defense-agency-budget-addresses-escalating-north-korea-iran-threats


Processing URLs:  59%|█████▉    | 593/1000 [27:23<09:48,  1.44s/it]

Error extracting text from http://tass.ru/en/world/849787: 404 Client Error: Not Found for url: https://tass.ru/en/world/849787


Processing URLs:  60%|█████▉    | 595/1000 [27:26<07:58,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-philippines-china-idUSKBN16K0UF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-china-idUSKBN16K0UF


Processing URLs:  60%|█████▉    | 596/1000 [27:29<13:11,  1.96s/it]

Error extracting text from http://morungexpress.com/long-live-the-wto-doha-development-agenda-is-dead/: 404 Client Error: Not Found for url: https://morungexpress.com/long-live-the-wto-doha-development-agenda-is-dead


Processing URLs:  60%|█████▉    | 598/1000 [27:30<07:32,  1.12s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-timeline-idUSKBN19A2FO?mod=related&amp;channelName=ousivMolt: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-timeline-idUSKBN19A2FO?mod=related&amp;channelName=ousivMolt


Processing URLs:  60%|█████▉    | 599/1000 [27:30<05:54,  1.13it/s]

Error extracting text from http://www.wsj.com/articles/sweden-launches-investigation-into-alleged-fraud-at-volkswagen-1452708889: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sweden-launches-investigation-into-alleged-fraud-at-volkswagen-1452708889


Processing URLs:  61%|██████    | 607/1000 [27:49<17:36,  2.69s/it]

Error extracting text from http://www.reuters.com/article/us-opec-oil-libya-idUSKCN0WO1JJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-oil-libya-idUSKCN0WO1JJ
URL filtered: https://www.bloomberg.com/politics/articles/2017-06-03/democrats-weigh-using-debt-ceiling-debate-to-thwart-gop-tax-cuts


Processing URLs:  61%|██████    | 609/1000 [27:50<10:21,  1.59s/it]

Error extracting text from https://www.amazon.com/Cases-Intelligence-Analysis-Structured-Techniques/dp/1608716813: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Cases-Intelligence-Analysis-Structured-Techniques/dp/1608716813


Processing URLs:  61%|██████    | 611/1000 [27:53<11:44,  1.81s/it]

Error extracting text from http://www.ibtimes.com/volkswagen-diesel-scandal-will-us-justice-department-file-criminal-charges-against-2110996: 403 Client Error: Forbidden for url: https://www.ibtimes.com/volkswagen-diesel-scandal-will-us-justice-department-file-criminal-charges-against-2110996


Processing URLs:  61%|██████▏   | 614/1000 [27:56<06:44,  1.05s/it]

Error extracting text from http://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUSL5N1221K820151011: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUSL5N1221K820151011
URL filtered: https://www.wired.com/story/the-curious-case-of-a-revolutionary-but-imaginary-superconductor/?__twitter_impression=true&mbid=social_twitter


Processing URLs:  62%|██████▏   | 616/1000 [27:56<04:02,  1.59it/s]

Error extracting text from http://www.wsj.com/articles/sen-delcidio-do-amarals-jailing-leaves-brazils-politics-paralyzed-1449012255: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sen-delcidio-do-amarals-jailing-leaves-brazils-politics-paralyzed-1449012255


Processing URLs:  62%|██████▏   | 618/1000 [27:58<05:15,  1.21it/s]

Error extracting text from https://www.yahoo.com/finance/news/venezuelan-presidents-opponents-lay-siege-air-213206741.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/venezuelan-presidents-opponents-lay-siege-air-213206741.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN0UC0JP20151229&gt: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN0UC0JP20151229&gt


Processing URLs:  62%|██████▏   | 620/1000 [28:10<18:32,  2.93s/it]

Error extracting text from http://www.cbs.com/shows/the-late-show-with-stephen-colbert/news/1004703/watch-colbert-s-heartfelt-interview-with-vice-president-joe-biden/: Exceeded 30 redirects.


Processing URLs:  62%|██████▏   | 622/1000 [28:14<16:17,  2.59s/it]

Error extracting text from http://en.abna24.com/service/iran/archive/2016/08/21/773478/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/iran/archive/2016/08/21/773478/story.html


Processing URLs:  63%|██████▎   | 631/1000 [28:27<10:13,  1.66s/it]

Error extracting text from http://georgiatoday.ge/news/2344/US-President-Invites-Georgian-PM-to-Nuclear-Security-Summit: 404 Client Error: Not Found for url: http://georgiatoday.ge/news/2344/US-President-Invites-Georgian-PM-to-Nuclear-Security-Summit


Processing URLs:  63%|██████▎   | 633/1000 [28:29<07:35,  1.24s/it]

Error extracting text from https://www.reuters.com/article/uk-britain-politics-scotland-idUSKBN2AB142: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-politics-scotland-idUSKBN2AB142


Processing URLs:  64%|██████▍   | 638/1000 [28:38<09:33,  1.58s/it]

Error extracting text from https://bit.ly/3tCOe0y: 403 Client Error: Forbidden for url: https://www.scotsman.com/health/new-edinburgh-sick-kids-hospital-building-delayed-again-3102837


Processing URLs:  64%|██████▍   | 645/1000 [28:57<11:02,  1.87s/it]

Error extracting text from http://english.aawsat.com/2015/12/article55346087/bahraini-parliament-seeks-recognition-of-ahwazs-occupation: 403 Client Error: Forbidden for url: http://english.aawsat.com/2015/12/article55346087/bahraini-parliament-seeks-recognition-of-ahwazs-occupation


Processing URLs:  65%|██████▍   | 646/1000 [29:00<11:36,  1.97s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/WPV_2011-2016_23FEB.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/WPV_2011-2016_23FEB.pdf


Processing URLs:  65%|██████▌   | 650/1000 [30:01<1:08:23, 11.72s/it]

Error extracting text from http://www.usnews.com/news/top-news/articles/2017-02-22/showdown-looms-for-protesters-near-site-of-dakota-access-pipeline: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.washingtontimes.com/news/2016/jan/5/steve-deace-ted-cruz-ethanols-best-friend/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/5/steve-deace-ted-cruz-ethanols-best-friend/


Processing URLs:  65%|██████▌   | 651/1000 [30:02<52:22,  9.00s/it]  

Error extracting text from http://www.newsweek.com/burundi-rebels-claim-rwanda-military-training-report-422930: 403 Client Error: Forbidden for url: https://www.newsweek.com/burundi-rebels-claim-rwanda-military-training-report-422930


Processing URLs:  65%|██████▌   | 654/1000 [30:06<22:25,  3.89s/it]

Error extracting text from http://thehill.com/policy/finance/258912-house-democrats-urge-opposition-of-export-import-bank-amendments-to-highway: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/258912-house-democrats-urge-opposition-of-export-import-bank-amendments-to-highway/
URL filtered: https://www.bloomberg.com/news/articles/2017-10-11/criminal-probe-of-uber-may-freeze-waymo-s-trade-secrets-trial


Processing URLs:  66%|██████▌   | 658/1000 [30:09<09:41,  1.70s/it]

Error extracting text from http://emarketalerts.forecast1.com/mic/eabstract.cfm?recno=237650: HTTPConnectionPool(host='emarketalerts.forecast1.com', port=80): Max retries exceeded with url: /mic/eabstract.cfm?recno=237650 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fece2c60>: Failed to resolve 'emarketalerts.forecast1.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  66%|██████▌   | 661/1000 [30:12<06:09,  1.09s/it]

URL filtered: https://www.youtube.com/watch?v=Zo1naJEacE8
Error extracting text from http://www.nytimes.com/2015/09/11/world/middleeast/whats-next-for-the-iran-nuclear-deal.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/11/world/middleeast/whats-next-for-the-iran-nuclear-deal.html?_r=1


Processing URLs:  66%|██████▋   | 663/1000 [30:13<05:35,  1.00it/s]

Error extracting text from http://akeza.net/le-chanteur-furious-big-annonce-son-retour-au-burundi/: 412 Client Error: Precondition Failed for url: http://akeza.net/le-chanteur-furious-big-annonce-son-retour-au-burundi/


Processing URLs:  67%|██████▋   | 668/1000 [30:21<07:33,  1.36s/it]

Error extracting text from http://allafrica.com/stories/201801210060.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201801210060.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x302e09a90>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  67%|██████▋   | 670/1000 [30:22<05:53,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/u-s-justice-department-conducts-criminal-probe-of-volkswagen-sources-say-1442869059: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-justice-department-conducts-criminal-probe-of-volkswagen-sources-say-1442869059


Processing URLs:  68%|██████▊   | 675/1000 [30:28<06:27,  1.19s/it]

Error extracting text from http://www.business-standard.com/article/opinion/shankar-acharya-myanmar-s-historic-election-115120901348_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/opinion/shankar-acharya-myanmar-s-historic-election-115120901348_1.html


Processing URLs:  68%|██████▊   | 676/1000 [31:28<1:41:55, 18.88s/it]

Error extracting text from http://blogs.rollcall.com/218/hoyer-hints-boehner-wants-vote-reauthorize-ex-im/: HTTPConnectionPool(host='blogs.rollcall.com', port=80): Max retries exceeded with url: /218/hoyer-hints-boehner-wants-vote-reauthorize-ex-im/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3027e93a0>, 'Connection to blogs.rollcall.com timed out. (connect timeout=60)'))


Processing URLs:  68%|██████▊   | 679/1000 [31:32<38:16,  7.15s/it]  

Error extracting text from https://www.maritime-executive.com/article/two-chinese-fishing-vessels-detained-after-body-found-on-board#:~:text=Indonesian%20authorities%20have%20detained%20two,Batam%20Island%20for%20a%20search: 403 Client Error: Forbidden for url: https://www.maritime-executive.com/article/two-chinese-fishing-vessels-detained-after-body-found-on-board#:~:text=Indonesian%20authorities%20have%20detained%20two,Batam%20Island%20for%20a%20search


Processing URLs:  68%|██████▊   | 684/1000 [31:36<09:10,  1.74s/it]

URL filtered: https://mobile.twitter.com/burgessev/status/1343360876162805766


Processing URLs:  69%|██████▊   | 687/1000 [31:41<08:07,  1.56s/it]

Error extracting text from https://www.nytimes.com/2017/02/14/opinion/what-trump-is-doing-is-not-ok.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/14/opinion/what-trump-is-doing-is-not-ok.html


Processing URLs:  69%|██████▉   | 690/1000 [31:44<06:07,  1.18s/it]

Error extracting text from https://www.wsj.com/articles/trump-administration-set-to-impose-new-sanctions-on-iran-entities-as-soon-as-friday-1486071696: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-administration-set-to-impose-new-sanctions-on-iran-entities-as-soon-as-friday-1486071696


Processing URLs:  69%|██████▉   | 692/1000 [31:54<12:52,  2.51s/it]

Error extracting text from http://arynews.tv/en/preparations-underway-for-pm-nawaz-return-to-pakistan/: 403 Client Error: Forbidden for url: http://arynews.tv/en/preparations-underway-for-pm-nawaz-return-to-pakistan/


Processing URLs:  70%|██████▉   | 695/1000 [31:59<10:41,  2.10s/it]

Error extracting text from https://www.reuters.com/article/us-autos-selfdriving-waymo/alphabet-looks-to-snowy-michigan-to-test-self-driving-cars-idUSKBN1CV1SN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-autos-selfdriving-waymo/alphabet-looks-to-snowy-michigan-to-test-self-driving-cars-idUSKBN1CV1SN


Processing URLs:  70%|███████   | 700/1000 [32:04<06:08,  1.23s/it]

Error extracting text from http://www.business-standard.com/article/pti-stories/may-form-adiz-in-scs-if-maritime-security-threatened-china-115053100585_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/pti-stories/may-form-adiz-in-scs-if-maritime-security-threatened-china-115053100585_1.html
URL filtered: https://www.bloomberg.com/news/articles/2017-05-06/iran-will-go-along-with-what-opec-decides-on-extending-oil-cuts


Processing URLs:  71%|███████   | 709/1000 [32:18<06:45,  1.39s/it]

Error extracting text from http://pressroom.toyota.com/releases/2015+nada+jd+power+autoconference+fay.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/2015+nada+jd+power+autoconference+fay/


Processing URLs:  71%|███████   | 710/1000 [32:19<05:42,  1.18s/it]

Error extracting text from https://asia.nikkei.com/Business/Agriculture/China-warns-of-worst-in-history-winter-wheat-crop).: 404 Client Error: Not Found for url: https://asia.nikkei.com/Business/Agriculture/China-warns-of-worst-in-history-winter-wheat-crop).


Processing URLs:  71%|███████   | 711/1000 [32:19<04:22,  1.10it/s]

Error extracting text from http://www.khaama.com/ghani-assigns-delegation-to-investigate-alleged-support-by-govt-to-isis-1743: 403 Client Error: Forbidden for url: http://www.khaama.com/ghani-assigns-delegation-to-investigate-alleged-support-by-govt-to-isis-1743


Processing URLs:  71%|███████   | 712/1000 [32:20<04:22,  1.10it/s]

Error extracting text from http://www.un.org/press/en/2015/sc12171.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2015/sc12171.doc.htm


Processing URLs:  72%|███████▏  | 715/1000 [32:25<06:05,  1.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-15/saudi-prince-reiterates-oil-freeze-depends-on-others-joining-in2c81r3


Processing URLs:  72%|███████▏  | 718/1000 [32:27<04:03,  1.16it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/356914-state-officials-press-congress-for-more-election-cyber-resources: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/356914-state-officials-press-congress-for-more-election-cyber-resources/


Processing URLs:  72%|███████▏  | 719/1000 [32:29<05:11,  1.11s/it]

Error extracting text from http://www.parl.ca/LegisInfo/BillDetails.aspx?Bill=C45&amp;Language=E&amp;Mode=1&amp;Parl=42&amp;Ses=1: 404 Client Error: Not Found for url: https://www.parl.ca/ErrorPage/Default.aspx?Url=https%3a%2f%2fwww.parl.ca%2fLegisInfo%2fBillDetails.aspx%3fBill%3dC45%26amp%3bLanguage%3dE%26amp%3bMode%3d1%26amp%3bParl%3d42%26amp%3bSes%3d1&StatusCode=404


Processing URLs:  72%|███████▏  | 721/1000 [32:33<06:43,  1.45s/it]

Error extracting text from http://electionlawblog.org/?p=80332: 403 Client Error: Forbidden for url: http://electionlawblog.org/?p=80332


Processing URLs:  72%|███████▏  | 724/1000 [32:40<10:13,  2.22s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O41E2T6KLVRG01-0EJ1IFCB0VKS4MI06D3RRHJE5V


Processing URLs:  73%|███████▎  | 732/1000 [32:49<05:14,  1.17s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NY0WKT6K50XS01-6VBBTD36J3039NASCVARLST9EI


Processing URLs:  73%|███████▎  | 734/1000 [32:52<05:21,  1.21s/it]

Error extracting text from http://www.praguemonitor.com/2016/05/10/intmin-border-checks-odds-eu-legislation: 500 Server Error: Internal Server Error for url: https://praguemonitor.com/2016/05/10/intmin-border-checks-odds-eu-legislation


Processing URLs:  74%|███████▎  | 736/1000 [32:55<05:15,  1.20s/it]

Error extracting text from http://www.nytimes.com/2016/01/14/us/politics/ted-cruz-wall-street-loan-senate-bid-2012.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/us/politics/ted-cruz-wall-street-loan-senate-bid-2012.html?_r=0


Processing URLs:  74%|███████▎  | 737/1000 [32:55<04:38,  1.06s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/08/gitrep-8may16am/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/08/gitrep-8may16am/


Processing URLs:  74%|███████▍  | 743/1000 [33:14<09:14,  2.16s/it]

Error extracting text from http://www.un.org/depts/los/convention_agreements/convention_declarations.htm#China%20Upon%20ratification: 403 Client Error: Forbidden for url: https://www.un.org/depts/los/convention_agreements/convention_declarations.htm#China%20Upon%20ratification


Processing URLs:  74%|███████▍  | 744/1000 [33:16<09:37,  2.25s/it]

Error extracting text from http://inserbia.info/today/2016/03/djukanovic-montenegro-may-become-nato-member-by-mid-2017/: 404 Client Error: Not Found for url: https://inserbia.info/today/2016/03/djukanovic-montenegro-may-become-nato-member-by-mid-2017/


Processing URLs:  75%|███████▍  | 747/1000 [33:20<07:20,  1.74s/it]

Error extracting text from http://agilisanalysis.com/home/?p=321: HTTPConnectionPool(host='agilisanalysis.com', port=80): Max retries exceeded with url: /home/?p=321 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303774200>: Failed to resolve 'agilisanalysis.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  75%|███████▌  | 750/1000 [33:22<04:52,  1.17s/it]

Error extracting text from http://blogs.barrons.com/asiastocks/2017/01/17/trump-dollar-is-too-strong-and-its-killing-us/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/asiastocks/2017/01/17/trump-dollar-is-too-strong-and-its-killing-us/


Processing URLs:  75%|███████▌  | 751/1000 [33:23<04:35,  1.11s/it]

Error extracting text from http://www.brookings.edu/blogs/order-from-chaos/posts/2015/09/11-russia-america-same-mistakes-syria-baev-shapiro?utm_campaign=Brookings+Brief&amp;utm_source=hs_email&amp;utm_medium=email&amp;utm_content=22021030&amp;_hsenc=p2ANqtz-9B7ty93z4qAwB1arfW_SqE6XXIiNucI6Q8PTahFOY5ergAYLM44XLxczz6hf5RctYtM_zIfOn3EGfVGX1lfjSaFSMOsyH3h0C0XrEak06E5bPbUuY&amp;_hsmi=22021030: 404 Client Error: Not Found for url: https://www.brookings.edu/blogs/order-from-chaos/posts/2015/09/11-russia-america-same-mistakes-syria-baev-shapiro?utm_campaign=Brookings+Brief&amp;utm_source=hs_email&amp;utm_medium=email&amp;utm_content=22021030&amp;_hsenc=p2ANqtz-9B7ty93z4qAwB1arfW_SqE6XXIiNucI6Q8PTahFOY5ergAYLM44XLxczz6hf5RctYtM_zIfOn3EGfVGX1lfjSaFSMOsyH3h0C0XrEak06E5bPbUuY&amp;_hsmi=22021030


Processing URLs:  75%|███████▌  | 754/1000 [33:26<03:28,  1.18it/s]

Error extracting text from http://www.nasdaq.com/article/bayer-makes-62-billion-bid-for-monsanto--4th-update-20160523-00659: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/bayer-makes-62-billion-bid-for-monsanto--4th-update-20160523-00659


Processing URLs:  76%|███████▌  | 756/1000 [33:26<02:12,  1.84it/s]

Error extracting text from http://nriworld.net/2016/04/mosul-will-eventually-be-retaken-obama/#sthash.f2xmtIti.dpuf: 404 Client Error: Not Found for url: http://nriworld.net/2016/04/mosul-will-eventually-be-retaken-obama/#sthash.f2xmtIti.dpuf


Processing URLs:  76%|███████▌  | 757/1000 [33:26<02:04,  1.95it/s]

Error extracting text from http://www.weforum.org/agenda/2016/02/can-south-africa-avoid-another-credit-rating-downgrade: 403 Client Error: Forbidden for url: http://www.weforum.org/agenda/2016/02/can-south-africa-avoid-another-credit-rating-downgrade


Processing URLs:  76%|███████▌  | 758/1000 [33:28<02:51,  1.42it/s]

Error extracting text from https://reut.rs/3hYbT8v: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/usa-fed/fed-signals-bond-buying-taper-coming-soon-rate-hike-in-2022-idUSKBN2GI0BQ


Processing URLs:  76%|███████▌  | 760/1000 [33:32<05:31,  1.38s/it]

Error extracting text from http://bigelowaerospace.com/about/strategic-relationships/spacex/: 404 Client Error: Not Found for url: https://bigelowaerospace.com/about/strategic-relationships/spacex/


Processing URLs:  77%|███████▋  | 768/1000 [33:56<14:19,  3.71s/it]

URL filtered: https://www.youtube.com/watch?v=8LlVnNNrw_M


Processing URLs:  77%|███████▋  | 770/1000 [33:58<08:33,  2.23s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-05/24/c_135384578.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-05/24/c_135384578.htm


Processing URLs:  77%|███████▋  | 773/1000 [34:02<07:01,  1.86s/it]

URL filtered: http://www.Bloomberg.com/tech
Error extracting text from http://www.reuters.com/article/2015/09/17/us-brazil-rousseff-idUSKCN0RH2NL20150917: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/17/us-brazil-rousseff-idUSKCN0RH2NL20150917


Processing URLs:  78%|███████▊  | 780/1000 [34:18<09:11,  2.51s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-OE5ZNL6TTDSC01-1V1LFAVJJ8J40LPFS007M218SM


Processing URLs:  78%|███████▊  | 784/1000 [34:27<09:45,  2.71s/it]

Error extracting text from http://www.bna.com/us-eu-look-n57982065951/: 403 Client Error: Forbidden for url: https://www.bloombergindustry.com/
URL filtered: https://www.bloomberg.com/gadfly/articles/2017-10-13/saudi-aramco-ipo-delay-private-offering-not-much-better


Processing URLs:  79%|███████▊  | 787/1000 [34:30<05:48,  1.64s/it]

Error extracting text from http://www.arabnews.com/node/996671/business-economy: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/996671/business-economy


Processing URLs:  79%|███████▉  | 792/1000 [34:33<02:48,  1.24it/s]

Error extracting text from http://www.nytimes.com/2016/06/24/us/supreme-court-immigration-obama-dapa.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/24/us/supreme-court-immigration-obama-dapa.html?_r=0


Processing URLs:  79%|███████▉  | 793/1000 [34:34<02:26,  1.41it/s]

URL filtered: https://www.youtube.com/watch?v=NHRHUHW6HQE#t=1h20m00s


Processing URLs:  80%|███████▉  | 796/1000 [34:35<01:26,  2.36it/s]

Error extracting text from http://www.politico.eu/article/how-to-fix-europe/&gt: 404 Client Error: Not Found for url: https://www.politico.eu/article/how-to-fix-europe/&gt
Error extracting text from http://www.reuters.com/article/us-safrica-election-anc-idUSKCN10P0KW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-election-anc-idUSKCN10P0KW?il=0


Processing URLs:  80%|███████▉  | 798/1000 [34:37<02:13,  1.51it/s]

Error extracting text from https://www.senate.gov/history/partydiv.htm: 403 Client Error: Forbidden for url: https://www.senate.gov/history/partydiv.htm


Processing URLs:  80%|███████▉  | 799/1000 [34:37<02:12,  1.52it/s]

URL filtered: https://www.youtube.com/watch?v=WhF6Yzws5PU


Processing URLs:  80%|████████  | 803/1000 [34:39<01:50,  1.78it/s]

Error extracting text from http://www.reuters.com/article/us-safrica-zuma-idUSKCN0ZA136: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-zuma-idUSKCN0ZA136
URL filtered: https://twitter.com/michaelh992/status/1495742444167643144?s=21


Processing URLs:  81%|████████  | 808/1000 [34:43<02:37,  1.22it/s]

URL filtered: https://www.bloomberg.com/news/articles/2021-09-21/asia-stocks-set-for-muted-open-with-focus-on-china-markets-wrap?cmpid=BBD092221_BIZ&amp;utm_medium=email&amp;utm_source=newsletter&amp;utm_term=210922&amp;utm_campaign=bloombergdaily


Processing URLs:  81%|████████  | 811/1000 [35:03<13:32,  4.30s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/russian-real-estate-deals-never-materialized-for-trump/2017/03/04/07897134-00ea-11e7-9b78-824ccab94435_story.html?utm_term=.269dcae807f9: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/russian-real-estate-deals-never-materialized-for-trump/2017/03/04/07897134-00ea-11e7-9b78-824ccab94435_story.html?utm_term=.269dcae807f9


Processing URLs:  82%|████████▏ | 816/1000 [35:11<06:46,  2.21s/it]

Error extracting text from http://www.maritime-executive.com/article/iran-slowly-sells-its-floating-storage-crude-stocks: 404 Client Error: Not Found for url: https://www.maritime-executive.com/403.shtml


Processing URLs:  82%|████████▏ | 820/1000 [35:16<04:42,  1.57s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/imf-to-start-to-mull-new-greek-loan-program-as-soon-as-january


Processing URLs:  82%|████████▏ | 824/1000 [35:20<03:56,  1.34s/it]

Error extracting text from http://atimes.com/2016/06/chinese-espionage-and-intelligence-activities-at-all-time-high-experts-say/: 404 Client Error: Not Found for url: https://atimes.com/2016/06/chinese-espionage-and-intelligence-activities-at-all-time-high-experts-say/


Processing URLs:  82%|████████▎ | 825/1000 [35:21<03:16,  1.12s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/314374-trumps-first-foreign-trip-as-president-will-be-to-meet-putin: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/314374-trumps-first-foreign-trip-as-president-will-be-to-meet-putin/


Processing URLs:  83%|████████▎ | 830/1000 [35:37<06:57,  2.45s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_republican_presidential_primary-3350.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_republican_presidential_primary-3350.html


Processing URLs:  83%|████████▎ | 832/1000 [35:40<05:22,  1.92s/it]

Error extracting text from http://www.reuters.tv/v/FRC/2017/03/15/china-begins-new-work-on-disputed-islands: HTTPConnectionPool(host='www.reuters.tv', port=80): Max retries exceeded with url: /v/FRC/2017/03/15/china-begins-new-work-on-disputed-islands (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303a77ad0>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▍ | 838/1000 [35:58<09:09,  3.39s/it]

Error extracting text from http://thehill.com/policy/energy-environment/314940-judge-rules-dakota-access-study-can-move-forward: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/314940-judge-rules-dakota-access-study-can-move-forward/


Processing URLs:  84%|████████▍ | 841/1000 [36:04<06:01,  2.27s/it]

Error extracting text from http://vestnikkavkaza.net/articles/Akkuyu-has-prospects.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/articles/Akkuyu-has-prospects.html
Error extracting text from http://www.reuters.com/article/us-burundi-violence-idUSKCN0XE0OT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-violence-idUSKCN0XE0OT


Processing URLs:  84%|████████▍ | 843/1000 [36:06<04:28,  1.71s/it]

Error extracting text from http://www.timesca.com/news/16502-stratfor-s-global-intelligence-week-of-april-4-2016: 403 Client Error: Forbidden for url: http://www.timesca.com/news/16502-stratfor-s-global-intelligence-week-of-april-4-2016


Processing URLs:  84%|████████▍ | 844/1000 [36:07<03:37,  1.39s/it]

URL filtered: https://www.youtube.com/watch?v=WRoG0kXnBSM&feature=youtu.be


Processing URLs:  85%|████████▍ | 847/1000 [36:08<01:55,  1.33it/s]

Error extracting text from https://news.usni.org/tag/u-s-pacific-command: 403 Client Error: Forbidden for url: https://news.usni.org/tag/u-s-pacific-command


Processing URLs:  85%|████████▍ | 849/1000 [36:10<02:27,  1.02it/s]

Error extracting text from http://blogs.wsj.com/brussels/2016/09/28/bulgaria-names-new-candidate-for-u-n-secretary-general/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/brussels/2016/09/28/bulgaria-names-new-candidate-for-u-n-secretary-general/


Processing URLs:  85%|████████▌ | 854/1000 [36:17<02:56,  1.21s/it]

Error extracting text from https://www.scotsman.com/news/uk-news/scottish-independence-boris-johnson-to-assert-indyref2-will-not-be-granted-even-if-snp-win-election-in-may-3163419: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/uk-news/scottish-independence-boris-johnson-to-assert-indyref2-will-not-be-granted-even-if-snp-win-election-in-may-3163419


Processing URLs:  86%|████████▌ | 855/1000 [36:18<02:41,  1.11s/it]

Error extracting text from http://politicscounter.com/?p=77: 403 Client Error: Forbidden for url: https://politicscounter.com/?p=77


Processing URLs:  86%|████████▌ | 857/1000 [36:30<07:53,  3.31s/it]

Error extracting text from http://www.szse.cn/szseWeb/FrontController.szse?ACTIONID=7&amp;AJAX=AJAX-TRUE&amp;CATALOGID=1845&amp;TABKEY=tab2: 404 Client Error: Not Found for url: http://www.szse.cn/szseWeb/FrontController.szse?ACTIONID=7&amp;AJAX=AJAX-TRUE&amp;CATALOGID=1845&amp;TABKEY=tab2


Processing URLs:  86%|████████▌ | 858/1000 [36:31<06:01,  2.54s/it]

Error extracting text from https://news.google.com/articles/CBMiY2h0dHBzOi8vdGhlaGlsbC5jb20vcG9saWN5L2ludGVybmF0aW9uYWwvNTUyMDIxLXVzLWlyYW4tc2lnbmFsLXBvc3NpYmxlLWJyZWFrdGhyb3VnaHMtaW4tbnVrZS10YWxrc9IBZ2h0dHBzOi8vdGhlaGlsbC5jb20vcG9saWN5L2ludGVybmF0aW9uYWwvNTUyMDIxLXVzLWlyYW4tc2lnbmFsLXBvc3NpYmxlLWJyZWFrdGhyb3VnaHMtaW4tbnVrZS10YWxrcz9hbXA?hl=en-US&amp;gl=US&amp;ceid=US%3Aen: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/552021-us-iran-signal-possible-breakthroughs-in-nuke-talks/


Processing URLs:  86%|████████▋ | 863/1000 [36:46<07:35,  3.33s/it]

Error extracting text from https://minnlawyer.com/2020/06/24/when-supreme-court-justices-defy-expectations/: 403 Client Error: Forbidden for url: https://minnlawyer.com/2020/06/24/when-supreme-court-justices-defy-expectations/


Processing URLs:  86%|████████▋ | 865/1000 [37:49<45:18, 20.14s/it]

Error extracting text from http://www.globalcube.net/clients/eacb/content/medias/publications/position_papers/Banking_Supervision/EACB_Letter_on_EDIS_Mrs_de_Lange.pdf: HTTPConnectionPool(host='www.globalcube.net', port=80): Max retries exceeded with url: /clients/eacb/content/medias/publications/position_papers/Banking_Supervision/EACB_Letter_on_EDIS_Mrs_de_Lange.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303988050>, 'Connection to www.globalcube.net timed out. (connect timeout=60)'))


Processing URLs:  87%|████████▋ | 868/1000 [37:55<17:59,  8.18s/it]

Error extracting text from https://www.cdc.gov/flu/pandemic-resources/monitoring/viruses-concern.html: 404 Client Error: Not Found for url: https://www.cdc.gov/flu/pandemic-resources/monitoring/viruses-concern.html


Processing URLs:  87%|████████▋ | 869/1000 [37:57<13:46,  6.31s/it]

Error extracting text from http://news.markets/bonds/emerging-market-investors-should-keep-a-close-eye-on-turkey-3335/: HTTPConnectionPool(host='news.markets', port=80): Max retries exceeded with url: /bonds/emerging-market-investors-should-keep-a-close-eye-on-turkey-3335/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe44a4e0>: Failed to resolve 'news.markets' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  87%|████████▋ | 871/1000 [38:00<08:38,  4.02s/it]

URL filtered: http://www.salon.com/2016/04/01/cracks_in_the_gop_wall_the_republicans_hardline_supreme_court_obstruction_is_crumbling/?utm_source=twitter&amp;utm_medium=socialflow


Processing URLs:  87%|████████▋ | 874/1000 [38:01<04:30,  2.15s/it]

Error extracting text from https://www.predictit.org/Contract/1792/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2016#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/1792/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2016#data


Processing URLs:  88%|████████▊ | 879/1000 [38:08<02:41,  1.34s/it]

Error extracting text from https://mgmt.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=9317: 404 Client Error: Not Found for url: https://mgmt.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=9317


Processing URLs:  88%|████████▊ | 880/1000 [38:09<02:32,  1.27s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/african-union-names-panel/2492638.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/african-union-names-panel/2492638.html


Processing URLs:  88%|████████▊ | 882/1000 [38:11<02:02,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13V1G1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13V1G1


Processing URLs:  89%|████████▉ | 891/1000 [38:29<02:47,  1.53s/it]

Error extracting text from http://blogs.wsj.com/washwire/2015/11/12/postal-workers-union-endorses-bernie-sanders-in-boost-to-underdog/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/11/12/postal-workers-union-endorses-bernie-sanders-in-boost-to-underdog/


Processing URLs:  90%|████████▉ | 897/1000 [38:37<02:22,  1.39s/it]

URL filtered: https://www.todayonline.com/world/no-olympics-if-no-athletes-come-japan-says-tokyo-2020-president-0?utm_source=dlvr.it&amp;utm_medium=twitter


Processing URLs:  90%|████████▉ | 899/1000 [38:37<01:24,  1.20it/s]

Error extracting text from https://www.nytimes.com/2021/07/19/world/americas/claude-joseph-haiti-stepping-down.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/19/world/americas/claude-joseph-haiti-stepping-down.html


Processing URLs:  91%|█████████ | 906/1000 [38:50<01:58,  1.26s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-oil-insight/venezuelas-deteriorating-oil-quality-riles-major-refiners-idUSKBN1CN2EO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-insight/venezuelas-deteriorating-oil-quality-riles-major-refiners-idUSKBN1CN2EO


Processing URLs:  91%|█████████ | 907/1000 [38:52<02:31,  1.63s/it]

Error extracting text from https://www.stripes.com/news/marines-ready-to-deploy-in-wake-of-trump-s-jerusalem-announcement-1.501409: 404 Client Error: Not Found for url: https://www.stripes.com/news/marines-ready-to-deploy-in-wake-of-trump-s-jerusalem-announcement-1.501409


Processing URLs:  92%|█████████▏| 915/1000 [39:02<01:20,  1.06it/s]

Error extracting text from http://www.nytimes.com/2016/01/19/us/politics/evangelicals-see-donald-trump-as-man-of-conviction-if-not-faith.html?emc=edit_th_20160119&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/us/politics/evangelicals-see-donald-trump-as-man-of-conviction-if-not-faith.html?emc=edit_th_20160119&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  92%|█████████▏| 919/1000 [39:07<01:39,  1.23s/it]

Error extracting text from http://www.nytimes.com/2015/10/20/world/middleeast/iranian-lawmaker-accuses-washington-post-reporter-jason-rezaian-of-sedition-plot.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/20/world/middleeast/iranian-lawmaker-accuses-washington-post-reporter-jason-rezaian-of-sedition-plot.html
Error extracting text from http://m.arabianbusiness.com/saudi-arabia-details-forthcoming-new-rules-on-ipo-share-pricing-643417.html: HTTPConnectionPool(host='m.arabianbusiness.com', port=80): Max retries exceeded with url: /saudi-arabia-details-forthcoming-new-rules-on-ipo-share-pricing-643417.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb7ec0>: Failed to resolve 'm.arabianbusiness.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▏| 921/1000 [39:08<01:07,  1.17it/s]

Error extracting text from http://energyfuse.org/opec-leaves-market-forces-to-rebalance-the-oil-market/: 403 Client Error: Forbidden for url: http://energyfuse.org/opec-leaves-market-forces-to-rebalance-the-oil-market/


Processing URLs:  92%|█████████▏| 923/1000 [39:09<00:49,  1.55it/s]

Error extracting text from http://www.wsj.com/articles/sterling-soars-after-brexit-poll-shows-preference-to-stay-in-eu-1466407254: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sterling-soars-after-brexit-poll-shows-preference-to-stay-in-eu-1466407254


Processing URLs:  92%|█████████▏| 924/1000 [39:10<00:48,  1.57it/s]

Error extracting text from http://warontherocks.com/2015/11/lessons-from-the-liberation-of-sinjar/: 403 Client Error: Forbidden for url: http://warontherocks.com/2015/11/lessons-from-the-liberation-of-sinjar/
URL filtered: https://www.bloomberg.com/news/articles/2016-11-02/venezuelan-credit-dashboard-default-risk-rises-even-with-swap


Processing URLs:  93%|█████████▎| 926/1000 [39:11<00:49,  1.49it/s]

Error extracting text from https://reut.rs/3rM1iAm: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-boe-rates/what-the-bank-of-england-policymakers-have-said-about-the-recovery-outlook-idUSKBN2B722W?il=0


Processing URLs:  93%|█████████▎| 927/1000 [39:12<00:56,  1.29it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/china-stock-index-futures-climb-after-biggest-gain-in-a-month


Processing URLs:  93%|█████████▎| 932/1000 [39:18<00:59,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-usa-iran-cyber-idUSKCN0WC2NH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-cyber-idUSKCN0WC2NH


Processing URLs:  94%|█████████▎| 936/1000 [39:23<01:13,  1.15s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/myanmar-parliament-brings/2559650.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/myanmar-parliament-brings/2559650.html


Processing URLs:  94%|█████████▍| 938/1000 [39:23<00:42,  1.47it/s]

Error extracting text from https://www.nytimes.com/2017/04/25/world/europe/turkey-referendum-judges.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/25/world/europe/turkey-referendum-judges.html


Processing URLs:  94%|█████████▍| 941/1000 [39:27<00:50,  1.16it/s]

URL filtered: http://www.bloombergview.com/articles/2016-01-25/opposition-says-kerry-threatens-aid-in-syrian-peace-effort
Error extracting text from https://www.rottentomatoes.com/m/get_out: 403 Client Error: Forbidden for url: https://www.rottentomatoes.com/m/get_out


Processing URLs:  94%|█████████▍| 943/1000 [39:29<00:51,  1.12it/s]

Error extracting text from http://www.wsj.com/articles/rate-rise-bets-heat-up-after-fed-minutes-1447890850: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/rate-rise-bets-heat-up-after-fed-minutes-1447890850


Processing URLs:  94%|█████████▍| 945/1000 [39:31<00:48,  1.15it/s]

Error extracting text from https://balkaninsight.com/2021/09/24/north-macedonia-census-takers-rush-to-complete-head-count/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/09/24/north-macedonia-census-takers-rush-to-complete-head-count/


Processing URLs:  95%|█████████▍| 948/1000 [39:36<00:53,  1.04s/it]

Error extracting text from http://www.nytimes.com/2015/09/05/world/middleeast/russian-moves-in-syria-pose-concerns-for-us.html?emc=edit_th_20150905&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/05/world/middleeast/russian-moves-in-syria-pose-concerns-for-us.html?emc=edit_th_20150905&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  95%|█████████▍| 949/1000 [39:37<00:56,  1.10s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-30/traders-scramble-as-clock-keeps-ticking-venezuela-default-watch


Processing URLs:  95%|█████████▌| 954/1000 [39:40<00:32,  1.43it/s]

Error extracting text from http://www.nytimes.com/2015/10/23/us/politics/hillary-clinton-joe-biden-presidential-election.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/23/us/politics/hillary-clinton-joe-biden-presidential-election.html


Processing URLs:  96%|█████████▌| 957/1000 [39:44<00:35,  1.20it/s]

Error extracting text from https://www.nytimes.com/2017/02/23/world/asia/kashmir-terror-attack-dead.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/23/world/asia/kashmir-terror-attack-dead.html?_r=0


Processing URLs:  96%|█████████▌| 958/1000 [39:56<02:52,  4.10s/it]

Error extracting text from http://www.washingtontimes.com/news/2017/feb/9/john-nicholson-russian-involvement-afghanistan-bec/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/feb/9/john-nicholson-russian-involvement-afghanistan-bec/


Processing URLs:  96%|█████████▌| 961/1000 [40:00<01:36,  2.47s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0X81ZU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0X81ZU


Processing URLs:  96%|█████████▋| 963/1000 [40:08<01:55,  3.12s/it]

Error extracting text from http://www.parl.gc.ca/LegisInfo/Home.aspx?Language=E&amp;Mode=1&amp;ParliamentSession=42-1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  96%|█████████▋| 965/1000 [40:11<01:31,  2.62s/it]

Error extracting text from http://www.hilltimes.com/news/2016/02/19/uss-positive-tweets-on-canadas-new-mission-against-isil-show-feds-increasing-combat-role/45339: 404 Client Error: Not Found for url: https://www.hilltimes.com/news/2016/02/19/uss-positive-tweets-on-canadas-new-mission-against-isil-show-feds-increasing-combat-role/45339


Processing URLs:  97%|█████████▋| 973/1000 [40:36<01:09,  2.57s/it]

Error extracting text from http://www.ibtimes.co.uk/having-climate-change-deniers-power-doesnt-imply-us-renewable-energy-industry-dead-1605249: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/having-climate-change-deniers-power-doesnt-imply-us-renewable-energy-industry-dead-1605249


Processing URLs:  98%|█████████▊| 975/1000 [41:40<08:21, 20.07s/it]

Error extracting text from http://www.mcclatchydc.com/latest-news/article127632934.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  98%|█████████▊| 976/1000 [41:44<06:03, 15.14s/it]

Error extracting text from http://english.aawsat.com/2016/08/article55357432/u-s-general-confirms-iraq-schedule-mosuls-retake: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/08/article55357432/u-s-general-confirms-iraq-schedule-mosuls-retake


Processing URLs:  98%|█████████▊| 979/1000 [41:46<02:17,  6.56s/it]

Error extracting text from http://www.kyivpost.com/article/content/ukraine-politics/vacher-too-soon-to-say-whether-imf-will-restart-lending-program-to-ukraine-415400.html: 403 Client Error: Forbidden for url: https://www.kyivpost.com/article/content/ukraine-politics/vacher-too-soon-to-say-whether-imf-will-restart-lending-program-to-ukraine-415400.html


Processing URLs:  98%|█████████▊| 984/1000 [41:52<00:40,  2.54s/it]

Error extracting text from http://www.news-journal.com/news/2015/aug/14/us-commerce-dept-eases-crude-oil-export-ban/: 404 Client Error: Not Found for url: https://www.news-journal.com/news/2015/aug/14/us-commerce-dept-eases-crude-oil-export-ban/


Processing URLs:  99%|█████████▉| 988/1000 [41:57<00:17,  1.47s/it]

Error extracting text from http://www.pravdareport.com/russia/politics/17-02-2015/129844-russia_returning_latin_america-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/russia/politics/17-02-2015/129844-russia_returning_latin_america-0/
URL filtered: https://twitter.com/boomaero


Processing URLs:  99%|█████████▉| 992/1000 [42:03<00:10,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKCN0WE0ZQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKCN0WE0ZQ


Processing URLs: 100%|█████████▉| 995/1000 [42:06<00:06,  1.39s/it]

Error extracting text from http://www.parliament.scot/parliamentarybusiness/Bills/576.aspx: 403 Client Error: Forbidden for url: https://www.parliament.scot/parliamentarybusiness/Bills/576.aspx


Processing URLs: 100%|██████████| 1000/1000 [42:15<00:00,  2.54s/it]


Error extracting text from http://pzfeed.com/breaking-news-roger-goodell-may-resign-soon/: 406 Client Error: Not Acceptable for url: http://pzfeed.com/breaking-news-roger-goodell-may-resign-soon/


Processing URLs:   0%|          | 3/1000 [00:06<25:29,  1.53s/it]  

Error extracting text from http://www.straitstimes.com/asia/se-asia/thailand-to-hold-elections-in-2017-junta-chief-confirms: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:   0%|          | 4/1000 [00:07<23:43,  1.43s/it]

Error extracting text from http://www.pewglobal.org/files/pdf/265.pdf: 404 Client Error: Not Found for url: https://www.pewresearch.org/global/files/pdf/265.pdf


Processing URLs:   1%|          | 6/1000 [00:09<18:41,  1.13s/it]

Error extracting text from https://www.nytimes.com/2017/06/16/business/eu-google-antitrust-fine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/16/business/eu-google-antitrust-fine.html


Processing URLs:   1%|          | 7/1000 [00:10<16:06,  1.03it/s]

Error extracting text from http://election.princeton.edu/2016/02/11/the-anti-trump-path-gets-very-narrow/: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /02/11/the-anti-trump-path-gets-very-narrow/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe55f980>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|          | 9/1000 [00:15<26:58,  1.63s/it]

Error extracting text from http://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUSL5N1221K820151011#Zrc0ICIDitdUOtzk.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/11/opec-oil-indonesia-idUSL5N1221K820151011#Zrc0ICIDitdUOtzk.97


Processing URLs:   1%|          | 10/1000 [00:16<22:47,  1.38s/it]

Error extracting text from http://finance.yahoo.com/q/fc?s=CLG16.NYM+Futures+Chain: 404 Client Error: Not Found for url: https://finance.yahoo.com/q/fc?s=CLG16.NYM+Futures+Chain


Processing URLs:   1%|          | 11/1000 [00:17<19:59,  1.21s/it]

Error extracting text from https://www.lawgazette.co.uk/news/work-from-home-guidance-to-remain-until-june-at-least/5107534.article: 403 Client Error: Forbidden for url: https://www.lawgazette.co.uk/news/work-from-home-guidance-to-remain-until-june-at-least/5107534.article


Processing URLs:   1%|          | 12/1000 [00:17<15:03,  1.09it/s]

Error extracting text from http://www.sandiegouniontribune.com/news/science/sd-me-cyber-attack-20161021-story.html: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/science/sd-me-cyber-attack-20161021-story.html


Processing URLs:   2%|▏         | 17/1000 [00:23<21:27,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-climatechange-idUSKBN17J1DN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-climatechange-idUSKBN17J1DN


Processing URLs:   2%|▎         | 25/1000 [00:35<30:35,  1.88s/it]

Error extracting text from https://www.thelifeyoucansave.org/Where-to-Donate: 403 Client Error: Forbidden for url: https://www.thelifeyoucansave.org/Where-to-Donate


Processing URLs:   3%|▎         | 26/1000 [00:39<36:52,  2.27s/it]

Error extracting text from http://38north.org/2016/04/rcarlin040416/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:   3%|▎         | 27/1000 [00:40<31:56,  1.97s/it]

Error extracting text from http://www.japanbullet.com/auto-moto/toyota-already-has-1-900-pre-orders-for-mirai-fuel-cell-car: 404 Client Error: Not Found for url: http://www.japanbullet.com/auto-moto/toyota-already-has-1-900-pre-orders-for-mirai-fuel-cell-car


Processing URLs:   4%|▎         | 35/1000 [00:53<19:32,  1.21s/it]

Error extracting text from http://i2.kym-cdn.com/entries/icons/original/000/007/423/untitle.JPG: 403 Client Error: Forbidden for url: http://i2.kym-cdn.com/entries/icons/original/000/007/423/untitle.JPG
Error extracting text from http://www.reuters.com/article/2015/11/16/us-g20-turkey-russia-japan-idUSKCN0T518F20151116: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/us-g20-turkey-russia-japan-idUSKCN0T518F20151116


Processing URLs:   4%|▎         | 37/1000 [00:55<17:41,  1.10s/it]

Error extracting text from https://gulfnews.com/amp/opinion/editorials/with-strict-safety-measures-expo-enjoys-unprecedented-numbers-1.84640033).: 404 Client Error: Not Found for url: https://gulfnews.com/amp/opinion/editorials/with-strict-safety-measures-expo-enjoys-unprecedented-numbers-1.84640033).


Processing URLs:   4%|▍         | 39/1000 [00:59<26:09,  1.63s/it]

Error extracting text from http://www.promedmail.org/: 403 Client Error: Forbidden for url: http://promedmail.org/


Processing URLs:   4%|▍         | 43/1000 [01:06<25:36,  1.61s/it]

Error extracting text from http://www.newsweek.com/brexit-eu-sovereignty-argument-myth-457816: 403 Client Error: Forbidden for url: https://www.newsweek.com/brexit-eu-sovereignty-argument-myth-457816


Processing URLs:   4%|▍         | 44/1000 [01:07<18:56,  1.19s/it]

Error extracting text from https://www.nytimes.com/2019/01/06/us/politics/joe-biden-2020-president.html?smid=tw-nytimes&smtyp=cur: 403 Client Error: Forbidden for url: https://www.nytimes.com/2019/01/06/us/politics/joe-biden-2020-president.html?smid=tw-nytimes&smtyp=cur


Processing URLs:   5%|▍         | 48/1000 [01:10<13:17,  1.19it/s]

Error extracting text from https://www.nytimes.com/2018/01/24/world/asia/pakistan-us-drone-haqqani-network.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/24/world/asia/pakistan-us-drone-haqqani-network.html


Processing URLs:   5%|▌         | 51/1000 [01:14<16:39,  1.05s/it]

Error extracting text from http://www.tradingeconomics.com/france/core-inflation-rate/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/france/core-inflation-rate/forecast


Processing URLs:   5%|▌         | 53/1000 [01:16<17:20,  1.10s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/india-to-boycott-china-summit-amid-kashmir-concerns-8844892: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/india-to-boycott-china-summit-amid-kashmir-concerns-8844892


Processing URLs:   6%|▌         | 57/1000 [01:21<16:32,  1.05s/it]

Error extracting text from http://thehill.com/homenews/administration/347931-cia-wary-of-pompeos-interest-in-agency-tied-to-russia-investigation: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/347931-cia-wary-of-pompeos-interest-in-agency-tied-to-russia-investigation/
URL filtered: https://www.facebook.com/help/572838089565953?helpref=faq_content


Processing URLs:   6%|▌         | 61/1000 [01:31<27:16,  1.74s/it]

Error extracting text from http://news.yahoo.com/census-foreigners-burundi-sparks-fears-192504218.html: 404 Client Error: Not Found for url: http://news.yahoo.com/census-foreigners-burundi-sparks-fears-192504218.html


Processing URLs:   7%|▋         | 67/1000 [01:41<21:31,  1.38s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/11/13/world/venezuela-power-grows-cilia-flores-evita-like-first-lady/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/11/13/world/venezuela-power-grows-cilia-flores-evita-like-first-lady/


Processing URLs:   7%|▋         | 69/1000 [01:42<13:58,  1.11it/s]

Error extracting text from http://apps.azsos.gov/election/2012/General/Canvass2012GE.pdf: 403 Client Error: Forbidden for url: https://apps.azsos.gov/election/2012/General/Canvass2012GE.pdf


Processing URLs:   7%|▋         | 72/1000 [01:43<07:33,  2.04it/s]

Error extracting text from http://www.reuters.com/article/us-britain-scotland-idUSKBN15N01J?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-scotland-idUSKBN15N01J?il=0


Processing URLs:   8%|▊         | 75/1000 [01:46<14:22,  1.07it/s]

Error extracting text from https://www.nytimes.com/2021/06/06/us/politics/joe-manchin-op-ed.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/06/us/politics/joe-manchin-op-ed.html


Processing URLs:   8%|▊         | 83/1000 [01:58<16:13,  1.06s/it]

Error extracting text from http://nyti.ms/2gr0gYW: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/01/us/politics/iran-nuclear-sanctions-senate.html


Processing URLs:   8%|▊         | 84/1000 [02:10<1:09:23,  4.55s/it]

Error extracting text from http://focus-fen.net/news/2016/05/04/405495/bulgaria-provides-financing-for-extra-measures-for-schengen-accession.html: HTTPConnectionPool(host='focus-fen.net', port=80): Max retries exceeded with url: /news/2016/05/04/405495/bulgaria-provides-financing-for-extra-measures-for-schengen-accession.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3021da510>: Failed to resolve 'focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/10/11/us-eurozone-greece-corner-analysis-idUSKCN0S507X20151011: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/11/us-eurozone-greece-corner-analysis-idUSKCN0S507X20151011


Processing URLs:   9%|▊         | 86/1000 [02:11<38:13,  2.51s/it]  

Error extracting text from http://www.etf.com/sections/etf-industry-perspective/vanguard-bitcoin-presents-quandary: 403 Client Error: Forbidden for url: https://www.etf.com/sections/etf-industry-perspective/vanguard-bitcoin-presents-quandary


Processing URLs:   9%|▉         | 88/1000 [02:11<22:33,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-usa-stocks-idUSKBN18W1HJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-idUSKBN18W1HJ


Processing URLs:   9%|▉         | 89/1000 [02:13<24:05,  1.59s/it]

Error extracting text from http://www.ibtimes.com/suu-kyi-may-settle-foreign-minister-post-break-impasse-over-myanmar-presidency-2327744: 403 Client Error: Forbidden for url: https://www.ibtimes.com/suu-kyi-may-settle-foreign-minister-post-break-impasse-over-myanmar-presidency-2327744


Processing URLs:   9%|▉         | 90/1000 [02:14<20:17,  1.34s/it]

Error extracting text from https://www.cdc.gov/about/history/sars/timeline.htm: 404 Client Error: Not Found for url: https://www.cdc.gov/about/history/sars/timeline.htm


Processing URLs:   9%|▉         | 93/1000 [02:15<10:28,  1.44it/s]

Error extracting text from https://postimg.org/image/70la95de7/: HTTPSConnectionPool(host='postimg.org', port=443): Max retries exceeded with url: /image/70la95de7/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3028e1310>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|▉         | 95/1000 [02:17<13:32,  1.11it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0UQ06F20160112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0UQ06F20160112


Processing URLs:  10%|▉         | 98/1000 [02:23<20:54,  1.39s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-24/opec-saudi-oil-production-upsets-venezuela-but-cartel-endures


Processing URLs:  10%|█         | 101/1000 [02:23<11:45,  1.27it/s]

URL filtered: http://www.reuters.com/article/us-facebook-journalismproject-idUSKBN14V1WE?il=0


Processing URLs:  10%|█         | 103/1000 [02:25<11:24,  1.31it/s]

Error extracting text from https://blog.boomsupersonic.com/how-technology-is-solving-one-of-the-biggest-supersonic-design-challenges-visibility-e269789f411a: 403 Client Error: Forbidden for url: https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Fblog.boomsupersonic.com%2Fhow-technology-is-solving-one-of-the-biggest-supersonic-design-challenges-visibility-e269789f411a


Processing URLs:  11%|█         | 109/1000 [02:46<39:55,  2.69s/it]

Error extracting text from https://fuelcellsworks.com/news/first-hydrogen-refueling-station-in-saratoga-is-now-open: 404 Client Error: Not Found for url: https://fuelcellsworks.com/news/first-hydrogen-refueling-station-in-saratoga-is-now-open


Processing URLs:  11%|█▏        | 113/1000 [02:55<30:37,  2.07s/it]

Error extracting text from http://eng.mod.gov.cn/news/2017-10/23/content_4795419.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/news/2017-10/23/content_4795419.htm


Processing URLs:  11%|█▏        | 114/1000 [02:58<33:51,  2.29s/it]

Error extracting text from http://www.isn.ethz.ch/Digital-Library/Articles/Detail/?lng=en&amp;id=194072: 404 Client Error: Not found UA for url: https://css.ethz.ch/en/services.html


Processing URLs:  12%|█▏        | 116/1000 [02:59<19:54,  1.35s/it]

Error extracting text from http://www.wsj.com/articles/china-says-it-warned-u-s-warship-in-south-china-sea-1445928223: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-says-it-warned-u-s-warship-in-south-china-sea-1445928223


Processing URLs:  12%|█▏        | 119/1000 [03:05<22:25,  1.53s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0ZH51I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0ZH51I


Processing URLs:  12%|█▏        | 122/1000 [03:07<13:13,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/07/18/world/middleeast/saudi-arabia-mohammed-bin-nayef-mohammed-bin-salman.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/18/world/middleeast/saudi-arabia-mohammed-bin-nayef-mohammed-bin-salman.html


Processing URLs:  13%|█▎        | 127/1000 [03:15<20:17,  1.39s/it]

Error extracting text from http://business.financialpost.com/investing/marijuana-stocks-are-smokin-today-on-report-canadas-legalization-bill-is-in-the-works: 403 Client Error: Forbidden for url: https://financialpost.com/investing/marijuana-stocks-are-smokin-today-on-report-canadas-legalization-bill-is-in-the-works


Processing URLs:  13%|█▎        | 129/1000 [03:17<13:36,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-usa-court-immigration-idUSKCN0Z91P4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-immigration-idUSKCN0Z91P4


Processing URLs:  13%|█▎        | 130/1000 [03:18<17:12,  1.19s/it]

Error extracting text from https://in.reuters.com/article/us-britain-politics-poll/uk-pm-johnson-could-lose-his-seat-and-majority-at-next-election-poll-idUSKBN2970MB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
URL filtered: https://twitter.com/eAsiaMediaHub/status/1424338796095557639
URL filtered: https://twitter.com/jaredlholt/status/1348274040222445570


Processing URLs:  14%|█▍        | 138/1000 [03:28<23:45,  1.65s/it]

Error extracting text from http://www.ibtimes.co.uk/could-donald-trump-really-pull-out-paris-agreement-1591812: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/could-donald-trump-really-pull-out-paris-agreement-1591812


Processing URLs:  14%|█▍        | 141/1000 [03:32<16:30,  1.15s/it]

Error extracting text from https://www.timesofisrael.com/hamas-threatens-to-renew-fighting-if-qatari-funds-dont-enter-gaza-next-week/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/hamas-threatens-to-renew-fighting-if-qatari-funds-dont-enter-gaza-next-week/
Error extracting text from http://www.nytimes.com/2016/07/14/business/media/time-inc-names-alan-murray-as-chief-content-officer.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/14/business/media/time-inc-names-alan-murray-as-chief-content-officer.html


Processing URLs:  14%|█▍        | 142/1000 [03:33<15:09,  1.06s/it]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKCN0WV2O4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  14%|█▍        | 145/1000 [03:37<20:06,  1.41s/it]

Error extracting text from http://en.trend.az/iran/politics/2492073.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2492073.html


Processing URLs:  15%|█▍        | 149/1000 [03:58<1:25:30,  6.03s/it]

URL filtered: https://www.youtube.com/watch?v=oX2Gepjju0Y#t=11m24s


Processing URLs:  15%|█▌        | 151/1000 [04:58<4:01:46, 17.09s/it]

Error extracting text from https://www.betfair.com/exchange/politics/event?id=27542456: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/politics/event?id=27542456 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3042e5010>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))
URL filtered: https://www.wired.com/story/what-we-know-and-dont-know-about-facebook-trump-and-russia/


Processing URLs:  15%|█▌        | 153/1000 [04:59<2:26:36, 10.39s/it]

Error extracting text from http://www.fin4dev.org/2015/09/02/buying-down-development-the-case-of-polio/: HTTPConnectionPool(host='www.fin4dev.org', port=80): Max retries exceeded with url: /2015/09/02/buying-down-development-the-case-of-polio/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff823500>: Failed to resolve 'www.fin4dev.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 154/1000 [04:59<1:54:52,  8.15s/it]

Error extracting text from https://www.nytimes.com/2017/08/24/opinion/canada-legalize-marijuana.html?mcubz=3: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/24/opinion/canada-legalize-marijuana.html?mcubz=3


Processing URLs:  16%|█▌        | 157/1000 [05:04<59:55,  4.27s/it]  

URL filtered: http://www.bloombergview.com/articles/2016-01-06/the-5-stages-of-reacting-to-a-north-korea-nuke-test


Processing URLs:  16%|█▌        | 160/1000 [05:06<32:23,  2.31s/it]

URL filtered: http://www.bloomberg.com/quote/USDCNY:CUR


Processing URLs:  16%|█▋        | 165/1000 [05:08<10:59,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-korea-north-usa-idUSKBN12F0S2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-korea-north-usa-idUSKBN12F0S2
Error extracting text from http://www.nytimes.com/2016/04/26/sports/tom-brady-deflategate-new-england-patriots-suspension-reinstated.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/26/sports/tom-brady-deflategate-new-england-patriots-suspension-reinstated.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  17%|█▋        | 166/1000 [05:09<11:47,  1.18it/s]

Error extracting text from http://awdnews.com/political/official-disclosure-of-isis-turkish-relationship: HTTPConnectionPool(host='awdnews.com', port=80): Max retries exceeded with url: /political/official-disclosure-of-isis-turkish-relationship (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ff8214f0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  17%|█▋        | 174/1000 [05:22<15:11,  1.10s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/12/gitrep-11may16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/12/gitrep-11may16pm/


Processing URLs:  18%|█▊        | 175/1000 [05:23<13:41,  1.00it/s]

Error extracting text from https://www.advancedligo.mit.edu/summary.html: HTTPSConnectionPool(host='www.advancedligo.mit.edu', port=443): Max retries exceeded with url: /summary.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  18%|█▊        | 182/1000 [05:35<17:36,  1.29s/it]

Error extracting text from http://www.oilandgas360.com/sub-50-oil-is-giving-opec-countries-a-sharp-pain-who-will-be-first-to-cave/: 403 Client Error: Forbidden for url: http://www.oilandgas360.com/sub-50-oil-is-giving-opec-countries-a-sharp-pain-who-will-be-first-to-cave/


Processing URLs:  18%|█▊        | 185/1000 [05:41<21:22,  1.57s/it]

Error extracting text from https://finance.yahoo.com/video/why-boeing-starliner-test-launch-083000874.html: 400 Client Error: Invalid HTTP Request for url: https://finance.yahoo.com/video/why-boeing-starliner-test-launch-083000874.html


Processing URLs:  19%|█▊        | 186/1000 [05:43<22:43,  1.68s/it]

Error extracting text from http://www.ibtimes.com/did-russia-kill-ukraines-electricity-cyberattack-linked-power-outage-has-global-2249900: 403 Client Error: Forbidden for url: https://www.ibtimes.com/did-russia-kill-ukraines-electricity-cyberattack-linked-power-outage-has-global-2249900


Processing URLs:  19%|█▊        | 187/1000 [05:44<18:22,  1.36s/it]

Error extracting text from https://www.c-span.org/video/?421782-1/energy-secretary-nominee-rick-perry-testifies-confirmation-hearing: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?421782-1/energy-secretary-nominee-rick-perry-testifies-confirmation-hearing


Processing URLs:  19%|█▉        | 192/1000 [05:49<16:43,  1.24s/it]

URL filtered: https://www.youtube.com/watch?v=9l5TrAXScbE


Processing URLs:  20%|█▉        | 197/1000 [06:00<24:55,  1.86s/it]

Error extracting text from https://reut.rs/3aPQye1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/pounds-post-brexit-calm-may-face-scottish-independence-test-2021-04-29/


Processing URLs:  20%|██        | 202/1000 [07:07<4:11:14, 18.89s/it]

Error extracting text from http://www.ebay.com/sch/i.html?_from=R40&amp;_trksid=m570.l1313&amp;_nkw=boat&amp;_sacat=0: HTTPSConnectionPool(host='www.ebay.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.youtube.com/watch?v=5znh58WITU8&amp;list=RDGMEMJQXQAmqrnmK1SEjY_rKBGA&amp;index=3


Processing URLs:  20%|██        | 205/1000 [07:08<1:44:06,  7.86s/it]

Error extracting text from http://thehill.com/policy/finance/258091-white-house-gop-near-two-year-budget-deal: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/258091-white-house-gop-near-two-year-budget-deal/
Error extracting text from http://www.nytimes.com/2016/03/09/world/middleeast/irans-revolutionary-guards-test-nationwide-ballistic-missiles.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/09/world/middleeast/irans-revolutionary-guards-test-nationwide-ballistic-missiles.html


Processing URLs:  21%|██        | 206/1000 [07:10<1:24:35,  6.39s/it]

Error extracting text from https://www.bostonglobe.com/news/nation/2017/04/06/where-are-trump-russia-investigations/Q0qmYnJ31mLphz8VoA6ZUI/story.html: 404 Client Error: Not Found for url: https://www.bostonglobe.com/news/nation/2017/04/06/where-are-trump-russia-investigations/Q0qmYnJ31mLphz8VoA6ZUI/story.html


Processing URLs:  21%|██        | 207/1000 [07:11<1:03:51,  4.83s/it]

Error extracting text from http://www.insightonconflict.org/2016/06/burundi-on-the-brink-crisis-in-central-africa/: HTTPConnectionPool(host='www.insightonconflict.org', port=80): Max retries exceeded with url: /2016/06/burundi-on-the-brink-crisis-in-central-africa/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303887620>: Failed to resolve 'www.insightonconflict.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  21%|██▏       | 214/1000 [07:17<14:33,  1.11s/it]  

Error extracting text from https://www.france24.com/en/live-news/20210411-myanmar-s-post-coup-civilian-death-toll-climbs-past-700: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210411-myanmar-s-post-coup-civilian-death-toll-climbs-past-700
Error extracting text from https://www.reuters.com/world/asia-pacific/poll-shows-60-japanese-want-games-cancelled-2021-05-10/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/poll-shows-60-japanese-want-games-cancelled-2021-05-10/


Processing URLs:  22%|██▏       | 216/1000 [07:19<11:09,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN19502V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN19502V


Processing URLs:  22%|██▎       | 225/1000 [07:31<17:18,  1.34s/it]

URL filtered: https://www.youtube.com/watch?v=6M8szlSa-8o


Processing URLs:  23%|██▎       | 233/1000 [07:52<31:08,  2.44s/it]

Error extracting text from http://www.wsj.com/articles/sea-dispute-heats-up-before-asia-summit-1473073718: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sea-dispute-heats-up-before-asia-summit-1473073718


Processing URLs:  24%|██▎       | 236/1000 [07:55<17:18,  1.36s/it]

Error extracting text from http://www.autonews.com/article/20160501/GLOBAL03/305029967/skepticism-surrounds-china-ev-boom: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160501/GLOBAL03/305029967/skepticism-surrounds-china-ev-boom


Processing URLs:  24%|██▍       | 240/1000 [08:03<22:33,  1.78s/it]



Processing URLs:  25%|██▍       | 247/1000 [08:13<16:46,  1.34s/it]

Error extracting text from http://m.theage.com.au/business/world-business/october-payrolls-surge-jobless-rate-falls-to-sevenyear-low-20151106-gkt5ci.html: 404 Client Error: Not Found for url: https://www.theage.com.au/business/world-business/october-payrolls-surge-jobless-rate-falls-to-sevenyear-low-20151106-gkt5ci.html


Processing URLs:  25%|██▌       | 251/1000 [08:21<19:19,  1.55s/it]

Error extracting text from https://jewishinsider.com/2021/03/michael-waltz-anthony-brown-bipartisan-iran-letter/: 403 Client Error: Forbidden for url: https://jewishinsider.com/2021/03/michael-waltz-anthony-brown-bipartisan-iran-letter/


Processing URLs:  26%|██▌       | 255/1000 [08:27<20:44,  1.67s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/us-senators-eye-sanctions-against-iran-for-missile-development/articleshow/57235005.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/us-senators-eye-sanctions-against-iran-for-missile-development/articleshow/57235005.cms
URL filtered: http://www.theverge.com/2017/2/6/14520172/facebook-fake-news-filter-france-election


Processing URLs:  26%|██▌       | 257/1000 [08:28<11:54,  1.04it/s]

Error extracting text from https://www.nytimes.com/2017/02/22/world/europe/russia-fake-news-media-foreign-ministry-.html?emc=edit_ee_20170223&amp;nl=todaysheadlines-europe&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/22/world/europe/russia-fake-news-media-foreign-ministry-.html?emc=edit_ee_20170223&amp;nl=todaysheadlines-europe&amp;nlid=77825025


Processing URLs:  26%|██▌       | 260/1000 [08:35<21:59,  1.78s/it]

Error extracting text from http://www.greeknewsonline.com/greece-and-lenders-reach-agreement-in-principal/: 404 Client Error: Not Found for url: http://www.greeknewsonline.com/greece-and-lenders-reach-agreement-in-principal/


Processing URLs:  26%|██▌       | 262/1000 [08:36<13:34,  1.10s/it]

Error extracting text from https://www.wsj.com/articles/opec-reaches-compromise-with-u-a-e-over-oil-production-standoff-11626264218: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-reaches-compromise-with-u-a-e-over-oil-production-standoff-11626264218


Processing URLs:  26%|██▋       | 264/1000 [08:45<32:56,  2.69s/it]

URL filtered: https://twitter.com/ahmetsyayla/status/757385299181928448/photo/1
URL filtered: https://www.youtube.com/watch?v=vEZ_Ernu3xs&amp;t=58s


Processing URLs:  27%|██▋       | 268/1000 [08:47<16:37,  1.36s/it]

Error extracting text from http://www.elevenmyanmar.com/politics/nld-impose-shan-and-rakhine-chief-ministers: 404 Client Error: Not Found for url: https://www.elevenmyanmar.com/politics/nld-impose-shan-and-rakhine-chief-ministers


Processing URLs:  27%|██▋       | 270/1000 [08:48<11:20,  1.07it/s]

Error extracting text from http://www.nytimes.com/2015/11/13/business/international/greece-general-strike.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/13/business/international/greece-general-strike.html?_r=0


Processing URLs:  27%|██▋       | 274/1000 [08:54<13:48,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN16T12G?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN16T12G?il=0


Processing URLs:  28%|██▊       | 277/1000 [09:00<24:41,  2.05s/it]

Error extracting text from https://www.cnn.com/2021/06/07/politics/covid-lab-leak-theory-classified-report/index.html0: 503 Server Error: Max restarts limit reached for url: https://edition.cnn.com/2021/06/07/politics/covid-lab-leak-theory-classified-report/index.html0


Processing URLs:  28%|██▊       | 281/1000 [09:15<32:34,  2.72s/it]

Error extracting text from http://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y2WB20151209#Ocr9TC2SaJXjcPdv.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y2WB20151209#Ocr9TC2SaJXjcPdv.97


Processing URLs:  28%|██▊       | 283/1000 [09:16<21:14,  1.78s/it]

Error extracting text from https://www.wsj.com/articles/for-oil-investors-early-faith-in-a-rally-begins-to-wane-1497048280: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-oil-investors-early-faith-in-a-rally-begins-to-wane-1497048280


Processing URLs:  28%|██▊       | 285/1000 [09:19<18:25,  1.55s/it]

Error extracting text from http://Www.wsj.com: 403 Client Error: Forbidden for url: https://www.wsj.com/


Processing URLs:  29%|██▉       | 288/1000 [09:22<13:18,  1.12s/it]

Error extracting text from http://en.trend.az/world/turkey/2444856.html: 404 Client Error: Not Found for url: https://www.trend.az/world/turkey/2444856.html
Error extracting text from http://www.arabnews.com/news/891361: 403 Client Error: Forbidden for url: https://www.arabnews.com/news/891361


Processing URLs:  29%|██▉       | 291/1000 [09:24<10:04,  1.17it/s]

Error extracting text from http://www.nytimes.com/2016/01/13/world/asia/north-korea-faked-test-video-group-says.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/13/world/asia/north-korea-faked-test-video-group-says.html?_r=0


Processing URLs:  29%|██▉       | 294/1000 [09:36<26:28,  2.25s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-03-06/calling-a-top-in-stocks-has-become-a-cottage-industry


Processing URLs:  30%|███       | 300/1000 [09:42<13:45,  1.18s/it]

Error extracting text from https://www.nytimes.com/2017/12/16/us/politics/pentagon-program-ufo-harry-reid.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/16/us/politics/pentagon-program-ufo-harry-reid.html


Processing URLs:  30%|███       | 301/1000 [09:43<11:54,  1.02s/it]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2405215&amp;CategoryId=12393: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2405215&amp;CategoryId=12393


Processing URLs:  31%|███       | 309/1000 [10:58<52:04,  4.52s/it]  

Error extracting text from https://www.state.gov/documents/organization/258249.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:  31%|███       | 311/1000 [10:59<29:56,  2.61s/it]

Error extracting text from http://www.latimes.com/local/lanow/la-me-bakersfield-baker-20180207-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/local/lanow/la-me-bakersfield-baker-20180207-story.html


Processing URLs:  32%|███▏      | 315/1000 [11:04<17:06,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VQ160: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VQ160


Processing URLs:  32%|███▏      | 316/1000 [11:05<14:24,  1.26s/it]

URL filtered: https://twitter.com/jilldlawrence/status/748942351838638080


Processing URLs:  32%|███▏      | 322/1000 [11:15<19:43,  1.75s/it]

Error extracting text from https://www.spglobal.com/marketintelligence/en/campaigns/leveraged-loan: 403 Client Error: Forbidden for url: https://pitchbook.com/leveraged-commentary-data/leveraged-loan


Processing URLs:  32%|███▎      | 325/1000 [11:17<11:27,  1.02s/it]

Error extracting text from http://www.nytimes.com/2016/01/05/world/middleeast/bahrain-sudan-united-arab-emirates-join-diplomatic-feud-against-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/05/world/middleeast/bahrain-sudan-united-arab-emirates-join-diplomatic-feud-against-iran.html
URL filtered: https://www.bloomberg.com/news/articles/2017-01-15/trump-wants-to-hold-summit-with-putin-in-iceland-sunday-times


Processing URLs:  33%|███▎      | 330/1000 [11:21<07:44,  1.44it/s]

Error extracting text from http://www.latimes.com/nation/politics/trailguide/la-na-trailguide-updates-we-will-build-the-wall-trump-to-1485314982-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/politics/trailguide/la-na-trailguide-updates-we-will-build-the-wall-trump-to-1485314982-htmlstory.html


Processing URLs:  33%|███▎      | 334/1000 [11:36<21:33,  1.94s/it]

Error extracting text from https://www.wsj.com/articles/italian-prime-minister-conte-resigns-triggering-search-for-new-government-to-tackle-covid-19-recession-11611661954: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/italian-prime-minister-conte-resigns-triggering-search-for-new-government-to-tackle-covid-19-recession-11611661954


Processing URLs:  34%|███▎      | 335/1000 [11:37<17:56,  1.62s/it]

Error extracting text from http://www.hybridcars.com/toyota-prius-sets-1-million-sales-green-car-benchmark-29731/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/toyota-prius-sets-1-million-sales-green-car-benchmark-29731/


Processing URLs:  34%|███▎      | 337/1000 [11:40<16:54,  1.53s/it]

Error extracting text from https://apnews.com/article/africa-kenya-ethiopia-abiy-ahmed-jeffrey-feltman-16075161e2badda09eeb50ca07a43840: 404 Client Error: Not Found for url: https://apnews.com/article/africa-kenya-ethiopia-abiy-ahmed-jeffrey-feltman-16075161e2badda09eeb50ca07a43840


Processing URLs:  34%|███▍      | 338/1000 [11:41<14:28,  1.31s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/world-on-edge-as-us-federal-reserve-weighs-a-rate-hike/articleshow/48941332.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/world-on-edge-as-us-federal-reserve-weighs-a-rate-hike/articleshow/48941332.cms


Processing URLs:  34%|███▍      | 340/1000 [11:41<08:44,  1.26it/s]

Error extracting text from https://www.timesofisrael.com/netanyahu-defends-excellent-coalition-whip-amid-corruption-investigation/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/netanyahu-defends-excellent-coalition-whip-amid-corruption-investigation/


Processing URLs:  34%|███▍      | 344/1000 [12:05<1:04:36,  5.91s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/french-troops-to-increase-for-the-1st-time-in-10-years/2016/01/14/288d463c-baf3-11e5-85cd-5ad59bc19432_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/french-troops-to-increase-for-the-1st-time-in-10-years/2016/01/14/288d463c-baf3-11e5-85cd-5ad59bc19432_story.html


Processing URLs:  35%|███▍      | 348/1000 [12:25<41:43,  3.84s/it]  

Error extracting text from https://thehill.com/homenews/senate/556483-parliamentarian-democrats-only-get-one-more-chance-to-sidestep-gop-this-year: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/556483-parliamentarian-democrats-only-get-one-more-chance-to-sidestep-gop-this-year/


Processing URLs:  35%|███▌      | 350/1000 [12:31<34:42,  3.20s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/03/11/world/middleeast/ap-un-united-nations-us-iran.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/03/11/world/middleeast/ap-un-united-nations-us-iran.html?_r=0


Processing URLs:  35%|███▌      | 352/1000 [12:36<33:01,  3.06s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-15/bonds-back-baring-asset-in-upending-fed-inflation-forecasts


Processing URLs:  36%|███▌      | 357/1000 [12:41<15:14,  1.42s/it]

Error extracting text from http://seekingalpha.com/article/3916616-volkswagen-scandal-harbinger-corporate-cover-revelations-investors-beware: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3916616-volkswagen-scandal-harbinger-corporate-cover-revelations-investors-beware


Processing URLs:  36%|███▋      | 363/1000 [12:52<17:12,  1.62s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-un-idUSKBN16S287: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-un-idUSKBN16S287


Processing URLs:  37%|███▋      | 372/1000 [13:20<34:55,  3.34s/it]

Error extracting text from http://en.trend.az/iran/nuclearp/2451494.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2451494.html


Processing URLs:  38%|███▊      | 377/1000 [13:32<28:01,  2.70s/it]

Error extracting text from http://www.ibtimes.co.uk/brexit-hsbc-forecasts-stagflation-uk-slower-growth-higher-inflation-1567377: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/brexit-hsbc-forecasts-stagflation-uk-slower-growth-higher-inflation-1567377


Processing URLs:  38%|███▊      | 379/1000 [13:36<23:07,  2.23s/it]

Error extracting text from https://www.wsj.com/articles/white-house-looks-at-scaling-back-u-s-military-presence-in-afghanistan-1501426803: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/white-house-looks-at-scaling-back-u-s-military-presence-in-afghanistan-1501426803


Processing URLs:  38%|███▊      | 384/1000 [13:47<21:04,  2.05s/it]

URL filtered: https://www.youtube.com/watch?v=pl3vxEudif8
Error extracting text from https://www.washingtontimes.com/news/2017/nov/21/vladimir-putin-tells-donald-trump-bashar-assad-fav/: 403 Client Error: Forbidden for url: https://www.washingtontimes.com/news/2017/nov/21/vladimir-putin-tells-donald-trump-bashar-assad-fav/


Processing URLs:  39%|███▉      | 390/1000 [13:56<19:09,  1.88s/it]

Error extracting text from http://www.elmundo.com.ve/noticias/economia/politicas-publicas/venezuela-mantiene-record-positivo-en-sus-pagos-de.aspx#ixzz45nPuzsvT: HTTPConnectionPool(host='www.elmundo.com.ve', port=80): Max retries exceeded with url: /noticias/economia/politicas-publicas/venezuela-mantiene-record-positivo-en-sus-pagos-de.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303321400>: Failed to resolve 'www.elmundo.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  39%|███▉      | 391/1000 [13:56<15:08,  1.49s/it]

Error extracting text from https://www.nilc.org/2016/08/25/ny-dreamer-challenges-injunction/: 403 Client Error: Forbidden for url: https://www.nilc.org/2016/08/25/ny-dreamer-challenges-injunction/


Processing URLs:  40%|███▉      | 399/1000 [14:08<12:17,  1.23s/it]

Error extracting text from https://www.reuters.com/article/venezuela-bonds-isda/isda-to-reconvene-to-discuss-venezuelas-pdvsa-bond-on-tuesday-idUSL1N1NJ18W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-bonds-isda/isda-to-reconvene-to-discuss-venezuelas-pdvsa-bond-on-tuesday-idUSL1N1NJ18W


Processing URLs:  40%|████      | 401/1000 [14:14<17:03,  1.71s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/03/quero-que-o-pt-saia-da-inhaca-em-que-se-meteu-diz-olivio-dutra-4991137.html&amp;usg=ALkJrhjXAqSdxtViJ_aW9c_Q5X-QAvJ6mg: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://zh.clicrbs.com.br/rs/noticias/noticia/2016/03/quero-que-o-pt-saia-da-inhaca-em-que-se-meteu-diz-olivio-dutra-4991137.html&amp;usg=ALkJrhjXAqSdxtViJ_aW9c_Q5X-QAvJ6mg


Processing URLs:  40%|████      | 403/1000 [14:17<15:00,  1.51s/it]

Error extracting text from http://www.reuters.com/article/us-philippines-usa-exercises-idUSKCN1240NZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-usa-exercises-idUSKCN1240NZ


Processing URLs:  40%|████      | 405/1000 [14:18<10:55,  1.10s/it]

Error extracting text from http://www.nytimes.com/2016/09/23/world/middleeast/with-boeing-deal-americans-are-coming-to-iran.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/23/world/middleeast/with-boeing-deal-americans-are-coming-to-iran.html?_r=0


Processing URLs:  41%|████      | 408/1000 [14:21<07:37,  1.29it/s]

Error extracting text from http://www.latimes.com/politics/la-na-pol-roy-moore-democrats-20171122-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-roy-moore-democrats-20171122-story.html


Processing URLs:  41%|████▏     | 413/1000 [14:31<14:36,  1.49s/it]

Error extracting text from https://www.nytimes.com/2017/05/08/us/politics/donald-trump-afghanistan-troops-taliban-stalemate.html?mwrsm=Email: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/08/us/politics/donald-trump-afghanistan-troops-taliban-stalemate.html?mwrsm=Email


Processing URLs:  42%|████▏     | 416/1000 [14:35<14:46,  1.52s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/08/27/774809/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/08/27/774809/story.html


Processing URLs:  42%|████▏     | 418/1000 [14:42<24:22,  2.51s/it]

Error extracting text from http://www.ipsos.pe/sites/default/files/opinion_data/OpinionData030416.pdf: 404 Client Error: Not Found for url: https://www.ipsos.com/es-pe/sites/default/files/opinion_data/OpinionData030416.pdf


Processing URLs:  42%|████▏     | 422/1000 [14:44<10:17,  1.07s/it]

URL filtered: https://www.instagram.com/p/CPSkeFZNTag/
Error extracting text from http://www.straitstimes.com/asia/east-asia/overfishing-and-political-disputes-send-some-south-china-sea-fish-close-to-extinction: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  42%|████▏     | 424/1000 [14:57<28:35,  2.98s/it]

Error extracting text from http://www.newsweek.com/un-investigators-urge-iran-free-washington-posts-jason-rezaian-and-other-393118: 403 Client Error: Forbidden for url: https://www.newsweek.com/un-investigators-urge-iran-free-washington-posts-jason-rezaian-and-other-393118


Processing URLs:  43%|████▎     | 427/1000 [14:58<13:11,  1.38s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-minister-idUSKBN16200X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-minister-idUSKBN16200X
Error extracting text from http://www.reuters.com/article/us-northkorea-cyberattack-sanctions-idUSKBN0KB16U20150104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-cyberattack-sanctions-idUSKBN0KB16U20150104


Processing URLs:  43%|████▎     | 430/1000 [15:02<11:44,  1.24s/it]

Error extracting text from https://www.arabnews.com/node/1791031/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1791031/middle-east


Processing URLs:  43%|████▎     | 432/1000 [15:03<08:27,  1.12it/s]

Error extracting text from http://english.cntv.cn/2016/01/07/VIDEAQ02RGCRG5BmHjJ8T8zT160107.shtml: HTTPConnectionPool(host='english.cntv.cn', port=80): Max retries exceeded with url: /2016/01/07/VIDEAQ02RGCRG5BmHjJ8T8zT160107.shtml (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301993380>: Failed to resolve 'english.cntv.cn' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▎     | 435/1000 [16:05<2:11:14, 13.94s/it]

Error extracting text from http://syrianperspective.com/2016/03/aleppo-more-sauditurk-blunders-at-tal-al-ays-syrian-army-kills-64-rodents-from-jund-al-aqsaa.html: HTTPConnectionPool(host='syrianperspective.com', port=80): Max retries exceeded with url: /2016/03/aleppo-more-sauditurk-blunders-at-tal-al-ays-syrian-army-kills-64-rodents-from-jund-al-aqsaa.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3019900b0>, 'Connection to syrianperspective.com timed out. (connect timeout=60)'))


Processing URLs:  44%|████▎     | 436/1000 [16:05<1:41:15, 10.77s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/gdp-per-capita: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/gdp-per-capita


Processing URLs:  44%|████▎     | 437/1000 [16:06<1:15:56,  8.09s/it]

Error extracting text from https://www.nytimes.com/2018/01/21/world/europe/germany-coalition-talks-spd.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/21/world/europe/germany-coalition-talks-spd.html


Processing URLs:  44%|████▍     | 445/1000 [16:16<16:53,  1.83s/it]  

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/isis-declares-emergency-mosul-waves-assassinations/amp/?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/isis-declares-emergency-mosul-waves-assassinations/amp/


Processing URLs:  45%|████▍     | 447/1000 [16:18<12:16,  1.33s/it]

Error extracting text from http://www.timesofisrael.com/uk-agrees-to-see-assad-in-charge-for-syria-transition/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/uk-agrees-to-see-assad-in-charge-for-syria-transition/


Processing URLs:  45%|████▍     | 448/1000 [16:20<12:45,  1.39s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.folhamax.com.br/opiniao/nao-e-so-pela-corrupcao/79635&amp;usg=ALkJrhi-bRhgJU8sBlzELpppGvrdFyMQ8g: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.folhamax.com.br/opiniao/nao-e-so-pela-corrupcao/79635&amp;usg=ALkJrhi-bRhgJU8sBlzELpppGvrdFyMQ8g


Processing URLs:  45%|████▌     | 450/1000 [16:20<07:28,  1.23it/s]

Error extracting text from http://www.donaldjtrump.com/media/our-country-tv-spot: 403 Client Error: Forbidden for url: https://www.donaldjtrump.com/media/our-country-tv-spot


Processing URLs:  45%|████▌     | 454/1000 [16:27<11:20,  1.25s/it]

Error extracting text from https://www.yahoo.com/news/merkel-warns-turkey-visa-free-eu-travel-july-142129586.html?soc_src=social-sh&amp;soc_trk=tw: 404 Client Error: Not Found for url: https://www.yahoo.com/news/merkel-warns-turkey-visa-free-eu-travel-july-142129586.html?soc_src=social-sh&amp;soc_trk=tw


Processing URLs:  46%|████▌     | 456/1000 [16:28<07:41,  1.18it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.valor.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.valor.com.br/&amp;prev=search


Processing URLs:  46%|████▌     | 460/1000 [16:32<08:15,  1.09it/s]

Error extracting text from http://mizzima.com/news-opinion/usdp-government-presidential-spokesperson%E2%80%99s-silly-and-shocking-statement-section-59f: 403 Client Error: Forbidden for url: http://mizzima.com/news-opinion/usdp-government-presidential-spokesperson%E2%80%99s-silly-and-shocking-statement-section-59f


Processing URLs:  46%|████▌     | 462/1000 [16:33<06:39,  1.35it/s]

Error extracting text from http://www.reuters.com/article/us-byd-battery-idUSKBN0M92MZ20150313: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-byd-battery-idUSKBN0M92MZ20150313


Processing URLs:  46%|████▋     | 463/1000 [16:34<07:38,  1.17it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-18/aiming-at-iran-saudi-arabia-mixes-oil-policy-with-politics


Processing URLs:  47%|████▋     | 468/1000 [16:40<08:54,  1.00s/it]

Error extracting text from https://www.nytimes.com/2017/10/30/us/politics/paul-manafort-indicted.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/30/us/politics/paul-manafort-indicted.html?_r=1


Processing URLs:  47%|████▋     | 473/1000 [16:46<09:25,  1.07s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-12-30/iowa-ethanol-group-rolls-out-the-un-welcome-mat-for-ted-cruz


Processing URLs:  48%|████▊     | 475/1000 [16:47<07:51,  1.11it/s]

Error extracting text from http://allpriorart.com/about/: 406 Client Error: Not Acceptable for url: http://allpriorart.com/about/


Processing URLs:  48%|████▊     | 479/1000 [16:56<13:30,  1.56s/it]

Error extracting text from https://www.amazon.com/Manufactured-Crisis-Untold-Story-Nuclear/dp/1935982338: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Manufactured-Crisis-Untold-Story-Nuclear/dp/1935982338
URL filtered: https://www.youtube.com/watch?v=eNCz7i4J70U


Processing URLs:  48%|████▊     | 482/1000 [16:59<10:09,  1.18s/it]

Error extracting text from https://www.reuters.com/article/us-usa-election-alabama/voters-head-to-polls-in-alabama-race-with-high-stakes-for-trump-idUSKBN1E616L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-alabama/voters-head-to-polls-in-alabama-race-with-high-stakes-for-trump-idUSKBN1E616L


Processing URLs:  48%|████▊     | 484/1000 [16:59<06:48,  1.26it/s]

Error extracting text from http://www.cdm.me/english/ratification-of-the-protocol-in-romania-italy-and-greece-by-the-end-of-july: 403 Client Error: Forbidden for url: https://www.cdm.me/english/ratification-of-the-protocol-in-romania-italy-and-greece-by-the-end-of-july


Processing URLs:  48%|████▊     | 485/1000 [16:59<05:41,  1.51it/s]

Error extracting text from http://www.wsj.com/articles/donald-trump-and-ben-carson-gain-strength-in-poll-of-republicans-1445288400: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/donald-trump-and-ben-carson-gain-strength-in-poll-of-republicans-1445288400


Processing URLs:  49%|████▊     | 487/1000 [17:04<09:53,  1.16s/it]

URL filtered: https://mobile.twitter.com/JZarif?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  49%|████▉     | 491/1000 [17:06<06:40,  1.27it/s]

Error extracting text from http://www.nytimes.com/2016/04/13/us/politics/donald-trump-losing-ground-tries-to-blame-the-system.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/13/us/politics/donald-trump-losing-ground-tries-to-blame-the-system.html
Error extracting text from http://blogs.wsj.com/economics/2016/05/05/qa-why-a-cautious-economic-forecaster-thinks-theres-a-60-chance-of-recession-in-the-next-year/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/economics/2016/05/05/qa-why-a-cautious-economic-forecaster-thinks-theres-a-60-chance-of-recession-in-the-next-year/


Processing URLs:  50%|████▉     | 495/1000 [17:09<05:19,  1.58it/s]

Error extracting text from http://www.reuters.tv/U6G/2016/01/29/fears-rise-around-n-korea-s-missile-program: HTTPConnectionPool(host='www.reuters.tv', port=80): Max retries exceeded with url: /U6G/2016/01/29/fears-rise-around-n-korea-s-missile-program (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe55d2e0>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|████▉     | 497/1000 [17:13<10:01,  1.20s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2015/12/15/why-russias-payback-to-turkey-could-be-lethal/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2015/12/15/why-russias-payback-to-turkey-could-be-lethal/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe55f980>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|████▉     | 498/1000 [17:14<09:35,  1.15s/it]

Error extracting text from https://www.nasa.gov/feature/goddard/2018/hubble-in-safe-mode-as-gyro-issues-are-diagnosed: 404 Client Error: Not Found for url: https://www.nasa.gov/feature/goddard/2018/hubble-in-safe-mode-as-gyro-issues-are-diagnosed


Processing URLs:  50%|█████     | 501/1000 [17:19<10:04,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-germany-idUSKBN0UI10M20160104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-germany-idUSKBN0UI10M20160104


Processing URLs:  50%|█████     | 505/1000 [17:22<06:21,  1.30it/s]

Error extracting text from https://www.nytimes.com/2017/06/18/us/politics/russia-trump-trademarks.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/18/us/politics/russia-trump-trademarks.html?_r=0


Processing URLs:  51%|█████     | 506/1000 [18:22<2:32:08, 18.48s/it]

Error extracting text from http://www.usnews.com/news/business/articles/2015/09/25/tsipras-pledges-to-lead-greece-out-of-crisis-by-2019: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  51%|█████     | 508/1000 [18:27<1:22:48, 10.10s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-imf-debt-idUSKBN17Y27F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-imf-debt-idUSKBN17Y27F


Processing URLs:  51%|█████▏    | 514/1000 [18:34<19:02,  2.35s/it]  

URL filtered: https://twitter.com/ct_bergstrom/status/1427767363080843265?lang=en


Processing URLs:  52%|█████▏    | 517/1000 [18:36<10:04,  1.25s/it]

Error extracting text from http://s4.reutersmedia.net/resources/r/?m=02&amp;d=20161201&amp;t=2&amp;i=1163789801&amp;w=780&amp;fh=&amp;fw=&amp;ll=&amp;pl=&amp;sq=&amp;r=LYNXMPECB02HU: 500 Server Error: Internal Server Error for url: https://s4.reutersmedia.net/resources/r/?m=02&amp;d=20161201&amp;t=2&amp;i=1163789801&amp;w=780&amp;fh=&amp;fw=&amp;ll=&amp;pl=&amp;sq=&amp;r=LYNXMPECB02HU


Processing URLs:  52%|█████▏    | 519/1000 [18:39<10:49,  1.35s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-24/opec-seen-holding-the-line-as-40-oil-looms-over-vienna-meeting


Processing URLs:  53%|█████▎    | 526/1000 [18:46<08:24,  1.06s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/28.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/28.htm


Processing URLs:  53%|█████▎    | 530/1000 [18:51<07:41,  1.02it/s]

Error extracting text from http://www.nytimes.com/2015/11/16/business/international/japan-economy-contracts-0-8-returning-to-recession.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/16/business/international/japan-economy-contracts-0-8-returning-to-recession.html


Processing URLs:  53%|█████▎    | 532/1000 [18:53<07:40,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-offensive-idUSKCN0X31US: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-offensive-idUSKCN0X31US


Processing URLs:  54%|█████▍    | 539/1000 [19:04<10:39,  1.39s/it]

Error extracting text from http://finance.yahoo.com/news/iraqs-military-still-struggling-despite-us-training-190315708.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/iraqs-military-still-struggling-despite-us-training-190315708.html


Processing URLs:  54%|█████▍    | 543/1000 [19:12<12:14,  1.61s/it]

Error extracting text from http://www.iran-daily.com/News/136434.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  55%|█████▍    | 546/1000 [19:15<09:13,  1.22s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Russia-open-to-Japanese-ownership-of-Siberian-energy-ventures: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Russia-open-to-Japanese-ownership-of-Siberian-energy-ventures


Processing URLs:  55%|█████▍    | 547/1000 [19:17<09:39,  1.28s/it]

Error extracting text from http://in.reuters.com/article/us-brazil-politics-duck-idINKCN0WM0F1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  55%|█████▌    | 550/1000 [19:23<14:31,  1.94s/it]

Error extracting text from https://www.aa.com.tr/en/asia-pacific/with-fall-of-logar-taliban-close-in-on-afghanistans-capital/2333360: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  56%|█████▌    | 556/1000 [19:32<11:00,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=HYbKa494c8A


Processing URLs:  56%|█████▌    | 559/1000 [19:34<08:10,  1.11s/it]

Error extracting text from http://en.trend.az/iran/politics/2541350.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2541350.html


Processing URLs:  56%|█████▋    | 563/1000 [19:44<15:34,  2.14s/it]

Error extracting text from https://www.reuters.com/world/europe/russian-prosecutor-submits-more-material-navalny-extremism-case-lawyers-2021-05-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russian-prosecutor-submits-more-material-navalny-extremism-case-lawyers-2021-05-17/


Processing URLs:  57%|█████▋    | 568/1000 [19:51<10:26,  1.45s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/12th-times-the-charm-thai-army-chief-vows-no-more-coups: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  57%|█████▋    | 571/1000 [19:53<06:04,  1.18it/s]

Error extracting text from http://www.debka.com/article/25658/Russia-Dissolve-US-Arab-Israeli-Syria-war-room: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25658/Russia-Dissolve-US-Arab-Israeli-Syria-war-room (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://twitter.com/realDonaldTrump/status/293817480131002368


Processing URLs:  58%|█████▊    | 577/1000 [20:05<10:50,  1.54s/it]

Error extracting text from http://www.reuters.com/article/2014/08/01/us-germany-ecclestone-trial-idUSKBN0G13YZ20140801: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2014/08/01/us-germany-ecclestone-trial-idUSKBN0G13YZ20140801
Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13941209000822: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13941209000822 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3020e3f80>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 579/1000 [20:06<08:20,  1.19s/it]

Error extracting text from http://www.payvand.com/news/16/apr/1100.html: 404 Client Error: Not Found for url: http://www.payvand.com/news/16/apr/1100.html


Processing URLs:  59%|█████▊    | 586/1000 [20:12<05:09,  1.34it/s]

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-usa/trump-expected-to-decertify-iran-nuclear-deal-official-says-idUSKBN1CA2ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa/trump-expected-to-decertify-iran-nuclear-deal-official-says-idUSKBN1CA2ID
Error extracting text from http://www.reuters.com/article/us-nigeria-security-oil-idUSKCN0YU0VZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security-oil-idUSKCN0YU0VZ?il=0


Processing URLs:  59%|█████▊    | 587/1000 [20:13<05:25,  1.27it/s]

Error extracting text from http://www.swissinfo.ch/eng/protesters-clash-ahead-of-swearing-in-of-lula-in-brazil/42028734: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/protesters-clash-ahead-of-swearing-in-of-lula-in-brazil/42028734


Processing URLs:  59%|█████▉    | 589/1000 [20:14<04:24,  1.56it/s]

Error extracting text from https://www.gjopen.com/faq/questions: 404 Client Error: Not Found for url: https://www.gjopen.com/faq/questions
Error extracting text from http://www.nytimes.com/2016/02/28/magazine/the-robots-are-coming-for-wall-street.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/28/magazine/the-robots-are-coming-for-wall-street.html


Processing URLs:  59%|█████▉    | 590/1000 [20:14<03:56,  1.73it/s]

Error extracting text from https://en.yna.co.kr/view/AEN20200421003652325: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: /view/AEN20200421003652325 (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  59%|█████▉    | 592/1000 [20:18<07:56,  1.17s/it]

Error extracting text from http://www.prnewswire.com/news-releases/sportsbookcom-odds-makers-choose-roger-goodell-to-be-next-nfl-commissioner-55851672.html: 404 Client Error: Not Found for url: https://www.prnewswire.com/news-releases/sportsbookcom-odds-makers-choose-roger-goodell-to-be-next-nfl-commissioner-55851672.html
URL filtered: http://www.bloomberg.com/news/articles/2016-04-28/hunt-for-brexit-beating-trade-spotlights-inflation-linked-debt


Processing URLs:  59%|█████▉    | 594/1000 [20:19<06:02,  1.12it/s]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKBN17U1JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.bloomberg.com/news/articles/2017-10-10/what-one-man-s-fate-says-about-xi-s-plans-to-keep-power-in-china


Processing URLs:  60%|█████▉    | 596/1000 [20:19<04:03,  1.66it/s]

Error extracting text from https://apps.fcc.gov/edocs_public/attachmatch/DOC-347927A1.pdf: 403 Client Error: Forbidden for url: https://apps.fcc.gov/edocs_public/attachmatch/DOC-347927A1.pdf


Processing URLs:  60%|█████▉    | 598/1000 [20:23<07:53,  1.18s/it]

Error extracting text from http://timesofindia.indiatimes.com/world/china/US-launches-quiet-diplomacy-to-ease-South-China-Sea-tensions/articleshow/53205807.cms: 410 Client Error: Gone for url: https://timesofindia.indiatimes.com/world/china/US-launches-quiet-diplomacy-to-ease-South-China-Sea-tensions/articleshow/53205807.cms
URL filtered: https://twitter.com/Rover829/status/1503031849697828869


Processing URLs:  60%|██████    | 601/1000 [20:25<06:34,  1.01it/s]

Error extracting text from http://www.caracaschronicles.com/mapa/: 403 Client Error: Forbidden for url: http://www.caracaschronicles.com/mapa/


Processing URLs:  60%|██████    | 602/1000 [20:27<07:14,  1.09s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYMW7I6KLVRK01-77QVV1955EDOQEDADC8DNB5D5S


Processing URLs:  60%|██████    | 604/1000 [20:28<05:21,  1.23it/s]

Error extracting text from http://uk.reuters.com/article/uk-northkorea-nuclear-park-idUKKCN0V008Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.bloomberg.com/news/articles/2018-01-10/canada-officials-said-to-see-odds-rising-of-trump-leaving-nafta-jc9gua48


Processing URLs:  61%|██████    | 606/1000 [20:28<03:38,  1.80it/s]

Error extracting text from https://translate.google.com/translate?sl=mk&amp;tl=en&amp;u=https://popis2021.stat.gov.mk/: 400 Client Error: Bad Request for url: https://translate.google.com/translate?sl=mk&amp;tl=en&amp;u=https://popis2021.stat.gov.mk/


Processing URLs:  61%|██████    | 608/1000 [20:29<03:46,  1.73it/s]

Error extracting text from http://www.gov.me/en/News/157682/Two-day-accession-talks-between-Montenegro-and-NATO-successfully-completed-in-Brussels.html: 404 Client Error: not found for url: https://www.gov.me/en/News/157682/Two-day-accession-talks-between-Montenegro-and-NATO-successfully-completed-in-Brussels.html
Error extracting text from http://www.straitstimes.com/asia/sino-philippine-tensions-easing-further-over-scarborough: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  61%|██████▏   | 614/1000 [20:37<07:07,  1.11s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57877#.WeEPS0wfl-U: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57877#.WeEPS0wfl-U
URL filtered: https://twitter.com/kyawtun62907405/status/1441199471992336385


Processing URLs:  62%|██████▏   | 619/1000 [20:39<03:23,  1.87it/s]

Error extracting text from http://syriadirect.org/news/sectarianism-hides-truth-that-%E2%80%98people-have-no-role-in-their-destinies%E2%80%99-says-alawite-dissident/: 404 Client Error: Not Found for url: http://syriadirect.org/news/sectarianism-hides-truth-that-%E2%80%98people-have-no-role-in-their-destinies%E2%80%99-says-alawite-dissident/
Error extracting text from http://jakartaglobe.beritasatu.com/world/el-pais-spains-best-selling-newspaper-hints-may-end-print-edition/: HTTPConnectionPool(host='jakartaglobe.beritasatu.com', port=80): Max retries exceeded with url: /world/el-pais-spains-best-selling-newspaper-hints-may-end-print-edition/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30364fd70>: Failed to resolve 'jakartaglobe.beritasatu.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-islamicstate-idUSKBN15426K?il=0: 401 Client Error: HTTP Forbidden for 

Processing URLs:  62%|██████▏   | 621/1000 [20:43<07:05,  1.12s/it]

Error extracting text from http://worldwidelogisticsltd.com/panama-canal-vessel-backlog-eases/: 404 Client Error: Not Found for url: https://worldwidelogisticsltd.com/panama-canal-vessel-backlog-eases/


Processing URLs:  62%|██████▎   | 625/1000 [20:48<06:45,  1.08s/it]

Error extracting text from http://www7.irna.ir/en/News/82081833/: HTTPConnectionPool(host='www7.irna.ir', port=80): Max retries exceeded with url: /en/News/82081833/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feeb7fb0>: Failed to resolve 'www7.irna.ir' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 626/1000 [20:50<07:46,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-bailout-idUSKBN17R146: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-bailout-idUSKBN17R146


Processing URLs:  64%|██████▎   | 635/1000 [21:07<12:28,  2.05s/it]

Error extracting text from https://bit.ly/3am1jDG: 403 Client Error: Forbidden for url: https://globalriskinsights.com/2021/01/a-new-italian-hard-right-coalition/


Processing URLs:  64%|██████▎   | 637/1000 [21:09<07:53,  1.30s/it]

Error extracting text from http://thehill.com/policy/finance/346080-gop-senators-ask-trump-not-to-target-venezuelan-oil-with-sanctions: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/346080-gop-senators-ask-trump-not-to-target-venezuelan-oil-with-sanctions/


Processing URLs:  64%|██████▍   | 640/1000 [21:12<06:16,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-usa-idUSKCN0ZR0LT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-usa-idUSKCN0ZR0LT


Processing URLs:  64%|██████▍   | 643/1000 [21:16<07:08,  1.20s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-02/17/c_135106756.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/17/c_135106756.htm


Processing URLs:  64%|██████▍   | 644/1000 [21:16<05:38,  1.05it/s]

Error extracting text from https://seekingalpha.com/article/4060927-irans-oil-production-fallacy-fallowed: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4060927-irans-oil-production-fallacy-fallowed


Processing URLs:  64%|██████▍   | 645/1000 [21:18<07:09,  1.21s/it]

Error extracting text from http://uk.businessinsider.com/privatizing-air-traffic-control-for-drone-delivery-2017-6?r=US&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/(null)/privatizing-air-traffic-control-for-drone-delivery-2017-6?amp;IR=T&IR=T


Processing URLs:  65%|██████▍   | 646/1000 [21:19<06:05,  1.03s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/10/a-super-long-picky-post/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/10/a-super-long-picky-post/


Processing URLs:  65%|██████▍   | 647/1000 [21:19<05:04,  1.16it/s]

Error extracting text from https://larswericson.wordpress.com/2016/04/02/gitrep-01apr16/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/02/gitrep-01apr16/


Processing URLs:  65%|██████▌   | 650/1000 [21:22<05:59,  1.03s/it]

Error extracting text from https://www.reuters.com/article/us-iran-military-russia-china/russia-china-iran-start-joint-naval-drills-in-indian-ocean-idUSKBN1YV0IB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-military-russia-china/russia-china-iran-start-joint-naval-drills-in-indian-ocean-idUSKBN1YV0IB


Processing URLs:  66%|██████▌   | 655/1000 [21:33<10:53,  1.89s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-03/who-is-alexander-dugin-the-man-linking-putin-erdogan-and-trump


Processing URLs:  66%|██████▌   | 658/1000 [21:37<10:15,  1.80s/it]

URL filtered: http://www.reuters.com/article/us-facebook-ceo-trump/trump-dismisses-facebook-ads-controversy-as-part-of-russia-hoax-idUSKCN1BX1CK?il=0


Processing URLs:  66%|██████▌   | 660/1000 [21:38<07:48,  1.38s/it]

Error extracting text from http://en.trend.az/iran/politics/2437937.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2437937.html


Processing URLs:  66%|██████▌   | 661/1000 [21:39<07:17,  1.29s/it]

Error extracting text from http://www.newyorker.com/magazine/2015/11/16/the-gene-hackers%22: 404 Client Error: Not Found for url: https://www.newyorker.com/magazine/2015/11/16/the-gene-hackers%22


Processing URLs:  67%|██████▋   | 671/1000 [22:19<15:55,  2.91s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKBN16B0PU?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKBN16B0PU?il=0


Processing URLs:  67%|██████▋   | 673/1000 [22:20<09:25,  1.73s/it]

Error extracting text from http://turcopolier.typepad.com/sic_semper_tyrannis/2016/04/httpswwwalmasdarnewscomarticlebattle-aleppo-city-begins-syrian-army-advances-north-map-update.html: 403 Client Error: Forbidden for url: https://turcopolier.typepad.com/sic_semper_tyrannis/2016/04/httpswwwalmasdarnewscomarticlebattle-aleppo-city-begins-syrian-army-advances-north-map-update.html


Processing URLs:  68%|██████▊   | 676/1000 [22:21<04:57,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-saudi-iran-yemen-idUSKBN0UL15F20160107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-iran-yemen-idUSKBN0UL15F20160107
Error extracting text from https://www.cihan.com.tr/en/iraq-isil-mosul-2023691.htm: HTTPSConnectionPool(host='www.cihan.com.tr', port=443): Max retries exceeded with url: /en/iraq-isil-mosul-2023691.htm (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301992210>: Failed to resolve 'www.cihan.com.tr' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 677/1000 [22:23<05:11,  1.04it/s]

Error extracting text from https://www.thecipherbrief.com/article/exclusive/north-america/fisa-can-fixed-without-risking-american-lives: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/exclusive/north-america/fisa-can-fixed-without-risking-american-lives


Processing URLs:  68%|██████▊   | 681/1000 [22:28<07:05,  1.33s/it]

Error extracting text from http://www.commdiginews.com/world-news/middle-east/history-lesson-if-isis-is-not-islam-then-what-is-it-69141/: 403 Client Error: Forbidden for url: https://www.americanwirenews.com


Processing URLs:  68%|██████▊   | 685/1000 [22:42<16:54,  3.22s/it]

Error extracting text from http://www.cfr.org/global/global-conflict-tracker/p32137#!/: 404 Client Error: Not Found for url: https://www.cfr.org/global/global-conflict-tracker/p32137#!/


Processing URLs:  70%|██████▉   | 695/1000 [23:09<14:27,  2.84s/it]

Error extracting text from https://ycharts.com/indicators/brent_crude_oil_spot_price: 403 Client Error: Forbidden for url: https://ycharts.com/indicators/brent_crude_oil_spot_price


Processing URLs:  70%|██████▉   | 698/1000 [23:12<07:40,  1.52s/it]

Error extracting text from https://www.fcc.gov/about-fcc/rulemaking-process: 403 Client Error: Forbidden for url: https://www.fcc.gov/about-fcc/rulemaking-process
Error extracting text from https://balkaninsight.com/2021/09/20/north-macedonias-sensitive-census-on-track-for-success/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/09/20/north-macedonias-sensitive-census-on-track-for-success/


Processing URLs:  70%|██████▉   | 699/1000 [23:24<24:03,  4.80s/it]

Error extracting text from http://focus-fen.net/news/2016/03/16/400693/montenegro-may-become-nato-member-by-mid-2017-pm.html: HTTPConnectionPool(host='focus-fen.net', port=80): Max retries exceeded with url: /news/2016/03/16/400693/montenegro-may-become-nato-member-by-mid-2017-pm.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303320e00>: Failed to resolve 'focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|███████   | 700/1000 [23:25<18:06,  3.62s/it]

Error extracting text from http://blogs.wsj.com/chinarealtime/2016/04/08/vietnam-tells-china-to-remove-oil-rig-from-disputed-waters/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/chinarealtime/2016/04/08/vietnam-tells-china-to-remove-oil-rig-from-disputed-waters/


Processing URLs:  70%|███████   | 703/1000 [23:30<12:12,  2.47s/it]

URL filtered: https://twitter.com/rorihuela/status/730486903007490049


Processing URLs:  70%|███████   | 705/1000 [24:30<1:08:20, 13.90s/it]

Error extracting text from https://www.cmegroup.com/trading/interest-rates/stir/mpc-sonia-futures_quotes_globex.html: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  71%|███████   | 708/1000 [25:35<1:48:26, 22.28s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-01-26/kremlin-says-vigorous-efforts-needed-to-extend-russia-us-new-start-arms-treaty: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 717/1000 [25:51<11:08,  2.36s/it]  

Error extracting text from https://socialistworker.co.uk/art/42527/US+sends+B-52+bombers+to+attack+Iraq: 403 Client Error: Forbidden for url: https://socialistworker.co.uk/art/42527/US+sends+B-52+bombers+to+attack+Iraq


Processing URLs:  72%|███████▏  | 719/1000 [26:13<35:21,  7.55s/it]

Error extracting text from http://blogs.wsj.com/law/2016/05/19/furious-federal-judge-orders-justice-department-lawyers-to-undergo-ethics-training/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/law/2016/05/19/furious-federal-judge-orders-justice-department-lawyers-to-undergo-ethics-training/


Processing URLs:  72%|███████▏  | 724/1000 [26:17<10:04,  2.19s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Japan-S-Korea-US-agree-to-cooperate-over-N-Korea-issues: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Japan-S-Korea-US-agree-to-cooperate-over-N-Korea-issues
Error extracting text from http://www.nytimes.com/2016/11/22/world/middleeast/iraq-civilians-flee-mosul.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/22/world/middleeast/iraq-civilians-flee-mosul.html?_r=0


Processing URLs:  73%|███████▎  | 726/1000 [26:19<07:36,  1.67s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-28/oil-majors-queue-up-in-iran-with-30-billion-of-projects-in-play


Processing URLs:  73%|███████▎  | 730/1000 [26:29<07:46,  1.73s/it]

Error extracting text from http://m.washingtontimes.com/news/2016/may/12/pentagon-mosul-battle-plan-hands-iraq-not-us/: 403 Client Error: Forbidden for url: http://m.washingtontimes.com/news/2016/may/12/pentagon-mosul-battle-plan-hands-iraq-not-us/
Error extracting text from https://www.thelancet.com/journals/lanres/article/PIIS2213-2600(21)00075-8/fulltext: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lanres/article/PIIS2213-2600(21)00075-8/fulltext


Processing URLs:  73%|███████▎  | 731/1000 [26:29<05:50,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-idUSKBN0TQ1II20151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKBN0TQ1II20151207


Processing URLs:  73%|███████▎  | 734/1000 [26:44<15:36,  3.52s/it]

Error extracting text from https://trends.google.com/trends/explore?date=today%203-m&geo=HK&q=Hong%20Kong%20protest: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?date=today%203-m&geo=HK&q=Hong%20Kong%20protest


Processing URLs:  74%|███████▍  | 739/1000 [26:51<07:31,  1.73s/it]

Error extracting text from https://www.ijidonline.com/article/S1201-9712(20)30011-4/pdf: 403 Client Error: Forbidden for url: https://www.ijidonline.com/article/S1201-9712(20)30011-4/pdf


Processing URLs:  75%|███████▍  | 748/1000 [27:11<09:19,  2.22s/it]

Error extracting text from http://www.reuters.com/article/us-oil-opec-kuwait-idUSKBN16W0LC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-opec-kuwait-idUSKBN16W0LC


Processing URLs:  75%|███████▌  | 750/1000 [27:13<06:46,  1.63s/it]

URL filtered: https://www.youtube.com/watch?v=y-0ge_HqTf8


Processing URLs:  75%|███████▌  | 753/1000 [27:15<04:30,  1.10s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-03-03/what-to-make-of-these-twice-in-history-s-p-500-valuations


Processing URLs:  76%|███████▌  | 757/1000 [27:20<04:57,  1.22s/it]

Error extracting text from http://www.dtic.mil/dtic/tr/fulltext/u2/a228098.pdf: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /dtic/tr/fulltext/u2/a228098.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  76%|███████▌  | 758/1000 [27:22<05:20,  1.32s/it]

Error extracting text from http://www.newsweek.com/sheikh-zakzaky-zaria-nigeria-islamic-movement-nigeria-439947: 403 Client Error: Forbidden for url: https://www.newsweek.com/sheikh-zakzaky-zaria-nigeria-islamic-movement-nigeria-439947


Processing URLs:  76%|███████▌  | 762/1000 [27:29<06:27,  1.63s/it]

Error extracting text from http://ragingbullshit.com/2016/01/30/even-after-years-of-ttip-talks-new-study-still-unable-to-point-to-any-major-benefits/: 404 Client Error: Not Found for url: https://ragingbullshit.com/2016/01/30/even-after-years-of-ttip-talks-new-study-still-unable-to-point-to-any-major-benefits/


Processing URLs:  76%|███████▋  | 763/1000 [27:29<05:02,  1.28s/it]

Error extracting text from https://bit.ly/2LceOgs: 403 Client Error: Forbidden for url: https://news.usni.org/2020/12/29/chinese-navy-expanding-bases-near-south-china-sea


Processing URLs:  77%|███████▋  | 768/1000 [27:43<08:05,  2.09s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/venezuela-congress-election-what-to-watch-as-results-come-in


Processing URLs:  77%|███████▋  | 773/1000 [27:50<05:49,  1.54s/it]

Error extracting text from http://www.reuters.com/article/us-usa-oil-storage-analysis-idUSKBN1630I2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-oil-storage-analysis-idUSKBN1630I2


Processing URLs:  78%|███████▊  | 776/1000 [27:55<05:24,  1.45s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.drb.de/cms/index.php%3Fid%3D952&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.drb.de/cms/index.php%3Fid%3D952&amp;prev=search


Processing URLs:  78%|███████▊  | 778/1000 [27:57<04:40,  1.26s/it]

URL filtered: https://www.youtube.com/watch?v=9qn8Bt7YqGc


Processing URLs:  78%|███████▊  | 781/1000 [27:59<02:37,  1.39it/s]

Error extracting text from https://www.nytimes.com/2021/03/31/world/americas/brazil-coronavirus-bolsonaro.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/31/world/americas/brazil-coronavirus-bolsonaro.html


Processing URLs:  78%|███████▊  | 782/1000 [27:59<02:20,  1.55it/s]

Error extracting text from https://www.nytimes.com/2020/11/30/us/politics/joe-manchin-interview.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/11/30/us/politics/joe-manchin-interview.html


Processing URLs:  78%|███████▊  | 783/1000 [27:59<01:54,  1.90it/s]

Error extracting text from https://www.nytimes.com/2020/12/20/health/coronavirus-britain-variant.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/20/health/coronavirus-britain-variant.html


Processing URLs:  78%|███████▊  | 785/1000 [28:00<01:47,  2.00it/s]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0278434317300250: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0278434317300250


Processing URLs:  79%|███████▊  | 787/1000 [28:02<02:26,  1.46it/s]

Error extracting text from https://www.zimlive.com/2021/07/15/after-hesitancy-mozambique-signs-agreement-for-sadc-troop-deployment/: 406 Client Error: Not Acceptable for url: https://www.zimlive.com/2021/07/15/after-hesitancy-mozambique-signs-agreement-for-sadc-troop-deployment/


Processing URLs:  79%|███████▉  | 791/1000 [28:11<05:56,  1.70s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=releases&amp;id=titanic.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=releases&amp;id=titanic.htm


Processing URLs:  79%|███████▉  | 794/1000 [28:17<06:07,  1.78s/it]

URL filtered: https://twitter.com/FT/status/1440439285761462281


Processing URLs:  80%|████████  | 800/1000 [28:25<05:09,  1.55s/it]

Error extracting text from http://www.polioeradication.org/Dataandmonitoring/: 404 Client Error: Not Found for url: https://polioeradication.org/Dataandmonitoring/


Processing URLs:  80%|████████  | 803/1000 [28:27<03:27,  1.06s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN18S5O4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN18S5O4
Error extracting text from http://www.khaama.com/intelligence-chief-defense-minister-nominees-introduced-to-parliament-for-voting-01200: 403 Client Error: Forbidden for url: http://www.khaama.com/intelligence-chief-defense-minister-nominees-introduced-to-parliament-for-voting-01200


Processing URLs:  80%|████████  | 805/1000 [28:30<03:51,  1.18s/it]

Error extracting text from https://www.carper.senate.gov/public/index.cfm/how-a-bill-becomes-a-law: 403 Client Error: Forbidden for url: https://www.carper.senate.gov/public/index.cfm/how-a-bill-becomes-a-law


Processing URLs:  81%|████████  | 806/1000 [28:31<03:15,  1.01s/it]

Error extracting text from http://www.amazon.com/gp/new-releases/books/2689/ref=zg_b_hnr_2689_1: 503 Server Error: Service Unavailable for url: https://www.amazon.com/gp/new-releases/books/2689/ref=zg_b_hnr_2689_1


Processing URLs:  81%|████████▏ | 813/1000 [28:48<05:04,  1.63s/it]

Error extracting text from https://www.faa.gov/uas/programs_partnerships/uas_integration_pilot_program/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/programs_partnerships/uas_integration_pilot_program/
URL filtered: https://twitter.com/hashtag/caucusforhillary
URL filtered: https://www.youtube.com/watch?v=QpbQ4I3Eidg


Processing URLs:  82%|████████▏ | 816/1000 [28:51<03:56,  1.29s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-moore-sessions/senate-leaders-look-to-work-with-white-house-to-block-moore-idUSKBN1DE2N1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-moore-sessions/senate-leaders-look-to-work-with-white-house-to-block-moore-idUSKBN1DE2N1


Processing URLs:  82%|████████▏ | 818/1000 [28:52<03:10,  1.05s/it]

URL filtered: https://www.youtube.com/watch?v=rPILhiTJv7E


Processing URLs:  82%|████████▏ | 822/1000 [28:55<02:36,  1.14it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=54441: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=54441


Processing URLs:  83%|████████▎ | 826/1000 [29:13<07:53,  2.72s/it]

URL filtered: https://www.youtube.com/watch?v=OU_fzCpwrNc


Processing URLs:  83%|████████▎ | 830/1000 [29:24<06:48,  2.40s/it]

Error extracting text from http://carnegieeurope.eu/strategiceurope/?fa=63370: 403 Client Error: Forbidden for url: http://carnegieeurope.eu/strategiceurope/?fa=63370


Processing URLs:  83%|████████▎ | 831/1000 [29:27<06:47,  2.41s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2015/Oct-13/318710-iranian-parliament-passes-bill-approving-nuclear-deal-irna.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2015/Oct-13/318710-iranian-parliament-passes-bill-approving-nuclear-deal-irna.ashx


Processing URLs:  83%|████████▎ | 833/1000 [29:27<03:50,  1.38s/it]

Error extracting text from http://www.wsj.com/graphics/border-tax/: 403 Client Error: Forbidden for url: https://www.wsj.com/graphics/border-tax/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.bbc.com/portuguese/noticias/2016/03/160305_semana_negra_impeachment_ms_ab&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.bbc.com/portuguese/noticias/2016/03/160305_semana_negra_impeachment_ms_ab&amp;prev=search


Processing URLs:  83%|████████▎ | 834/1000 [29:29<03:47,  1.37s/it]

Error extracting text from http://www.kentlive.news/10-000-revellers-descend-on-southbeats-festival-at-quex-park/story-29752082-detail/story.html: 410 Client Error: Gone for url: https://www.kentlive.news/10-000-revellers-descend-on-southbeats-festival-at-quex-park/story-29752082-detail/story.html


Processing URLs:  84%|████████▎ | 835/1000 [29:30<03:59,  1.45s/it]

Error extracting text from https://uk.reuters.com/article/uk-ireland-politics/irish-government-on-verge-of-collapse-ahead-of-eu-brexit-summit-idUKKBN1DN25S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from https://www.reuters.com/article/us-usa-trump-sessions-idUSKBN1AA21V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-sessions-idUSKBN1AA21V


Processing URLs:  84%|████████▍ | 838/1000 [30:33<42:24, 15.71s/it]

Error extracting text from https://www.justice.gov/opa/speech/assistant-attorney-general-john-p-carlin-delivers-remarks-press-conference-announcing: HTTPSConnectionPool(host='www.justice.gov', port=443): Read timed out. (read timeout=60)


Processing URLs:  84%|████████▍ | 840/1000 [30:35<23:51,  8.94s/it]

Error extracting text from https://www.goodjudgmentproject.com/questions/15-will-bashar-al-assad-cease-to-be-president-of-syria-before-1-march-2017: 404 Client Error: Not Found for url: https://www.goodjudgmentproject.com/questions/15-will-bashar-al-assad-cease-to-be-president-of-syria-before-1-march-2017


Processing URLs:  84%|████████▍ | 842/1000 [30:39<14:32,  5.52s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/nato-chief-russia-interference-boosts-montenegro-chances-35321767: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/nato-chief-russia-interference-boosts-montenegro-chances-35321767


Processing URLs:  84%|████████▍ | 845/1000 [30:41<06:02,  2.34s/it]

Error extracting text from http://www.cdc.gov/about/history/sars/feature.htm: 404 Client Error: Not Found for url: https://www.cdc.gov/about/history/sars/feature.htm
Error extracting text from http://www.nytimes.com/2016/04/25/world/europe/wary-of-big-business-germans-protest-trade-deal-as-obama-visits: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/25/world/europe/wary-of-big-business-germans-protest-trade-deal-as-obama-visits


Processing URLs:  85%|████████▍ | 847/1000 [30:45<06:12,  2.43s/it]

Error extracting text from http://uk.reuters.com/article/uk-iran-lukoil-idUKKBN15D0Z5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  85%|████████▍ | 849/1000 [30:52<07:02,  2.80s/it]

Error extracting text from https://www.ipsos-mori.com/researchpublications/researcharchive/3781/Brexit-does-not-trigger-significant-increase-in-support-for-independence.aspx: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-uk/researchpublications/researcharchive/3781/Brexit-does-not-trigger-significant-increase-in-support-for-independence.aspx


Processing URLs:  85%|████████▌ | 853/1000 [30:58<03:55,  1.60s/it]

Error extracting text from http://nationalinterest.org/feature/the-growing-danger-military-conflict-russia-18013: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/the-growing-danger-military-conflict-russia-18013


Processing URLs:  86%|████████▌ | 856/1000 [31:07<05:20,  2.23s/it]

Error extracting text from http://www.nasdaq.com/markets/soybean.aspx: 403 Client Error: Forbidden for url: http://www.nasdaq.com/markets/soybean.aspx


Processing URLs:  86%|████████▋ | 863/1000 [31:17<04:16,  1.87s/it]

URL filtered: https://www.youtube.com/watch?v=yykSHYjXI0s


Processing URLs:  87%|████████▋ | 868/1000 [31:21<02:15,  1.03s/it]

Error extracting text from http://www.businessinsider.com.au/iphone-6s-sales-disappointing-pacific-crest-2015-10: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/iphone-6s-sales-disappointing-pacific-crest-2015-10


Processing URLs:  87%|████████▋ | 870/1000 [31:24<02:22,  1.10s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-11/22/c_134842294.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-11/22/c_134842294.htm


Processing URLs:  88%|████████▊ | 875/1000 [31:38<04:38,  2.23s/it]

Error extracting text from http://nyti.ms/1VGfOr4: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/09/world/europe/russia-syria-intervention-nato.html?smid=pl-share


Processing URLs:  88%|████████▊ | 877/1000 [31:49<07:00,  3.42s/it]

Error extracting text from http://www.globaltimes.cn/content/1005283.shtml: 404 Client Error: Not Found for url: https://www.globaltimes.cn/content/1005283.shtml
Error extracting text from http://www.washingtontimes.com/news/2016/mar/24/iran-indictments-sanctions-send-signal-to-obama-nu/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/mar/24/iran-indictments-sanctions-send-signal-to-obama-nu/


Processing URLs:  88%|████████▊ | 879/1000 [31:49<03:50,  1.90s/it]

Error extracting text from https://www.wsj.com/articles/u-s-europe-in-agreement-on-russia-sanctions-eu-official-says-1486755982: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-europe-in-agreement-on-russia-sanctions-eu-official-says-1486755982


Processing URLs:  88%|████████▊ | 881/1000 [31:52<03:33,  1.80s/it]

URL filtered: https://www.youtube.com/watch?v=uA2RApcfE_0


Processing URLs:  88%|████████▊ | 885/1000 [31:56<02:16,  1.19s/it]

Error extracting text from https://www.nytimes.com/2016/12/12/world/africa/niger-nigeria-boko-haram-food-crisis.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/12/12/world/africa/niger-nigeria-boko-haram-food-crisis.html?_r=0


Processing URLs:  89%|████████▉ | 888/1000 [32:58<32:57, 17.65s/it]

Error extracting text from https://www.betfair.com/exchange/plus/#/politics/market/1.120736245: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fece2ae0>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:  89%|████████▉ | 891/1000 [33:01<12:31,  6.89s/it]

Error extracting text from https://www.wsj.com/articles/how-cash-strapped-chicago-snagged-a-triple-a-rating-for-its-new-bonds-1512320401: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/how-cash-strapped-chicago-snagged-a-triple-a-rating-for-its-new-bonds-1512320401


Processing URLs:  89%|████████▉ | 894/1000 [33:08<06:36,  3.74s/it]

Error extracting text from http://kepler.nasa.gov/: HTTPConnectionPool(host='kepler.nasa.gov', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fece0b90>: Failed to resolve 'kepler.nasa.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|████████▉ | 895/1000 [33:39<20:39, 11.80s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_eus-tusk-turkey-must-decide-how-to-cut-migrant-flow_413947.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_eus-tusk-turkey-must-decide-how-to-cut-migrant-flow_413947.html


Processing URLs:  90%|████████▉ | 898/1000 [33:42<07:59,  4.70s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/07/gitrep-06apr16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/07/gitrep-06apr16pm/


Processing URLs:  90%|█████████ | 901/1000 [33:47<04:02,  2.45s/it]

Error extracting text from https://www.amazon.com/Anon-Patriotic-Flag-Where-Qanon/dp/B07DX9CD52/ref=pd_lpo_vtph_193_bs_img_1?_encoding=UTF8&psc=1&refRID=75RJHVVY9CA2KE3JDVRF&dpID=41dV8q-f-8L&preST=_SX342_QL70_&dpSrc=detail: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Anon-Patriotic-Flag-Where-Qanon/dp/B07DX9CD52/ref=pd_lpo_vtph_193_bs_img_1?_encoding=UTF8&psc=1&refRID=75RJHVVY9CA2KE3JDVRF&dpID=41dV8q-f-8L&preST=_SX342_QL70_&dpSrc=detail


Processing URLs:  90%|█████████ | 903/1000 [33:49<02:47,  1.72s/it]

Error extracting text from https://www.consilium.europa.eu/en/meetings/european-council/2020/10/15-16/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/meetings/european-council/2020/10/15-16/


Processing URLs:  91%|█████████ | 909/1000 [33:56<01:52,  1.23s/it]

URL filtered: http://www.blick.ch/news/ausland/fluechtlingsboot-gekentert-bis-zu-700-menschen-an-bord-id5108544.html?utm_source=twitter&amp;utm_medium=social_page&amp;utm_campaign=bli


Processing URLs:  91%|█████████ | 911/1000 [33:56<01:04,  1.39it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-kerry-idUSKCN0Z11ET: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-kerry-idUSKCN0Z11ET


Processing URLs:  91%|█████████ | 912/1000 [33:58<01:30,  1.02s/it]

URL filtered: https://www.theguardian.com/world/2016/jul/12/philippines-wins-south-china-sea-case-against-china?CMP=Share_iOSApp_Other&amp;utm_content=bufferd82b0&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  92%|█████████▏| 919/1000 [34:07<02:15,  1.68s/it]

Error extracting text from http://www.electriccarpledge.com/electric-cars/electric-cars-for-sale-in-germany/: 403 Client Error: Forbidden for url: http://www.electriccarpledge.com/electric-cars/electric-cars-for-sale-in-germany/


Processing URLs:  92%|█████████▏| 921/1000 [34:09<01:35,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-clapper-idUSKCN12P2L7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-clapper-idUSKCN12P2L7


Processing URLs:  92%|█████████▏| 924/1000 [34:13<01:18,  1.03s/it]

Error extracting text from http://asiafoundation.org/2016/04/06/afghanistans-electoral-reform-distant-reality/: 403 Client Error: Forbidden for url: http://asiafoundation.org/2016/04/06/afghanistans-electoral-reform-distant-reality/


Processing URLs:  92%|█████████▎| 925/1000 [34:13<01:01,  1.23it/s]

Error extracting text from http://www.saigon-gpdaily.com.vn/International/2017/1/122884/: 403 Client Error: Forbidden for url: http://xoilac365.tv/International/2017/1/122884/


Processing URLs:  93%|█████████▎| 926/1000 [34:13<00:51,  1.43it/s]

Error extracting text from https://thehill.com/homenews/sunday-talk-shows/476819-pompeo-says-administrations-maximum-pressure-strategy-on-iran-is: 403 Client Error: Forbidden for url: https://thehill.com/homenews/sunday-talk-shows/476819-pompeo-says-administrations-maximum-pressure-strategy-on-iran-is/


Processing URLs:  93%|█████████▎| 928/1000 [34:16<01:06,  1.09it/s]

URL filtered: https://www.nytimes.com/2018/03/19/technology/facebook-alex-stamos.html


Processing URLs:  93%|█████████▎| 933/1000 [34:21<01:09,  1.04s/it]

Error extracting text from http://www.bostonherald.com/sites/default/files/media/2016/01/25/FPU-BH-Jan20-24-Dem.pdf: 404 Client Error: Not Found for url: https://www.bostonherald.com/sites/default/files/media/2016/01/25/FPU-BH-Jan20-24-Dem.pdf


Processing URLs:  94%|█████████▎| 935/1000 [34:22<00:54,  1.19it/s]

Error extracting text from http://www.plasticsnews.com/article/20180112/NEWS/180119953/gm-seeking-ok-for-autonomous-cruise-in-2019: 403 Client Error: Forbidden for url: https://www.plasticsnews.com/article/20180112/NEWS/180119953/gm-seeking-ok-for-autonomous-cruise-in-2019


Processing URLs:  94%|█████████▎| 936/1000 [34:22<00:45,  1.40it/s]

Error extracting text from http://www.bath.ac.uk/ipr/events/ipr-online-lectures/: 404 Client Error: Not Found for url: http://www.bath.ac.uk/ipr/events/ipr-online-lectures/


Processing URLs:  94%|█████████▎| 937/1000 [34:24<01:02,  1.01it/s]



Processing URLs:  94%|█████████▍| 944/1000 [34:34<01:12,  1.29s/it]

Error extracting text from http://www.nytimes.com/2015/11/06/opinion/in-iran-a-deal-and-then-a-crackdown.html?emc=edit_th_20151106&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/06/opinion/in-iran-a-deal-and-then-a-crackdown.html?emc=edit_th_20151106&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  94%|█████████▍| 945/1000 [34:35<00:55,  1.01s/it]

Error extracting text from https://www.wsj.com/articles/oil-prices-rise-on-bullish-sentiment-over-production-cuts-1487674430: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-prices-rise-on-bullish-sentiment-over-production-cuts-1487674430


Processing URLs:  95%|█████████▍| 946/1000 [34:39<01:40,  1.85s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-11-09/kerry-sees-a-deal-on-global-carbon-trading-rules-at-cop26-summit


Processing URLs:  95%|█████████▌| 951/1000 [34:42<00:48,  1.01it/s]

Error extracting text from https://www.france24.com/en/americas/20210812-haiti-postpones-election-date-to-replace-assassinated-president: 403 Client Error: Forbidden for url: https://www.france24.com/en/americas/20210812-haiti-postpones-election-date-to-replace-assassinated-president


Processing URLs:  96%|█████████▌| 955/1000 [35:45<05:12,  6.95s/it]

Error extracting text from http://thehill.com/homenews/administration/337796-tillerson-to-congress-dont-pass-sanctions-that-prevent-dialogue-with: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/337796-tillerson-to-congress-dont-pass-sanctions-that-prevent-dialogue-with/


Processing URLs:  96%|█████████▌| 958/1000 [35:51<02:38,  3.78s/it]

Error extracting text from http://www.suffolk.edu/documents/SUPRC/7_21_2016_marginals.pdf: 404 Client Error: Not Found for url: https://www.suffolk.edu/documents/SUPRC/7_21_2016_marginals.pdf


Processing URLs:  96%|█████████▌| 959/1000 [35:53<02:10,  3.18s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html


Processing URLs:  96%|█████████▋| 963/1000 [35:58<00:55,  1.49s/it]

Error extracting text from https://www.reuters.com/article/us-daimler-strategy-investors/mercedes-benz-to-offer-electric-option-for-every-car-by-2022-idUSKCN1BM0TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-daimler-strategy-investors/mercedes-benz-to-offer-electric-option-for-every-car-by-2022-idUSKCN1BM0TL


Processing URLs:  96%|█████████▋| 965/1000 [35:59<00:43,  1.24s/it]

Error extracting text from https://www.reuters.com/business/environment/fires-brazilian-amazon-retreat-september-2021-09-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/environment/fires-brazilian-amazon-retreat-september-2021-09-30/


Processing URLs:  97%|█████████▋| 973/1000 [36:08<00:34,  1.27s/it]

Error extracting text from https://reut.rs/3k2pAn1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/hopes-quake-survivors-dwindle-storm-lashes-haiti-2021-08-17/


Processing URLs:  98%|█████████▊| 975/1000 [36:11<00:30,  1.22s/it]

Error extracting text from http://nyti.ms/1Llm3q0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/15/world/middleeast/russian-military-uses-syria-as-proving-ground-and-west-takes-notice.html?smid=pl-share


Processing URLs:  98%|█████████▊| 976/1000 [36:11<00:22,  1.07it/s]

Error extracting text from https://www.nytimes.com/2018/01/19/us/politics/military-china-russia-terrorism-focus.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/19/us/politics/military-china-russia-terrorism-focus.html


Processing URLs:  98%|█████████▊| 980/1000 [36:18<00:28,  1.44s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-11/rand-meltdown-brings-south-african-recession-junk-rating-closer


Processing URLs:  99%|█████████▉| 988/1000 [36:27<00:13,  1.11s/it]

Error extracting text from https://www.senate.gov/artandhistory/history/common/briefing/Majority_Minority_Leaders.htm: 403 Client Error: Forbidden for url: https://www.senate.gov/artandhistory/history/common/briefing/Majority_Minority_Leaders.htm


Processing URLs:  99%|█████████▉| 991/1000 [36:30<00:08,  1.05it/s]

Error extracting text from http://www.who.int/emergencies/zika-virus/situation-report/5-february-2016/en/: 404 Client Error: Not Found for url: https://www.who.int/emergencies/zika-virus/situation-report/5-february-2016/en/


Processing URLs: 100%|█████████▉| 996/1000 [36:43<00:08,  2.14s/it]

Error extracting text from http://www.businessinsider.com/r-we-are-95-percent-of-the-way-to-a-coalition-deal-german-spd-says-2018-2?r=UK&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-we-are-95-percent-of-the-way-to-a-coalition-deal-german-spd-says-2018-2?r=UK&amp;IR=T


Processing URLs: 100%|█████████▉| 997/1000 [36:45<00:05,  1.96s/it]

Error extracting text from http://statisticstimes.com/economy/gdp-growth-of-india.php: 404 Client Error: Not Found for url: https://www.statisticstimes.com/economy/gdp-growth-of-india.php


Processing URLs: 100%|█████████▉| 998/1000 [36:45<00:02,  1.43s/it]

Error extracting text from http://www.cdm.me/english/czech-republic-to-ratify-montenegros-accession-in-the-fall: 403 Client Error: Forbidden for url: https://www.cdm.me/english/czech-republic-to-ratify-montenegros-accession-in-the-fall


Processing URLs: 100%|██████████| 1000/1000 [36:46<00:00,  2.21s/it]


Error extracting text from http://edmaps.com/html/syrian_civil_war_in_maps.html: 500 Server Error: Internal Server Error for url: http://edmaps.com/html/syrian_civil_war_in_maps.html
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-factbox-idUSKBN16K1ZK?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-factbox-idUSKBN16K1ZK?mod=related&amp;channelName=worldNews


Processing URLs:   0%|          | 1/1000 [00:01<26:19,  1.58s/it]

Error extracting text from http://www.newsweek.com/donald-trump-impeachments-odds-white-house-removal-staff-leaving-aides-611471: 403 Client Error: Forbidden for url: https://www.newsweek.com/donald-trump-impeachments-odds-white-house-removal-staff-leaving-aides-611471


Processing URLs:   0%|          | 3/1000 [00:06<34:17,  2.06s/it]

Error extracting text from https://esa.un.org/unpd/wpp/Graphs/Probabilistic/POP/TOT/: HTTPSConnectionPool(host='esa.un.org', port=443): Max retries exceeded with url: /unpd/wpp/Graphs/Probabilistic/POP/TOT/ (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   0%|          | 5/1000 [00:07<16:15,  1.02it/s]

Error extracting text from http://www.nytimes.com/2016/04/18/world/middleeast/civilian-casualties-in-afghan-war-are-unabated-in-2016.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/18/world/middleeast/civilian-casualties-in-afghan-war-are-unabated-in-2016.html


Processing URLs:   1%|          | 9/1000 [00:12<16:19,  1.01it/s]

Error extracting text from http://thehill.com/policy/defense/359709-defense-bill-would-not-limit-extension-of-arms-treaty-with-russia: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/359709-defense-bill-would-not-limit-extension-of-arms-treaty-with-russia/


Processing URLs:   2%|▏         | 16/1000 [00:27<33:34,  2.05s/it]  

Error extracting text from http://www.france24.com/en/20160116-venezuela-economic-emergency-currency-woes-opposition-oil: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160116-venezuela-economic-emergency-currency-woes-opposition-oil


Processing URLs:   2%|▏         | 17/1000 [00:28<30:56,  1.89s/it]

Error extracting text from http://uk.reuters.com/investigates/special-report/yemen-saudi-blockade: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   2%|▏         | 19/1000 [00:29<18:38,  1.14s/it]

Error extracting text from http://www.internationaltradecomplianceupdate.com/blog.aspx?topic=158: 403 Client Error: Forbidden for url: http://www.internationaltradecomplianceupdate.com/blog.aspx?topic=158


Processing URLs:   2%|▏         | 24/1000 [00:42<34:40,  2.13s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html


Processing URLs:   3%|▎         | 26/1000 [00:45<27:34,  1.70s/it]

Error extracting text from http://www.ibtimes.com/opec-says-growing-oil-demand-slower-us-production-2016-will-balance-global-oil-market-2005654: 403 Client Error: Forbidden for url: https://www.ibtimes.com/opec-says-growing-oil-demand-slower-us-production-2016-will-balance-global-oil-market-2005654


Processing URLs:   3%|▎         | 29/1000 [00:49<22:21,  1.38s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/big-protests-brazil-put-pressure-president-37624207: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/big-protests-brazil-put-pressure-president-37624207


Processing URLs:   3%|▎         | 32/1000 [00:52<18:28,  1.14s/it]

Error extracting text from https://voyage.auto/: HTTPSConnectionPool(host='voyage.auto', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:   3%|▎         | 33/1000 [00:54<20:36,  1.28s/it]

Error extracting text from https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/pfizer-biontech-covid-19-vaccine: 404 Client Error: Not Found for url: https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/pfizer-biontech-covid-19-vaccine


Processing URLs:   3%|▎         | 34/1000 [01:54<5:04:24, 18.91s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/world/article45269595.html?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2ASituation%20Report: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   4%|▍         | 38/1000 [01:58<1:24:59,  5.30s/it]

Error extracting text from http://english.ahram.org.eg/NewsContent/1/64/285438/Egypt/Politics-/Egypt,-Russia-sign-protocol-to-resume-MoscowCairo-.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/1/64/285438/Egypt/Politics-/Egypt,-Russia-sign-protocol-to-resume-MoscowCairo-.aspx


Processing URLs:   4%|▍         | 40/1000 [02:00<47:39,  2.98s/it]  

Error extracting text from http://www.unhcr.org/569ca19c6.html: 403 Client Error: Forbidden for url: http://www.unhcr.org/569ca19c6.html


Processing URLs:   4%|▍         | 42/1000 [02:03<33:15,  2.08s/it]

Error extracting text from http://ajw.asahi.com/article/behind_news/politics/AJ201601130032: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/behind_news/politics/AJ201601130032 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301ab56d0>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▍         | 48/1000 [02:12<28:32,  1.80s/it]

Error extracting text from http://allafrica.com/stories/201707110217.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201707110217.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303ecad50>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:   6%|▌         | 56/1000 [02:29<30:22,  1.93s/it]  

Error extracting text from http://www.focus-fen.net/news/2016/03/16/400693/montenegro-may-become-nato-member-by-mid-2017-pm.html: HTTPConnectionPool(host='www.focus-fen.net', port=80): Max retries exceeded with url: /news/2016/03/16/400693/montenegro-may-become-nato-member-by-mid-2017-pm.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303ecb6b0>: Failed to resolve 'www.focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/AP/status/718577336803803136
Error extracting text from https://www.reuters.com/article/us-germany-politics-poll/two-thirds-of-spd-supporters-back-german-grand-coalition-poll-idUSKCN1G0007: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-poll/two-thirds-of-spd-supporters-back-german-grand-coalition-poll-idUSKCN1G0007
URL filtered: http://www.bloomberg.com/news/articles/2015-10-29/watch-for-these-six-things-ahead-of-the-fed-s-december

Processing URLs:   6%|▌         | 59/1000 [02:35<28:17,  1.80s/it]

Error extracting text from http://www.nasdaq.com/article/oil-prices-move-lower-opec-production-quota-eyed-20151126-00224#ixzz3slroNsiS: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/oil-prices-move-lower-opec-production-quota-eyed-20151126-00224#ixzz3slroNsiS


Processing URLs:   6%|▌         | 62/1000 [02:39<20:48,  1.33s/it]

Error extracting text from http://www.enca.com/africa/au-announces-high-level-delegation-burundi: 404 Client Error: Not Found for url: https://www.enca.com/africa/au-announces-high-level-delegation-burundi
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=es&amp;tl=en&amp;u=http%3A%2F%2Fwww.ntn24.com%2Fnoticia%2Fdiputado-opositor-jose-guerra-afirma-que-un-bcv-autonomo-es-garantia-de-baja-inflacion-89003&amp;sandbox=1: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=es&amp;tl=en&amp;u=http%3A%2F%2Fwww.ntn24.com%2Fnoticia%2Fdiputado-opositor-jose-guerra-afirma-que-un-bcv-autonomo-es-garantia-de-baja-inflacion-89003&amp;sandbox=1
URL filtered: http://www.bloomberg.com/news/articles/2015-10-10/rousseff-s-most-powerful-foe-suffers-blow-in-kickback-scandal


Processing URLs:   6%|▋         | 65/1000 [02:41<15:49,  1.02s/it]

Error extracting text from http://www.imdb.com/video/screenplay/vi1365771545: 403 Client Error: Forbidden for url: https://www.imdb.com/videoplayer/vi1365771545/
Error extracting text from http://stats.ml/: HTTPConnectionPool(host='stats.ml', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303529c70>: Failed to resolve 'stats.ml' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 67/1000 [02:42<11:40,  1.33it/s]

Error extracting text from https://larswericson.wordpress.com/2016/03/08/gitrep-7mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/08/gitrep-7mar16pm/


Processing URLs:   7%|▋         | 68/1000 [02:43<13:22,  1.16it/s]

URL filtered: https://www.rand.org/blog/2016/01/rand-kicks-off-2016-presidential-election-panel-survey.html?utm_source=linkedin.com&amp;utm_medium=rand_social


Processing URLs:   8%|▊         | 75/1000 [02:57<22:17,  1.45s/it]

Error extracting text from https://admm.asean.org/index.php/2012-12-05-19-08-20/admm-plus-maritime-security-community-information-sharing-portal/2-uncategorised.html: 404 Client Error: Component not found for url: https://admm.asean.org/index.php/2012-12-05-19-08-20/admm-plus-maritime-security-community-information-sharing-portal/2-uncategorised.html


Processing URLs:   8%|▊         | 83/1000 [03:10<18:03,  1.18s/it]

Error extracting text from http://www.crisisgroup.org/en/publication-type/crisiswatch/crisiswatch-database.aspx?CountryIDs={5875B08E-4CDC-4879-9179-17AAE2724AEB: 404 Client Error: Not Found for url: https://www.crisisgroup.org/en/publication-type/crisiswatch/crisiswatch-database.aspx?CountryIDs=%7B5875B08E-4CDC-4879-9179-17AAE2724AEB
URL filtered: https://www.bloomberg.com/news/articles/2016-12-22/what-we-learned-from-interviews-with-four-top-saudi-officials
Error extracting text from http://www.oddschecker.com/politics/british-politics/scottish-politics/scotland-to-vote-for-independence-by-end-of-2024: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/british-politics/scottish-politics/scotland-to-vote-for-independence-by-end-of-2024


Processing URLs:   8%|▊         | 84/1000 [03:11<14:27,  1.06it/s]

Error extracting text from http://www.nasdaq.com/symbol/wfm/earnings-growth: 403 Client Error: Forbidden for url: http://www.nasdaq.com/symbol/wfm/earnings-growth


Processing URLs:   9%|▊         | 86/1000 [03:12<13:24,  1.14it/s]

Error extracting text from http://www.reuters.com/article/2015/10/21/us-brazil-rousseff-idUSKCN0SF1NU20151021: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/21/us-brazil-rousseff-idUSKCN0SF1NU20151021


Processing URLs:   9%|▉         | 89/1000 [03:19<21:22,  1.41s/it]

Error extracting text from http://mobile.nytimes.com/2016/04/23/world/middleeast/russian-military-buildup-near-aleppo-threatens-truce-kerry-warns.html?_r=0&amp;referer=https://www.google.ch/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/04/23/world/middleeast/russian-military-buildup-near-aleppo-threatens-truce-kerry-warns.html?_r=0&amp;referer=https://www.google.ch/


Processing URLs:   9%|▉         | 90/1000 [03:20<21:16,  1.40s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-06-21/trump-russia-and-those-shadowy-sater-deals-at-bayrock


Processing URLs:   9%|▉         | 92/1000 [03:21<16:51,  1.11s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/myanmar-s-army-picks-its/2437890.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/myanmar-s-army-picks-its/2437890.html


Processing URLs:   9%|▉         | 94/1000 [04:25<3:57:25, 15.72s/it]

Error extracting text from http://ensemble.va.com.au/tableau/suzy/TT_ResearchProjects/Hexen2039/PsyO/mkultra.html: HTTPConnectionPool(host='ensemble.va.com.au', port=80): Max retries exceeded with url: /tableau/suzy/TT_ResearchProjects/Hexen2039/PsyO/mkultra.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3051d8350>, 'Connection to ensemble.va.com.au timed out. (connect timeout=60)'))


Processing URLs:  10%|▉         | 97/1000 [05:33<6:07:32, 24.42s/it]

Error extracting text from https://www.leftseat.com/amegindex.htm: HTTPSConnectionPool(host='www.leftseat.com', port=443): Max retries exceeded with url: /amegindex.htm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3051da720>, 'Connection to www.leftseat.com timed out. (connect timeout=60)'))


Processing URLs:  10%|█         | 100/1000 [05:36<2:20:00,  9.33s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-republicans-idUSKBN15A2C4?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-republicans-idUSKBN15A2C4?il=0


Processing URLs:  11%|█         | 107/1000 [05:54<1:08:08,  4.58s/it]

URL filtered: https://www.youtube.com/watch?v=BUdw3Nryyu0


Processing URLs:  11%|█         | 109/1000 [05:54<37:34,  2.53s/it]  

Error extracting text from http://www.wsj.com/articles/a-pacific-admiral-takes-chinas-measure-1470436129: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-pacific-admiral-takes-chinas-measure-1470436129


Processing URLs:  11%|█         | 110/1000 [05:56<34:13,  2.31s/it]

Error extracting text from http://www.ibtimes.com/russia-calls-natos-europe-military-buildup-absolutely-unjustified-ahead-first-meeting-2354657: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russia-calls-natos-europe-military-buildup-absolutely-unjustified-ahead-first-meeting-2354657


Processing URLs:  11%|█         | 112/1000 [05:58<26:41,  1.80s/it]

Error extracting text from http://predictwise.com/politics/2016-president-republican-vice-president-nomination: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016-president-republican-vice-president-nomination


Processing URLs:  11%|█▏        | 113/1000 [05:58<20:41,  1.40s/it]

Error extracting text from http://fpa.org/: 403 Client Error: Forbidden for url: https://fpa.org/


Processing URLs:  12%|█▏        | 117/1000 [06:02<13:56,  1.06it/s]

Error extracting text from http://thehill.com/policy/technology/358117-tech-companies-grilled-over-russian-election-interference: 403 Client Error: Forbidden for url: https://thehill.com/policy/technology/358117-tech-companies-grilled-over-russian-election-interference/


Processing URLs:  12%|█▏        | 121/1000 [06:09<22:59,  1.57s/it]

Error extracting text from http://www.just-auto.com/news/china-auto-sales-up-261-in-september_id172708.aspx: 404 Client Error: Not Found for url: https://www.just-auto.com/news/china-auto-sales-up-261-in-september_id172708.aspx


Processing URLs:  12%|█▏        | 122/1000 [06:09<17:27,  1.19s/it]

Error extracting text from https://www.chathamhouse.org/expert/comment/why-brexit-looms-large#sthash.cwQKnHTn.dpuf: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/expert/comment/why-brexit-looms-large#sthash.cwQKnHTn.dpuf


Processing URLs:  12%|█▎        | 125/1000 [06:12<12:40,  1.15it/s]

URL filtered: https://www.youtube.com/watch?v=-Pa5nqYXEnY
Error extracting text from http://www.nytimes.com/2016/01/11/world/middleeast/neglect-may-do-what-isis-didnt-breach-iraqi-dam.html?emc=edit_th_20160111&amp;nl=todaysheadlines&amp;nlid=45205797: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/11/world/middleeast/neglect-may-do-what-isis-didnt-breach-iraqi-dam.html?emc=edit_th_20160111&amp;nl=todaysheadlines&amp;nlid=45205797


Processing URLs:  13%|█▎        | 127/1000 [06:17<23:45,  1.63s/it]

Error extracting text from http://www-pub.iaea.org/books/IAEABooks/Publications_on_Accident_Response: 500 Server Error: Internal Server Error for url: https://www-pub.iaea.org/books/ErrorPage.aspx?aspxerrorpath=/books/IAEABooks/Publications_on_Accident_Response


Processing URLs:  13%|█▎        | 129/1000 [06:22<29:27,  2.03s/it]

Error extracting text from https://www.deutschlandfunkkultur.de/2-5-milliarden-fuer-corona-kulturfonds.265.de.html?drn:news_id=1244395: 404 Client Error: Not Found for url: https://www.deutschlandfunkkultur.de/2-5-milliarden-fuer-corona-kulturfonds.265.de.html?drn:news_id=1244395


Processing URLs:  13%|█▎        | 132/1000 [06:26<21:30,  1.49s/it]

Error extracting text from http://www.nytimes.com/2015/11/08/world/asia/xi-jinping-china-south-china-sea-singapore.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/08/world/asia/xi-jinping-china-south-china-sea-singapore.html


Processing URLs:  14%|█▎        | 137/1000 [06:39<46:30,  3.23s/it]

Error extracting text from http://www.dod.mil/dodgc/defense_ethics/resource_library/summary_emoluments_clause_restrictions.pdf: 404 Client Error: Not Found for url: https://www.dod.mil/dodgc/defense_ethics/resource_library/summary_emoluments_clause_restrictions.pdf


Processing URLs:  14%|█▍        | 138/1000 [06:40<35:30,  2.47s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/north-korea-using-chinas-satellites-guide-its-missiles-20810: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/north-korea-using-chinas-satellites-guide-its-missiles-20810
URL filtered: https://twitter.com/SenatePPG/status/1468767413428699136


Processing URLs:  14%|█▍        | 142/1000 [06:50<30:53,  2.16s/it]

Error extracting text from https://seekingalpha.com/news/3699280-us-moves-to-double-tariffs-on-canadian-softwood-lumber: 403 Client Error: Forbidden for url: https://seekingalpha.com/news/3699280-us-moves-to-double-tariffs-on-canadian-softwood-lumber


Processing URLs:  14%|█▍        | 143/1000 [06:51<26:11,  1.83s/it]

Error extracting text from http://99iowapastors.com: HTTPConnectionPool(host='99iowapastors.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301cd0530>: Failed to resolve '99iowapastors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 151/1000 [07:01<16:35,  1.17s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/355575-russian-internet-trolls-were-required-to-watch-house-of-cards: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/355575-russian-internet-trolls-were-required-to-watch-house-of-cards/


Processing URLs:  15%|█▌        | 153/1000 [07:12<41:01,  2.91s/it]

Error extracting text from https://au.news.yahoo.com/world/a/31762734/favorite-fujimori-loses-ground-before-peru-election-polls/: 404 Client Error: Not Found for url: https://au.news.yahoo.com/favorite-fujimori-loses-ground-before-peru-election-polls-31762734.html


Processing URLs:  16%|█▌        | 156/1000 [07:17<28:59,  2.06s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/282503-nfl-account-tweets-erroneous-news-after-apparent-breach: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/282503-nfl-account-tweets-erroneous-news-after-apparent-breach/


Processing URLs:  16%|█▌        | 157/1000 [07:17<21:24,  1.52s/it]

Error extracting text from https://www.nytimes.com/2017/01/27/world/europe/russia-hacking-us-election.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/27/world/europe/russia-hacking-us-election.html?_r=0


Processing URLs:  16%|█▌        | 158/1000 [07:19<23:59,  1.71s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-04-05/greece-nears-compromise-accord-on-pensions-taxes-with-creditors


Processing URLs:  16%|█▌        | 160/1000 [07:20<14:44,  1.05s/it]

Error extracting text from http://finmin.nic.in/reports/govt_debt_status_paper_2016.pdf: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
URL filtered: https://www.bloomberg.com/news/articles/2021-02-04/gamestop-mania-stirs-new-ire-for-social-media-shield-trump-hates


Processing URLs:  16%|█▋        | 164/1000 [07:26<22:02,  1.58s/it]

URL filtered: https://www.forbes.com/sites/thomasbrewster/2016/10/23/massive-ddos-iot-botnet-for-hire-twitter-dyn-amazon/
URL filtered: https://www.facebook.com/Big-Fizzo-Fariouz-1586977778223069/
Error extracting text from https://www.reuters.com/article/us-afghanistan-governor/pence-urges-peaceful-end-to-standoff-over-afghan-governor-idUSKBN1F61AE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-governor/pence-urges-peaceful-end-to-standoff-over-afghan-governor-idUSKBN1F61AE


Processing URLs:  17%|█▋        | 169/1000 [07:28<12:24,  1.12it/s]

Error extracting text from http://existentialcomics.com/comic/: 404 Client Error: NOT FOUND for url: https://existentialcomics.com/comic/


Processing URLs:  17%|█▋        | 173/1000 [07:34<17:02,  1.24s/it]

Error extracting text from http://www.reuters.com/article/2015/10/25/us-mideast-crisis-syria-election-idUSKCN0SJ05R20151025: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/25/us-mideast-crisis-syria-election-idUSKCN0SJ05R20151025


Processing URLs:  18%|█▊        | 175/1000 [07:35<14:00,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/paris-tragedy-and-turkish-crisis-cast-fresh-light-on-nato-role-1448463452: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/paris-tragedy-and-turkish-crisis-cast-fresh-light-on-nato-role-1448463452


Processing URLs:  18%|█▊        | 179/1000 [07:49<40:15,  2.94s/it]

Error extracting text from http://tass.ru/en/world/844436: 404 Client Error: Not Found for url: https://tass.ru/en/world/844436


Processing URLs:  18%|█▊        | 183/1000 [08:03<36:27,  2.68s/it]  

Error extracting text from https://www.confidencial.com.ni/politica/reunion-entre-cxl-y-prd-continua-en-el-limbo/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/politica/reunion-entre-cxl-y-prd-continua-en-el-limbo/


Processing URLs:  19%|█▉        | 188/1000 [08:11<22:17,  1.65s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-01-27/share-of-americans-who-want-covid-vaccine-growing-poll-finds?sref=x7nYEkiY


Processing URLs:  19%|█▉        | 191/1000 [08:12<10:45,  1.25it/s]

Error extracting text from http://www.reuters.com/article/2015/11/02/us-usa-economy-atlantafed-idUSKCN0SR25A20151102: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/02/us-usa-economy-atlantafed-idUSKCN0SR25A20151102
URL filtered: http://www.dailymail.co.uk/sciencetech/article-3758850/CIA-reveals-Spacenet-AI-sky-constantly-monitor-activity-Earth-high-resolution-satellites.html?utm_content=buffer3c242&amp;utm_medium=social&amp;utm_source=linkedin.com&amp;utm_campaign=buffer


Processing URLs:  19%|█▉        | 193/1000 [08:13<07:44,  1.74it/s]

Error extracting text from http://thehill.com/policy/energy-environment/253665-dem-senator-renewables-could-be-key-to-lifting-oil-export-ban: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/253665-dem-senator-renewables-could-be-key-to-lifting-oil-export-ban/


Processing URLs:  20%|█▉        | 195/1000 [08:15<10:07,  1.32it/s]

URL filtered: https://www.youtube.com/watch?v=NZC9Ud9WtY4


Processing URLs:  20%|█▉        | 199/1000 [09:17<3:26:03, 15.43s/it]

Error extracting text from http://www.usnews.com/news/articles/2014/11/17/russia-turkey-inch-toward-improved-relations: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://twitter.com/UK_FoRBEnvoy/status/1418133926665674753


Processing URLs:  20%|██        | 201/1000 [09:18<2:02:41,  9.21s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics-spd/german-spd-leader-dampens-hopes-for-quick-coalition-deal-source-idUSKBN1FI1OQ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-spd/german-spd-leader-dampens-hopes-for-quick-coalition-deal-source-idUSKBN1FI1OQ?il=0


Processing URLs:  21%|██        | 208/1000 [09:35<47:16,  3.58s/it]  

Error extracting text from http://www.mariettatimes.com/page/content.detail/id/1086313/The-Latest--Hungary-endorses-referendum-on-EU-migrant-quotas.html?isap=1&amp;nav=5021: 404 Client Error: Not Found for url: https://www.mariettatimes.com/page/content.detail/id/1086313/The-Latest--Hungary-endorses-referendum-on-EU-migrant-quotas.html/?isap=1&amp;nav=5021


Processing URLs:  21%|██        | 211/1000 [09:41<29:03,  2.21s/it]

Error extracting text from https://ahvalnews.com/turkish-lira/turkish-lira-weakens-ahead-central-bank-meeting-interest-rates: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkish-lira/turkish-lira-weakens-ahead-central-bank-meeting-interest-rates


Processing URLs:  21%|██▏       | 213/1000 [09:41<16:13,  1.24s/it]

Error extracting text from http://www.nytimes.com/2016/04/07/us/politics/homeland-security-dept-struggles-to-hire-staff-to-combat-cyberattacks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/07/us/politics/homeland-security-dept-struggles-to-hire-staff-to-combat-cyberattacks.html?_r=0


Processing URLs:  22%|██▏       | 218/1000 [10:00<45:11,  3.47s/it]

Error extracting text from http://www.paddypower.com/bet/politics/other-politics/us-politics?ev_oc_grp_ids=2222350: HTTPConnectionPool(host='www.paddypower.com', port=80): Max retries exceeded with url: /bet/politics/other-politics/us-politics?ev_oc_grp_ids=2222350 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe654980>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  22%|██▏       | 220/1000 [10:01<27:37,  2.13s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-13/south-africa-s-zuma-appoints-pravin-gordhan-as-finance-minister


Processing URLs:  22%|██▏       | 224/1000 [10:05<17:05,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-britain-election-talks-idUSKBN18P22H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-talks-idUSKBN18P22H


Processing URLs:  23%|██▎       | 233/1000 [12:24<3:17:03, 15.42s/it]

Error extracting text from https://reportuk.org/2016/07/28/sturgeon-scotland-poll-blow-90-per-cent-say-independence-referendum-not-priority/: 404 Client Error: Not Found for url: https://www.reportuk.org/2016/07/28/sturgeon-scotland-poll-blow-90-per-cent-say-independence-referendum-not-priority/


Processing URLs:  23%|██▎       | 234/1000 [12:25<2:30:27, 11.78s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/ct-syria-militias-us-cia-islamic-state-20160326-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/ct-syria-militias-us-cia-islamic-state-20160326-story.html


Processing URLs:  24%|██▎       | 235/1000 [12:26<1:56:59,  9.18s/it]

Error extracting text from https://forums.teslamotors.com/forum/forums/how-many-cars-sold-2014-and-2015: HTTPSConnectionPool(host='forums.teslamotors.com', port=443): Max retries exceeded with url: /forum/forums/how-many-cars-sold-2014-and-2015 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffd5dac0>: Failed to resolve 'forums.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  24%|██▍       | 238/1000 [12:28<54:57,  4.33s/it]  

Error extracting text from http://www.nytimes.com/2016/08/27/world/middleeast/syria-civil-war-why-get-worse.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/27/world/middleeast/syria-civil-war-why-get-worse.html?_r=0


Processing URLs:  24%|██▍       | 241/1000 [12:31<29:50,  2.36s/it]

Error extracting text from http://www.business-standard.com/article/international/nawaz-sharif-fails-to-reach-islamabad-due-to-health-issues-116071800723_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/international/nawaz-sharif-fails-to-reach-islamabad-due-to-health-issues-116071800723_1.html
URL filtered: https://www.bloomberg.com/news/articles/2017-02-16/how-ubs-wealth-sees-aramco-ipo-changing-middle-east-markets-q-a


Processing URLs:  24%|██▍       | 245/1000 [12:33<14:31,  1.15s/it]

Error extracting text from https://www.afghanistan-analysts.org/hekmatyars-return-to-kabul-background-reading-by-aan/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/hekmatyars-return-to-kabul-background-reading-by-aan/


Processing URLs:  25%|██▍       | 247/1000 [12:34<11:26,  1.10it/s]

Error extracting text from http://warontherocks.com/2016/02/asias-mediterranean-strategy-geopolitics-and-risk-in-the-seas-of-the-indo-pacific/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/02/asias-mediterranean-strategy-geopolitics-and-risk-in-the-seas-of-the-indo-pacific/


Processing URLs:  25%|██▍       | 249/1000 [12:36<09:39,  1.30it/s]

Error extracting text from https://www.investors.com/news/economy/5-reasons-why-trump-is-very-unlikely-to-pull-out-of-nafta/: 403 Client Error: Forbidden for url: https://www.investors.com/news/economy/5-reasons-why-trump-is-very-unlikely-to-pull-out-of-nafta/


Processing URLs:  25%|██▌       | 251/1000 [12:38<10:46,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-safrica-politics-anc-factbox/factbox-how-south-africas-anc-will-pick-zumas-successor-idUSKBN1D14M7?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics-anc-factbox/factbox-how-south-africas-anc-will-pick-zumas-successor-idUSKBN1D14M7?il=0


Processing URLs:  25%|██▌       | 252/1000 [12:38<09:28,  1.31it/s]

Error extracting text from http://warontherocks.com/2016/01/how-chinas-new-russian-air-defense-system-could-change-asia/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/01/how-chinas-new-russian-air-defense-system-could-change-asia/


Processing URLs:  25%|██▌       | 253/1000 [12:39<09:57,  1.25it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-23/china-s-stock-index-futures-fall-as-csrc-restarts-ipo-process


Processing URLs:  26%|██▌       | 255/1000 [13:39<2:55:03, 14.10s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/article63112917.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  26%|██▌       | 257/1000 [13:53<2:12:14, 10.68s/it]

URL filtered: https://www.nytimes.com/2017/10/09/technology/russia-election-facebook-ads-rage.html


Processing URLs:  26%|██▌       | 260/1000 [13:54<1:00:53,  4.94s/it]

Error extracting text from http://evobsession.com/german-ev-sales-split-february-2016/: 403 Client Error: Forbidden for url: http://evobsession.com/german-ev-sales-split-february-2016/


Processing URLs:  26%|██▋       | 263/1000 [14:12<1:15:46,  6.17s/it]

Error extracting text from https://www.reuters.com/article/us-iran-military-usa/iran-says-warplanes-warned-off-two-western-vessels-during-drill-idUSKBN1FB0VM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-military-usa/iran-says-warplanes-warned-off-two-western-vessels-during-drill-idUSKBN1FB0VM
URL filtered: https://inews.co.uk/news/health/prosecute-ministers-acted-gross-negligence-pandemic-peoples-covid-inquiry-1091046?ito=twitter_share_article-top


Processing URLs:  27%|██▋       | 267/1000 [14:15<34:33,  2.83s/it]  

URL filtered: https://www.youtube.com/watch?v=rge17TciHfU


Processing URLs:  27%|██▋       | 271/1000 [14:23<27:11,  2.24s/it]

Error extracting text from https://www.techdirt.com/articles/20160204/09411333520/top-german-judges-tear-to-shreds-eus-proposed-tafta-ttip-investment-court-system.shtml: 403 Client Error: Forbidden for url: https://www.techdirt.com/articles/20160204/09411333520/top-german-judges-tear-to-shreds-eus-proposed-tafta-ttip-investment-court-system.shtml


Processing URLs:  27%|██▋       | 274/1000 [14:24<12:57,  1.07s/it]

Error extracting text from http://www.reuters.com/article/venezuela-economy-idUSL2N15V0LR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-economy-idUSL2N15V0LR
Error extracting text from http://www.wsj.com/articles/judge-denies-developers-request-to-force-approval-of-dakota-access-pipelines-final-stage-1481319184: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/judge-denies-developers-request-to-force-approval-of-dakota-access-pipelines-final-stage-1481319184
Error extracting text from https://www.reuters.com/world/us/us-senators-propose-adding-boycott-chinas-winter-olympics-defense-bill-2021-10-28/,: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-senators-propose-adding-boycott-chinas-winter-olympics-defense-bill-2021-10-28/,


Processing URLs:  28%|██▊       | 278/1000 [14:27<09:38,  1.25it/s]

Error extracting text from https://www.japantimes.co.jp/news/2021/05/17/national/tokyo-olympics-cancel-survey/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2021/05/17/national/tokyo-olympics-cancel-survey/
Error extracting text from http://www.whorunsgov.com/u-s-seeks-pope-francis-help/01273: 403 Client Error: Forbidden for url: http://www.whorunsgov.com/u-s-seeks-pope-francis-help/01273


Processing URLs:  28%|██▊       | 279/1000 [14:28<10:59,  1.09it/s]

Error extracting text from http://www.ibtimes.co.uk/mysterious-hackers-are-trying-bring-down-entire-internet-by-ddos-ing-critical-servers-1532762: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/mysterious-hackers-are-trying-bring-down-entire-internet-by-ddos-ing-critical-servers-1532762


Processing URLs:  28%|██▊       | 281/1000 [14:30<09:49,  1.22it/s]

Error extracting text from https://thehill.com/policy/healthcare/public-global-health/560477-last-foreign-scientist-to-work-at-wuhan-lab-what: 403 Client Error: Forbidden for url: https://thehill.com/policy/healthcare/public-global-health/560477-last-foreign-scientist-to-work-at-wuhan-lab-what/


Processing URLs:  29%|██▊       | 286/1000 [14:41<19:58,  1.68s/it]

Error extracting text from http://tass.ru/en/world/847336: 404 Client Error: Not Found for url: https://tass.ru/en/world/847336


Processing URLs:  29%|██▊       | 287/1000 [14:42<16:11,  1.36s/it]

Error extracting text from http://globalriskinsights.com/2016/10/battle-for-mosul-what-lies-ahead/***: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/10/battle-for-mosul-what-lies-ahead/***


Processing URLs:  29%|██▉       | 289/1000 [14:43<10:51,  1.09it/s]

Error extracting text from https://www.axios.com/biden-putin-call-navalny-0ac7c7e7-1f75-42be-8e8c-1d9386156efb.html: 403 Client Error: Forbidden for url: https://www.axios.com/biden-putin-call-navalny-0ac7c7e7-1f75-42be-8e8c-1d9386156efb.html


Processing URLs:  29%|██▉       | 292/1000 [14:46<12:14,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/trump-to-take-executive-actions-on-immigration-1485302909?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-to-take-executive-actions-on-immigration-1485302909?mod=e2fb


Processing URLs:  30%|██▉       | 296/1000 [14:55<24:29,  2.09s/it]

Error extracting text from http://www.ansamed.info/ansamed/en/news/sections/politics/2015/10/29/israel-cautions-italy-about-rouhani-visit-to-rome_2a9a16e7-bef6-4b54-b6de-b226a66f48b6.html: 404 Client Error: Not Found for url: https://www.ansa.it/ansamed/en/news/sections/politics/2015/10/29/israel-cautions-italy-about-rouhani-visit-to-rome_2a9a16e7-bef6-4b54-b6de-b226a66f48b6.html


Processing URLs:  30%|██▉       | 297/1000 [14:55<17:58,  1.53s/it]

Error extracting text from https://gma.yahoo.com/punishment-alleged-russian-hacking-expected-announced-today-153305821--abc-news-topstories.html: 404 Client Error: Not Found for url: https://www.yahoo.com/gma/punishment-alleged-russian-hacking-expected-announced-today-153305821--abc-news-topstories.html


Processing URLs:  30%|███       | 300/1000 [14:56<09:10,  1.27it/s]

Error extracting text from https://www.nytimes.com/2017/04/13/climate/el-nino-climate-change.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/13/climate/el-nino-climate-change.html


Processing URLs:  30%|███       | 303/1000 [15:00<11:55,  1.03s/it]

Error extracting text from http://www.ibtimes.co.uk/g7-summit-brexit-risk-global-growth-1562281?awt_l=NJT0v&amp;awt_m=irMwNOXK7WbOvKU&amp;utm_source=email&amp;utm_medium=email&amp;utm_campaign=rss: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/g7-summit-brexit-risk-global-growth-1562281?awt_l=NJT0v&amp;awt_m=irMwNOXK7WbOvKU&amp;utm_source=email&amp;utm_medium=email&amp;utm_campaign=rss
URL filtered: https://www.youtube.com/watch?v=CDJQjnbU-i8&amp;list=PLyxXLecVwrA3YRvhgntRKHwt0NNRVxWQY&amp;index=32


Processing URLs:  31%|███       | 308/1000 [15:07<17:40,  1.53s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-30/philippines-to-join-asian-infrastructure-investment-bank-iis90uzq


Processing URLs:  31%|███       | 310/1000 [15:07<11:29,  1.00it/s]

Error extracting text from https://www.amazon.com/Classical-Electrodynamics-Third-David-Jackson/dp/047130932X: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Classical-Electrodynamics-Third-David-Jackson/dp/047130932X


Processing URLs:  31%|███▏      | 313/1000 [15:12<13:43,  1.20s/it]

Error extracting text from http://www.londonstockexchange.com/home/guide-to-listing.pdf: 404 Client Error: Not Found for url: https://www.londonstockexchange.com/home/guide-to-listing.pdf


Processing URLs:  32%|███▏      | 318/1000 [15:17<13:41,  1.20s/it]

URL filtered: https://www.youtube.com/watch?v=aARaYjgm_rA


Processing URLs:  32%|███▏      | 321/1000 [15:28<26:47,  2.37s/it]

Error extracting text from http://www.gov.me/en/News/164774/Turkey-ratifies-Protocol-on-Montenegro-s-Accession-to-NATO.html: 404 Client Error: not found for url: https://www.gov.me/en/News/164774/Turkey-ratifies-Protocol-on-Montenegro-s-Accession-to-NATO.html


Processing URLs:  32%|███▏      | 323/1000 [15:29<16:21,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-trump-usa-tax-trade-idUSKBN15G5EH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trump-usa-tax-trade-idUSKBN15G5EH


Processing URLs:  32%|███▎      | 325/1000 [15:31<12:41,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0X30GY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0X30GY


Processing URLs:  33%|███▎      | 327/1000 [15:31<08:40,  1.29it/s]

Error extracting text from https://www.wsj.com/articles/irs-bank-reporting-democrats-11634658560: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/irs-bank-reporting-democrats-11634658560


Processing URLs:  33%|███▎      | 329/1000 [16:33<2:25:18, 12.99s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-07-18/venezuela-rejects-trump-sanctions-threat-reviews-relations: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://www.bls.gov/news.release/empsit.nr0.htm: 403 Client Error: Forbidden for url: https://www.bls.gov/news.release/empsit.nr0.htm


Processing URLs:  33%|███▎      | 331/1000 [16:35<1:16:47,  6.89s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-wisconsin-senate-johnson-vs-feingold: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-wisconsin-senate-johnson-vs-feingold


Processing URLs:  33%|███▎      | 333/1000 [16:37<46:37,  4.19s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-09-29/markets-vs-economists-who-s-right-on-fed-interest-rate-timing-


Processing URLs:  34%|███▍      | 341/1000 [16:47<17:41,  1.61s/it]

Error extracting text from https://warisboring.com/the-u-s-intelligence-community-is-dangerously-unimaginative-5869cf509c0a: 403 Client Error: Forbidden for url: https://warisboring.com/the-u-s-intelligence-community-is-dangerously-unimaginative-5869cf509c0a


Processing URLs:  34%|███▍      | 344/1000 [16:50<12:03,  1.10s/it]

Error extracting text from http://www.nytimes.com/2015/09/19/world/europe/us-to-begin-military-talks-with-russia-on-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/19/world/europe/us-to-begin-military-talks-with-russia-on-syria.html


Processing URLs:  35%|███▌      | 350/1000 [17:29<33:34,  3.10s/it]  

Error extracting text from https://prod-static-ngop-pbl.s3.amazonaws.com/media/documents/DRAFT_12_FINAL[1]-ben_1468872234.pdf: 403 Client Error: Forbidden for url: https://prod-static-ngop-pbl.s3.amazonaws.com/media/documents/DRAFT_12_FINAL%5B1%5D-ben_1468872234.pdf
Error extracting text from https://www.barrons.com/articles/construction-on-the-nord-stream-2-pipeline-is-finished-the-conflicts-around-it-are-not-51631636197?siteid=yhoof2: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/construction-on-the-nord-stream-2-pipeline-is-finished-the-conflicts-around-it-are-not-51631636197?siteid=yhoof2


Processing URLs:  35%|███▌      | 354/1000 [17:37<24:52,  2.31s/it]

Error extracting text from http://www.oddschecker.com/politics/european-politics/german-politics/next-chancellor: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/european-politics/german-politics/next-chancellor


Processing URLs:  36%|███▌      | 356/1000 [17:41<21:19,  1.99s/it]

Error extracting text from https://simpleflying.com/tsa-saw-3rd-busiest-day-since-covid-began/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  36%|███▌      | 359/1000 [17:44<13:06,  1.23s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-turkey/erdogan-says-turkey-will-crush-kurdish-militia-in-afrin-idUSKBN1F20G4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey/erdogan-says-turkey-will-crush-kurdish-militia-in-afrin-idUSKBN1F20G4


Processing URLs:  36%|███▌      | 361/1000 [17:47<12:58,  1.22s/it]

URL filtered: https://www.metaculus.com/questions/2796/what-will-be-the-daily-volume-of-facebooks-libra-by-october-1st-2020/#comment-15220


Processing URLs:  37%|███▋      | 372/1000 [17:58<08:46,  1.19it/s]

Error extracting text from http://thehill.com/homenews/administration/346315-special-counsel-mueller-in-talks-to-interview-administration: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/346315-special-counsel-mueller-in-talks-to-interview-administration/


Processing URLs:  38%|███▊      | 376/1000 [18:06<15:49,  1.52s/it]

Error extracting text from http://amti.csis.org/philippines-lopsided-south-china-sea-policy/: 403 Client Error: Forbidden for url: http://amti.csis.org/philippines-lopsided-south-china-sea-policy/


Processing URLs:  38%|███▊      | 378/1000 [18:10<17:03,  1.65s/it]

Error extracting text from https://www.tvnz.co.nz/one-news/new-zealand/new-zealand-slumps-120th-in-world-covid-19-vaccination-rates: 404 Client Error: Not Found for url: https://www.1news.co.nz/one-news/new-zealand/new-zealand-slumps-120th-in-world-covid-19-vaccination-rates/
Error extracting text from http://www.latimes.com/world/la-fg-russia-alexei-navalny-putin-election-20171226-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-russia-alexei-navalny-putin-election-20171226-story.html


Processing URLs:  38%|███▊      | 379/1000 [18:21<45:31,  4.40s/it]

Error extracting text from http://icasualties.org/Iraq/Nationality.aspx: 404 Client Error: Not Found for url: http://icasualties.org/Iraq/Nationality.aspx


Processing URLs:  38%|███▊      | 384/1000 [18:32<26:12,  2.55s/it]

Error extracting text from http://pollytix.eu/pollytix-german-election-trend/: 403 Client Error: Forbidden for url: https://pollytix.eu/pollytix-german-election-trend/


Processing URLs:  39%|███▊      | 386/1000 [18:36<21:59,  2.15s/it]

Error extracting text from https://www.thesun.co.uk/news/13271266/brexit-news-latest-deal-talks-deadline-uk-eu-boris-barnier-live/: 401 Client Error: Unauthorized for url: https://www.thesun.co.uk/news/13370720/brexit-news-latest-deal-barnier-uk-eu-boris-live
URL filtered: http://www.bloombergview.com/articles/2015-12-14/market-instability-won-t-deter-a-fed-rate-hike


Processing URLs:  39%|███▉      | 389/1000 [18:36<09:51,  1.03it/s]

Error extracting text from https://boingboing.net/2016/02/21/the-latest-dns-bug-is-terrifyi.html: 403 Client Error: Forbidden for url: https://boingboing.net/2016/02/21/the-latest-dns-bug-is-terrifyi.html


Processing URLs:  39%|███▉      | 391/1000 [18:39<10:51,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/russia-moves-artillery-to-northern-syria-u-s-officials-say-1461153190: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-moves-artillery-to-northern-syria-u-s-officials-say-1461153190


Processing URLs:  39%|███▉      | 393/1000 [18:51<38:46,  3.83s/it]

Error extracting text from http://blogs.wsj.com/digits/2012/12/14/google-hires-famed-futurist-ray-kurzweil: 403 Client Error: Forbidden for url: http://blogs.wsj.com/digits/2012/12/14/google-hires-famed-futurist-ray-kurzweil


Processing URLs:  40%|███▉      | 398/1000 [19:05<34:07,  3.40s/it]

Error extracting text from http://caracaschronicles.com/2015/11/30/where-6d-will-be-won/: 403 Client Error: Forbidden for url: http://caracaschronicles.com/2015/11/30/where-6d-will-be-won/


Processing URLs:  40%|████      | 400/1000 [19:06<20:39,  2.07s/it]

Error extracting text from http://www.reuters.com/article/latam-emergingmarkets-idUSL2N16B11R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/latam-emergingmarkets-idUSL2N16B11R


Processing URLs:  40%|████      | 402/1000 [19:10<20:59,  2.11s/it]

Error extracting text from http://theweatherspace.com/2015/10/19/161358-aides-say-biden-is-close-to-making-a-decision-after-months/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  40%|████      | 404/1000 [19:13<16:09,  1.63s/it]

URL filtered: https://www.youtube.com/watch?v=-_kjR487TwI
Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BR1HE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BR1HE


Processing URLs:  41%|████      | 408/1000 [19:14<07:12,  1.37it/s]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-asia-idUSKBN16A0U8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-asia-idUSKBN16A0U8


Processing URLs:  41%|████      | 409/1000 [19:17<13:10,  1.34s/it]

Error extracting text from http://www.un.org/undpa/Speeches-statements/14112015/syria: 403 Client Error: Forbidden for url: https://www.un.org/undpa/Speeches-statements/14112015/syria


Processing URLs:  41%|████      | 410/1000 [19:17<10:29,  1.07s/it]

Error extracting text from https://www.nytimes.com/2019/11/01/technology/tiktok-national-security-review.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2019/11/01/technology/tiktok-national-security-review.html


Processing URLs:  41%|████▏     | 413/1000 [19:21<11:04,  1.13s/it]

Error extracting text from http://www.un.org/pga/70/sg/: 403 Client Error: Forbidden for url: https://www.un.org/pga/70/sg/


Processing URLs:  42%|████▏     | 416/1000 [19:23<08:26,  1.15it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics-nahles/germanys-spd-bets-on-first-female-chair-in-154-years-to-revive-fortunes-idUSKBN1FS2HV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-nahles/germanys-spd-bets-on-first-female-chair-in-154-years-to-revive-fortunes-idUSKBN1FS2HV


Processing URLs:  42%|████▏     | 417/1000 [19:24<08:31,  1.14it/s]

URL filtered: https://twitter.com/DefenceHQ


Processing URLs:  42%|████▏     | 420/1000 [19:27<08:40,  1.11it/s]

Error extracting text from http://www.iran-bn.com/2016/10/19/iran-producing-nearly-4-5m-bpd-of-oil/: HTTPConnectionPool(host='www.iran-bn.com', port=80): Max retries exceeded with url: /2016/10/19/iran-producing-nearly-4-5m-bpd-of-oil/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3049205c0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  42%|████▏     | 421/1000 [19:29<10:24,  1.08s/it]

URL filtered: https://twitter.com/SuspendThePres


Processing URLs:  42%|████▏     | 424/1000 [19:46<44:02,  4.59s/it]

Error extracting text from http://www.ew.com/article/2015/07/01/terminator-genisys-building-young-arnold: 406 Client Error: Not Acceptable for url: https://www.ew.com/article/2015/07/01/terminator-genisys-building-young-arnold


Processing URLs:  43%|████▎     | 434/1000 [20:06<20:17,  2.15s/it]

URL filtered: https://twitter.com/britainelects/status/722101052007452673


Processing URLs:  44%|████▎     | 436/1000 [20:07<13:50,  1.47s/it]

URL filtered: https://www.washingtonpost.com/technology/2021/01/16/how-twitter-banned-trump/


Processing URLs:  44%|████▍     | 439/1000 [20:14<17:33,  1.88s/it]

Error extracting text from http://uk.reuters.com/article/us-europe-usa-trade-idUKKCN0XI0AT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  44%|████▍     | 443/1000 [20:22<17:53,  1.93s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-25/if-china-killed-commodities-super-cycle-fed-is-about-to-bury-it


Processing URLs:  45%|████▍     | 446/1000 [20:25<13:13,  1.43s/it]

Error extracting text from http://data.unhcr.org/syrianrefugees/regional.php: 404 Client Error: Not Found for url: https://data.unhcr.org:443/syrianrefugees/regional.php


Processing URLs:  45%|████▍     | 448/1000 [20:30<15:36,  1.70s/it]

Error extracting text from https://www.reuters.com/article/us-china-health-northkorea/who-says-no-indication-of-coronavirus-cases-in-north-korea-idUSKBN20D04S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-health-northkorea/who-says-no-indication-of-coronavirus-cases-in-north-korea-idUSKBN20D04S


Processing URLs:  45%|████▌     | 450/1000 [20:33<14:11,  1.55s/it]

Error extracting text from http://www.levantinegroup.com/: 404 Client Error: Not Found for url: http://www.levantinegroup.com/


Processing URLs:  45%|████▌     | 454/1000 [20:37<08:34,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-saudi-aramco-gas-eclusive-idUSKBN17D0JY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-gas-eclusive-idUSKBN17D0JY


Processing URLs:  46%|████▌     | 455/1000 [20:37<06:45,  1.35it/s]

Error extracting text from https://www.nytimes.com/2017/09/29/opinion/gerrymandering-supreme-court.html?rref=collection%2Fsectioncollection%2Fopinion&amp;action=click&amp;contentCollection=opinion&amp;region=ra: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/29/opinion/gerrymandering-supreme-court.html?rref=collection%2Fsectioncollection%2Fopinion&amp;action=click&amp;contentCollection=opinion&amp;region=ra
URL filtered: https://twitter.com/katyafimava/status/1450146536848072707


Processing URLs:  46%|████▌     | 457/1000 [20:38<06:19,  1.43it/s]

URL filtered: https://twitter.com/BillKristol/status/1346924713940049935


Processing URLs:  46%|████▌     | 460/1000 [20:41<07:57,  1.13it/s]

Error extracting text from http://www.wsj.com/articles/feds-lockhart-says-interest-rate-rise-in-2015-still-very-much-on-the-table-1442854789: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-lockhart-says-interest-rate-rise-in-2015-still-very-much-on-the-table-1442854789


Processing URLs:  46%|████▋     | 465/1000 [21:47<2:38:50, 17.81s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-04-30/irans-leader-dismisses-rouhanis-detente-policy-ahead-of-vote: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://www.resultados.eleccionesgenerales2021.pe/EG2021/EleccionesPresidenciales/RePres/T: HTTPSConnectionPool(host='www.resultados.eleccionesgenerales2021.pe', port=443): Max retries exceeded with url: /EG2021/EleccionesPresidenciales/RePres/T (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x304920680>: Failed to resolve 'www.resultados.eleccionesgenerales2021.pe' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  47%|████▋     | 469/1000 [21:50<53:25,  6.04s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-11-19/boj-keeps-policy-unchanged-even-after-recession-weak-inflation


Processing URLs:  47%|████▋     | 472/1000 [21:52<25:47,  2.93s/it]

Error extracting text from http://www.newsweek.com/hillary-clinton-leads-bernie-sanders-new-hampshire-385602: 403 Client Error: Forbidden for url: https://www.newsweek.com/hillary-clinton-leads-bernie-sanders-new-hampshire-385602
Error extracting text from http://www.reuters.com/article/us-usa-fiscal-cybersecurity-idUSBRE93913S20130411: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-cybersecurity-idUSBRE93913S20130411


Processing URLs:  48%|████▊     | 479/1000 [22:01<10:53,  1.25s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2017/06/10/30/0401000000AEN20170610001051315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/news/articles/2015-11-17/iran-won-t-seek-opec-s-permission-before-boosting-oil-exports-ih3ejzrp


Processing URLs:  48%|████▊     | 481/1000 [22:02<07:00,  1.23it/s]

Error extracting text from https://www.dvidshub.net/feature/Balikatan: 404 Client Error: Not Found for url: https://www.dvidshub.net/feature/Balikatan


Processing URLs:  48%|████▊     | 485/1000 [22:10<15:29,  1.81s/it]



Processing URLs:  49%|████▊     | 486/1000 [23:11<2:39:01, 18.56s/it]

Error extracting text from http://aa.com.tr/en/middle-east/airstrike-kills-20-injures-scores-in-iraq-s-mosul/713870: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  49%|████▉     | 493/1000 [23:20<22:31,  2.67s/it]  

Error extracting text from https://www.nytimes.com/2017/07/30/world/americas/venezuela-constituent-assembly-election.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/30/world/americas/venezuela-constituent-assembly-election.html


Processing URLs:  50%|████▉     | 497/1000 [23:42<24:41,  2.95s/it]  

Error extracting text from https://www.nytimes.com/2021/05/09/world/middleeast/israeli-court-palestinian-families-east-jerusalem.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/09/world/middleeast/israeli-court-palestinian-families-east-jerusalem.html


Processing URLs:  51%|█████     | 507/1000 [24:04<12:32,  1.53s/it]

Error extracting text from http://www.nknews.org/2015/10/analysis-redesigned-kn-08-missile-unveiled-in-military-parade/: 404 Client Error: Not Found for url: https://www.nknews.org/2015/10/analysis-redesigned-kn-08-missile-unveiled-in-military-parade/
Error extracting text from http://www.nytimes.com/2016/07/22/automobiles/water-out-the-tailpipe-a-new-class-of-electric-car-gains-traction.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/22/automobiles/water-out-the-tailpipe-a-new-class-of-electric-car-gains-traction.html


Processing URLs:  51%|█████     | 509/1000 [24:06<09:23,  1.15s/it]

Error extracting text from http://gawker.com/joe-biden-is-100-running-for-president-says-advisor-1731501196: 404 Client Error: Not Found for url: https://gawker.com/joe-biden-is-100-running-for-president-says-advisor-1731501196


Processing URLs:  51%|█████     | 510/1000 [24:07<10:29,  1.28s/it]

Error extracting text from https://www.aa.com.tr/en/africa/sudan-looks-for-alternatives-as-nile-dam-talks-stall/2114336: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  51%|█████     | 511/1000 [24:24<47:11,  5.79s/it]

Error extracting text from https://research-doc.credit-suisse.com/docView?language=ENG&amp;source=emfromsendlink&amp;format=PDF&amp;document_id=999089271&amp;extdocid=999089271_1_eng_pdf&amp;serialid=pvH393UArco6JvZIguX4cJ5jXWIkrqD%2Bb1l3MzX4YTI%3D#:~:text=Do%20Spin%2DOffs%20Create%20or%20Destroy%20Value%3F&amp;text=We%20find%20that%20spin%2Doffs,volatility%20of%20returns%20is%20significant: HTTPSConnectionPool(host='plus2.credit-suisse.com', port=443): Max retries exceeded with url: /docView?language=ENG&amp;source=emfromsendlink&amp;format=PDF&amp;document_id=999089271&amp;extdocid=999089271_1_eng_pdf&amp;serialid=pvH393UArco6JvZIguX4cJ5jXWIkrqD%2Bb1l3MzX4YTI%3D (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30341bd40>: Failed to resolve 'plus2.credit-suisse.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████▏    | 513/1000 [24:26<26:57,  3.32s/it]

Error extracting text from https://apps.fcc.gov/edocs_public/attachmatch/DOC-344590A1.pdf: 403 Client Error: Forbidden for url: https://apps.fcc.gov/edocs_public/attachmatch/DOC-344590A1.pdf


Processing URLs:  52%|█████▏    | 517/1000 [24:33<18:14,  2.27s/it]

Error extracting text from http://www.ibtimes.com/iphone-q1-2016-sales-estimates-apple-incs-smartphone-sales-may-decline-first-time-2152449: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iphone-q1-2016-sales-estimates-apple-incs-smartphone-sales-may-decline-first-time-2152449


Processing URLs:  52%|█████▏    | 520/1000 [24:36<10:25,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-burundi-politics-idUSKCN0YY0OH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-politics-idUSKCN0YY0OH


Processing URLs:  53%|█████▎    | 528/1000 [24:50<05:43,  1.37it/s]

Error extracting text from http://thehill.com/homenews/senate/315524-schumer-ready-to-leave-ninth-supreme-court-seat-open: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/315524-schumer-ready-to-leave-ninth-supreme-court-seat-open/
URL filtered: https://www.youtube.com/watch?v=Ozj0qwnMGZ0
Error extracting text from http://www.reuters.com/article/2015/11/05/us-safrica-iran-idUSKCN0SU2GL20151105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/05/us-safrica-iran-idUSKCN0SU2GL20151105


Processing URLs:  53%|█████▎    | 531/1000 [24:59<17:43,  2.27s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-21/china-s-electric-car-subsidy-fraud-casts-doubt-on-surging-demand


Processing URLs:  54%|█████▎    | 535/1000 [25:00<07:20,  1.06it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://painel.blogfolha.uol.com.br/2016/02/03/impacto-do-desgaste-de-lula-sobre-o-impeachment-faz-dilma-ordenar-defesa-do-antecessor/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://painel.blogfolha.uol.com.br/2016/02/03/impacto-do-desgaste-de-lula-sobre-o-impeachment-faz-dilma-ordenar-defesa-do-antecessor/&amp;prev=search
Error extracting text from http://www.nytimes.com/2016/07/07/us/politics/hillary-clinton-loretta-lynch.htm: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/07/us/politics/hillary-clinton-loretta-lynch.htm


Processing URLs:  54%|█████▍    | 538/1000 [25:03<07:37,  1.01it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/feb/24/inside-the-ring-us-mulls-pledge-on-disputed-philip/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/feb/24/inside-the-ring-us-mulls-pledge-on-disputed-philip/


Processing URLs:  55%|█████▌    | 551/1000 [25:20<06:14,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-eu-usa-trade-idUSKCN0ZF2HY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-trade-idUSKCN0ZF2HY


Processing URLs:  55%|█████▌    | 554/1000 [25:23<06:42,  1.11it/s]

Error extracting text from http://ottawacitizen.com/news/politics/kady-what-will-happen-when-independent-senators-rule-the-red-chamber: 403 Client Error: Forbidden for url: https://ottawacitizen.com:443/news/politics/kady-what-will-happen-when-independent-senators-rule-the-red-chamber


Processing URLs:  56%|█████▌    | 557/1000 [25:28<08:28,  1.15s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-falluja-idUSKCN0Z81NV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-falluja-idUSKCN0Z81NV?il=0


Processing URLs:  56%|█████▋    | 564/1000 [25:36<10:55,  1.50s/it]

Error extracting text from https://reut.rs/2KLxenS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://youtube.com/watch?v=w5fBUYjux40


Processing URLs:  57%|█████▊    | 575/1000 [25:48<06:10,  1.15it/s]

Error extracting text from http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html?_r=0
Error extracting text from https://www.reuters.com/world/middle-east/brent-stays-above-75bbl-amid-impasse-opec-talks-2021-07-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/brent-stays-above-75bbl-amid-impasse-opec-talks-2021-07-12/


Processing URLs:  58%|█████▊    | 577/1000 [25:49<05:03,  1.40it/s]

Error extracting text from http://www.opec.org/opec_web/static_files_project/media/downloads/publications/MOMR%20August%202017.pdf: 404 Client Error: Not Found for url: https://www.opec.org/opec_web/static_files_project/media/downloads/publications/MOMR%20August%202017.pdf
Error extracting text from https://www.eureporter.co/frontpage/2017/01/11/bih-is-eu-membership-for-montenegro-bosnia-and-herzegovina-really-worth-it/: 403 Client Error: Forbidden for url: https://www.eureporter.co/frontpage/2017/01/11/bih-is-eu-membership-for-montenegro-bosnia-and-herzegovina-really-worth-it/


Processing URLs:  58%|█████▊    | 582/1000 [26:02<14:54,  2.14s/it]

Error extracting text from https://www.timesofisrael.com/study-covid-recovery-gave-israelis-longer-lasting-delta-defense-than-vaccines/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/study-covid-recovery-gave-israelis-longer-lasting-delta-defense-than-vaccines/


Processing URLs:  58%|█████▊    | 584/1000 [26:08<15:32,  2.24s/it]

Error extracting text from http://www.baltimoresun.com/sports/olympics/88090481-157.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/sports/olympics/88090481-157.html


Processing URLs:  59%|█████▊    | 586/1000 [26:10<12:09,  1.76s/it]

Error extracting text from http://nameexoworlds.iau.org/names: 404 Client Error: Not Found for url: https://www.nameexoworlds.iau.org/names


Processing URLs:  59%|█████▊    | 587/1000 [26:12<11:29,  1.67s/it]

Error extracting text from http://www.airforcetimes.com/story/military/2016/02/23/up-dozen-iraqi-brigades-may-need-us-combat-support-battle-mosul/80804462/: 404 Client Error: Not Found for url: https://www.airforcetimes.com/story/military/2016/02/23/up-dozen-iraqi-brigades-may-need-us-combat-support-battle-mosul/80804462/


Processing URLs:  59%|█████▉    | 591/1000 [26:20<14:45,  2.17s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-29/putin-sees-assad-chance-to-stay-as-u-s-pursues-syria-settlement


Processing URLs:  60%|█████▉    | 599/1000 [26:31<07:27,  1.11s/it]

Error extracting text from http://www.ntiindex.org/wp-content/uploads/2016/03/NTI_2016-Index-Report_MAR-25-2.pdf: 404 Client Error: Not Found for url: https://www.ntiindex.org/wp-content/uploads/2016/03/NTI_2016-Index-Report_MAR-25-2.pdf
URL filtered: https://www.youtube.com/watch?v=S1Cuekbklkg
Error extracting text from https://www.jns.org/1800-former-israeli-generals-service-members-urge-biden-not-to-return-to-jcpoa/: 403 Client Error: Forbidden for url: https://www.jns.org/1800-former-israeli-generals-service-members-urge-biden-not-to-return-to-jcpoa/


Processing URLs:  60%|██████    | 600/1000 [26:34<11:15,  1.69s/it]

URL filtered: http://www.bloombergview.com/articles/2015-11-30/you-re-the-fed-chairman-what-would-you-do-


Processing URLs:  60%|██████    | 602/1000 [26:35<07:26,  1.12s/it]

Error extracting text from https://www.houstonchronicle.com/business/energy/article/Rig-count-leaps-by-nine-as-crude-climbs-above-74-16485238.php: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/energy/article/Rig-count-leaps-by-nine-as-crude-climbs-above-74-16485238.php
URL filtered: http://www.bloomberg.com/news/articles/2015-11-19/pboc-doubles-reverse-repos-to-inject-funds-before-ipos-resume


Processing URLs:  61%|██████    | 606/1000 [26:39<07:45,  1.18s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-anc-leader-zuma-to-be-dealt-with-over-time-idUSKBN1F30GH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-anc-leader-zuma-to-be-dealt-with-over-time-idUSKBN1F30GH?il=0


Processing URLs:  61%|██████    | 609/1000 [26:41<06:39,  1.02s/it]

Error extracting text from http://www.business-standard.com/article/international/vladimir-putin-says-new-elections-key-for-ending-syrian-crisis-116061701007_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/international/vladimir-putin-says-new-elections-key-for-ending-syrian-crisis-116061701007_1.html


Processing URLs:  61%|██████    | 611/1000 [26:42<04:38,  1.39it/s]

Error extracting text from https://m.ladbrokes.com.au/sports/politics/23450981-australian-politics-federal/23450981-australian-politics-federal/: 403 Client Error: Forbidden for url: https://www.ladbrokes.com.au/mobile-download


Processing URLs:  61%|██████    | 612/1000 [26:42<04:23,  1.47it/s]

Error extracting text from https://www.un.org/press/en/disarmament: 403 Client Error: Forbidden for url: https://www.un.org/press/en/disarmament


Processing URLs:  61%|██████▏   | 614/1000 [26:44<04:07,  1.56it/s]

Error extracting text from http://www.reuters.com/article/us-montenegro-protests-idUSKCN0SC0SR20151018: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-montenegro-protests-idUSKCN0SC0SR20151018


Processing URLs:  62%|██████▏   | 616/1000 [26:44<03:06,  2.06it/s]

Error extracting text from http://www.wsj.com/: 403 Client Error: Forbidden for url: https://www.wsj.com/


Processing URLs:  62%|██████▏   | 618/1000 [26:46<03:36,  1.77it/s]

Error extracting text from http://corporatenews.pressroom.toyota.com/releases/toyota+fortune+2016+most+admired.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/toyota+fortune+2016+most+admired/
Error extracting text from http://www.reuters.com/article/us-germany-election-idUSKCN1BA21B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-idUSKCN1BA21B


Processing URLs:  62%|██████▏   | 620/1000 [26:49<06:20,  1.00s/it]

URL filtered: https://www.youtube.com/watch?v=KLKW-_pAY9k


Processing URLs:  62%|██████▎   | 625/1000 [26:56<07:11,  1.15s/it]

Error extracting text from http://adage.com/article/media/time-s-acquisition-myspace-parent-company-paying/305324/: 403 Client Error: Forbidden for url: https://adage.com/article/media/time-s-acquisition-myspace-parent-company-paying/305324/


Processing URLs:  63%|██████▎   | 627/1000 [26:57<05:41,  1.09it/s]

Error extracting text from http://www.tradingeconomics.com/commodity/crude-oil: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/commodity/crude-oil


Processing URLs:  63%|██████▎   | 629/1000 [27:34<50:40,  8.20s/it]  

Error extracting text from http://www.aflcio.org/Blog/Global-Action/What-Colombia-Can-Teach-Us-About-the-TPP: 403 Client Error: Forbidden for url: https://aflcio.org/2016/4/12/what-colombia-can-teach-us-about-tpp


Processing URLs:  63%|██████▎   | 630/1000 [27:36<38:57,  6.32s/it]

Error extracting text from http://www.startribune.com/obama-makes-fundraising-plea-for-feingold-in-senate-race/384535311/: 404 Client Error: Not Found for url: https://www.startribune.com/obama-makes-fundraising-plea-for-feingold-in-senate-race/384535311/


Processing URLs:  63%|██████▎   | 632/1000 [27:36<20:11,  3.29s/it]

Error extracting text from http://www.nytimes.com/2016/01/13/world/middleeast/iran-holds-us-navy-boats-crew.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/13/world/middleeast/iran-holds-us-navy-boats-crew.html


Processing URLs:  63%|██████▎   | 633/1000 [27:37<15:41,  2.56s/it]

Error extracting text from https://unama.unmissions.org/civil-society-northeast-dialogue-local-authorities-upcoming-elections: HTTPSConnectionPool(host='unama.unmissions.org', port=443): Max retries exceeded with url: /civil-society-northeast-dialogue-local-authorities-upcoming-elections (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  63%|██████▎   | 634/1000 [27:38<11:27,  1.88s/it]

URL filtered: https://www.thebanker.com/Editor-s-Blog/Facebook-s-Diem-joins-the-regulatory-fold


Processing URLs:  64%|██████▍   | 643/1000 [28:50<1:51:19, 18.71s/it]

Error extracting text from https://money.usnews.com/investing/news/articles/2021-02-24/imf-chief-urges-strong-g20-action-to-reverse-dangerous-divergence-in-global-economy: HTTPSConnectionPool(host='money.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  64%|██████▍   | 645/1000 [28:57<1:04:04, 10.83s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-06-22/nearly-half-of-sanders-supporters-won-t-support-clinton


Processing URLs:  65%|██████▍   | 648/1000 [29:00<29:28,  5.02s/it]  

Error extracting text from https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/113471/diego-schwartzman-gives-thoughts-on-rafael-nadal-s-french-open-title-chances/: 403 Client Error: Forbidden for url: https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/113471/diego-schwartzman-gives-thoughts-on-rafael-nadal-s-french-open-title-chances/


Processing URLs:  65%|██████▌   | 652/1000 [29:04<11:32,  1.99s/it]

Error extracting text from http://gbtimes.com/business/china-wants-conclude-rcep-trade-talks-year: 403 Client Error: Forbidden for url: http://gbtimes.com/business/china-wants-conclude-rcep-trade-talks-year
Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-idUSKBN0U311E20151221: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-idUSKBN0U311E20151221


Processing URLs:  66%|██████▌   | 656/1000 [29:21<26:45,  4.67s/it]

URL filtered: https://www.youtube.com/watch?v=OiNzaL6QyMo&amp;feature=youtu.be


Processing URLs:  66%|██████▌   | 658/1000 [29:23<16:03,  2.82s/it]

Error extracting text from https://news.sky.com/story/politics-live-labour-conference-key-starmer-andy-burnham-tory-scum-row-12418424: 404 Client Error: Not Found for url: https://news.sky.com/story/politics-live-labour-conference-key-starmer-andy-burnham-tory-scum-row-12418424


Processing URLs:  66%|██████▌   | 659/1000 [29:23<12:52,  2.27s/it]

Error extracting text from https://www.gazt.gov.sa/en/value-added-tax-vat: HTTPSConnectionPool(host='www.gazt.gov.sa', port=443): Max retries exceeded with url: /en/value-added-tax-vat (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x30153ac30>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  66%|██████▌   | 661/1000 [29:26<10:32,  1.86s/it]

Error extracting text from https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/Tq8qeRxQ4pB3b5RKg: 403 Client Error: Forbidden for url: https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/Tq8qeRxQ4pB3b5RKg


Processing URLs:  66%|██████▌   | 662/1000 [29:27<09:54,  1.76s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-ginsburg-idUSKCN11D2U9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-ginsburg-idUSKCN11D2U9


Processing URLs:  67%|██████▋   | 666/1000 [29:33<08:51,  1.59s/it]

Error extracting text from http://www.businessinsider.com/russia-is-sending-advanced-air-defenses-to-syria-2015-9: 404 Client Error: Not Found for url: https://www.businessinsider.com/russia-is-sending-advanced-air-defenses-to-syria-2015-9


Processing URLs:  67%|██████▋   | 668/1000 [29:38<10:52,  1.97s/it]

Error extracting text from http://futuresproject.net/2016/03/14/saudi-arabia-and-iran-a-complex-relationship-evolves-part-ii/: HTTPConnectionPool(host='futuresproject.net', port=80): Max retries exceeded with url: /2016/03/14/saudi-arabia-and-iran-a-complex-relationship-evolves-part-ii/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcd340>: Failed to resolve 'futuresproject.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  67%|██████▋   | 671/1000 [29:40<06:57,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-russia-usa-media/russia-not-currently-planning-actions-against-u-s-media-or-social-networks-ria-idUSKBN1D14ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-media/russia-not-currently-planning-actions-against-u-s-media-or-social-networks-ria-idUSKBN1D14ID


Processing URLs:  67%|██████▋   | 674/1000 [29:41<04:45,  1.14it/s]

Error extracting text from http://www.wsj.com/articles/the-backlash-against-xi-jinping-1459204928: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-backlash-against-xi-jinping-1459204928


Processing URLs:  68%|██████▊   | 675/1000 [29:42<03:53,  1.39it/s]

Error extracting text from http://world.kbs.co.kr/english/news/news_In_detail.htm?No=117639: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_In_detail.htm?No=117639


Processing URLs:  68%|██████▊   | 678/1000 [29:54<11:32,  2.15s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-11/brazil-s-corruption-scandal-grows-ever-nearer-to-rousseff


Processing URLs:  68%|██████▊   | 681/1000 [29:57<07:25,  1.40s/it]

Error extracting text from https://ahvalnews.com/turkey-debt/indebted-turkish-firms-left-exposed-weak-policy-fed-taper-fitch: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkey-debt/indebted-turkish-firms-left-exposed-weak-policy-fed-taper-fitch


Processing URLs:  68%|██████▊   | 682/1000 [29:57<06:29,  1.22s/it]

Error extracting text from http://www.rs.nato.int/news-center/press-releases/2018-press-releases/taliban-red-unit-flees--surrendering-explosives-cache-to-commandos.aspx: HTTPConnectionPool(host='www.rs.nato.int', port=80): Max retries exceeded with url: /news-center/press-releases/2018-press-releases/taliban-red-unit-flees--surrendering-explosives-cache-to-commandos.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcda60>: Failed to resolve 'www.rs.nato.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  69%|██████▉   | 688/1000 [30:11<10:25,  2.00s/it]

Error extracting text from https://www.nord-stream2.com/company/shareholder-and-financial-investors/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /company/shareholder-and-financial-investors/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300db1220>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  69%|██████▉   | 693/1000 [30:18<07:05,  1.39s/it]

Error extracting text from http://www.latimes.com/nation/la-na-us-intervention-foreign-elections-20161213-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-us-intervention-foreign-elections-20161213-story.html


Processing URLs:  70%|██████▉   | 699/1000 [30:25<03:35,  1.40it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-infrastru-idUSKCN0XM1J2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-infrastru-idUSKCN0XM1J2
Error extracting text from http://www.wsj.com/articles/global-markets-rebound-on-yellen-speech-1443167180: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/global-markets-rebound-on-yellen-speech-1443167180


Processing URLs:  70%|███████   | 700/1000 [30:25<02:44,  1.83it/s]

Error extracting text from https://apps.fcc.gov/edocs_public/recentReleases.do: 403 Client Error: Forbidden for url: https://apps.fcc.gov/edocs_public/recentReleases.do


Processing URLs:  70%|███████   | 701/1000 [30:27<04:16,  1.17it/s]

Error extracting text from http://www.ibtimes.com/hydrogen-bomb-vs-atomic-bomb-fusion-powered-weapon-more-destructive-fission-powered-2251137: 403 Client Error: Forbidden for url: https://www.ibtimes.com/hydrogen-bomb-vs-atomic-bomb-fusion-powered-weapon-more-destructive-fission-powered-2251137


Processing URLs:  70%|███████   | 703/1000 [30:31<08:14,  1.66s/it]

Error extracting text from http://www.reuters.com/article/us-britain-scotland-independence-idUSKCN11F18W?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-scotland-independence-idUSKCN11F18W?il=0


Processing URLs:  71%|███████   | 706/1000 [30:34<06:13,  1.27s/it]

Error extracting text from https://science.sciencemag.org/content/371/6534/1103: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.abg7404


Processing URLs:  71%|███████   | 707/1000 [30:35<05:27,  1.12s/it]

Error extracting text from http://aranews.net/2016/06/12000-isis-militants-fighting-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/06/12000-isis-militants-fighting-mosul/


Processing URLs:  71%|███████   | 708/1000 [30:36<06:03,  1.24s/it]

Error extracting text from http://www.ibtimes.com/eu-brexit-debate-2016-iain-duncan-smith-says-resignation-not-about-referendum-2339816: 403 Client Error: Forbidden for url: https://www.ibtimes.com/eu-brexit-debate-2016-iain-duncan-smith-says-resignation-not-about-referendum-2339816


Processing URLs:  71%|███████   | 710/1000 [30:39<05:33,  1.15s/it]

Error extracting text from http://www.reuters.com/article/russia-putin-turkey-nuclear-idUSR4N12R01Y20151217: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-putin-turkey-nuclear-idUSR4N12R01Y20151217


Processing URLs:  71%|███████   | 712/1000 [30:45<10:49,  2.26s/it]

Error extracting text from http://www.northkoreatech.org/2015/01/03/us-sanctions-north-korea-for-sony-hack/: 403 Client Error: Forbidden for url: http://www.northkoreatech.org/2015/01/03/us-sanctions-north-korea-for-sony-hack/


Processing URLs:  72%|███████▏  | 718/1000 [30:52<04:41,  1.00it/s]

Error extracting text from https://jamiemetzl.com/origins-of-sars-cov-2/: 403 Client Error: Forbidden for url: https://jamiemetzl.com/origins-of-sars-cov-2/
Error extracting text from http://www.reuters.com/article/us-turkey-eu-parliament-idUSKBN19R194?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-parliament-idUSKBN19R194?il=0


Processing URLs:  72%|███████▏  | 721/1000 [30:53<03:01,  1.53it/s]

Error extracting text from http://www.wsj.com/articles/tesla-misses-its-2016-sales-goal-despite-27-fourth-quarter-rise-1483479562: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tesla-misses-its-2016-sales-goal-despite-27-fourth-quarter-rise-1483479562


Processing URLs:  72%|███████▎  | 725/1000 [31:04<06:51,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-garland-idUSKCN0WJ251: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-garland-idUSKCN0WJ251


Processing URLs:  73%|███████▎  | 726/1000 [31:04<05:05,  1.12s/it]

Error extracting text from http://www.wsj.com/articles/brazil-speaker-eduardo-cunha-is-defiant-as-high-court-orders-him-tried-1457043379: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-speaker-eduardo-cunha-is-defiant-as-high-court-orders-him-tried-1457043379
URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/brexit-odds-double-to-36-percent-at-bookmaker


Processing URLs:  73%|███████▎  | 728/1000 [31:05<03:12,  1.41it/s]

Error extracting text from http://sondeos.elperiodic.ad/segundo-sondeo-elecciones-generales-26j.html: 404 Client Error: Not Found for url: http://sondeos.elperiodic.ad/segundo-sondeo-elecciones-generales-26j.html


Processing URLs:  73%|███████▎  | 729/1000 [31:05<03:05,  1.46it/s]

Error extracting text from http://www.tradingeconomics.com/united-states/inflation-cpi: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/inflation-cpi


Processing URLs:  73%|███████▎  | 730/1000 [31:06<03:35,  1.25it/s]

Error extracting text from https://www.instituteforsupplymanagement.org/ismreport/mfgrob.cfm?SSO=1: HTTPSConnectionPool(host='www.instituteforsupplymanagement.org', port=443): Max retries exceeded with url: /ismreport/mfgrob.cfm?SSO=1 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  73%|███████▎  | 731/1000 [31:07<03:39,  1.22it/s]

Error extracting text from https://www.radioiowa.com/2017/04/18/senator-grassley-not-confident-of-quick-obamacare-repeal-and-replacement/: 403 Client Error: Forbidden for url: https://www.radioiowa.com/2017/04/18/senator-grassley-not-confident-of-quick-obamacare-repeal-and-replacement/


Processing URLs:  73%|███████▎  | 732/1000 [31:18<15:34,  3.49s/it]

Error extracting text from https://www.france24.com/en/europe/20210212-kremlin-critic-navalny-back-in-moscow-court-for-slander-trial: 403 Client Error: Forbidden for url: https://www.france24.com/en/europe/20210212-kremlin-critic-navalny-back-in-moscow-court-for-slander-trial


Processing URLs:  74%|███████▎  | 735/1000 [31:26<13:05,  2.96s/it]

Error extracting text from http://articles.latimes.com/1991-06-13/news/mn-717_1_foreseeable-future: 403 Client Error: Forbidden for url: https://www.latimes.com/archives/la-xpm-1991-06-13-mn-717-story.html


Processing URLs:  74%|███████▎  | 736/1000 [31:28<11:53,  2.70s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/war-crimes-court-to-investigate-deadly-violence-in-burundi/article29747459/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/war-crimes-court-to-investigate-deadly-violence-in-burundi/article29747459/


Processing URLs:  74%|███████▍  | 738/1000 [31:30<08:18,  1.90s/it]

Error extracting text from https://tradingeconomics.com/commodity/brent-crude-oil: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/brent-crude-oil


Processing URLs:  74%|███████▍  | 742/1000 [31:37<07:05,  1.65s/it]

URL filtered: http://www.bloomberg.com/view/articles/2016-06-17/britain-s-elites-ignore-the-masses-at-their-peril


Processing URLs:  74%|███████▍  | 745/1000 [31:38<04:07,  1.03it/s]

Error extracting text from https://www.yahoo.com/news/doctors-upsurge-paralysis-condition-accompanies-050831706.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/doctors-upsurge-paralysis-condition-accompanies-050831706.html


Processing URLs:  75%|███████▍  | 747/1000 [31:42<04:56,  1.17s/it]

Error extracting text from https://www.today.ng/news/nigeria/18177/aid-chief-urges-sustained-aid-response-north-east: 403 Client Error: Forbidden for url: https://www.today.ng/news/nigeria/18177/aid-chief-urges-sustained-aid-response-north-east


Processing URLs:  75%|███████▍  | 748/1000 [31:42<03:46,  1.11it/s]

Error extracting text from http://www.barrons.com/articles/venezuela-bondholders-on-default-watch-1510158935: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-bondholders-on-default-watch-1510158935


Processing URLs:  75%|███████▌  | 752/1000 [31:48<05:36,  1.36s/it]

URL filtered: https://www.instagram.com/davidseamandoing/?hl=en


Processing URLs:  75%|███████▌  | 754/1000 [31:49<03:14,  1.27it/s]

Error extracting text from http://www.wsj.com/articles/greece-suspends-legislative-package-after-concerns-from-creditors-1450371197: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/greece-suspends-legislative-package-after-concerns-from-creditors-1450371197


Processing URLs:  76%|███████▌  | 756/1000 [31:49<02:25,  1.68it/s]

Error extracting text from https://www.courtlistener.com/docket/18506139/1/securities-and-exchange-commission-v-mcafee/: 403 Client Error: Forbidden for url: https://www.courtlistener.com/docket/18506139/1/securities-and-exchange-commission-v-mcafee/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdozedefatima.com.br/%3Fp%3D8247&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdozedefatima.com.br/%3Fp%3D8247&amp;prev=search


Processing URLs:  76%|███████▌  | 761/1000 [31:56<04:26,  1.11s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/272959-white-house-set-to-send-iran-cyber-message: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/272959-white-house-set-to-send-iran-cyber-message/


Processing URLs:  77%|███████▋  | 766/1000 [32:05<06:04,  1.56s/it]

Error extracting text from http://www.ibtimes.com/iran-offered-syrian-president-bashar-assad-shelter-during-civil-war-he-refused-2189763: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-offered-syrian-president-bashar-assad-shelter-during-civil-war-he-refused-2189763


Processing URLs:  77%|███████▋  | 770/1000 [32:09<03:51,  1.01s/it]

Error extracting text from http://www.reuters.com/article/2015/09/29/us-icahn-fed-idUSKCN0RT09120150929#CEyTmxXxKeE83UYl.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/29/us-icahn-fed-idUSKCN0RT09120150929#CEyTmxXxKeE83UYl.97
Error extracting text from http://www.wsj.com/articles/petroleos-de-venezuela-weighs-bond-refinancing-plan-1446849347?mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/petroleos-de-venezuela-weighs-bond-refinancing-plan-1446849347?mg=id-wsj


Processing URLs:  77%|███████▋  | 774/1000 [32:14<04:07,  1.10s/it]

Error extracting text from https://www.flightglobal.com/airframers/first-flight-of-booms-xb-1-demonstrator-could-happen-next-year-ceo/143485.article: 403 Client Error: Forbidden for url: https://www.flightglobal.com/airframers/first-flight-of-booms-xb-1-demonstrator-could-happen-next-year-ceo/143485.article


Processing URLs:  78%|███████▊  | 775/1000 [32:14<03:30,  1.07it/s]

Error extracting text from http://seekingalpha.com/article/4021758-trump-impact-auto-industry-good-tesla: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4021758-trump-impact-auto-industry-good-tesla


Processing URLs:  78%|███████▊  | 777/1000 [32:17<04:18,  1.16s/it]

Error extracting text from https://housegop.leadpages.co/healthcare/: 404 Client Error: Not Found for url: https://housegop.leadpages.co/healthcare/


Processing URLs:  78%|███████▊  | 778/1000 [32:18<03:57,  1.07s/it]

Error extracting text from https://academic.oup.com/ejil/advance-article-abstract/doi/10.1093/ejil/chaa061/5908084?redirectedFrom=fulltext: 403 Client Error: Forbidden for url: https://academic.oup.com/ejil/advance-article-abstract/doi/10.1093/ejil/chaa061/5908084?redirectedFrom=fulltext


Processing URLs:  78%|███████▊  | 783/1000 [32:25<04:44,  1.31s/it]

Error extracting text from http://www.wsj.com/articles/ge-to-shift-jobs-to-canada-in-ex-im-bank-protest-1443449286: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ge-to-shift-jobs-to-canada-in-ex-im-bank-protest-1443449286
URL filtered: http://www.bloomberg.com/news/articles/2016-04-08/iran-steps-up-offense-in-oil-market-battle-with-pricing-discount


Processing URLs:  79%|███████▉  | 790/1000 [32:31<02:59,  1.17it/s]

Error extracting text from https://www.reuters.com/article/us-saudi-cyber/saudi-agency-says-country-targeted-in-cyber-spying-campaign-idUSKBN1DK27M?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-cyber/saudi-agency-says-country-targeted-in-cyber-spying-campaign-idUSKBN1DK27M?il=0
Error extracting text from http://www.khaama.com/terrorist-with-links-to-pakistani-military-arrested-by-afghan-forces-01757: 403 Client Error: Forbidden for url: http://www.khaama.com/terrorist-with-links-to-pakistani-military-arrested-by-afghan-forces-01757


Processing URLs:  79%|███████▉  | 791/1000 [32:32<03:07,  1.12it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160722/1305196312.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160722/1305196312.html


Processing URLs:  79%|███████▉  | 794/1000 [32:40<05:32,  1.62s/it]

Error extracting text from https://www.forbes.com/sites/heshmatalavi/2018/02/15/how-the-world-views-irans-role-in-syria/#5cf0f7002365: 410 Client Error: Gone for url: https://www.forbes.com/sites/heshmatalavi/2018/02/15/how-the-world-views-irans-role-in-syria/#5cf0f7002365
Error extracting text from https://news.yahoo.com/perfect-james-webb-telescope-track-233510711.html: 404 Client Error: Not Found for url: https://news.yahoo.com/perfect-james-webb-telescope-track-233510711.html


Processing URLs:  80%|███████▉  | 795/1000 [32:41<04:26,  1.30s/it]

Error extracting text from https://www.predictit.org/Contract/2091/Will-the-Senate-confirm-any-SCOTUS-nominee-before-Obama-leaves-office#openoffers: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/2091/Will-the-Senate-confirm-any-SCOTUS-nominee-before-Obama-leaves-office#openoffers


Processing URLs:  80%|███████▉  | 797/1000 [32:42<03:09,  1.07it/s]

Error extracting text from http://www.parliament.uk/mps-lords-and-offices/lords/composition-of-the-lords/: 403 Client Error: Forbidden for url: http://www.parliament.uk/mps-lords-and-offices/lords/composition-of-the-lords/


Processing URLs:  80%|███████▉  | 799/1000 [32:44<02:28,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/colombian-peace-plan-heads-for-plebiscite-1472166364: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/colombian-peace-plan-heads-for-plebiscite-1472166364


Processing URLs:  80%|████████  | 800/1000 [32:45<03:03,  1.09it/s]

Error extracting text from http://in.reuters.com/article/bbc-cybersecurity-idINKBN0UE0U520151231: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  80%|████████  | 801/1000 [32:46<02:52,  1.16it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/12/super-friday-night/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/12/super-friday-night/


Processing URLs:  80%|████████  | 802/1000 [32:46<02:39,  1.24it/s]

Error extracting text from https://www.macnn.com/articles/16/01/18/apples.homepage.offers.quote.from.civil.rights.leader.on.federal.holiday.132079/: HTTPSConnectionPool(host='www.macnn.com', port=443): Max retries exceeded with url: /articles/16/01/18/apples.homepage.offers.quote.from.civil.rights.leader.on.federal.holiday.132079/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  81%|████████  | 806/1000 [32:50<02:26,  1.33it/s]

Error extracting text from http://www.nasdaq.com/article/is-grexit-back-on-the-table-cm549685: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/is-grexit-back-on-the-table-cm549685
Error extracting text from https://www.reuters.com/world/defiant-odessa-is-seen-vulnerable-russian-sea-assault-2022-03-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/defiant-odessa-is-seen-vulnerable-russian-sea-assault-2022-03-18/


Processing URLs:  81%|████████  | 807/1000 [32:51<02:19,  1.39it/s]

Error extracting text from http://aranews.net/2016/06/us-led-coalition-bombs-isis-weapons-warehouse-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/06/us-led-coalition-bombs-isis-weapons-warehouse-mosul/


Processing URLs:  81%|████████  | 808/1000 [32:56<06:19,  1.98s/it]

Error extracting text from http://illinoistimes.com/article-18184-lawmakers-propose-tax-on-sugary-beverages.html: 404 Client Error: Not Found for url: https://www.illinoistimes.com/springfield/Content?url=https://www.illinoistimes.com/article-18184-lawmakers-propose-tax-on-sugary-beverages.html
Error extracting text from http://www.business-standard.com/article/current-affairs/quietly-india-is-helping-build-world-s-largest-nuclear-fusion-reactor-115081800576_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/current-affairs/quietly-india-is-helping-build-world-s-largest-nuclear-fusion-reactor-115081800576_1.html
URL filtered: http://www.bloomberg.com/news/articles/2016-10-17/what-makes-a-hard-brexit-harder-than-a-soft-one-quicktake-q-a


Processing URLs:  81%|████████▏ | 813/1000 [33:01<03:53,  1.25s/it]

Error extracting text from https://www.whitehouse.gov/trade-deals-working-all-americans: 404 Client Error: Not Found for url: https://www.whitehouse.gov/trade-deals-working-all-americans


Processing URLs:  82%|████████▏ | 818/1000 [33:13<06:13,  2.05s/it]

Error extracting text from http://schrts.co/jfwHnT: 404 Client Error: Not Found for url: https://schrts.co:443/jfwHnT


Processing URLs:  82%|████████▏ | 819/1000 [33:16<06:23,  2.12s/it]

Error extracting text from http://www.spacex.com/falcon-heavy: 404 Client Error: The requested content does not exist. for url: https://www.spacex.com/falcon-heavy
Error extracting text from https://www.reuters.com/article/us-germany-politics-merkel-factbox/factbox-who-could-wield-the-knife-scenarios-for-a-merkel-exit-idUSKBN1FW1ZW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-merkel-factbox/factbox-who-could-wield-the-knife-scenarios-for-a-merkel-exit-idUSKBN1FW1ZW?il=0
URL filtered: https://twitter.com/jbloom_lab/status/1407445643547746305


Processing URLs:  82%|████████▏ | 822/1000 [33:16<02:56,  1.01it/s]

Error extracting text from http://www.scientificamerican.com/article/nuclear-confusion-the-data-suggest-north-korea-s-h-bomb-isn-t/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/nuclear-confusion-the-data-suggest-north-korea-s-h-bomb-isn-t/


Processing URLs:  82%|████████▎ | 825/1000 [33:23<04:42,  1.61s/it]

Error extracting text from http://www.aina.org/news/20160202203033.htm: 404 Client Error:  for url: http://www.aina.org/news/20160202203033.htm


Processing URLs:  83%|████████▎ | 829/1000 [33:23<01:45,  1.62it/s]

Error extracting text from http://www.wsj.com/articles/enrique-pena-nieto-put-in-tough-spot-by-donald-trump-1485556405: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/enrique-pena-nieto-put-in-tough-spot-by-donald-trump-1485556405
URL filtered: http://www.bloomberg.com/news/articles/2015-11-30/opec-rivals-become-unwitting-allies-in-push-for-oil-market-share
URL filtered: http://www.bloomberg.com/news/articles/2015-12-09/nobel-laureate-says-fed-shouldn-t-raise-interest-rates-next-week
Error extracting text from http://www.nytimes.com/2016/10/20/world/europe/prague-russian-hacker.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/20/world/europe/prague-russian-hacker.html?_r=0


Processing URLs:  83%|████████▎ | 830/1000 [33:24<01:48,  1.56it/s]

Error extracting text from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1040076/Technical_Briefing_31.pdf.: 404 Client Error: Not Found for url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1040076/Technical_Briefing_31.pdf.


Processing URLs:  84%|████████▎ | 837/1000 [33:35<04:09,  1.53s/it]

Error extracting text from https://www.aa.com.tr/en/africa/russia-ethiopia-ink-military-cooperation-agreement/2302337: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  84%|████████▍ | 840/1000 [33:38<03:20,  1.25s/it]

Error extracting text from http://www.politicalbots.org: 404 Client Error: Not Found for url: http://www.politicalbots.org/


Processing URLs:  84%|████████▍ | 841/1000 [33:39<02:53,  1.09s/it]

Error extracting text from http://www.democraticaudit.com/?p=16238: 403 Client Error: Forbidden for url: http://www.democraticaudit.com/?p=16238


Processing URLs:  84%|████████▍ | 843/1000 [33:42<03:12,  1.22s/it]

Error extracting text from http://www.tandfonline.com/doi/abs/10.1080/00472336.2013.771942#.VrMxiVWLTnA: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/abs/10.1080/00472336.2013.771942#.VrMxiVWLTnA


Processing URLs:  85%|████████▌ | 850/1000 [33:57<05:00,  2.00s/it]

Error extracting text from https://reut.rs/3d60CR9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/china-says-it-will-take-necessary-measures-safeguard-chinese-firms-interests-2021-06-24/
URL filtered: https://www.bloomberg.com/view/articles/2017-02-24/the-immigration-number-that-ignores-citizens-interests


Processing URLs:  85%|████████▌ | 854/1000 [34:01<03:23,  1.39s/it]

Error extracting text from http://www.fairmotoring.com/index.php?entry_id=1466445358: 500 Server Error: Internal Server Error for url: http://www.fairmotoring.com/index.php?entry_id=1466445358
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0WZ0O5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0WZ0O5


Processing URLs:  86%|████████▌ | 856/1000 [34:03<02:57,  1.23s/it]

Error extracting text from http://www.newkerala.com/news/2016/fullnews-22397.html: 404 Client Error: Not Found for url: https://www.newkerala.com/news/2016/fullnews-22397.html


Processing URLs:  86%|████████▌ | 859/1000 [34:06<02:45,  1.17s/it]

Error extracting text from https://finance.yahoo.com/news/covid-wuhan-lab-leak-hypotheses-143344170.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/covid-wuhan-lab-leak-hypotheses-143344170.html


Processing URLs:  86%|████████▋ | 865/1000 [34:16<02:29,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-china-silkroad-india-idUSKBN18A07L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-silkroad-india-idUSKBN18A07L


Processing URLs:  87%|████████▋ | 867/1000 [34:18<02:16,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-zarif-idUSKCN0UU0C7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-zarif-idUSKCN0UU0C7
Error extracting text from http://www.reuters.com/article/us-libya-security-france-idUSKCN0VX1C3?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-libya-security-france-idUSKCN0VX1C3?feedType=RSS&amp;feedName=worldNews


Processing URLs:  87%|████████▋ | 869/1000 [34:25<04:21,  2.00s/it]

Error extracting text from http://theiranproject.com/blog/2016/02/18/irans-top-commander-irgcs-missile-drill-forthcoming/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=irans-top-commander-irgcs-missile-drill-forthcoming


Processing URLs:  87%|████████▋ | 870/1000 [34:25<03:34,  1.65s/it]

Error extracting text from http://www.c-span.org/video/?325192-3/discussion-president-george-w-bush-afghanistan-iraq-wars: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?325192-3/discussion-president-george-w-bush-afghanistan-iraq-wars


Processing URLs:  87%|████████▋ | 873/1000 [34:26<01:46,  1.19it/s]

Error extracting text from https://www.congress.gov/bill/116th-congress/house-bill/8235: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/116th-congress/house-bill/8235
Error extracting text from http://www.reuters.com/article/us-britain-politics-scotland-sturgeon-idUSKBN19I0Z7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-politics-scotland-sturgeon-idUSKBN19I0Z7


Processing URLs:  88%|████████▊ | 876/1000 [34:31<03:10,  1.53s/it]

Error extracting text from http://www.defensenews.com/articles/israeli-satellite-imagary-shows-russian-nuclear-capable-missiles-in-syria: 404 Client Error: Not Found for url: https://www.defensenews.com/articles/israeli-satellite-imagery-shows-russian-nuclear-capable-missiles-in-syria/


Processing URLs:  88%|████████▊ | 877/1000 [34:32<02:57,  1.44s/it]

Error extracting text from http://gizadeathstar.com/2015/03/the-mind-control-scrapbook-russias-psychotronic-weapons-to-become-part-of-its-arsenal/: 403 Client Error: Forbidden for url: https://gizadeathstar.com/2015/03/the-mind-control-scrapbook-russias-psychotronic-weapons-to-become-part-of-its-arsenal/


Processing URLs:  88%|████████▊ | 879/1000 [35:06<21:20, 10.58s/it]

Error extracting text from http://gas2.org/2015/02/25/the-toyota-mirais-biggest-problem-is-that-its-boring/: 522 Server Error:  for url: https://gas2.org/2015/02/25/the-toyota-mirais-biggest-problem-is-that-its-boring/


Processing URLs:  88%|████████▊ | 880/1000 [35:07<15:12,  7.60s/it]

Error extracting text from http://www.hybridcars.com/smart-transitioning-over-to-electric-only-vehicles-in-north-america/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/smart-transitioning-over-to-electric-only-vehicles-in-north-america/


Processing URLs:  88%|████████▊ | 881/1000 [35:09<11:47,  5.95s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekend&amp;id=jurassicpark4.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekend&amp;id=jurassicpark4.htm


Processing URLs:  89%|████████▊ | 887/1000 [35:21<05:06,  2.71s/it]

URL filtered: https://twitter.com/zerohedge/status/826115005703655425


Processing URLs:  89%|████████▉ | 889/1000 [35:21<02:51,  1.54s/it]

Error extracting text from http://www.amazon.com/gp/new-releases/books/3/ref=zg_bsnr_unv_b_2_2675_1#3: 503 Server Error: Service Unavailable for url: https://www.amazon.com/gp/new-releases/books/3/ref=zg_bsnr_unv_b_2_2675_1#3


Processing URLs:  89%|████████▉ | 893/1000 [35:28<02:41,  1.51s/it]

Error extracting text from http://www.oxitec.com/wpcms/wp-content/uploads/Information-about-Florida-trial-EA-2015.pdf?db0f11: 404 Client Error: Not Found for url: https://www.oxitec.com/wpcms/wp-content/uploads/Information-about-Florida-trial-EA-2015.pdf?db0f11
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-imf-lagarde-idUSKCN0UT0N0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-imf-lagarde-idUSKCN0UT0N0
URL filtered: https://www.buzzfeed.com/jimwaterson/british-mps-are-targeting-facebook-with-fake-news-inquiry?utm_term=.sgmLJ0KYV#.gg3malBEX
Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-cyber-idUSKBN19U0P4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-cyber-idUSKBN19U0P4


Processing URLs:  90%|████████▉ | 897/1000 [35:29<01:11,  1.45it/s]

Error extracting text from http://www.thelocal.de/20160818/germany-starts-sending-weapons-to-kurds-again: 403 Client Error: Forbidden for url: https://www.thelocal.de/20160818/germany-starts-sending-weapons-to-kurds-again


Processing URLs:  90%|████████▉ | 898/1000 [35:30<01:11,  1.42it/s]

Error extracting text from http://unama.unmissions.org/afghanistan-record-level-civilian-casualties-sustained-first-half-2016-un-report: HTTPSConnectionPool(host='unama.unmissions.org', port=443): Max retries exceeded with url: /afghanistan-record-level-civilian-casualties-sustained-first-half-2016-un-report (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  90%|████████▉ | 899/1000 [35:33<02:23,  1.42s/it]

URL filtered: https://seekingalpha.com/article/4451402-will-facebook-stock-ever-split


Processing URLs:  90%|█████████ | 902/1000 [35:36<01:40,  1.02s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-24/opec-seen-holding-the-line-as-40-oil-looms-over-vienna-meeting?cmpid=wsdemand
Error extracting text from http://www.reuters.com/article/us-libya-security-sirte-idUSKCN12L1QD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-libya-security-sirte-idUSKCN12L1QD
URL filtered: https://www.youtube.com/watch?v=zei3xnivwFk


Processing URLs:  91%|█████████ | 908/1000 [35:38<00:47,  1.93it/s]

Error extracting text from http://english.aawsat.com/2016/08/article55355537/hrw-calls-keep-pmf-mosul: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/08/article55355537/hrw-calls-keep-pmf-mosul
Error extracting text from http://www.nrc.gov/reading-rm/doc-collections/cfr/part073/part073-0054.html: 403 Client Error: Forbidden for url: http://www.nrc.gov/reading-rm/doc-collections/cfr/part073/part073-0054.html


Processing URLs:  91%|█████████ | 911/1000 [35:44<02:01,  1.36s/it]

Error extracting text from http://uk.reuters.com/article/2015/12/02/uk-iran-nuclear-iaea-idUKKBN0TL24620151202: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  91%|█████████ | 912/1000 [35:45<01:54,  1.30s/it]

Error extracting text from https://providencemag.com/2021/02/america-might-need-rejoin-tpp-trans-pacific-partnership-joe-biden/: 403 Client Error: Forbidden for url: https://providencemag.com/2021/02/america-might-need-rejoin-tpp-trans-pacific-partnership-joe-biden/


Processing URLs:  91%|█████████▏| 914/1000 [35:47<01:31,  1.06s/it]

Error extracting text from https://www.reuters.com/world/europe/un-security-council-members-look-act-ukraine-doomed-fail-2022-02-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/un-security-council-members-look-act-ukraine-doomed-fail-2022-02-24/


Processing URLs:  92%|█████████▏| 923/1000 [36:04<02:22,  1.86s/it]

Error extracting text from http://www.thenational.ae/world/middle-east/iran-says-it-will-expand-missile-programme-in-response-to-us-sanctions-threat: 404 Client Error: Not Found for url: https://www.thenationalnews.com/mena/iran-says-it-will-expand-missile-programme-in-response-to-us-sanctions-threat/


Processing URLs:  93%|█████████▎| 930/1000 [36:27<03:38,  3.12s/it]

Error extracting text from https://www.hindustantimes.com/world-news/new-start-all-you-need-to-know-about-us-russia-nuclear-arms-control-treaty-101611755500309.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/new-start-all-you-need-to-know-about-us-russia-nuclear-arms-control-treaty-101611755500309.html


Processing URLs:  93%|█████████▎| 932/1000 [36:27<01:59,  1.76s/it]

Error extracting text from http://www.thelocal.es/20160828/rajoy-says-forming-new-government-a-wish: 403 Client Error: Forbidden for url: https://www.thelocal.es/20160828/rajoy-says-forming-new-government-a-wish


Processing URLs:  93%|█████████▎| 934/1000 [36:31<02:00,  1.83s/it]



Processing URLs:  94%|█████████▎| 937/1000 [36:34<01:27,  1.39s/it]

Error extracting text from http://europe.newsweek.com/fight-take-back-mosul-isis-already-started-us-envoy-437448?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/fight-take-back-mosul-isis-already-started-us-envoy-437448


Processing URLs:  94%|█████████▍| 938/1000 [36:35<01:16,  1.24s/it]

Error extracting text from http://www.businessinsider.com/r-china-air-force-holds-drills-in-western-pacific-for-second-time-this-month-2016-9?r=UK&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-china-air-force-holds-drills-in-western-pacific-for-second-time-this-month-2016-9?r=UK&amp;IR=T


Processing URLs:  95%|█████████▌| 952/1000 [36:58<01:08,  1.44s/it]

Error extracting text from http://middle-east-online.com/english/?id=76164: 404 Client Error: Not Found for url: https://middle-east-online.com/english/?id=76164


Processing URLs:  96%|█████████▌| 956/1000 [37:03<00:57,  1.31s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-03-05/chasing-c


Processing URLs:  96%|█████████▌| 959/1000 [37:05<00:36,  1.12it/s]

Error extracting text from https://www.reuters.com/world/us/us-sikh-group-demands-probe-possible-hate-bias-deadly-indianapolis-fedex-rampage-2021-04-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-sikh-group-demands-probe-possible-hate-bias-deadly-indianapolis-fedex-rampage-2021-04-17/
Error extracting text from http://www.reuters.com/article/us-southchinasea-usa-hongkong-idUSKCN0XQ1RM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-usa-hongkong-idUSKCN0XQ1RM


Processing URLs:  97%|█████████▋| 966/1000 [38:22<10:51, 19.16s/it]

Error extracting text from http://www.irantracker.org/iran-news-round-may-11-2016: HTTPConnectionPool(host='www.irantracker.org', port=80): Max retries exceeded with url: /iran-news-round-may-11-2016 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe840260>, 'Connection to www.irantracker.org timed out. (connect timeout=60)'))


Processing URLs:  97%|█████████▋| 970/1000 [38:26<02:42,  5.41s/it]

Error extracting text from http://www.publications.parliament.uk/pa/cm201617/cmselect/cmdfence/668/66804.htm: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/cm201617/cmselect/cmdfence/668/66804.htm


Processing URLs:  97%|█████████▋| 971/1000 [38:26<01:52,  3.87s/it]

Error extracting text from http://www.geekwire.com/2016/spacexs-elon-musk-wants-to-go-into-space-by-2021-and-start-sending-people-to-mars-by-2025/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2016/spacexs-elon-musk-wants-to-go-into-space-by-2021-and-start-sending-people-to-mars-by-2025/


Processing URLs:  97%|█████████▋| 972/1000 [38:27<01:20,  2.89s/it]

Error extracting text from http://aranews.net/2016/04/dozens-experts-media-workers-desert-isis-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/dozens-experts-media-workers-desert-isis-mosul/


Processing URLs:  97%|█████████▋| 974/1000 [38:29<00:51,  1.99s/it]

Error extracting text from http://www.ndb.int/New-Development-Bank-plans-rupee-rouble-bonds.php: 403 Client Error: Forbidden for url: https://www.ndb.int/New-Development-Bank-plans-rupee-rouble-bonds.php


Processing URLs:  98%|█████████▊| 978/1000 [38:34<00:27,  1.25s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X119T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X119T


Processing URLs:  98%|█████████▊| 979/1000 [38:34<00:21,  1.02s/it]

Error extracting text from http://evobsession.com/electric-car-sales-germany-july-2016/: 403 Client Error: Forbidden for url: http://evobsession.com/electric-car-sales-germany-july-2016/


Processing URLs:  98%|█████████▊| 980/1000 [38:36<00:26,  1.30s/it]

Error extracting text from http://www.38north.org/topics/satellite-analysis/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  98%|█████████▊| 982/1000 [38:37<00:17,  1.03it/s]

Error extracting text from https://www.gwjusticejournal.com/single-post/2017/04/09/Redrawing-Districts-and-Redefining-Precedent: 406 Client Error: Not Acceptable for url: https://www.gwjusticejournal.com/single-post/2017/04/09/Redrawing-Districts-and-Redefining-Precedent


Processing URLs:  98%|█████████▊| 984/1000 [38:40<00:17,  1.06s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/50673663.cms?utm_source=contentofinterest&amp;utm_medium=text&amp;utm_campaign=cppst: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/50673663.cms?utm_source=contentofinterest&amp;utm_medium=text&amp;utm_campaign=cppst
URL filtered: http://www.bloombergview.com/articles/2015-10-12/relax-we-ll-survive-china-s-sales-of-u-s-debt-


Processing URLs:  99%|█████████▉| 994/1000 [38:56<00:10,  1.70s/it]

Error extracting text from https://investors.pxd.com/static-files/e33ef32a-6588-4a7d-aad9-9e5ba73c5a49: 403 Client Error: Forbidden for url: https://investors.pxd.com/static-files/e33ef32a-6588-4a7d-aad9-9e5ba73c5a49


Processing URLs: 100%|█████████▉| 999/1000 [39:00<00:00,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-arak-idUSKCN0UQ19G20160112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-arak-idUSKCN0UQ19G20160112


Processing URLs: 100%|██████████| 1000/1000 [39:02<00:00,  2.34s/it]
Processing URLs:   0%|          | 1/1000 [00:00<03:09,  5.28it/s]

Error extracting text from https://www.wsj.com/articles/saudis-boxed-in-by-low-oil-prices-1498141357: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudis-boxed-in-by-low-oil-prices-1498141357


Processing URLs:   0%|          | 5/1000 [00:04<11:47,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-chemicalweapons-idUSKCN0WB1ZI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-chemicalweapons-idUSKCN0WB1ZI


Processing URLs:   1%|          | 7/1000 [01:04<4:19:00, 15.65s/it]

Error extracting text from http://www.jitunews.com/read/29914/baku-tembak-bakal-terjadi-di-perairan-laut-sumbar-ada-apa: HTTPSConnectionPool(host='www.jitunews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   1%|          | 11/1000 [01:09<1:24:35,  5.13s/it]

Error extracting text from http://thehill.com/policy/energy-environment/256513-house-votes-to-end-ban-on-oil-exports: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/256513-house-votes-to-end-ban-on-oil-exports/


Processing URLs:   1%|          | 12/1000 [01:10<1:02:23,  3.79s/it]

Error extracting text from http://currentaffairs.gktoday.in/tags/saarc: 404 Client Error: Not Found for url: https://currentaffairs.gktoday.in/tags/saarc


Processing URLs:   2%|▏         | 15/1000 [01:13<31:46,  1.94s/it]  

Error extracting text from http://www.reuters.com/article/us-eu-dataprotection-usa-accord-idUSKCN0VB1RN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-dataprotection-usa-accord-idUSKCN0VB1RN


Processing URLs:   2%|▏         | 16/1000 [01:13<24:57,  1.52s/it]

Error extracting text from https://theconversation.com/scottish-election-2016-disaster-for-labour-reality-check-for-the-snp-and-the-tories-are-back-59007: 403 Client Error: Forbidden for url: https://theconversation.com/scottish-election-2016-disaster-for-labour-reality-check-for-the-snp-and-the-tories-are-back-59007


Processing URLs:   2%|▏         | 17/1000 [01:14<18:31,  1.13s/it]

Error extracting text from http://www.infotep.gov.do/noticias.php?id=1547: 403 Client Error: Forbidden for url: http://www.infotep.gov.do/noticias.php?id=1547


Processing URLs:   2%|▏         | 19/1000 [01:20<34:12,  2.09s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1318605/000119312516629286/d40004d425.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1318605/000119312516629286/d40004d425.htm


Processing URLs:   2%|▏         | 20/1000 [01:22<36:04,  2.21s/it]

Error extracting text from https://www.reuters.com/world/middle-east/turkish-lira-nears-record-low-after-cenbank-comment-us-genocide-move-2021-04-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/turkish-lira-nears-record-low-after-cenbank-comment-us-genocide-move-2021-04-25/


Processing URLs:   2%|▏         | 23/1000 [01:25<22:11,  1.36s/it]

Error extracting text from http://insideevs.com/nextev-electric-hypercar-targets-mclaren-levels-performance/: 404 Client Error: Not Found for url: https://insideevs.com:443/nextev-electric-hypercar-targets-mclaren-levels-performance/


Processing URLs:   3%|▎         | 26/1000 [01:28<20:41,  1.27s/it]

Error extracting text from http://fuelfix.com/blog/2015/12/04/opec-to-keep-pumping-crude/#36898101=0: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2015/12/04/opec-to-keep-pumping-crude/#36898101=0


Processing URLs:   3%|▎         | 29/1000 [01:32<21:42,  1.34s/it]

Error extracting text from http://www.infectioncontroltoday.com/news/2015/08/ihr-emergency-committee-meets-on-international-spread-of-wild-poliovirus.aspx: 404 Client Error: Not Found for url: https://www.infectioncontroltoday.com/news/2015/08/ihr-emergency-committee-meets-on-international-spread-of-wild-poliovirus.aspx


Processing URLs:   3%|▎         | 31/1000 [01:33<16:19,  1.01s/it]

Error extracting text from http://www.balkaninsight.com/en/article/albanians-use-dutch-ports-to-sneak-into-britain-02-17-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/albanians-use-dutch-ports-to-sneak-into-britain-02-17-2016


Processing URLs:   3%|▎         | 33/1000 [01:35<13:41,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-usa-election-trump-koch-exclusive-idUSMTZSAPEC32FPEGCF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-trump-koch-exclusive-idUSMTZSAPEC32FPEGCF


Processing URLs:   3%|▎         | 34/1000 [01:36<13:08,  1.22it/s]

Error extracting text from http://pressroom.toyota.com/releases/tfs-environmental-green-bond-expansion-may13.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/tfs-environmental-green-bond-expansion-may13/


Processing URLs:   4%|▎         | 36/1000 [01:39<19:14,  1.20s/it]

Error extracting text from https://theconversation.com/barnaby-joyce-scores-dismal-ratings-in-resolve-poll-while-berejiklian-government-easily-in-front-despite-nsw-lockdown-164936: 403 Client Error: Forbidden for url: https://theconversation.com/barnaby-joyce-scores-dismal-ratings-in-resolve-poll-while-berejiklian-government-easily-in-front-despite-nsw-lockdown-164936


Processing URLs:   4%|▎         | 37/1000 [01:40<16:59,  1.06s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Japan-protests-to-China-as-over-200-vessels-spotted-near-Senkakus: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Japan-protests-to-China-as-over-200-vessels-spotted-near-Senkakus


Processing URLs:   4%|▍         | 44/1000 [01:46<10:17,  1.55it/s]

Error extracting text from http://blogs.barrons.com/stockstowatchtoday/2016/07/05/tesla-oh-no-not-again/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/stockstowatchtoday/2016/07/05/tesla-oh-no-not-again/


Processing URLs:   4%|▍         | 45/1000 [01:47<09:13,  1.72it/s]

Error extracting text from https://thehill.com/homenews/administration/546838-state-department-denies-an-olympics-boycott-is-under-consideration: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/546838-state-department-denies-an-olympics-boycott-is-under-consideration/


Processing URLs:   5%|▍         | 47/1000 [01:48<12:32,  1.27it/s]

URL filtered: https://twitter.com/usambun/status/1378340005345685505


Processing URLs:   5%|▍         | 49/1000 [01:49<07:43,  2.05it/s]

Error extracting text from https://fortunedotcom.files.wordpress.com/2015/08/adrev-600x423.jpg?quality=80: 404 Client Error: Not Found for url: https://fortunedotcom.files.wordpress.com/2015/08/adrev-600x423.jpg?quality=80
Error extracting text from http://greece.greekreporter.com/2016/07/28/citigroup-puts-grexit-back-on-the-table-for-next-1-3-years/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/07/28/citigroup-puts-grexit-back-on-the-table-for-next-1-3-years/


Processing URLs:   5%|▌         | 51/1000 [01:49<05:19,  2.97it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN1600V0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKBN1600V0


Processing URLs:   5%|▌         | 53/1000 [01:51<08:55,  1.77it/s]

Error extracting text from https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/05/passenger-economy.pdf?cid=em-elq-26916&utm_source=elq&utm_medium=email&utm_campaign=26916&elq_cid=1494219: 404 Client Error: Not Found for url: https://www.intel.com/content/www/us/en/404.html?ref=https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/05/passenger-economy.pdf?cid=em-elq-26916&utm_source=elq&utm_medium=email&utm_campaign=26916&elq_cid=1494219
Error extracting text from http://www.reuters.com/article/venezuela-debt-idUSL2N1631PV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-debt-idUSL2N1631PV


Processing URLs:   5%|▌         | 54/1000 [01:53<15:21,  1.03it/s]

Error extracting text from https://www.sciencenews.org/article/supersymmetry%E2%80%99s-absence-lhc-puzzles-physicists: 404 Client Error: Not Found for url: https://www.sciencenews.org/article/supersymmetry%E2%80%99s-absence-lhc-puzzles-physicists


Processing URLs:   6%|▌         | 57/1000 [02:06<37:42,  2.40s/it]  

Error extracting text from https://www.wsj.com/articles/nyse-executive-seeks-to-calm-firms-over-stock-trading-curbs-1453334007: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nyse-executive-seeks-to-calm-firms-over-stock-trading-curbs-1453334007
URL filtered: https://www.bloomberg.com/news/articles/2017-08-31/venezuelan-bonds-get-harder-to-trade-as-sanctions-spur-caution


Processing URLs:   6%|▌         | 60/1000 [02:10<27:07,  1.73s/it]

Error extracting text from http://inserbia.info/today/2015/11/montenegro-opposition-well-protest-as-long-as-it-takes/: 404 Client Error: Not Found for url: https://inserbia.info/today/2015/11/montenegro-opposition-well-protest-as-long-as-it-takes/


Processing URLs:   6%|▋         | 65/1000 [02:19<32:35,  2.09s/it]

Error extracting text from http://www.centralamericalink.com/en/News/Panama_Canal_expansion_now_96_percent_complete/: 404 Client Error: Not Found for url: https://www.centralamericalink.com/en/News/Panama_Canal_expansion_now_96_percent_complete/


Processing URLs:   7%|▋         | 69/1000 [03:22<4:52:58, 18.88s/it]

Error extracting text from https://www.massroots.com/news/is-canada-legalizing-cannabis: HTTPSConnectionPool(host='www.massroots.com', port=443): Max retries exceeded with url: /news/is-canada-legalizing-cannabis (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30153a480>, 'Connection to www.massroots.com timed out. (connect timeout=60)'))


Processing URLs:   7%|▋         | 70/1000 [03:22<3:27:54, 13.41s/it]

Error extracting text from http://www.rand.org/blog/2016/09/when-will-we-know-self-driving-cars-are-safe.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2016/09/when-will-we-know-self-driving-cars-are-safe.html
URL filtered: https://www.bloomberg.com/news/articles/2017-07-14/debt-limit-increase-bill-is-said-to-be-voted-on-first-by-senate


Processing URLs:   7%|▋         | 74/1000 [03:26<1:09:19,  4.49s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-manbij-idUSKBN16D25L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-manbij-idUSKBN16D25L
Error extracting text from http://greece.greekreporter.com/2016/12/02/greeces-disabled-people-take-to-the-streets-in-protest-against-cutbacks/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/12/02/greeces-disabled-people-take-to-the-streets-in-protest-against-cutbacks/


Processing URLs:   8%|▊         | 76/1000 [03:27<39:06,  2.54s/it]  

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-2021-poll-shows-nicola-sturgeons-snp-set-to-win-majority-3222029: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-poll-shows-nicola-sturgeons-snp-set-to-win-majority-3222029
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O2M9X96TTDSA01-1N35HC3FJ2TR3HBD1T9VOQP9R6


Processing URLs:   8%|▊         | 79/1000 [03:29<23:13,  1.51s/it]

Error extracting text from http://news.asiaone.com/news/malaysia/malaysian-pm-presents-revised-budget-2016: 404 Client Error: Not Found for url: https://www.asiaone.com/news/news/malaysia/malaysian-pm-presents-revised-budget-2016


Processing URLs:   9%|▊         | 87/1000 [03:37<11:20,  1.34it/s]

URL filtered: https://www.youtube.com/watch?v=Yj718A7_s4A
Error extracting text from https://www.ajot.com/news/a-note-from-the-panama-cana-authority-administrator: 403 Client Error: Forbidden for url: https://www.ajot.com/news/a-note-from-the-panama-cana-authority-administrator


Processing URLs:   9%|▉         | 90/1000 [03:40<14:02,  1.08it/s]

Error extracting text from http://elections.huffingtonpost.com/pollster/polls/arg-23409: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/polls/arg-23409


Processing URLs:   9%|▉         | 92/1000 [03:42<13:35,  1.11it/s]

Error extracting text from http://www.sciencemag.org/news/2014/07/tibetans-inherited-high-altitude-gene-ancient-human: 403 Client Error: Forbidden for url: https://www.science.org/content/article/tibetans-inherited-high-altitude-gene-ancient-human-rev2
URL filtered: http://www.bloomberg.com/markets/economic-calendar


Processing URLs:   9%|▉         | 94/1000 [03:44<13:25,  1.12it/s]

Error extracting text from http://www.israelhayom.com/site/newsletter_article.php?id=38853: 403 Client Error: Forbidden for url: https://www.israelhayom.com/site/newsletter_article.php?id=38853


Processing URLs:  10%|▉         | 98/1000 [03:48<15:04,  1.00s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-06/investors-remain-cautious-as-treasuries-gain-markets-wrap
URL filtered: https://www.youtube.com/watch?v=pWBjl-jPcVM


Processing URLs:  10%|█         | 101/1000 [03:48<08:17,  1.81it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/286128-trump-campaign-vetting-new-jersey-gov-christie-for-vp: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/286128-trump-campaign-vetting-new-jersey-gov-christie-for-vp/


Processing URLs:  11%|█         | 107/1000 [04:08<32:36,  2.19s/it]

Error extracting text from https://balkaneu.com/north-macedonia-census-boycotters-to-face-5-year-imprisonment/: 404 Client Error: Not Found for url: https://balkaneu.com/north-macedonia-census-boycotters-to-face-5-year-imprisonment/


Processing URLs:  11%|█▏        | 113/1000 [04:16<18:49,  1.27s/it]

Error extracting text from http://www.autonews.com/article/20160407/OEM05/304079906/tesla-model-3-preorders-surge-to-325000: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160407/OEM05/304079906/tesla-model-3-preorders-surge-to-325000


Processing URLs:  12%|█▏        | 118/1000 [04:24<21:13,  1.44s/it]

Error extracting text from http://www.newsweek.com/greece-bailout-tsipras-eurogroup-summit-creditors-austerity-527622: 403 Client Error: Forbidden for url: https://www.newsweek.com/greece-bailout-tsipras-eurogroup-summit-creditors-austerity-527622


Processing URLs:  12%|█▏        | 119/1000 [04:25<17:42,  1.21s/it]

Error extracting text from http://www.businessinsider.com.au/chinese-missiles-can-hit-entirety-of-us-2015-5?r=US&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/chinese-missiles-can-hit-entirety-of-us-2015-5?r=US&amp;IR=T


Processing URLs:  12%|█▏        | 121/1000 [04:25<10:09,  1.44it/s]

Error extracting text from https://www.wsj.com/articles/trumps-trade-war-will-be-left-for-biden-to-win-11609682580: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trumps-trade-war-will-be-left-for-biden-to-win-11609682580
Error extracting text from http://globalnation.inquirer.net/148681/ph-coast-guard-to-test-the-waters-in-panatag-shoal#ixzz4P36ETLFp: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/148681/ph-coast-guard-to-test-the-waters-in-panatag-shoal#ixzz4P36ETLFp
URL filtered: https://www.youtube.com/watch?v=Y6rVbJX120I


Processing URLs:  12%|█▏        | 124/1000 [04:28<13:33,  1.08it/s]

URL filtered: https://www.youtube.com/watch?v=2HUCkUDLz3Q


Processing URLs:  13%|█▎        | 128/1000 [04:35<17:53,  1.23s/it]

Error extracting text from http://www.cnbc.com/2016/09/02/reuters-america-update-1-south-africas-cabinet-seeks-inquiry-on-banks-treatment-of-zuma-friends.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/09/02/reuters-america-update-1-south-africas-cabinet-seeks-inquiry-on-banks-treatment-of-zuma-friends.html


Processing URLs:  13%|█▎        | 133/1000 [04:42<18:14,  1.26s/it]

Error extracting text from http://www.nytimes.com/2015/08/01/opinion/revenge-of-the-ideologues-killing-the-export-import-bank.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/01/opinion/revenge-of-the-ideologues-killing-the-export-import-bank.html?_r=0


Processing URLs:  14%|█▎        | 136/1000 [04:45<13:06,  1.10it/s]

Error extracting text from https://www.opendemocracy.net/francis-ghil-s/france-in-syria-under-pressure: 403 Client Error: Forbidden for url: https://www.opendemocracy.net/francis-ghil-s/france-in-syria-under-pressure


Processing URLs:  14%|█▍        | 139/1000 [04:47<09:08,  1.57it/s]

Error extracting text from http://www.reuters.com/article/2015/11/04/us-venezuela-election-advantage-idUSKCN0ST25X20151104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/04/us-venezuela-election-advantage-idUSKCN0ST25X20151104


Processing URLs:  14%|█▍        | 141/1000 [04:50<14:06,  1.02it/s]

Error extracting text from https://nationalinterest.org/blog/buzz/what-known-about-russia%E2%80%99s-secret-advanced-%E2%80%9Ckedr%E2%80%9D-nuclear-missile-184674: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/buzz/what-known-about-russia%E2%80%99s-secret-advanced-%E2%80%9Ckedr%E2%80%9D-nuclear-missile-184674


Processing URLs:  14%|█▍        | 143/1000 [04:54<24:30,  1.72s/it]

Error extracting text from http://www.scpr.org/news/2016/09/30/65145/brown-s-2016-signing-tsunami-key-bills-the-governo/: 403 Client Error: Forbidden for url: https://www.kpcc.org/news/2016/09/30/65145/brown-s-2016-signing-tsunami-key-bills-the-governo/


Processing URLs:  15%|█▍        | 146/1000 [04:58<19:13,  1.35s/it]

Error extracting text from http://www.hematology.org/Newsroom/Press-Releases/2016/6949.aspx: 403 Client Error: Forbidden for url: https://www.hematology.org/Newsroom/Press-Releases/2016/6949.aspx


Processing URLs:  15%|█▍        | 147/1000 [04:58<14:31,  1.02s/it]

Error extracting text from http://www.caranddriver.com/toyota/mirai: 403 Client Error: Forbidden for url: http://www.caranddriver.com/toyota/mirai


Processing URLs:  15%|█▌        | 151/1000 [05:00<08:16,  1.71it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/news_123958.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_123958.htm
URL filtered: https://www.youtube.com/watch?v=bvC_0foemLY
Error extracting text from http://www.reuters.com/article/2015/11/22/us-iran-usa-trial-idUSKBN0TB0GG20151122: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/22/us-iran-usa-trial-idUSKBN0TB0GG20151122


Processing URLs:  15%|█▌        | 152/1000 [05:01<06:50,  2.06it/s]

Error extracting text from http://www.nytimes.com/2016/02/17/world/asia/china-is-arming-south-china-sea-island-us-says.html?emc=edit_th_20160217&amp;nl=todaysheadlines&amp;nlid=71898624&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/17/world/asia/china-is-arming-south-china-sea-island-us-says.html?emc=edit_th_20160217&amp;nl=todaysheadlines&amp;nlid=71898624&amp;_r=0


Processing URLs:  15%|█▌        | 153/1000 [05:01<06:25,  2.20it/s]

Error extracting text from http://www.wsj.com/articles/some-look-to-turnout-figures-for-clues-in-u-k-referendum-vote-1466514810: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/some-look-to-turnout-figures-for-clues-in-u-k-referendum-vote-1466514810
Error extracting text from https://www.france24.com/en/france/20210128-macron-weighs-up-a-third-lockdown-despite-signs-the-french-can-t-take-it-anymore: 403 Client Error: Forbidden for url: https://www.france24.com/en/france/20210128-macron-weighs-up-a-third-lockdown-despite-signs-the-french-can-t-take-it-anymore


Processing URLs:  16%|█▌        | 158/1000 [05:07<15:00,  1.07s/it]

Error extracting text from http://movieweb.com/movies/2016/may/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  16%|█▌        | 160/1000 [05:08<12:16,  1.14it/s]

Error extracting text from http://www.tradingeconomics.com/united-kingdom/inflation-cpi/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-kingdom/inflation-cpi/forecast


Processing URLs:  16%|█▋        | 164/1000 [05:13<14:21,  1.03s/it]

URL filtered: http://www.bloomberg.com/news/articles/2011-12-14/opec-agrees-to-higher-output-limit-to-match-production-accommodate-libya


Processing URLs:  17%|█▋        | 167/1000 [05:16<14:17,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-colombia-rebels-idUSKCN0VX23N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-colombia-rebels-idUSKCN0VX23N


Processing URLs:  17%|█▋        | 171/1000 [05:25<27:23,  1.98s/it]

Error extracting text from http://www.icrc.org/ihl-nat.nsf/162d151af444ded44125673e00508141/aba339f342ad7493c1256bc8004c2772/%24file/constitution%20-%20korea%20-%20en.pdf: 403 Client Error: Forbidden for url: https://www.icrc.org/ihl-nat.nsf/162d151af444ded44125673e00508141/aba339f342ad7493c1256bc8004c2772/%24file/constitution%20-%20korea%20-%20en.pdf


Processing URLs:  17%|█▋        | 174/1000 [05:31<24:55,  1.81s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/60240468.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/60240468.cms


Processing URLs:  18%|█▊        | 175/1000 [05:32<22:31,  1.64s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-09/south-africa-s-gordhan-seeks-to-appease-investors-on-economy


Processing URLs:  18%|█▊        | 178/1000 [05:35<16:10,  1.18s/it]

Error extracting text from http://www.consilium.europa.eu/en/meetings/european-council/2015/12/17-18/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/meetings/european-council/2015/12/17-18/
URL filtered: http://www.bloomberg.com/news/articles/2015-11-10/opec-said-to-consider-new-output-ceiling-as-indonesia-rejoins-igtfm5nj


Processing URLs:  18%|█▊        | 181/1000 [05:35<09:07,  1.50it/s]

Error extracting text from https://www.nytimes.com/2021/09/11/world/un-ambassadors-myanmar-afghanistan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/09/11/world/un-ambassadors-myanmar-afghanistan.html


Processing URLs:  19%|█▊        | 186/1000 [05:53<27:08,  2.00s/it]

Error extracting text from http://www.wsj.com/articles/pro-europe-camp-in-u-k-referendum-tries-to-mobilize-the-youth-vote-1462199117With: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pro-europe-camp-in-u-k-referendum-tries-to-mobilize-the-youth-vote-1462199117With
Error extracting text from http://www.reuters.com/article/us-britain-election-idUSKBN17L0LU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-idUSKBN17L0LU


Processing URLs:  19%|█▊        | 187/1000 [05:55<28:01,  2.07s/it]

Error extracting text from http://www.globalpost.com/article/6652858/2015/09/19/yonhap-interview-us-nuclear-envoy-willing-hold-talks-n-korea-pyongyang: 404 Client Error: Not Found for url: https://theworld.org/article/6652858/2015/09/19/yonhap-interview-us-nuclear-envoy-willing-hold-talks-n-korea-pyongyang
URL filtered: https://www.bloomberg.com/news/articles/2021-06-12/egypt-addresses-letter-to-un-security-council


Processing URLs:  19%|█▉        | 190/1000 [05:56<12:23,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-japan-yen-idUSKCN0ZF2SA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-japan-yen-idUSKCN0ZF2SA


Processing URLs:  19%|█▉        | 192/1000 [06:01<23:20,  1.73s/it]

Error extracting text from http://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151204


Processing URLs:  19%|█▉        | 194/1000 [06:02<15:21,  1.14s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/other/trump_favorableunfavorable-5493.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/other/trump_favorableunfavorable-5493.html


Processing URLs:  20%|█▉        | 196/1000 [06:05<16:19,  1.22s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0YA0QX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0YA0QX


Processing URLs:  20%|█▉        | 198/1000 [06:05<09:42,  1.38it/s]

Error extracting text from https://www.nytimes.com/2017/01/26/world/europe/uk-theresa-may-brexit-bill.html?action=click&contentCollection=Europe&module=RelatedCoverage&region=EndOfArticle&pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/26/world/europe/uk-theresa-may-brexit-bill.html?action=click&contentCollection=Europe&module=RelatedCoverage&region=EndOfArticle&pgtype=article
Error extracting text from http://www.nytimes.com/2016/12/31/world/europe/russia-hacking-alisa-shevchenko.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/31/world/europe/russia-hacking-alisa-shevchenko.html


Processing URLs:  20%|██        | 202/1000 [06:13<19:08,  1.44s/it]

Error extracting text from http://www.dh.gov.hk/english/main/main_cgs/files/Know_more_genetic_diseases_en.pdf: 404 Client Error: Not Found for url: https://www.dh.gov.hk/english/main/main_cgs/files/Know_more_genetic_diseases_en.pdf


Processing URLs:  20%|██        | 203/1000 [06:14<16:50,  1.27s/it]

Error extracting text from http://allafrica.com/stories/201607290097.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607290097.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3024e6a80>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  20%|██        | 205/1000 [06:16<15:30,  1.17s/it]

Error extracting text from https://bit.ly/3mQfjuv: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2020/11/30/beijings_line_on_the_south_china_sea_nothing_to_see_here_651344.html


Processing URLs:  21%|██        | 206/1000 [06:17<14:43,  1.11s/it]

Error extracting text from http://nationalinterest.org/feature/china-should-push-american-style-hegemony-17238: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/china-should-push-american-style-hegemony-17238


Processing URLs:  21%|██        | 208/1000 [06:29<51:14,  3.88s/it]

URL filtered: https://twitter.com/Arianespace/status/1473406135185154053


Processing URLs:  22%|██▏       | 222/1000 [07:05<29:04,  2.24s/it]  

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/August/REINZ%20Monthly%20Property%20Report%20-%20August%202021-1.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/August/REINZ%20Monthly%20Property%20Report%20-%20August%202021-1.pdf
URL filtered: https://www.bloomberg.com/news/articles/2017-03-12/why-scotland-s-independence-is-back-on-the-table-quicktake-q-a


Processing URLs:  22%|██▎       | 225/1000 [07:07<15:57,  1.24s/it]

Error extracting text from https://tradingeconomics.com/qatar/exports: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/qatar/exports


Processing URLs:  23%|██▎       | 226/1000 [07:07<14:49,  1.15s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-05-04/urgent-nobel-literature-prize-will-not-be-awarded-this-year


Processing URLs:  23%|██▎       | 231/1000 [07:15<16:09,  1.26s/it]

Error extracting text from http://uk.businessinsider.com/r-trump-invites-philippines-duterte-to-white-house-during-animated-phone-call-aide-2016-12: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-trump-invites-philippines-duterte-to-white-house-during-animated-phone-call-aide-2016-12


Processing URLs:  23%|██▎       | 233/1000 [07:25<34:48,  2.72s/it]

Error extracting text from http://www.barrons.com/articles/here-comes-20-oil-1454736627: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/here-comes-20-oil-1454736627
URL filtered: https://www.bloombergquint.com/business/iran-says-it-has-seized-an-oil-tanker-in-persian-gulf


Processing URLs:  24%|██▎       | 235/1000 [07:28<28:08,  2.21s/it]

URL filtered: https://twitter.com/leonidvolkov/status/966611666178924544


Processing URLs:  24%|██▍       | 240/1000 [07:35<18:59,  1.50s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O02L4A6S972A01-4PKVIH6LIS97LLRHJ6FCP284P9


Processing URLs:  24%|██▍       | 243/1000 [07:38<14:22,  1.14s/it]

Error extracting text from http://www.wsj.com/articles/inside-verizons-gamble-on-digital-media-1470099397: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/inside-verizons-gamble-on-digital-media-1470099397


Processing URLs:  24%|██▍       | 244/1000 [07:38<12:06,  1.04it/s]

Error extracting text from https://www.whitehouse.gov/the-press-office/2017/01/25/executive-order-border-security-and-immigration-enforcement-improvements: 404 Client Error: Not Found for url: https://www.whitehouse.gov/the-press-office/2017/01/25/executive-order-border-security-and-immigration-enforcement-improvements


Processing URLs:  25%|██▍       | 246/1000 [07:39<09:45,  1.29it/s]

Error extracting text from http://www.gunviolencearchive.org/reports/mass-shooting: 403 Client Error: Forbidden for url: http://www.gunviolencearchive.org/reports/mass-shooting


Processing URLs:  25%|██▌       | 250/1000 [07:44<11:43,  1.07it/s]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough?page=2: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough?page=2


Processing URLs:  25%|██▌       | 251/1000 [07:44<11:13,  1.11it/s]

Error extracting text from https://www.newsweek.com/china-russia-north-korea-iran-build-ties-un-friends-feud-us-1578169: 403 Client Error: Forbidden for url: https://www.newsweek.com/china-russia-north-korea-iran-build-ties-un-friends-feud-us-1578169
URL filtered: http://www.bloomberg.com/energy


Processing URLs:  25%|██▌       | 253/1000 [07:48<15:31,  1.25s/it]

Error extracting text from http://www.accuweather.com/en/weather-news/major-winter-storm-snow-turkey-istanbul/54536271: 403 Client Error: Forbidden for url: http://www.accuweather.com/en/weather-news/major-winter-storm-snow-turkey-istanbul/54536271


Processing URLs:  26%|██▌       | 256/1000 [07:49<10:39,  1.16it/s]

Error extracting text from http://wirelessgoodness.com/2015/10/30/admirals-for-tense-talks-on-warship6603/: 404 Client Error: Not Found for url: http://wirelessgoodness.com/2015/10/30/admirals-for-tense-talks-on-warship6603/


Processing URLs:  26%|██▌       | 257/1000 [07:49<08:52,  1.39it/s]

Error extracting text from https://www.nytimes.com/2017/07/26/us/politics/jeff-sessions-trump-mccabe.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/26/us/politics/jeff-sessions-trump-mccabe.html


Processing URLs:  26%|██▌       | 262/1000 [07:57<17:51,  1.45s/it]

Error extracting text from http://russia-insider.com/en/users/andrew-korybko: 503 Server Error: Service Unavailable for url: https://russia-insider.com/en/users/andrew-korybko


Processing URLs:  27%|██▋       | 266/1000 [08:02<14:00,  1.15s/it]

Error extracting text from https://www.theranos.com/: HTTPSConnectionPool(host='www.theranos.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2feeb5d90>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  27%|██▋       | 267/1000 [08:03<13:32,  1.11s/it]

Error extracting text from http://theloadstar.co.uk/coolstar/perishables-shippers-expect-an-enlarged-panama-canal-to-offer-fresh-transhipment-options/: 404 Client Error: Not Found for url: https://theloadstar.com/coolstar/perishables-shippers-expect-an-enlarged-panama-canal-to-offer-fresh-transhipment-options/
URL filtered: https://www.youtube.com/watch?v=t-wFKNy0MZQ


Processing URLs:  27%|██▋       | 272/1000 [08:08<13:57,  1.15s/it]

URL filtered: https://www.youtube.com/watch?v=eesUdFlYMh8
Error extracting text from https://www.nord-stream2.com/nord-stream-2-is-a-european-collaboration/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /nord-stream-2-is-a-european-collaboration/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300c926f0>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  28%|██▊       | 277/1000 [08:11<10:45,  1.12it/s]

Error extracting text from https://isp.netscape.com/politics/story/0002/20160324/L5N16W39V_1342247189: 404 Client Error: 404 for url: https://isp.netscape.com/politics/story/0002/20160324/L5N16W39V_1342247189


Processing URLs:  28%|██▊       | 281/1000 [08:14<07:37,  1.57it/s]

Error extracting text from http://www.nytimes.com/2016/02/06/world/asia/myanmar-military-aung-san-suu-kyi-president.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/06/world/asia/myanmar-military-aung-san-suu-kyi-president.html


Processing URLs:  29%|██▉       | 289/1000 [08:30<15:04,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-politics-erdogan-europe-idUSKBN17R2GG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-politics-erdogan-europe-idUSKBN17R2GG


Processing URLs:  29%|██▉       | 291/1000 [08:33<16:44,  1.42s/it]

Error extracting text from http://www.thenational.ae/opinion/comment/jordan-ponders-a-change-of-course-on-syria: 404 Client Error: Not Found for url: https://www.thenationalnews.com/opinion/comment/jordan-ponders-a-change-of-course-on-syria/
URL filtered: https://www.youtube.com/watch?v=DkRDmJpthXg


Processing URLs:  29%|██▉       | 293/1000 [08:35<13:49,  1.17s/it]

Error extracting text from https://uk.reuters.com/article/uk-germany-politics/german-spd-leaders-aim-to-improve-on-coalition-deal-with-merkel-idUKKBN1F30KM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  30%|██▉       | 296/1000 [08:36<08:52,  1.32it/s]

Error extracting text from https://www.wsj.com/articles/oil-watchers-see-little-price-gains-for-2017-1491392085: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-watchers-see-little-price-gains-for-2017-1491392085
Error extracting text from https://www.reuters.com/article/us-art-auction-da-vinci-abudhabi/abu-dhabi-to-acquire-leonardo-da-vincis-salvator-mundi-christies-idUSKBN1E22IN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-art-auction-da-vinci-abudhabi/abu-dhabi-to-acquire-leonardo-da-vincis-salvator-mundi-christies-idUSKBN1E22IN


Processing URLs:  30%|██▉       | 298/1000 [08:41<16:59,  1.45s/it]

Error extracting text from http://www.buenosairesherald.com/article/222172/colombians-expected-to-say-%E2%80%98yes%E2%80%99-to-peace: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/222172/colombians-expected-to-say-%E2%80%98yes%E2%80%99-to-peace


Processing URLs:  30%|███       | 305/1000 [08:59<22:26,  1.94s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014359/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014359/


Processing URLs:  31%|███       | 309/1000 [09:11<25:06,  2.18s/it]

Error extracting text from https://www.yahoo.com/news/egypts-sisi-says-putin-ready-host-mideast-peace-095146679.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/egypts-sisi-says-putin-ready-host-mideast-peace-095146679.html


Processing URLs:  32%|███▏      | 316/1000 [09:26<19:54,  1.75s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/china-urges-us-to-act-and-speak-cautiously-on-south-china-sea/3462708.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/china-urges-us-to-act-and-speak-cautiously-on-south-china-sea/3462708.html
Error extracting text from http://www.reuters.com/article/us-usa-fcc-neutrality-idUSKBN19X268: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fcc-neutrality-idUSKBN19X268
Error extracting text from http://www.reuters.com/article/us-brazil-security-idUSKCN0XC01G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-security-idUSKCN0XC01G


Processing URLs:  32%|███▏      | 319/1000 [09:27<11:35,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/11/13/world/asia/taliban-shooting-policemen.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/13/world/asia/taliban-shooting-policemen.html
URL filtered: https://www.youtube.com/watch?v=a01QQZyl-_I


Processing URLs:  32%|███▏      | 322/1000 [09:29<07:41,  1.47it/s]

Error extracting text from https://www.transportation.gov/briefing-room/us-transportation-secretary-anthony-foxx-announces-unmanned-aircraft-registration: 403 Client Error: Forbidden for url: https://www.transportation.gov/briefing-room/us-transportation-secretary-anthony-foxx-announces-unmanned-aircraft-registration


Processing URLs:  32%|███▏      | 323/1000 [09:31<11:55,  1.06s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-03/venezuela-leader-hooked-on-china-cash-spurs-9-000-mile-journey


Processing URLs:  33%|███▎      | 328/1000 [09:39<14:40,  1.31s/it]

Error extracting text from https://nyti.ms/3rFC0oI: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/22/world/europe/ukraine-russia-coup-britain.html


Processing URLs:  33%|███▎      | 331/1000 [09:42<15:08,  1.36s/it]

URL filtered: https://mobile.reuters.com/article/amp/idUSKBN28W214?il=0&amp;__twitter_impression=true


Processing URLs:  33%|███▎      | 334/1000 [09:45<12:45,  1.15s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-03-31/china-seen-surpassing-u-s-as-top-oil-importer-this-year-chart


Processing URLs:  34%|███▎      | 336/1000 [09:47<10:45,  1.03it/s]

Error extracting text from http://www.newsweek.com/increased-diversity-sparked-voters-implicit-racial-biases-study-539089?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/increased-diversity-sparked-voters-implicit-racial-biases-study-539089?rx=us


Processing URLs:  34%|███▎      | 337/1000 [09:48<11:43,  1.06s/it]

Error extracting text from https://osesgy.unmissions.org/briefing-un-special-envoy-yemen-open-session-security-council: HTTPSConnectionPool(host='osesgy.unmissions.org', port=443): Max retries exceeded with url: /briefing-un-special-envoy-yemen-open-session-security-council (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  34%|███▍      | 340/1000 [09:55<15:56,  1.45s/it]

Error extracting text from http://www.nytimes.com/2016/04/21/upshot/the-loophole-that-could-cost-donald-trump-the-nomination.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/21/upshot/the-loophole-that-could-cost-donald-trump-the-nomination.html
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;ie=UTF8&amp;prev=_t&amp;rurl=translate.google.com&amp;sl=auto&amp;tl=en&amp;u=http://ceo.gov.af/fa/news/69343&amp;usg=ALkJrhiEfK_pBFphQSiyf0IP22OA0VWeuA: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;ie=UTF8&amp;prev=_t&amp;rurl=translate.google.com&amp;sl=auto&amp;tl=en&amp;u=http://ceo.gov.af/fa/news/69343&amp;usg=ALkJrhiEfK_pBFphQSiyf0IP22OA0VWeuA


Processing URLs:  34%|███▍      | 342/1000 [09:56<11:55,  1.09s/it]

Error extracting text from https://bit.ly/3jmm06r: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy-report/2021/february-2021


Processing URLs:  35%|███▍      | 349/1000 [10:02<10:01,  1.08it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/359014-nato-pressing-forward-on-cyber-defense-official-says: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/359014-nato-pressing-forward-on-cyber-defense-official-says/


Processing URLs:  35%|███▌      | 350/1000 [10:04<12:10,  1.12s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-15/venezuela-s-maduro-digs-in-for-a-fight-bondholders-are-worried-


Processing URLs:  35%|███▌      | 354/1000 [10:10<16:37,  1.54s/it]

Error extracting text from https://www.ad-hoc-news.de/wirtschaft/berlin-auf-dem-weg-zur-endgueltigen-inbetriebnahme-der-umstrittenen/62084134: 404 Client Error: Not Found for url: https://www.ad-hoc-news.de/wirtschaft/berlin-auf-dem-weg-zur-endgueltigen-inbetriebnahme-der-umstrittenen/62084134


Processing URLs:  36%|███▌      | 358/1000 [10:13<09:03,  1.18it/s]

Error extracting text from http://thehill.com/policy/energy-environment/319488-trump-signs-repeal-of-transparency-rule-for-oil-companies: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/319488-trump-signs-repeal-of-transparency-rule-for-oil-companies/
Error extracting text from http://www.reuters.com/article/us-switzerland-malaysia-investigation-idUSKCN0V8069: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-switzerland-malaysia-investigation-idUSKCN0V8069


Processing URLs:  36%|███▌      | 360/1000 [10:15<09:04,  1.17it/s]

Error extracting text from http://www.nytimes.com/2017/01/04/us/affordable-care-act-congress-repeal-plan.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/01/04/us/affordable-care-act-congress-repeal-plan.html?_r=0


Processing URLs:  36%|███▋      | 363/1000 [10:19<10:03,  1.05it/s]

Error extracting text from https://go.berniesanders.com/page/event/detail/canvass/4rdjm: 403 Client Error: Forbidden for url: https://berniesanders.com/?nosplash


Processing URLs:  36%|███▋      | 364/1000 [10:19<09:11,  1.15it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-05-25/key-questions-raised-by-the-2-trillion-saudi-wealth-fund-plan


Processing URLs:  37%|███▋      | 370/1000 [10:27<11:39,  1.11s/it]

URL filtered: https://www.zdnet.com/article/trump-takes-aim-at-section-230-with-lawsuit-against-facebook-twitter-and-google/


Processing URLs:  37%|███▋      | 373/1000 [10:29<08:45,  1.19it/s]

Error extracting text from http://news.xinhuanet.com/english/2016-02/16/c_135101222.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/16/c_135101222.htm


Processing URLs:  38%|███▊      | 376/1000 [10:35<13:28,  1.30s/it]

Error extracting text from https://www.ipsos.com/ipsos-mori/en-uk/six-ten-britons-think-keir-starmer-has-done-bad-job-setting-out-clear-alternative-government: 403 Client Error: Forbidden for url: https://www.ipsos.com/ipsos-mori/en-uk/six-ten-britons-think-keir-starmer-has-done-bad-job-setting-out-clear-alternative-government


Processing URLs:  38%|███▊      | 382/1000 [10:42<10:02,  1.03it/s]

Error extracting text from https://www.nytimes.com/2021/04/01/world/asia/myanmar-journalists-arrests.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/01/world/asia/myanmar-journalists-arrests.html


Processing URLs:  38%|███▊      | 384/1000 [15:50<15:49:01, 92.44s/it]

Error extracting text from http://www.japantoday.com/category/politics/view/abe-to-seek-broader-support-for-security-laws-in-upcoming-summits: 404 Client Error: Not Found for url: https://japantoday.com/category/politics/abe-to-seek-broader-support-for-security-laws-in-upcoming-summits


Processing URLs:  39%|███▊      | 386/1000 [15:55<7:54:33, 46.37s/it] 

Error extracting text from http://www.tradingeconomics.com/euro-area/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/euro-area/rating


Processing URLs:  39%|███▉      | 390/1000 [16:59<4:30:44, 26.63s/it]

Error extracting text from http://www.seattletimes.com/nation-world/muslim-labour-leader-favored-to-win-london-mayoral-race/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  39%|███▉      | 392/1000 [17:00<2:34:23, 15.24s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-19/oil-holds-below-49-as-u-s-rig-gain-counters-opec-output-cuts


Processing URLs:  39%|███▉      | 394/1000 [17:03<1:33:00,  9.21s/it]

URL filtered: https://www.facebook.com/NotoWahabism/photos/a.446760528785868.1073741828.446357405492847/835280296600554/?type=3&amp;theater


Processing URLs:  40%|███▉      | 397/1000 [17:04<46:47,  4.66s/it]  

Error extracting text from http://www.consilium.europa.eu/en/meetings/european-council/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/meetings/european-council/


Processing URLs:  40%|████      | 401/1000 [17:10<24:15,  2.43s/it]

Error extracting text from http://www.citizenside.com/en/videos/news/2016-05-17/130546/spain-activists-hang-banner-against-the-ttip-in.html: 404 Client Error: Not Found for url: https://citizenside.com/en/videos/news/2016-05-17/130546/spain-activists-hang-banner-against-the-ttip-in.html


Processing URLs:  40%|████      | 402/1000 [17:12<22:26,  2.25s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/30/751015/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/30/751015/story.html


Processing URLs:  41%|████      | 408/1000 [17:16<07:32,  1.31it/s]

Error extracting text from http://thehill.com/policy/defense/297618-us-marine-general-predicts-mosul-offensive-is-near: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/297618-us-marine-general-predicts-mosul-offensive-is-near/
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16P00D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16P00D


Processing URLs:  41%|████      | 409/1000 [17:19<14:15,  1.45s/it]

Error extracting text from http://www.wrapsnet.org/Reports/InteractiveReporting/tabid/393/EnumType/Report/Default.aspx?ItemPath=/rpt_WebArrivalsReports/MX%20-%20Arrivals%20by%20Nationality%20and%20Religion: 404 Client Error: Not Found for url: https://www.wrapsnet.org/Reports/InteractiveReporting/tabid/393/EnumType/Report/Default.aspx?ItemPath=/rpt_WebArrivalsReports/MX%20-%20Arrivals%20by%20Nationality%20and%20Religion


Processing URLs:  41%|████      | 410/1000 [17:21<13:46,  1.40s/it]

Error extracting text from https://reut.rs/3p4p75e: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-eu/eu-states-should-recognise-guaido-as-venezuelas-leader-eu-lawmakers-to-say-idUSKBN29Q21E?il=0


Processing URLs:  41%|████▏     | 413/1000 [17:25<14:28,  1.48s/it]

URL filtered: https://www.youtube.com/watch?v=jb2XxImw24c
Error extracting text from http://blogs.wsj.com/washwire/2015/10/08/white-house-touts-joe-biden-as-hillary-clinton-diverges-from-obama-legacy/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/10/08/white-house-touts-joe-biden-as-hillary-clinton-diverges-from-obama-legacy/


Processing URLs:  42%|████▏     | 421/1000 [17:38<15:13,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-usa-idUSKCN0WD2IC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-usa-idUSKCN0WD2IC


Processing URLs:  42%|████▏     | 423/1000 [17:39<10:34,  1.10s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=54268#.V2kUypPhBIZ: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=54268#.V2kUypPhBIZ


Processing URLs:  43%|████▎     | 427/1000 [17:42<07:52,  1.21it/s]

URL filtered: https://www.youtube.com/watch?v=WBnSv3a6Nh4
Error extracting text from https://thenewdaily.com.au/news/coronavirus/2021/07/04/coronavirus-origin-covid-19/: 403 Client Error: Forbidden for url: https://thenewdaily.com.au/news/coronavirus/2021/07/04/coronavirus-origin-covid-19/


Processing URLs:  43%|████▎     | 428/1000 [17:44<09:19,  1.02it/s]

Error extracting text from http://www.foxbusiness.com/markets/2015/11/23/americans-to-spend-25-more-over-thanksgiving-weekend-versus-last-year-says/: 404 Client Error: Not Found for url: https://www.foxbusiness.com/markets/2015/11/23/americans-to-spend-25-more-over-thanksgiving-weekend-versus-last-year-says/


Processing URLs:  43%|████▎     | 431/1000 [17:45<05:48,  1.63it/s]

URL filtered: https://www.bloomberg.com/professional/blog/etf-approval-times-shortened-costs-slashed-sec-2019-plan/


Processing URLs:  44%|████▎     | 435/1000 [17:52<15:00,  1.59s/it]

Error extracting text from http://www.isn.ethz.ch/Digital-Library/Articles/Detail/?ots591=4888caa0-b3db-1461-98b9-e20e7b9c13d4&amp;lng=en&amp;id=196286: 404 Client Error: Not found UA for url: https://css.ethz.ch/en/services.html


Processing URLs:  44%|████▎     | 436/1000 [17:53<12:37,  1.34s/it]

Error extracting text from http://aranews.net/2016/03/islamic-state-kills-us-soldier-rocket-attack/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/islamic-state-kills-us-soldier-rocket-attack/


Processing URLs:  44%|████▎     | 437/1000 [17:53<09:44,  1.04s/it]

Error extracting text from https://www.wsj.com/articles/top-u-s-general-readies-military-plan-for-north-korea-but-pushes-for-diplomacy-1502626832: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/top-u-s-general-readies-military-plan-for-north-korea-but-pushes-for-diplomacy-1502626832


Processing URLs:  44%|████▍     | 438/1000 [17:54<10:06,  1.08s/it]

Error extracting text from http://www.nytimes.com/2016/09/29/world/middleeast/obama-troops-iraq.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/29/world/middleeast/obama-troops-iraq.html


Processing URLs:  44%|████▍     | 442/1000 [18:03<18:38,  2.00s/it]

Error extracting text from http://www.state.gov/t/avc/rls/2015/242610.htm: 404 Client Error: Not Found for url: https://www.state.gov/t/avc/rls/2015/242610.htm


Processing URLs:  44%|████▍     | 443/1000 [18:05<17:29,  1.88s/it]

Error extracting text from http://www.ibtimes.com/russian-hacking-group-sandworm-targeted-us-knocking-out-power-ukraine-2257194: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russian-hacking-group-sandworm-targeted-us-knocking-out-power-ukraine-2257194
Error extracting text from http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016#sthash.zoQQmpVQ.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016#sthash.zoQQmpVQ.dpuf
URL filtered: https://twitter.com/VerdictRetail/status/745173429448409088


Processing URLs:  45%|████▌     | 452/1000 [18:24<20:39,  2.26s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-astana-idUSKBN15L0E9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-astana-idUSKBN15L0E9


Processing URLs:  45%|████▌     | 453/1000 [18:24<15:23,  1.69s/it]

Error extracting text from http://www.wsj.com/articles/polarized-over-polls-internet-vs-phone-1454063404: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/polarized-over-polls-internet-vs-phone-1454063404


Processing URLs:  45%|████▌     | 454/1000 [18:26<13:58,  1.54s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-02/manufacturing-pmis-fed-rate-hike-odds-increase-turkey-election-what-people-in-markets-are-talking-about


Processing URLs:  46%|████▌     | 459/1000 [18:37<16:41,  1.85s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/North-Korea-notifies-UN-agency-of-satellite-launch: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/North-Korea-notifies-UN-agency-of-satellite-launch


Processing URLs:  46%|████▌     | 460/1000 [18:39<16:18,  1.81s/it]

Error extracting text from http://triblive.com/news/regional/10999212-74/testing-ohio-driving: 403 Client Error: Forbidden for url: http://triblive.com/news/regional/10999212-74/testing-ohio-driving


Processing URLs:  47%|████▋     | 467/1000 [18:50<12:16,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=zCorSqB-UQw


Processing URLs:  47%|████▋     | 472/1000 [18:58<12:02,  1.37s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/congress-final-stage-talks-massive-budget-tax-bills-35709143: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/congress-final-stage-talks-massive-budget-tax-bills-35709143


Processing URLs:  47%|████▋     | 474/1000 [19:01<12:53,  1.47s/it]

Error extracting text from http://nationalinterest.org/feature/keep-troop-levels-steady-afghanistan-16450: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/keep-troop-levels-steady-afghanistan-16450


Processing URLs:  48%|████▊     | 475/1000 [19:02<11:06,  1.27s/it]

Error extracting text from http://www.israelpolicyforum.org/news/ipf-briefing-gary-samore: 403 Client Error: Forbidden for url: http://www.israelpolicyforum.org/news/ipf-briefing-gary-samore


Processing URLs:  48%|████▊     | 479/1000 [19:10<17:21,  2.00s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-06/saudi-aramco-ipo-will-offer-stake-in-all-of-company-s-operations


Processing URLs:  48%|████▊     | 483/1000 [19:14<11:08,  1.29s/it]

Error extracting text from http://warontherocks.com/2016/01/10000-wont-do-it-the-mathematics-of-an-american-deployment-to-fight-isil/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/01/10000-wont-do-it-the-mathematics-of-an-american-deployment-to-fight-isil/


Processing URLs:  49%|████▉     | 490/1000 [19:26<10:57,  1.29s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN1A202Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN1A202Z


Processing URLs:  49%|████▉     | 491/1000 [19:26<08:16,  1.02it/s]

Error extracting text from http://english.alarabiya.net/en/perspective/features/2016/03/18/Syria-s-Alawites-bear-brunt-of-Assad-after-soaring-death-toll-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/perspective/features/2016/03/18/Syria-s-Alawites-bear-brunt-of-Assad-after-soaring-death-toll-.html


Processing URLs:  49%|████▉     | 493/1000 [19:29<08:49,  1.04s/it]

Error extracting text from http://www.un.org/esa/desa/papers/2014/wp133_2014.pdf: 403 Client Error: Forbidden for url: https://www.un.org/esa/desa/papers/2014/wp133_2014.pdf


Processing URLs:  49%|████▉     | 494/1000 [19:30<08:46,  1.04s/it]

Error extracting text from https://www.newsweek.com/joe-biden-was-chairman-who-ruled-out-tighter-controls-hunters-china-business-tony-bobulinski-1542687: 403 Client Error: Forbidden for url: https://www.newsweek.com/joe-biden-was-chairman-who-ruled-out-tighter-controls-hunters-china-business-tony-bobulinski-1542687


Processing URLs:  50%|████▉     | 495/1000 [20:30<2:37:31, 18.72s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/election/article48406280.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  50%|████▉     | 496/1000 [20:31<1:53:13, 13.48s/it]

Error extracting text from https://www.lesswrong.com/posts/XfHXQPPKNY8BXkn72/honoring-petrov-day-on-lesswrong-in-2020?commentId=CviMXu8BciCqcSMKJ#Relating_to_the_End_of_Humanity: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/XfHXQPPKNY8BXkn72/honoring-petrov-day-on-lesswrong-in-2020?commentId=CviMXu8BciCqcSMKJ#Relating_to_the_End_of_Humanity


Processing URLs:  50%|█████     | 501/1000 [20:54<54:32,  6.56s/it]  

Error extracting text from http://thehill.com/homenews/administration/344443-trump-russia-was-against-trump-in-the-2016-election: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/344443-trump-russia-was-against-trump-in-the-2016-election/


Processing URLs:  50%|█████     | 503/1000 [20:58<33:58,  4.10s/it]

Error extracting text from https://www.yahoo.com/news/iran-guards-warn-saudis-over-gulf-war-games-093502033.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/iran-guards-warn-saudis-over-gulf-war-games-093502033.html


Processing URLs:  50%|█████     | 504/1000 [20:59<25:06,  3.04s/it]

Error extracting text from http://thehill.com/homenews/senate/342222-warner-unbelievable-kushner-and-trump-jr-didnt-tell-trump-about-meeting: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/342222-warner-unbelievable-kushner-and-trump-jr-didnt-tell-trump-about-meeting/


Processing URLs:  50%|█████     | 505/1000 [20:59<18:14,  2.21s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/09/06/asia-pacific/kim-stresses-need-bolster-nuclear-forces-north-test-fires-missiles-sea-japan/#.V87la_krK00: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/09/06/asia-pacific/kim-stresses-need-bolster-nuclear-forces-north-test-fires-missiles-sea-japan/#.V87la_krK00


Processing URLs:  51%|█████     | 506/1000 [21:00<13:49,  1.68s/it]

Error extracting text from http://seekingalpha.com/article/4018774-iran-fails-attract-international-oil-companies-oil-markets-daily: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4018774-iran-fails-attract-international-oil-companies-oil-markets-daily


Processing URLs:  51%|█████▏    | 514/1000 [21:10<07:09,  1.13it/s]

Error extracting text from https://www.betbrain.nl/cricket/world/icc-world-twenty20/: HTTPSConnectionPool(host='www.betbrain.nl', port=443): Max retries exceeded with url: /cricket/world/icc-world-twenty20/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3054a40e0>: Failed to resolve 'www.betbrain.nl' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-usa-sanctions-iran-idUSKCN0Y92WI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-sanctions-iran-idUSKCN0Y92WI


Processing URLs:  52%|█████▏    | 516/1000 [21:13<09:55,  1.23s/it]

Error extracting text from http://www.ibtimes.com/iraqi-forces-capture-ramadi-isis-look-toward-fallujah-mosul-2240563&gt: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iraqi-forces-capture-ramadi-isis-look-toward-fallujah-mosul-2240563&gt


Processing URLs:  52%|█████▏    | 518/1000 [21:16<10:58,  1.37s/it]

Error extracting text from http://www.aol.com/article/2014/01/28/ipo-profit-without-being-rich-insider/20814862/: 404 Client Error: Not Found for url: https://www.aol.com/article/2014/01/28/ipo-profit-without-being-rich-insider/20814862/


Processing URLs:  52%|█████▏    | 520/1000 [21:18<09:40,  1.21s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-syria-resolution-idUKKBN17702O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  52%|█████▏    | 522/1000 [21:20<07:04,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-dijsselbloem-idUSKBN1321SO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-dijsselbloem-idUSKBN1321SO


Processing URLs:  52%|█████▏    | 524/1000 [21:21<05:45,  1.38it/s]

Error extracting text from http://www.wsj.com/articles/u-s-pursues-several-paths-in-volkswagen-probe-1444938924: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-pursues-several-paths-in-volkswagen-probe-1444938924


Processing URLs:  52%|█████▎    | 525/1000 [21:21<04:36,  1.72it/s]

Error extracting text from http://www.wsj.com/articles/china-sets-economic-growth-target-of-6-5-to-7-for-2016-1457137605: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-sets-economic-growth-target-of-6-5-to-7-for-2016-1457137605


Processing URLs:  53%|█████▎    | 527/1000 [21:27<14:53,  1.89s/it]

Error extracting text from http://www.buenosairesherald.com/article/214590/recount-confirms-fujimori%E2%80%99s-majority-in-peru-congress: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/214590/recount-confirms-fujimori%E2%80%99s-majority-in-peru-congress


Processing URLs:  53%|█████▎    | 528/1000 [21:28<13:21,  1.70s/it]

Error extracting text from https://www.naij.com/823481-food-scarcity-imminent-nigeria-constant-fulani-herdsmen-attacks.html: 410 Client Error: Gone for url: https://www.legit.ng/823481-food-scarcity-imminent-nigeria-constant-fulani-herdsmen-attacks.html


Processing URLs:  53%|█████▎    | 530/1000 [21:32<13:45,  1.76s/it]

Error extracting text from http://www.gov.me/en/News/154541/Podgorica-Montenegro-18-November-2015-Montenegro-s-membership-in-the-European-Union-and-NATO-are-our-strategic-priorities-They-a.html: 404 Client Error: not found for url: https://www.gov.me/en/News/154541/Podgorica-Montenegro-18-November-2015-Montenegro-s-membership-in-the-European-Union-and-NATO-are-our-strategic-priorities-They-a.html
Error extracting text from http://same.mind: HTTPConnectionPool(host='same.mind', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3052c3800>: Failed to resolve 'same.mind' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  53%|█████▎    | 534/1000 [22:35<1:07:45,  8.72s/it]

Error extracting text from http://www.foxnews.com/world/2015/10/11/iran-reportedly-test-fires-new-long-range-ballistic-missile/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/10/11/iran-reportedly-test-fires-new-long-range-ballistic-missile/
Error extracting text from http://www.keeptalkinggreece.com/2016/06/08/greek-finmin-once-qe-for-greek-bonds-grexit-will-be-off-the-table/: 403 Client Error: Forbidden for url: http://www.keeptalkinggreece.com/2016/06/08/greek-finmin-once-qe-for-greek-bonds-grexit-will-be-off-the-table/


Processing URLs:  54%|█████▎    | 536/1000 [22:36<36:31,  4.72s/it]  

Error extracting text from http://www.reuters.com/article/us-spacex-blast-idUSKCN11T26P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spacex-blast-idUSKCN11T26P


Processing URLs:  54%|█████▎    | 537/1000 [22:36<26:45,  3.47s/it]

Error extracting text from https://psmag.com/brazil-s-billion-dollar-gym-experiment-ca00f3f4091b: 403 Client Error: Forbidden for url: https://psmag.com/brazil-s-billion-dollar-gym-experiment-ca00f3f4091b


Processing URLs:  54%|█████▍    | 538/1000 [22:40<26:49,  3.48s/it]

Error extracting text from http://ewn.co.za/2018/01/16/opinion-ramaphosa-should-end-presidential-merry-go-round-in-south-africa: 404 Client Error: Not Found for url: https://www.ewn.co.za/2018/01/16/opinion-ramaphosa-should-end-presidential-merry-go-round-in-south-africa


Processing URLs:  54%|█████▍    | 541/1000 [22:45<18:26,  2.41s/it]

Error extracting text from https://www.iea.org/oilmarketreport/omrpublic/currentreport/: 404 Client Error: Not Found for url: https://www.iea.org/oilmarketreport/omrpublic/currentreport/


Processing URLs:  54%|█████▍    | 544/1000 [22:47<09:15,  1.22s/it]

Error extracting text from https://www.nytimes.com/2021/10/14/science/bat-coronaviruses-lab-leak.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/10/14/science/bat-coronaviruses-lab-leak.html
Error extracting text from http://www.nytimes.com/2015/10/30/us/politics/trump-and-carson-underwhelm-iowa-republicans-in-debate.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/30/us/politics/trump-and-carson-underwhelm-iowa-republicans-in-debate.html?_r=0


Processing URLs:  55%|█████▍    | 549/1000 [22:56<12:12,  1.62s/it]

Error extracting text from https://www.justsecurity.org/30205/deterrence-indictment/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/30205/deterrence-indictment/


Processing URLs:  55%|█████▌    | 550/1000 [22:58<12:21,  1.65s/it]

Error extracting text from https://www.reuters.com/world/middle-east/gaza-ceasefire-holding-egyptian-mediators-consult-hamas-israel-2021-05-22/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/gaza-ceasefire-holding-egyptian-mediators-consult-hamas-israel-2021-05-22/


Processing URLs:  55%|█████▌    | 554/1000 [23:02<09:17,  1.25s/it]

Error extracting text from http://www.valuewalk.com/2016/12/venezuela-liquidity-crisis/: 404 Client Error: Not Found for url: https://www.valuewalk.com/venezuela-liquidity-crisis
Error extracting text from https://www.npd.com/wps/portal/npd/us/news/press-releases/2021/2021-is-shaping-up-to-be-a-very-good-year-for-young-adult-fiction/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/news/press-releases/2021/2021-is-shaping-up-to-be-a-very-good-year-for-young-adult-fiction/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:  56%|█████▌    | 555/1000 [23:04<10:49,  1.46s/it]

Error extracting text from http://www.ibtimes.com/egyptair-mechanic-suspected-russian-plane-crash-sources-2285675: 403 Client Error: Forbidden for url: https://www.ibtimes.com/egyptair-mechanic-suspected-russian-plane-crash-sources-2285675


Processing URLs:  56%|█████▌    | 556/1000 [23:06<11:27,  1.55s/it]

Error extracting text from http://en.abna24.com/service/europe/archive/2016/05/31/757509/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/europe/archive/2016/05/31/757509/story.html


Processing URLs:  56%|█████▌    | 557/1000 [23:07<09:42,  1.31s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2016/02/07/0401000000AEN20160207001700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  56%|█████▌    | 561/1000 [23:12<08:36,  1.18s/it]

Error extracting text from https://finance.yahoo.com/news/oil-prices-climb-strong-fuel-190000118.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/oil-prices-climb-strong-fuel-190000118.html


Processing URLs:  56%|█████▌    | 562/1000 [23:12<06:28,  1.13it/s]

Error extracting text from http://www.barrons.com/articles/russia-venezuela-citgo-debt-national-security-issue-mnuchin-says-1495134008: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/russia-venezuela-citgo-debt-national-security-issue-mnuchin-says-1495134008


Processing URLs:  56%|█████▋    | 565/1000 [23:16<08:37,  1.19s/it]

Error extracting text from http://remotecontrolproject.org/wp-content/uploads/2016/01/Hostile-use-of-drones-report_open-briefing.pdf: 401 Client Error: Unauthorized for url: https://www.oxfordresearchgroup.org.uk/wp-content/uploads/2016/01/Hostile-use-of-drones-report_open-briefing.pdf


Processing URLs:  57%|█████▋    | 566/1000 [23:20<14:54,  2.06s/it]

Error extracting text from http://www.bna.com/environmental-riders-omnibus-n57982063504/: 403 Client Error: Forbidden for url: https://www.bloombergindustry.com/


Processing URLs:  57%|█████▋    | 567/1000 [23:21<11:52,  1.65s/it]

Error extracting text from http://www.newsroompanama.com/business/panama-4/predatory-billing-by-canal-expansion-contractor: HTTPSConnectionPool(host='newsroompanama.combusiness', port=443): Max retries exceeded with url: /panama-4/predatory-billing-by-canal-expansion-contractor (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3052c3830>: Failed to resolve 'newsroompanama.combusiness' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  57%|█████▋    | 570/1000 [23:22<05:47,  1.24it/s]

Error extracting text from http://www.nejm.org/doi/full/10.1056/NEJMoa1602412?query=featured_home: 403 Client Error: Forbidden for url: http://www.nejm.org/doi/full/10.1056/NEJMoa1602412?query=featured_home
Error extracting text from http://adage.com/article/media/time-magazine-redesigns-website-newsweek-returns-print/292000/: 403 Client Error: Forbidden for url: https://adage.com/article/media/time-magazine-redesigns-website-newsweek-returns-print/292000/


Processing URLs:  57%|█████▋    | 572/1000 [23:24<05:32,  1.29it/s]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-qatar-explainer-2017-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-qatar-explainer-2017-story.html


Processing URLs:  57%|█████▋    | 573/1000 [23:24<04:56,  1.44it/s]

Error extracting text from http://fr.allafrica.com/stories/201510280571.html: HTTPConnectionPool(host='fr.allafrica.com', port=80): Max retries exceeded with url: /stories/201510280571.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3054a5640>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  57%|█████▊    | 575/1000 [23:27<06:07,  1.16it/s]

Error extracting text from http://www.wsj.com/articles/sen-kaine-seen-as-clintons-vp-pick-1469144626: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sen-kaine-seen-as-clintons-vp-pick-1469144626


Processing URLs:  58%|█████▊    | 578/1000 [23:31<09:03,  1.29s/it]

Error extracting text from http://screenrant.com/captain-america-civil-war-opening-weekend/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  58%|█████▊    | 581/1000 [23:35<07:59,  1.14s/it]

Error extracting text from http://www.straitstimes.com/business/economy/imf-urges-fed-to-delay-rate-hike-until-inflation-evident: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  58%|█████▊    | 582/1000 [23:36<06:03,  1.15it/s]

Error extracting text from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2303048: 403 Client Error: Forbidden for url: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2303048


Processing URLs:  58%|█████▊    | 585/1000 [23:39<07:06,  1.03s/it]

Error extracting text from http://in.reuters.com/article/burundi-politics-idINKCN0VM0JC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
URL filtered: https://www.bloomberg.com/news/articles/2017-03-02/oil-holds-losses-after-u-s-crude-inventories-expand-to-record


Processing URLs:  59%|█████▉    | 590/1000 [23:43<05:18,  1.29it/s]

Error extracting text from http://www.scotsman.com/news/politics/second-scottish-independence-referendum-key-questions-answered-1-4263813: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/second-scottish-independence-referendum-key-questions-answered-1-4263813


Processing URLs:  59%|█████▉    | 593/1000 [23:47<07:04,  1.04s/it]

Error extracting text from https://www.afghanistan-analysts.org/a-matter-of-registration-factional-tensions-in-hezb-e-islami/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/a-matter-of-registration-factional-tensions-in-hezb-e-islami/
Error extracting text from https://www.reuters.com/business/energy/exclusive-germany-seeks-regulatory-assurances-nord-stream-2-cant-rule-out-2021-10-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/exclusive-germany-seeks-regulatory-assurances-nord-stream-2-cant-rule-out-2021-10-04/


Processing URLs:  60%|█████▉    | 598/1000 [23:53<10:26,  1.56s/it]

Error extracting text from http://www.poandpo.com/politics/france-sees-little-chance-for-ttip-deal-by-yearend-28-4-2016/: 404 Client Error: Not Found for url: https://www.poandpo.com/politics/france-sees-little-chance-for-ttip-deal-by-yearend-28-4-2016/


Processing URLs:  60%|█████▉    | 599/1000 [23:54<07:54,  1.18s/it]

Error extracting text from https://au.news.yahoo.com/world/a/33386546/1-600-kurdish-peshmerga-killed-in-mosul-offensive-spokesman/#page1: 404 Client Error: Not Found for url: https://au.news.yahoo.com/1-600-kurdish-peshmerga-killed-in-mosul-offensive-spokesman-33386546.html#page1


Processing URLs:  60%|██████    | 602/1000 [24:00<10:42,  1.61s/it]

Error extracting text from https://phys.org/news/2017-05-lawmakers-internet-privacy-repealing.html#jCp: 400 Client Error: Bad request for url: https://phys.org/news/2017-05-lawmakers-internet-privacy-repealing.html#jCp


Processing URLs:  60%|██████    | 605/1000 [24:05<09:44,  1.48s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC515119/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC515119/


Processing URLs:  61%|██████    | 608/1000 [24:08<08:29,  1.30s/it]

Error extracting text from http://luraypagefreepress.com/world/iraqi-forces-launche-offensive-to-recapture-mosul/1432: 404 Client Error: Not Found for url: https://www.luraypagefreepress.com/world/iraqi-forces-launche-offensive-to-recapture-mosul/1432


Processing URLs:  61%|██████    | 610/1000 [25:10<2:03:21, 18.98s/it]

Error extracting text from https://www.google.com/amp/s/www.usnews.com/news/world-report/articles/2021-04-26/putin-agrees-to-meet-biden-as-west-seeks-to-deescalate-russian-aggression%3fcontext=amp: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 615/1000 [25:17<27:32,  4.29s/it]  

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-extradites-belgian-drugs-trafficing-suspect-11-23-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-extradites-belgian-drugs-trafficing-suspect-11-23-2015
URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/myanmar-president-assures-suu-kyi-of-peaceful-power-transfer


Processing URLs:  62%|██████▏   | 619/1000 [25:23<14:42,  2.32s/it]

Error extracting text from http://www.modestoradiomuseum.org/images/the%20shadow.JPG: 404 Client Error: Not Found for url: https://modestoradiomuseum.org/images/the%20shadow.JPG
Error extracting text from http://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212#3P4tbaruFIt00Zym.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212#3P4tbaruFIt00Zym.97


Processing URLs:  62%|██████▏   | 621/1000 [25:24<10:08,  1.61s/it]

URL filtered: https://twitter.com/pilotmsv/status/1513131559834079234
URL filtered: https://www.youtube.com/watch?v=2uDsnvznyMw


Processing URLs:  62%|██████▏   | 624/1000 [25:27<07:33,  1.21s/it]

Error extracting text from https://www.reuters.com/article/us-russia-nato/russia-may-have-tested-cyber-warfare-on-latvia-western-officials-say-idUSKBN1CA142: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-nato/russia-may-have-tested-cyber-warfare-on-latvia-western-officials-say-idUSKBN1CA142


Processing URLs:  63%|██████▎   | 631/1000 [25:38<11:20,  1.84s/it]

Error extracting text from https://www.wsj.com/articles/amid-low-prices-oil-giants-gush-about-breaking-even-1509010205: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/amid-low-prices-oil-giants-gush-about-breaking-even-1509010205


Processing URLs:  63%|██████▎   | 632/1000 [25:39<08:53,  1.45s/it]

Error extracting text from https://www.whitehouse.gov/the-press-office/2017/01/30/presidential-executive-order-reducing-regulation-and-controlling: 404 Client Error: Not Found for url: https://www.whitehouse.gov/the-press-office/2017/01/30/presidential-executive-order-reducing-regulation-and-controlling


Processing URLs:  64%|██████▎   | 636/1000 [25:41<04:30,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/the-race-to-create-elon-musks-hyperloop-heats-up-1448899356?: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-race-to-create-elon-musks-hyperloop-heats-up-1448899356
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-syria-assad-idUSKCN0WV1RG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-syria-assad-idUSKCN0WV1RG


Processing URLs:  64%|██████▍   | 638/1000 [25:48<12:38,  2.10s/it]

Error extracting text from https://rbth.com/news/2016/05/28/us-eroding-inf-treaty-with-intermediate-range-missiles-putin_598145: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/05/28/us-eroding-inf-treaty-with-intermediate-range-missiles-putin_598145


Processing URLs:  64%|██████▍   | 639/1000 [25:51<15:08,  2.52s/it]

Error extracting text from https://www.reuters.com/article/us-usa-stocks-tesla/teslas-market-value-zooms-past-that-of-gm-and-ford-combined-idUSKBN1Z72MU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-tesla/teslas-market-value-zooms-past-that-of-gm-and-ford-combined-idUSKBN1Z72MU


Processing URLs:  64%|██████▍   | 642/1000 [25:53<08:18,  1.39s/it]

Error extracting text from https://balkaninsight.com/2021/09/16/north-macedonias-headcount-slowed-by-technical-glitches/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/09/16/north-macedonias-headcount-slowed-by-technical-glitches/
URL filtered: https://www.youtube.com/watch?v=alkhi0jouFE


Processing URLs:  64%|██████▍   | 644/1000 [25:53<05:12,  1.14it/s]

Error extracting text from https://www.nytimes.com/2021/05/03/nyregion/new-york-city-subway-reopen.html?campaign_id=57&amp;emc=edit_ne_20210503&amp;instance_id=30164&amp;nl=evening-briefing&amp;regi_id=54222971&amp;segment_id=57137&amp;te=1&amp;user_id=9c6bc82f1d7e69dac5b2dde75ca776a7: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/03/nyregion/new-york-city-subway-reopen.html?campaign_id=57&amp;emc=edit_ne_20210503&amp;instance_id=30164&amp;nl=evening-briefing&amp;regi_id=54222971&amp;segment_id=57137&amp;te=1&amp;user_id=9c6bc82f1d7e69dac5b2dde75ca776a7


Processing URLs:  64%|██████▍   | 645/1000 [26:54<1:25:03, 14.38s/it]

Error extracting text from http://kremlin.ru/events/president/transcripts/56354: HTTPConnectionPool(host='kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  65%|██████▍   | 649/1000 [27:01<30:45,  5.26s/it]  

Error extracting text from https://www.cia.gov/the-world-factbook/countries/venezuela/#economy: 403 Client Error: Forbidden for url: https://www.cia.gov/the-world-factbook/countries/venezuela/#economy


Processing URLs:  65%|██████▌   | 650/1000 [27:01<22:21,  3.83s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-voters-insight-idUSKCN1AU164: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-voters-insight-idUSKCN1AU164


Processing URLs:  66%|██████▌   | 655/1000 [27:08<09:53,  1.72s/it]

Error extracting text from http://news.yahoo.com/latest-turkey-ground-troops-syria-not-agenda-130740850.html: 404 Client Error: Not Found for url: http://news.yahoo.com/latest-turkey-ground-troops-syria-not-agenda-130740850.html


Processing URLs:  66%|██████▌   | 656/1000 [27:10<09:15,  1.61s/it]

Error extracting text from https://obamawhitehouse.archives.gov/node/323681: 404 Client Error: Not Found for url: https://obamawhitehouse.archives.gov/node/323681


Processing URLs:  67%|██████▋   | 666/1000 [27:36<10:28,  1.88s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20150312/1105150686.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20150312/1105150686.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKCN0VY17G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKCN0VY17G


Processing URLs:  67%|██████▋   | 669/1000 [27:38<05:42,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/10/23/us-mideast-crisis-talks-idUSKCN0SH1LN20151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/us-mideast-crisis-talks-idUSKCN0SH1LN20151023


Processing URLs:  67%|██████▋   | 672/1000 [27:43<07:16,  1.33s/it]

Error extracting text from http://www.manilatimes.net/breaking_news/italys-renzi-to-make-landmark-visit-to-iran/: 404 Client Error: Not Found for url: https://www.manilatimes.net/breaking_news/italys-renzi-to-make-landmark-visit-to-iran/
URL filtered: https://www.youtube.com/watch?v=lP5Xv7QqXiM


Processing URLs:  67%|██████▋   | 674/1000 [27:43<04:13,  1.28it/s]

Error extracting text from http://afghanistantimes.af/over-52pc-people-wont-vote-in-parliamentary-polls-survey/: 403 Client Error: Forbidden for url: https://afghanistantimes.af/over-52pc-people-wont-vote-in-parliamentary-polls-survey/


Processing URLs:  68%|██████▊   | 675/1000 [27:46<07:02,  1.30s/it]

Error extracting text from http://smile.amazon.com/Missing-Man-American-Vanished-Iran/dp/0374210454: 500 Server Error: Internal Server Error for url: https://www.amazon.com:443/Missing-Man-American-Vanished-Iran/dp/0374210454
Error extracting text from https://www.reuters.com/article/us-usa-russia-meeting-idUSKBN19K2OF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-meeting-idUSKBN19K2OF


Processing URLs:  68%|██████▊   | 678/1000 [27:56<11:53,  2.21s/it]

Error extracting text from http://www.nytimes.com/2016/01/15/world/nuclear-threat-initiative-cyberattack-study.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/15/world/nuclear-threat-initiative-cyberattack-study.html?_r=0


Processing URLs:  69%|██████▉   | 688/1000 [28:12<05:51,  1.13s/it]

Error extracting text from https://www.wsj.com/articles/venezuelan-bond-prices-slide-as-nicolas-maduro-seeks-to-restructure-debt-1509722070: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelan-bond-prices-slide-as-nicolas-maduro-seeks-to-restructure-debt-1509722070
Error extracting text from http://www.latimes.com/nation/la-na-republicans-iowa-caucuses-20160201-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-republicans-iowa-caucuses-20160201-story.html


Processing URLs:  69%|██████▉   | 691/1000 [28:14<04:03,  1.27it/s]

Error extracting text from http://www.todayonline.com/world/swiss-detain-brazilian-citizen-petrobras-probe: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/swiss-detain-brazilian-citizen-petrobras-probe
Error extracting text from http://www.reuters.com/article/us-mideast-security-carrier-idUSKBN16T2CZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-security-carrier-idUSKBN16T2CZ


Processing URLs:  70%|██████▉   | 695/1000 [29:20<1:36:52, 19.06s/it]

Error extracting text from http://www.usnews.com/news/articles/2015/12/09/pentagon-admits-syrian-russian-opposition-scuttles-no-fly-zone: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  70%|██████▉   | 698/1000 [29:29<43:05,  8.56s/it]  

Error extracting text from https://reaganlibrary.archives.gov/archives/speeches/1984/102184b.htm: 404 Client Error: Not Found for url: https://www.reaganlibrary.gov/archives/speeches/1984/102184b.htm


Processing URLs:  70%|██████▉   | 699/1000 [29:30<31:42,  6.32s/it]

Error extracting text from http://phys.org/news/2016-08-physicists-discovery-nature.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-08-physicists-discovery-nature.html


Processing URLs:  70%|███████   | 702/1000 [29:50<25:54,  5.21s/it]

Error extracting text from https://www.nytimes.com/2021/02/19/world/asia/china-olympics-boycott.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/19/world/asia/china-olympics-boycott.html


Processing URLs:  70%|███████   | 703/1000 [29:50<18:53,  3.82s/it]

Error extracting text from https://www.un.org/en/ga/76/meetings/: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/76/meetings/
URL filtered: http://www.bloomberg.com/news/articles/2016-01-27/montenegro-pm-wins-vote-of-confidence-after-offering-concessions
URL filtered: https://www.bloomberg.com/news/articles/2021-06-30/u-s-taiwan-to-talk-chips-vaccines-as-long-stalled-talks-begin
Error extracting text from http://blogs.barrons.com/techtraderdaily/2015/12/14/apple-hit-by-trifecta-of-iphone-estimate-cuts-but-bluefin-is-upbeat/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/techtraderdaily/2015/12/14/apple-hit-by-trifecta-of-iphone-estimate-cuts-but-bluefin-is-upbeat/
URL filtered: https://www.linkedin.com/in/warren-hatch/


Processing URLs:  71%|███████   | 709/1000 [29:54<07:34,  1.56s/it]

Error extracting text from http://cherna.gora.me/news/montenegros-accession-protocol-presented-in-the-parliament-of-the-united-kingdom/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/montenegros-accession-protocol-presented-in-the-parliament-of-the-united-kingdom/


Processing URLs:  71%|███████▏  | 713/1000 [30:59<1:17:24, 16.18s/it]

Error extracting text from http://kremlin.ru/acts/constitution: HTTPConnectionPool(host='kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  72%|███████▏  | 715/1000 [31:03<46:01,  9.69s/it]  

Error extracting text from http://tass.ru/en/defense/873724: 404 Client Error: Not Found for url: https://tass.ru/en/defense/873724


Processing URLs:  72%|███████▏  | 716/1000 [31:04<33:41,  7.12s/it]

Error extracting text from https://constitutioncenter.org/blog/10-huge-supreme-court-cases-about-the-14th-amendment: 403 Client Error: Forbidden for url: https://constitutioncenter.org/blog/10-huge-supreme-court-cases-about-the-14th-amendment


Processing URLs:  72%|███████▏  | 721/1000 [31:14<14:34,  3.14s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-07/south-korea-to-inflict-k-pop-blasts-on-kim-jong-un-for-nuke-test


Processing URLs:  73%|███████▎  | 726/1000 [31:19<06:18,  1.38s/it]

Error extracting text from https://www.timesofisrael.com/jewish-homes-bennett-says-hell-run-for-pm-after-netanyahu/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/jewish-homes-bennett-says-hell-run-for-pm-after-netanyahu/


Processing URLs:  73%|███████▎  | 728/1000 [32:20<1:21:51, 18.06s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2021-02-18/biden-administration-ready-to-meet-with-iran: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 729/1000 [32:22<1:00:48, 13.46s/it]

Error extracting text from http://www.globeatnight.org/infographic: 404 Client Error: Not Found for url: https://globeatnight.org/infographic/


Processing URLs:  73%|███████▎  | 733/1000 [32:32<20:59,  4.72s/it]  

URL filtered: http://lobelog.com/whatever-happened-to-the-iranian-parliaments-jcpoas-review/?utm_content=buffer958a4&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  74%|███████▍  | 741/1000 [32:43<05:35,  1.30s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-migrants-europe-comment-20b1ecd6-e561-11e5-a9ce-681055c7a05f-20160308-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-migrants-europe-comment-20b1ecd6-e561-11e5-a9ce-681055c7a05f-20160308-story.html
Error extracting text from http://www.nytimes.com/2015/12/11/business/international/vw-emissions-scandal.html?emc=edit_th_20151211&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/11/business/international/vw-emissions-scandal.html?emc=edit_th_20151211&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  74%|███████▍  | 743/1000 [32:47<06:57,  1.63s/it]

Error extracting text from https://www.sec.gov/cgi-bin/browse-edgar?company=AirBnB&match=&CIK=&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany: 403 Client Error: Forbidden for url: https://www.sec.gov/cgi-bin/browse-edgar?company=AirBnB&match=&CIK=&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany


Processing URLs:  74%|███████▍  | 745/1000 [33:01<16:45,  3.94s/it]

Error extracting text from http://www.heraldsun.com.au/news/world/russia-is-sending-ships-loaded-with-tanks-and-trucks-to-support-regime-of-president-assad-in-syria/story-: 404 Client Error: Not Found for url: https://www.heraldsun.com.au/news/world/russia-is-sending-ships-loaded-with-tanks-and-trucks-to-support-regime-of-president-assad-in-syria/story-?nk=23631955cae6c62e59a8b89ce563ce89-1706837248


Processing URLs:  75%|███████▍  | 747/1000 [33:04<10:53,  2.58s/it]

Error extracting text from http://www.imdb.com/title/tt0475784/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0475784/
URL filtered: https://www.youtube.com/watch?v=R6_eWWfNB54


Processing URLs:  75%|███████▌  | 752/1000 [33:11<06:49,  1.65s/it]

Error extracting text from http://news.yahoo.com/saudi-executes-47-including-top-shiite-cleric-075812543.html?soc_src=mediacontentsharebuttons&amp;soc_trk=tw#: 404 Client Error: Not Found for url: http://news.yahoo.com/saudi-executes-47-including-top-shiite-cleric-075812543.html?soc_src=mediacontentsharebuttons&amp;soc_trk=tw
URL filtered: https://www.bloomberg.com/news/articles/2021-04-07/snp-seen-winning-record-majority-in-scottish-vote-poll-shows


Processing URLs:  76%|███████▌  | 757/1000 [33:17<04:53,  1.21s/it]

Error extracting text from https://www.reuters.com/business/energy/us-shale-producers-signal-more-oil-coming-opec-counts-restraint-2021-11-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/us-shale-producers-signal-more-oil-coming-opec-counts-restraint-2021-11-03/


Processing URLs:  76%|███████▌  | 759/1000 [33:20<05:15,  1.31s/it]

URL filtered: https://twitter.com/unhumanrights/status/1438435977224146948


Processing URLs:  76%|███████▌  | 761/1000 [33:21<03:53,  1.03it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-05/tesla-challenger-in-china-promises-to-debut-106-000-e-roadster
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-agreement-idUSKCN0VK2NT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-agreement-idUSKCN0VK2NT


Processing URLs:  77%|███████▋  | 767/1000 [33:25<03:36,  1.07it/s]

Error extracting text from http://thehill.com/homenews/campaign/363874-trump-looks-to-boost-moore-with-friday-rally: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/363874-trump-looks-to-boost-moore-with-friday-rally/


Processing URLs:  77%|███████▋  | 768/1000 [33:26<03:20,  1.16it/s]

Error extracting text from http://www.sify.com/news/pakistan-sri-lanka-urge-end-to-saarc-stalemate-news-others-rcjmamahahdji.html: 403 Client Error: Forbidden for url: https://www.sify.com/news/pakistan-sri-lanka-urge-end-to-saarc-stalemate-news-others-rcjmamahahdji.html


Processing URLs:  77%|███████▋  | 769/1000 [33:26<02:54,  1.33it/s]

Error extracting text from http://www.truthdig.com/report/item/nearly_500_us_troops_sent_iraq_mosul_attack_advance_election_day_20160910: 403 Client Error: Forbidden for url: http://www.truthdig.com/report/item/nearly_500_us_troops_sent_iraq_mosul_attack_advance_election_day_20160910


Processing URLs:  77%|███████▋  | 772/1000 [33:29<03:00,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-italy-politics-iran-idUSKCN0X119T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-iran-idUSKCN0X119T


Processing URLs:  77%|███████▋  | 773/1000 [33:29<02:24,  1.57it/s]

Error extracting text from https://www.nytimes.com/2017/07/06/science/cern-quarks-charm-baryon.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/06/science/cern-quarks-charm-baryon.html


Processing URLs:  78%|███████▊  | 778/1000 [33:46<11:55,  3.22s/it]

Error extracting text from https://www.espn.com/olympics/story/_/id/31514222/major-japan-newspaper-demands-tokyo-olympics-canceled: 403 Client Error: Forbidden for url: https://www.espn.com/olympics/story/_/id/31514222/major-japan-newspaper-demands-tokyo-olympics-canceled


Processing URLs:  78%|███████▊  | 784/1000 [33:50<03:08,  1.15it/s]

Error extracting text from https://medium.com/@geekcoders/building-a-social-network-with-scala-neo4j-ffba884f5120#.icd3yy2dz: 403 Client Error: Forbidden for url: https://medium.com/@geekcoders/building-a-social-network-with-scala-neo4j-ffba884f5120#.icd3yy2dz
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN0UC0JP20151229: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN0UC0JP20151229


Processing URLs:  78%|███████▊  | 785/1000 [33:50<02:19,  1.54it/s]

Error extracting text from http://www.reuters.com/article/2015/11/04/us-brazil-rousseff-accounts-idUSKCN0ST2AN20151104#ufmUlVfHG2Hd2GdT.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/04/us-brazil-rousseff-accounts-idUSKCN0ST2AN20151104#ufmUlVfHG2Hd2GdT.97


Processing URLs:  79%|███████▊  | 786/1000 [33:52<03:30,  1.02it/s]

Error extracting text from http://www.pravdareport.com/video/26-10-2016/136000-usa-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/video/26-10-2016/136000-usa-0/


Processing URLs:  79%|███████▉  | 789/1000 [33:55<02:54,  1.21it/s]

Error extracting text from http://blogs.wsj.com/korearealtime/2014/11/05/speculation-about-ban-ki-moon-running-for-south-korea-presidency-heats-up/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/korearealtime/2014/11/05/speculation-about-ban-ki-moon-running-for-south-korea-presidency-heats-up/
Error extracting text from https://www.rottentomatoes.com/m/logan_2017: 403 Client Error: Forbidden for url: https://www.rottentomatoes.com/m/logan_2017


Processing URLs:  79%|███████▉  | 792/1000 [33:59<03:44,  1.08s/it]

Error extracting text from https://www.smithandcrown.com/index/: 404 Client Error: Not Found for url: https://www.smithandcrown.com/index


Processing URLs:  79%|███████▉  | 793/1000 [34:02<05:30,  1.60s/it]

Error extracting text from http://rbth.com/news/2016/12/02/turkish-parliament-ratifies-deal-with-russian-on-building-turkish-stream_652887: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/12/02/turkish-parliament-ratifies-deal-with-russian-on-building-turkish-stream_652887


Processing URLs:  80%|███████▉  | 795/1000 [34:04<04:23,  1.28s/it]

Error extracting text from http://icitech.org/know-your-enemies-2-0/: 404 Client Error: Not Found for url: https://www.icitech.org/know-your-enemies-2-0
Error extracting text from http://www.reuters.com/article/us-france-iran-idUSKBN15E29Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-iran-idUSKBN15E29Q


Processing URLs:  80%|████████  | 800/1000 [34:10<04:40,  1.40s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-26/saudi-arabia-says-aramco-ipo-on-track-as-it-weighs-best-approach


Processing URLs:  80%|████████  | 803/1000 [34:13<03:18,  1.01s/it]

Error extracting text from http://www.wsj.com/articles/taliban-reels-from-leaders-death-in-u-s-drone-strike-1464008252: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/taliban-reels-from-leaders-death-in-u-s-drone-strike-1464008252


Processing URLs:  80%|████████  | 805/1000 [34:23<10:36,  3.27s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-11/morgan-stanley-sees-20-a-barrel-oil-on-u-s-dollar-appreciation-
Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-24-april-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-24-april-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761f050>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  81%|████████  | 809/1000 [34:36<09:50,  3.09s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-january-5-2016: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-january-5-2016


Processing URLs:  81%|████████  | 811/1000 [34:37<06:37,  2.10s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN13T09N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKBN13T09N


Processing URLs:  81%|████████  | 812/1000 [34:40<07:21,  2.35s/it]

Error extracting text from http://38north.org/2016/09/wmckinney091516/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  82%|████████▏ | 817/1000 [34:49<05:57,  1.95s/it]

Error extracting text from http://www.orion-strategies.com/pages/about.html: 404 Client Error: Not Found for url: https://www.orion-strategies.com/pages/about.html


Processing URLs:  82%|████████▏ | 818/1000 [34:49<04:25,  1.46s/it]

Error extracting text from https://www.nytimes.com/2017/01/17/us/politics/congressional-budget-office-affordable-care-act.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/17/us/politics/congressional-budget-office-affordable-care-act.html
URL filtered: https://twitter.com/ESA_Webb/status/1474290218178170894


Processing URLs:  82%|████████▏ | 821/1000 [34:50<02:28,  1.20it/s]

Error extracting text from http://www.greaterkashmir.com/news/world/india-says-no-change-in-position-on-indus-water-treaty/244604.html: 403 Client Error: Forbidden for url: https://www.greaterkashmir.com/news/world/india-says-no-change-in-position-on-indus-water-treaty/244604.html


Processing URLs:  82%|████████▏ | 823/1000 [34:52<02:24,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-panama-canal-idUSKCN0VD07T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-panama-canal-idUSKCN0VD07T


Processing URLs:  82%|████████▎ | 825/1000 [34:55<03:37,  1.24s/it]

URL filtered: https://www.rferl.org/a/facebook-advertisement-russia-us-election/28720809.html


Processing URLs:  83%|████████▎ | 827/1000 [34:57<03:06,  1.08s/it]

Error extracting text from https://www.dni.gov/files/documents/Newsroom/Testimonies/2018-ATA: 404 Client Error: Not Found for url: https://www.dni.gov/files/documents/Newsroom/Testimonies/2018-ATA


Processing URLs:  83%|████████▎ | 829/1000 [34:57<01:59,  1.43it/s]

Error extracting text from https://www.yahoo.com/news/furore-over-brexit-vote-registration-crash-084829208.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/furore-over-brexit-vote-registration-crash-084829208.html
Error extracting text from https://www.axios.com/biden-student-debt-50k-10k-cancel-forgive-klain-e41886f7-fc1a-42d6-b2bc-ce4cc1c3ac2b.html: 403 Client Error: Forbidden for url: https://www.axios.com/biden-student-debt-50k-10k-cancel-forgive-klain-e41886f7-fc1a-42d6-b2bc-ce4cc1c3ac2b.html


Processing URLs:  84%|████████▎ | 837/1000 [35:10<04:28,  1.65s/it]

Error extracting text from http://www.commoditytrademantra.com/crude-oil-trading/doha-is-done-saudi-prince-says-no-oil-deal-without-iran/: 406 Client Error: Not Acceptable for url: http://www.commoditytrademantra.com/crude-oil-trading/doha-is-done-saudi-prince-says-no-oil-deal-without-iran/


Processing URLs:  84%|████████▍ | 839/1000 [36:11<50:39, 18.88s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2017-02-07/assad-says-eu-should-have-no-role-in-syrias-reconstruction: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  84%|████████▍ | 843/1000 [36:17<14:49,  5.66s/it]

Error extracting text from https://abc7.com/terrorism-threat-terror-attacks-extremists/10638417/: 404 Client Error: Not Found for url: https://abc7.com/terrorism-threat-terror-attacks-extremists/10638417/


Processing URLs:  85%|████████▍ | 848/1000 [36:22<04:54,  1.94s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13940713001523: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940713001523 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30601fe60>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▌ | 858/1000 [36:28<01:01,  2.33it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics-spd/german-spd-leader-quits-in-bid-to-calm-party-after-coalition-deal-with-merkel-idUSKBN1FX1AA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-spd/german-spd-leader-quits-in-bid-to-calm-party-after-coalition-deal-with-merkel-idUSKBN1FX1AA?il=0
Error extracting text from http://news.yahoo.com/rouhani-iran-ready-hold-talks-syria-us-saudi-084102993.html: 404 Client Error: Not Found for url: http://news.yahoo.com/rouhani-iran-ready-hold-talks-syria-us-saudi-084102993.html
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-eurogroup-idUSKCN0XP2O8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-eurogroup-idUSKCN0XP2O8


Processing URLs:  87%|████████▋ | 866/1000 [36:42<03:32,  1.59s/it]

Error extracting text from http://www.g8.utoronto.ca/blogs/160526-welch.html: 404 Client Error: Not Found for url: http://www.g8.utoronto.ca/blogs/160526-welch.html


Processing URLs:  87%|████████▋ | 872/1000 [36:48<02:12,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/clock-is-ticking-for-time-inc-s-ceo-1438051150: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/clock-is-ticking-for-time-inc-s-ceo-1438051150


Processing URLs:  87%|████████▋ | 873/1000 [36:49<01:51,  1.14it/s]

Error extracting text from https://www.chathamhouse.org/expert/comment/brexit-clouds-ttip-negotiations-may-not-scupper-deal: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/expert/comment/brexit-clouds-ttip-negotiations-may-not-scupper-deal


Processing URLs:  88%|████████▊ | 875/1000 [36:53<03:06,  1.49s/it]

Error extracting text from https://www.reuters.com/article/us-usa-afghanistan-trump-idUSKBN1AM0F5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-afghanistan-trump-idUSKBN1AM0F5


Processing URLs:  88%|████████▊ | 878/1000 [36:58<03:27,  1.70s/it]

Error extracting text from http://thebulletin.org/implementing-iran-deal-will-banks-do-business: 404 Client Error: Not Found for url: https://thebulletin.org/implementing-iran-deal-will-banks-do-business/


Processing URLs:  88%|████████▊ | 879/1000 [37:58<34:14, 16.98s/it]

Error extracting text from http://www.miamiherald.com/opinion/op-ed/article95671077.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  88%|████████▊ | 885/1000 [38:07<06:24,  3.35s/it]

Error extracting text from http://www.wsj.com/articles/japan-protests-after-chinese-vessels-sail-near-disputed-east-china-sea-islands-1470468543: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/japan-protests-after-chinese-vessels-sail-near-disputed-east-china-sea-islands-1470468543


Processing URLs:  89%|████████▉ | 888/1000 [38:13<03:52,  2.08s/it]

Error extracting text from http://www.nytimes.com/2013/02/14/business/global/obama-pledges-trade-pact-talks-with-eu.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2013/02/14/business/global/obama-pledges-trade-pact-talks-with-eu.html


Processing URLs:  89%|████████▉ | 890/1000 [38:15<03:02,  1.66s/it]

Error extracting text from http://www.lseg.com/resources/media-centre/press-releases/british-columbia-issues-world%E2%80%99s-first-masala-bond-foreign-government-entity-london-stock-exchange: 404 Client Error: Not Found for url: https://www.lseg.com/resources/media-centre/press-releases/british-columbia-issues-world%E2%80%99s-first-masala-bond-foreign-government-entity-london-stock-exchange


Processing URLs:  89%|████████▉ | 892/1000 [38:20<03:24,  1.90s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-18/facebook-and-google-helped-anti-refugee-campaign-in-swing-states


Processing URLs:  90%|████████▉ | 895/1000 [38:20<01:29,  1.18it/s]

Error extracting text from https://www.atptour.com/en/news/nadal-alcaraz-madrid-2022-friday: 403 Client Error: Forbidden for url: https://www.atptour.com/en/news/nadal-alcaraz-madrid-2022-friday
Error extracting text from http://www.reuters.com/article/us-un-secretarygeneral-portugal-idUSKCN0W22LZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-secretarygeneral-portugal-idUSKCN0W22LZ


Processing URLs:  90%|████████▉ | 896/1000 [38:20<01:08,  1.52it/s]

Error extracting text from http://www.nytimes.com/2016/04/19/world/africa/burundi-is-torturing-prisoners-in-crackdown-on-dissent-united-nations-says.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/19/world/africa/burundi-is-torturing-prisoners-in-crackdown-on-dissent-united-nations-says.html


Processing URLs:  90%|████████▉ | 899/1000 [38:36<05:34,  3.31s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-02/china-s-korea-agree-to-summit-with-japan-by-november-yonhap


Processing URLs:  90%|█████████ | 902/1000 [38:36<02:24,  1.47s/it]

Error extracting text from http://www.wsj.com/articles/brexit-to-complicate-trans-atlantic-cooperation-1467064847: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brexit-to-complicate-trans-atlantic-cooperation-1467064847
Error extracting text from http://www.basnews.com/index.php/en/news/iras/267057: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iras/267057


Processing URLs:  91%|█████████ | 906/1000 [38:43<02:43,  1.74s/it]

Error extracting text from https://www.crunchbase.com/organization/insilico-medicine/funding_rounds/funding_rounds_list#section-funding-rounds: 403 Client Error: Forbidden for url: https://www.crunchbase.com/organization/insilico-medicine/funding_rounds/funding_rounds_list#section-funding-rounds


Processing URLs:  91%|█████████ | 909/1000 [38:48<02:22,  1.57s/it]

Error extracting text from https://www.brookings.edu/book/cyber-security-at-civil-nuclear-facilities-understanding-the-risks/: 404 Client Error: Not Found for url: https://www.brookings.edu/books/cyber-security-at-civil-nuclear-facilities-understanding-the-risks


Processing URLs:  91%|█████████ | 911/1000 [38:51<02:02,  1.38s/it]

Error extracting text from https://rbth.com/news/2017/04/17/citizens-of-18-countries-can-visit-russias-far-east-without-visas_744357: 404 Client Error: Not Found for url: https://www.rbth.com/news/2017/04/17/citizens-of-18-countries-can-visit-russias-far-east-without-visas_744357
Error extracting text from http://www.reuters.com/article/us-usa-trump-inauguration-idUSKBN1540I0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-inauguration-idUSKBN1540I0


Processing URLs:  91%|█████████▏| 914/1000 [38:57<02:33,  1.79s/it]

Error extracting text from http://www.ibtimes.com/under-kim-jong-un-north-korea-slams-failed-talks-south-amid-decline-tourism-sector-2226034: 403 Client Error: Forbidden for url: https://www.ibtimes.com/under-kim-jong-un-north-korea-slams-failed-talks-south-amid-decline-tourism-sector-2226034


Processing URLs:  92%|█████████▏| 916/1000 [38:57<01:21,  1.04it/s]

Error extracting text from http://www.wsj.com/articles/u-s-steel-accuses-china-of-hacking-1461859201: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-steel-accuses-china-of-hacking-1461859201
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-turkey-idUSKBN0TW0EU20151213: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-turkey-idUSKBN0TW0EU20151213
Error extracting text from https://ycharts.com/indicators/total_world_rotary_rigs: 403 Client Error: Forbidden for url: https://ycharts.com/indicators/total_world_rotary_rigs


Processing URLs:  92%|█████████▏| 922/1000 [39:00<00:44,  1.75it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/senate-races/278354-portman-attacks-stricklands-record-as-governor-in-new-round-of: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/senate-races/278354-portman-attacks-stricklands-record-as-governor-in-new-round-of/
Error extracting text from http://www.visualcapitalist.com/u-s-military-personnel-deployments-country/: 403 Client Error: Forbidden for url: http://www.visualcapitalist.com/u-s-military-personnel-deployments-country/


Processing URLs:  92%|█████████▏| 923/1000 [39:02<01:01,  1.25it/s]

Error extracting text from http://townhall.com/tipsheet/guybenson/2016/01/19/oh-my-is-sarah-palin-about-to-endorse-trump-n2106499: 403 Client Error: Forbidden for url: https://townhall.com/tipsheet/guybenson/2016/01/19/oh-my-is-sarah-palin-about-to-endorse-trump-n2106499


Processing URLs:  92%|█████████▎| 925/1000 [39:03<00:57,  1.30it/s]

Error extracting text from http://www.nytimes.com/2015/10/01/world/europe/putin-military-syria.html?smprod=nytcore-iphone&amp;smid=nytcore-iphone-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/world/europe/putin-military-syria.html?smprod=nytcore-iphone&amp;smid=nytcore-iphone-share


Processing URLs:  93%|█████████▎| 928/1000 [39:08<01:30,  1.26s/it]



Processing URLs:  93%|█████████▎| 932/1000 [39:19<02:29,  2.20s/it]

Error extracting text from http://maritimeawarenessproject.org/2016/04/27/forecasting-the-south-china-sea-arbitration-merits-award/: 403 Client Error: Forbidden for url: https://map.nbr.org/


Processing URLs:  94%|█████████▎| 937/1000 [39:26<01:21,  1.30s/it]

Error extracting text from https://theconversation.com/inside-venezuelas-economic-collapse-80597: 403 Client Error: Forbidden for url: https://theconversation.com/inside-venezuelas-economic-collapse-80597


Processing URLs:  94%|█████████▍| 941/1000 [39:30<01:15,  1.28s/it]

Error extracting text from http://www.myanmar.com/16-history-timeline/37-myanmar-s-president-will-not-run-in-2015-elections.html: 406 Client Error: Not Acceptable for url: http://www.myanmar.com/16-history-timeline/37-myanmar-s-president-will-not-run-in-2015-elections.html


Processing URLs:  94%|█████████▍| 942/1000 [39:30<01:01,  1.06s/it]

Error extracting text from https://www.oig.dhs.gov/sites/default/files/assets/pr/2018/oigpr-010918-screening-protocol-aliens-who-may-be-known-suspected-terrorists-limited-risks-national.pdf: 403 Client Error: Forbidden for url: https://www.oig.dhs.gov/sites/default/files/assets/pr/2018/oigpr-010918-screening-protocol-aliens-who-may-be-known-suspected-terrorists-limited-risks-national.pdf


Processing URLs:  94%|█████████▍| 943/1000 [39:31<00:50,  1.13it/s]

Error extracting text from http://blogs.reuters.com/breakingviews/2015/10/15/hong-kong-ipos-leaning-too-heavily-on-cornerstones/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2015/10/15/hong-kong-ipos-leaning-too-heavily-on-cornerstones/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3082fff50>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  95%|█████████▍| 946/1000 [39:35<01:13,  1.36s/it]

Error extracting text from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801951/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2801951/


Processing URLs:  95%|█████████▍| 949/1000 [39:53<03:09,  3.71s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/portugals-former-pm-antonio-guterres-clears-hurdle-to-be-next-united-nations-chief/articleshow/54701916.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/portugals-former-pm-antonio-guterres-clears-hurdle-to-be-next-united-nations-chief/articleshow/54701916.cms


Processing URLs:  96%|█████████▌| 958/1000 [40:07<01:33,  2.23s/it]

Error extracting text from https://www.cnbc.com/2020/11/13/china-faces-the-challenge-of-keeping-big-tech-in-check.ht: 404 Client Error: Not Found for url: https://www.cnbc.com/2020/11/13/china-faces-the-challenge-of-keeping-big-tech-in-check.ht


Processing URLs:  96%|█████████▌| 959/1000 [40:07<01:06,  1.62s/it]

Error extracting text from http://www.wsj.com/articles/brexit-vote-clouds-eu-u-s-trade-deal-1468971211: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brexit-vote-clouds-eu-u-s-trade-deal-1468971211


Processing URLs:  96%|█████████▌| 962/1000 [41:11<12:12, 19.27s/it]

Error extracting text from http://www.charlotteobserver.com/news/business/article63357522.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  97%|█████████▋| 966/1000 [41:16<03:13,  5.69s/it]

URL filtered: https://www.youtube.com/watch?v=dRB4MfhtGS4


Processing URLs:  97%|█████████▋| 969/1000 [41:18<01:23,  2.68s/it]

Error extracting text from http://splash247.com/contractor-says-new-panama-canal-locks-will-be-repaired-in-january/: 403 Client Error: Forbidden for url: https://splash247.com/contractor-says-new-panama-canal-locks-will-be-repaired-in-january/


Processing URLs:  97%|█████████▋| 972/1000 [41:19<00:37,  1.34s/it]

Error extracting text from http://www.wsj.com/articles/north-korea-blamed-for-nuclear-power-plant-hack-1426589324: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-blamed-for-nuclear-power-plant-hack-1426589324


Processing URLs:  98%|█████████▊| 975/1000 [41:21<00:19,  1.29it/s]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-china-ipo-idUSKCN0SV1AK20151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-china-ipo-idUSKCN0SV1AK20151106


Processing URLs:  98%|█████████▊| 978/1000 [41:24<00:17,  1.23it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://panoramasvs.blogspot.com/2016/02/fhc-defende-o-imediato-impeachment-de.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://panoramasvs.blogspot.com/2016/02/fhc-defende-o-imediato-impeachment-de.html&amp;prev=search


Processing URLs:  99%|█████████▊| 986/1000 [41:41<00:25,  1.83s/it]

Error extracting text from http://english.alarabiya.net/en/business/economy/2015/12/14/IMF-must-stay-Grexit-risk-not-over-EU-bailout-chief.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/economy/2015/12/14/IMF-must-stay-Grexit-risk-not-over-EU-bailout-chief.html


Processing URLs:  99%|█████████▉| 991/1000 [41:49<00:14,  1.61s/it]

URL filtered: https://twitter.com/ianbremmer/status/695443976195706881


Processing URLs:  99%|█████████▉| 994/1000 [41:51<00:06,  1.09s/it]

Error extracting text from http://www.newsweek.com/why-trump-immigrant-ban-unlawful-chapter-and-verse-531044: 403 Client Error: Forbidden for url: https://www.newsweek.com/why-trump-immigrant-ban-unlawful-chapter-and-verse-531044


Processing URLs: 100%|█████████▉| 997/1000 [41:55<00:03,  1.08s/it]

Error extracting text from https://media.nti.org/pdfs/NTI-Hruby_FINAL.PDF: 403 Client Error: Forbidden for url: https://media.nti.org/pdfs/NTI-Hruby_FINAL.PDF


Processing URLs: 100%|██████████| 1000/1000 [41:57<00:00,  2.52s/it]
Processing URLs:   0%|          | 2/1000 [00:02<20:04,  1.21s/it]

URL filtered: https://www.youtube.com/watch?v=lRnVJ7Mgt0I


Processing URLs:   0%|          | 4/1000 [00:03<14:24,  1.15it/s]

Error extracting text from http://www.channelnewsasia.com/news/world/iraqi-shiite-cleric-sadr-urges-assad-to-step-down/3663222.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/iraqi-shiite-cleric-sadr-urges-assad-to-step-down/3663222.html


Processing URLs:   1%|          | 6/1000 [00:04<09:54,  1.67it/s]

Error extracting text from http://thehill.com/policy/national-security/fbi/314892-fbi-other-agencies-probe-possible-russia-aid-to-trump-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/fbi/314892-fbi-other-agencies-probe-possible-russia-aid-to-trump-report/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/brasil/2016/03/pelo-menos-13-paises-farao-manifestacoes-pro-impeachment-de-dilma-00833825.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/brasil/2016/03/pelo-menos-13-paises-farao-manifestacoes-pro-impeachment-de-dilma-00833825.html&amp;prev=search


Processing URLs:   1%|          | 10/1000 [00:18<46:53,  2.84s/it] 

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN1640J5?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN1640J5?mod=related&amp;channelName=worldNews


Processing URLs:   1%|          | 11/1000 [00:25<1:05:53,  4.00s/it]

Error extracting text from http://www.frostbrowntodd.com/resources-could-president-elect-trump-kill-nafta.html: 404 Client Error: Not Found for url: https://frostbrowntodd.com/resources-could-president-elect-trump-kill-nafta.html


Processing URLs:   1%|          | 12/1000 [00:25<47:50,  2.90s/it]  

Error extracting text from http://13dble.legion-etrangere.com/: HTTPConnectionPool(host='13dble.legion-etrangere.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3082ffdd0>: Failed to resolve '13dble.legion-etrangere.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/view/articles/2017-11-16/softbank-thinks-some-uber-shares-are-worth-more-than-others


Processing URLs:   1%|▏         | 14/1000 [00:26<27:09,  1.65s/it]

Error extracting text from http://www.insidecounsel.com/2016/01/21/cyber-attacks-and-litigation-expected-to-jump-in-2: HTTPConnectionPool(host='www.insidecounsel.com', port=80): Max retries exceeded with url: /2016/01/21/cyber-attacks-and-litigation-expected-to-jump-in-2 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761cec0>: Failed to resolve 'www.insidecounsel.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/11/16/us-brazil-politics-cunha-idUSKCN0T51HV20151116: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/us-brazil-politics-cunha-idUSKCN0T51HV20151116


Processing URLs:   2%|▏         | 16/1000 [00:26<18:39,  1.14s/it]

Error extracting text from https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/6114301bc89bd902013024ed/1628713030425/REINZ+Monthly+Property+Report+-+July+2021.pdf: 403 Client Error: Forbidden for url: https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/6114301bc89bd902013024ed/1628713030425/REINZ+Monthly+Property+Report+-+July+2021.pdf


Processing URLs:   2%|▏         | 17/1000 [00:27<18:13,  1.11s/it]

Error extracting text from http://www.who.int/influenza/human_animal_interface/avian_influenza/riskassessment_AH5N8_201611/en/: 404 Client Error: Not Found for url: https://www.who.int/influenza/human_animal_interface/avian_influenza/riskassessment_AH5N8_201611/en/


Processing URLs:   2%|▏         | 19/1000 [00:29<14:00,  1.17it/s]

Error extracting text from http://www.iran-daily.com/News/151733.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
Error extracting text from http://www.reuters.com/article/us-philippines-usa-aid-idUSKBN14516E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-usa-aid-idUSKBN14516E


Processing URLs:   2%|▏         | 22/1000 [00:31<12:01,  1.36it/s]

Error extracting text from https://www.cruz.senate.gov/?p=press_release&amp;id=5878: 403 Client Error: Forbidden for url: https://www.cruz.senate.gov/?p=press_release&amp;id=5878
Error extracting text from http://www.autonews.com/article/20150713/OEM11/150719948/gm-ignition-switch-death-toll-reaches-124: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20150713/OEM11/150719948/gm-ignition-switch-death-toll-reaches-124


Processing URLs:   3%|▎         | 30/1000 [00:50<26:22,  1.63s/it]

Error extracting text from https://onlinelibrary.wiley.com/doi/abs/10.1111/pafo.12142: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/abs/10.1111/pafo.12142


Processing URLs:   3%|▎         | 32/1000 [00:54<26:29,  1.64s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-24/two-u-k-eu-phone-polls-show-remain-ahead-with-reduced-lead


Processing URLs:   4%|▎         | 37/1000 [01:00<21:04,  1.31s/it]

Error extracting text from http://www.investors.com/news/technology/tesla-appears-on-track-to-meet-production-goals-though-risks-remain/?ven=DJCP&amp;src=AURLABO: 403 Client Error: Forbidden for url: https://www.investors.com/news/technology/tesla-appears-on-track-to-meet-production-goals-though-risks-remain/?ven=DJCP&amp;src=AURLABO


Processing URLs:   4%|▍         | 38/1000 [01:01<16:17,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/03/23/world/asia/afghanistan-taliban-helmand-sangin.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/23/world/asia/afghanistan-taliban-helmand-sangin.html


Processing URLs:   4%|▍         | 39/1000 [01:01<13:52,  1.15it/s]

Error extracting text from https://www.predictit.org/Market/1460/Who-will-win-the-2016-New-Hampshire-Democratic-primary: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1460/Who-will-win-the-2016-New-Hampshire-Democratic-primary


Processing URLs:   4%|▍         | 40/1000 [01:02<12:25,  1.29it/s]

Error extracting text from http://business.financialpost.com/news/fp-street/book-excerpt-superforecasting-the-art-and-science-of-prediction: 403 Client Error: Forbidden for url: https://financialpost.com/news/fp-street/book-excerpt-superforecasting-the-art-and-science-of-prediction


Processing URLs:   4%|▍         | 43/1000 [01:10<29:27,  1.85s/it]

Error extracting text from http://carnegie.ru/2015/09/09/russian-regime-in-2015-all-tactics-no-strategy/ih3t: 403 Client Error: Forbidden for url: http://carnegie.ru/2015/09/09/russian-regime-in-2015-all-tactics-no-strategy/ih3t


Processing URLs:   4%|▍         | 44/1000 [01:11<23:44,  1.49s/it]

Error extracting text from http://nationalinterest.org/blog/north-korea-test-nuclear-missile-could-strike-america-year-20609: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/north-korea-test-nuclear-missile-could-strike-america-year-20609


Processing URLs:   5%|▍         | 48/1000 [01:15<17:27,  1.10s/it]

Error extracting text from http://thehill.com/policy/finance/258188-ryan-says-budget-process-stinks: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/258188-ryan-says-budget-process-stinks/


Processing URLs:   5%|▌         | 50/1000 [01:20<24:51,  1.57s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-02/floods-stall-ivory-coast-cocoa-harvest-as-farmers-fret-over-rot


Processing URLs:   5%|▌         | 52/1000 [01:26<37:12,  2.36s/it]

Error extracting text from http://www.buenosairesherald.com/article/220772/spain%E2%80%99s-socialists-say-no-to-rajoy-as-pm: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/220772/spain%E2%80%99s-socialists-say-no-to-rajoy-as-pm


Processing URLs:   5%|▌         | 53/1000 [01:26<29:12,  1.85s/it]

Error extracting text from http://www.mb.com.ph/north-korea-could-return-to-6-party-nuclear-talks/: 403 Client Error: Forbidden for url: https://mb.com.ph/north-korea-could-return-to-6-party-nuclear-talks/


Processing URLs:   6%|▌         | 56/1000 [01:33<34:51,  2.22s/it]

Error extracting text from http://stateofobesity.org/healthcare-costs-obesity/: 404 Client Error: Not Found for url: https://stateofchildhoodobesity.org/healthcare-costs-obesity/


Processing URLs:   6%|▌         | 61/1000 [01:42<31:39,  2.02s/it]

Error extracting text from http://themalaysianreserve.com/new/story/ringgit-remain-under-pressure: 404 Client Error: Not Found for url: https://themalaysianreserve.com/new/story/ringgit-remain-under-pressure


Processing URLs:   6%|▌         | 62/1000 [01:42<23:17,  1.49s/it]

Error extracting text from https://english.alarabiya.net/News/world/2021/08/15/Taliban-begin-offensive-on-Kabul-Interior-Ministry: 403 Client Error: Forbidden for url: https://english.alarabiya.net/News/world/2021/08/15/Taliban-begin-offensive-on-Kabul-Interior-Ministry
Error extracting text from https://www.predictit.org/Contract/4490/Will-the-ACA-individual-mandate-be-repealed-by-the-end-of-2017#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/4490/Will-the-ACA-individual-mandate-be-repealed-by-the-end-of-2017#data
URL filtered: https://www.bloomberg.com/news/articles/2017-10-02/gm-pledges-electric-future-with-20-all-electric-models-by-2023


Processing URLs:   7%|▋         | 67/1000 [01:45<13:27,  1.16it/s]

Error extracting text from https://www.nytimes.com/2017/12/17/world/asia/afghanistan-taliban-attacks.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=3&amp;pgtype=collection: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/17/world/asia/afghanistan-taliban-attacks.html?rref=collection%2Ftimestopic%2FTaliban&amp;action=click&amp;contentCollection=timestopics&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=3&amp;pgtype=collection


Processing URLs:   7%|▋         | 72/1000 [01:53<21:53,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-bill-negotiators-idUSKBN17Z0RK?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-bill-negotiators-idUSKBN17Z0RK?il=0


Processing URLs:   7%|▋         | 73/1000 [01:57<29:59,  1.94s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/ttip-negotiators-reiterate-2016-goal-while-noting-market-access-gaps: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/ttip-negotiators-reiterate-2016-goal-while-noting-market-access-gaps


Processing URLs:   7%|▋         | 74/1000 [03:57<8:33:39, 33.28s/it]

Error extracting text from http://www.mda.mil/global/documents/pdf/budgetfy17.pdf: HTTPConnectionPool(host='www.mda.mil', port=80): Max retries exceeded with url: /global/documents/pdf/budgetfy17.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3065cf1d0>, 'Connection to www.mda.mil timed out. (connect timeout=60)'))


Processing URLs:   8%|▊         | 76/1000 [03:58<4:30:21, 17.56s/it]

Error extracting text from http://www.opec.org/opec_web/en/923.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/923.htm


Processing URLs:   8%|▊         | 83/1000 [04:10<47:31,  3.11s/it]  

Error extracting text from https://www.worthynews.com/24308-erdogan-putin-agree-multi-billion-energy-deal: 403 Client Error: Forbidden for url: https://www.worthynews.com/24308-erdogan-putin-agree-multi-billion-energy-deal


Processing URLs:   9%|▊         | 86/1000 [04:13<22:39,  1.49s/it]

Error extracting text from http://www.nytimes.com/2016/03/09/world/middleeast/irans-revolutionary-guards-test-nationwide-ballistic-missiles.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/09/world/middleeast/irans-revolutionary-guards-test-nationwide-ballistic-missiles.html?_r=0


Processing URLs:   9%|▉         | 88/1000 [04:17<27:42,  1.82s/it]

Error extracting text from https://reut.rs/2IYVItr: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   9%|▉         | 93/1000 [04:22<14:51,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN15W0WA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN15W0WA


Processing URLs:  10%|▉         | 99/1000 [04:31<16:43,  1.11s/it]

Error extracting text from http://awesci.com/cambridge-university-says-spelling-does-not-matter/: 406 Client Error: Not Acceptable for url: http://awesci.com/cambridge-university-says-spelling-does-not-matter/
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/o-subekte-i-obekte-infosfery&amp;usg=ALkJrhiQMHlw9BJtLycCTkhReXtcWjdv0w: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/o-subekte-i-obekte-infosfery&amp;usg=ALkJrhiQMHlw9BJtLycCTkhReXtcWjdv0w


Processing URLs:  10%|█         | 104/1000 [04:50<34:07,  2.29s/it]

Error extracting text from http://www.cnbc.com/2016/01/26/reuters-america-rouhani-says-italys-renzi-will-visit-iran-in-coming-months.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/01/26/reuters-america-rouhani-says-italys-renzi-will-visit-iran-in-coming-months.html


Processing URLs:  11%|█         | 109/1000 [04:53<11:04,  1.34it/s]

Error extracting text from https://www.wsj.com/articles/worst-flooding-in-decades-raises-concerns-over-chinas-three-gorges-dam-11595337875: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/worst-flooding-in-decades-raises-concerns-over-chinas-three-gorges-dam-11595337875
URL filtered: https://www.youtube.com/watch?v=5Ue4_MWwKY8
Error extracting text from http://www.reuters.com/article/us-tesla-china-crash-idUSKCN10L0P4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-china-crash-idUSKCN10L0P4


Processing URLs:  11%|█         | 111/1000 [04:54<08:17,  1.79it/s]

Error extracting text from http://english.alarabiya.net/en/News/2015/09/26/Russia-flying-reconnaissance-over-Syria-no-strikes-yet-U-S-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/2015/09/26/Russia-flying-reconnaissance-over-Syria-no-strikes-yet-U-S-.html


Processing URLs:  12%|█▏        | 115/1000 [04:57<11:44,  1.26it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/353410-hewlett-packard-enterprise-let-russia-review-software-used-by-pentagon: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/353410-hewlett-packard-enterprise-let-russia-review-software-used-by-pentagon/


Processing URLs:  12%|█▏        | 117/1000 [06:00<4:32:55, 18.55s/it]

Error extracting text from https://www.teamusa.org/about-the-usopc: HTTPSConnectionPool(host='www.teamusa.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  12%|█▏        | 119/1000 [06:01<2:19:25,  9.50s/it]

Error extracting text from http://www.fda.gov/AnimalVeterinary/DevelopmentApprovalProcess/GeneticEngineering/GeneticallyEngineeredAnimals/ucm446529.htm: 404 Client Error: Not Found for url: https://www.fda.gov/AnimalVeterinary/DevelopmentApprovalProcess/GeneticEngineering/GeneticallyEngineeredAnimals/ucm446529.htm
Error extracting text from https://www.middleeastmonitor.com/20160517-pa-prime-minister-confirms-security-cooperation-with-israel-still-ongoing/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20160517-pa-prime-minister-confirms-security-cooperation-with-israel-still-ongoing/


Processing URLs:  12%|█▏        | 124/1000 [06:07<38:28,  2.64s/it]  

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-eu-idUSKBN19430R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-eu-idUSKBN19430R


Processing URLs:  13%|█▎        | 126/1000 [06:09<24:06,  1.65s/it]

Error extracting text from https://globalguessing.com/tag/elections/: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /tag/elections/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  13%|█▎        | 127/1000 [06:10<20:19,  1.40s/it]

Error extracting text from https://www.ecb.europa.eu/ecb/legal/pdf/en_con_2016_26_f__sign.pdf: 404 Client Error: Not Found for url: https://www.ecb.europa.eu/ecb/legal/pdf/en_con_2016_26_f__sign.pdf


Processing URLs:  13%|█▎        | 131/1000 [06:13<13:43,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKCN1B70X5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKCN1B70X5


Processing URLs:  13%|█▎        | 133/1000 [06:15<12:19,  1.17it/s]

Error extracting text from http://www.wsj.com/articles/opec-cuts-forecast-of-oil-supplies-from-nonmembers-in-2015-1442228400: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-cuts-forecast-of-oil-supplies-from-nonmembers-in-2015-1442228400


Processing URLs:  14%|█▎        | 136/1000 [06:21<27:12,  1.89s/it]

URL filtered: https://www.youtube.com/watch?v=uHkvD7-u7y8


Processing URLs:  14%|█▍        | 138/1000 [06:22<15:27,  1.08s/it]

Error extracting text from https://www.nytimes.com/2021/07/02/world/europe/labor-by-election-victory-batley-and-spen.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/02/world/europe/labor-by-election-victory-batley-and-spen.html


Processing URLs:  14%|█▍        | 142/1000 [06:25<11:27,  1.25it/s]

Error extracting text from http://ir.avisbudgetgroup.com/releasedetail.cfm?ReleaseID=1047813: 403 Client Error: Forbidden for url: http://ir.avisbudgetgroup.com/releasedetail.cfm?ReleaseID=1047813


Processing URLs:  14%|█▍        | 143/1000 [06:27<16:53,  1.18s/it]

Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-24-july-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-24-july-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306259f70>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▍        | 146/1000 [07:30<3:42:14, 15.61s/it]

Error extracting text from https://massroots.com/blog/trump-admin-not-concerned-about-canadian-legal-marijuana: HTTPSConnectionPool(host='massroots.com', port=443): Max retries exceeded with url: /blog/trump-admin-not-concerned-about-canadian-legal-marijuana (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x306258710>, 'Connection to massroots.com timed out. (connect timeout=60)'))
Error extracting text from https://www.reuters.com/article/us-china-thaad-russia-idUSKBN19O0N8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-thaad-russia-idUSKBN19O0N8


Processing URLs:  15%|█▍        | 149/1000 [07:31<1:44:10,  7.34s/it]

Error extracting text from http://www.nytimes.com/2016/07/02/us/politics/loretta-lynch-hillary-clinton-email-server.html?smprod=nytcore-ipad&amp;smid=nytcore-ipad-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/02/us/politics/loretta-lynch-hillary-clinton-email-server.html?smprod=nytcore-ipad&amp;smid=nytcore-ipad-share


Processing URLs:  15%|█▌        | 150/1000 [07:33<1:23:18,  5.88s/it]

Error extracting text from https://www.newsday.com/long-island/columnists/dan-janison/afghanistan-biden-trump-troops-1.50213807: 404 Client Error: Not Found for url: https://www.newsday.com/long-island/columnists/dan-janison/afghanistan-biden-trump-troops-1.50213807


Processing URLs:  15%|█▌        | 154/1000 [07:41<37:12,  2.64s/it]  

Error extracting text from https://www.snp.org/our_vision: 403 Client Error: Forbidden for url: https://www.snp.org/our_vision


Processing URLs:  16%|█▋        | 164/1000 [07:55<19:07,  1.37s/it]

Error extracting text from https://www.tuko.co.ke/247484-second-opinion-poll-raila-lead-uhuru-kenyatta.html: 410 Client Error: Gone for url: https://www.tuko.co.ke/247484-second-opinion-poll-raila-lead-uhuru-kenyatta.html


Processing URLs:  18%|█▊        | 178/1000 [08:25<24:26,  1.78s/it]

Error extracting text from https://www.opensecrets.org/overview/reelect.php: 403 Client Error: Forbidden for url: https://www.opensecrets.org/overview/reelect.php


Processing URLs:  18%|█▊        | 180/1000 [13:32<14:53:52, 65.41s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/china-sent-monster-ship-roam-the-south-china-sea-20608: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/china-sent-monster-ship-roam-the-south-china-sea-20608


Processing URLs:  19%|█▊        | 187/1000 [13:44<1:28:56,  6.56s/it] 

Error extracting text from http://www.straitstimes.com/asia/east-asia/graft-buster-wang-qishan-not-on-chinas-new-leadership-line-up-report: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  19%|█▉        | 190/1000 [13:50<46:14,  3.43s/it]  

Error extracting text from http://in.reuters.com/article/philippines-southchinasea-idINKBN0UA0DL20151227: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  19%|█▉        | 194/1000 [13:54<18:33,  1.38s/it]

Error extracting text from https://www.nytimes.com/2017/03/25/world/americas/nafta-renegotiation-mexico.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/25/world/americas/nafta-renegotiation-mexico.html?_r=0
Error extracting text from http://www.balkaninsight.com/en/article/montenegrins-vote-online-to-keep-military-neutrality-04-12-2016#sthash.P1Zp1OSQ.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegrins-vote-online-to-keep-military-neutrality-04-12-2016#sthash.P1Zp1OSQ.dpuf


Processing URLs:  20%|█▉        | 199/1000 [14:02<14:35,  1.09s/it]

Error extracting text from https://events.google.com/alphago2017/: 404 Client Error: Not Found for url: https://events.google.com/alphago2017/


Processing URLs:  20%|██        | 200/1000 [14:02<11:36,  1.15it/s]

Error extracting text from https://www.nytimes.com/2020/07/07/opinion/biden-trump-debate.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/07/07/opinion/biden-trump-debate.html


Processing URLs:  20%|██        | 201/1000 [14:05<21:36,  1.62s/it]

Error extracting text from http://en.trend.az/iran/politics/2509252.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2509252.html


Processing URLs:  20%|██        | 203/1000 [14:08<17:39,  1.33s/it]

Error extracting text from https://www.axios.com/xinjiang-forced-labor-uyghurs-a3b58b6e-c98f-4ce4-ae52-7b4a37fa61f5.html: 403 Client Error: Forbidden for url: https://www.axios.com/xinjiang-forced-labor-uyghurs-a3b58b6e-c98f-4ce4-ae52-7b4a37fa61f5.html


Processing URLs:  21%|██        | 209/1000 [14:14<15:16,  1.16s/it]

Error extracting text from http://www.amazon.com/Where-Am-Heaven-Eternity-Beyond/dp/0718042220/ref=zg_bsnr_books_44: 500 Server Error:  for url: https://www.amazon.com/Where-Am-Heaven-Eternity-Beyond/dp/0718042220/ref=zg_bsnr_books_44


Processing URLs:  21%|██        | 210/1000 [14:15<11:34,  1.14it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/12/17/U-N-war-crimes-team-will-not-investigate-foreign-air-strikes-in-Syria.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/12/17/U-N-war-crimes-team-will-not-investigate-foreign-air-strikes-in-Syria.html


Processing URLs:  22%|██▏       | 216/1000 [14:22<14:24,  1.10s/it]

Error extracting text from http://www.nytimes.com/2015/10/06/business/energy-environment/oil-industry-gaining-in-push-for-repeal-of-us-ban-on-petroleum-exports.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/06/business/energy-environment/oil-industry-gaining-in-push-for-repeal-of-us-ban-on-petroleum-exports.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  22%|██▏       | 219/1000 [14:26<12:53,  1.01it/s]

Error extracting text from http://www.npr.org/sections/goatsandsoda/2015/10/26/451908297/next-year-could-mark-the-end-of-polio: 500 Server Error: Internal Server Error for url: https://www.npr.org/sections/goatsandsoda/2015/10/26/451908297/next-year-could-mark-the-end-of-polio
Error extracting text from https://www.irrawaddy.com/news/burma/myanmars-national-unity-government-deeply-concerned-with-china.html: 403 Client Error: Forbidden for url: https://www.irrawaddy.com/news/burma/myanmars-national-unity-government-deeply-concerned-with-china.html


Processing URLs:  22%|██▏       | 223/1000 [14:38<32:44,  2.53s/it]

URL filtered: http://www.nasdaq.com/article/gunmen-attack-aid-convoy-in-south-sudan-killing-two-iom-20170316-00802?utm_content=buffer2ac0e&amp;utm_medium=social&amp;utm_source=linkedin.com&amp;utm_campaign=buffer


Processing URLs:  22%|██▎       | 225/1000 [14:39<20:29,  1.59s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-06-03/u-k-services-left-out-of-brexit-trade-talks-think-tank-warns


Processing URLs:  23%|██▎       | 227/1000 [15:39<2:46:04, 12.89s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-06-27/spain-conservatives-win-vote-but-face-problems-to-form-govt: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  23%|██▎       | 229/1000 [16:43<4:53:31, 22.84s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220793-leader-s-top-aide-iran-to-reciprocate-us-sanctions: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220793-leader-s-top-aide-iran-to-reciprocate-us-sanctions (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300fd4f80>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:  23%|██▎       | 232/1000 [16:46<2:03:29,  9.65s/it]

Error extracting text from http://www.france24.com/en/20161124-venezuela-crisis-talks-wobble-december-meeting-key: 403 Client Error: Forbidden for url: http://www.france24.com/en/20161124-venezuela-crisis-talks-wobble-december-meeting-key


Processing URLs:  24%|██▍       | 238/1000 [16:51<28:52,  2.27s/it]  

Error extracting text from https://www.reuters.com/article/us-germany-politics/german-parties-regroup-for-last-ditch-coalition-push-idUSKBN1DI0DE?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-parties-regroup-for-last-ditch-coalition-push-idUSKBN1DI0DE?il=0


Processing URLs:  24%|██▍       | 241/1000 [16:54<17:19,  1.37s/it]

Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-white-house-still-lacks-the-votes-on-1490291939-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-white-house-still-lacks-the-votes-on-1490291939-htmlstory.html


Processing URLs:  24%|██▍       | 243/1000 [16:56<16:24,  1.30s/it]

URL filtered: https://twitter.com/britainelects/status/742722487167946752


Processing URLs:  24%|██▍       | 245/1000 [16:57<10:13,  1.23it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-30/hong-kong-bourse-wants-to-expand-shanghai-link-to-ipos-li-says


Processing URLs:  25%|██▍       | 247/1000 [16:58<10:38,  1.18it/s]

Error extracting text from https://www.reuters.com/article/us-russia-usa-missiles-putin/putin-accuses-u-s-of-plotting-to-break-landmark-arms-control-pact-idUSKBN1EG1CG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-missiles-putin/putin-accuses-u-s-of-plotting-to-break-landmark-arms-control-pact-idUSKBN1EG1CG


Processing URLs:  25%|██▌       | 250/1000 [16:59<07:12,  1.74it/s]

Error extracting text from http://www.nytimes.com/2016/04/13/science/alpha-centauri-breakthrough-starshot-yuri-milner-stephen-hawking.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/13/science/alpha-centauri-breakthrough-starshot-yuri-milner-stephen-hawking.html


Processing URLs:  25%|██▌       | 253/1000 [17:03<09:09,  1.36it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-security-germany-idUSKCN10S0X9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-germany-idUSKCN10S0X9


Processing URLs:  25%|██▌       | 254/1000 [17:07<20:17,  1.63s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-13/canada-should-permit-retail-marijuana-sales-panel-says


Processing URLs:  26%|██▌       | 256/1000 [17:07<13:07,  1.06s/it]

Error extracting text from http://thehill.com/policy/finance/259218-house-rejects-efforts-to-derail-export-import-bank: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/259218-house-rejects-efforts-to-derail-export-import-bank/


Processing URLs:  26%|██▌       | 259/1000 [17:11<12:55,  1.05s/it]

Error extracting text from http://www.nytimes.com/2016/02/25/opinion/the-party-of-no-way.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/25/opinion/the-party-of-no-way.html


Processing URLs:  26%|██▌       | 260/1000 [17:12<11:43,  1.05it/s]

Error extracting text from https://phys.org/news/2017-05-atlas-results-weakly-interacting-supersymmetric-particles.html: 400 Client Error: Bad request for url: https://phys.org/news/2017-05-atlas-results-weakly-interacting-supersymmetric-particles.html


Processing URLs:  26%|██▋       | 265/1000 [17:37<49:04,  4.01s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-trump-interview-highlights-idUSKBN1622RG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-interview-highlights-idUSKBN1622RG


Processing URLs:  27%|██▋       | 266/1000 [17:38<36:26,  2.98s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-trio-exclusive-idUSKBN18L302: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-trio-exclusive-idUSKBN18L302


Processing URLs:  27%|██▋       | 270/1000 [18:01<1:07:45,  5.57s/it]

URL filtered: https://www.theguardian.com/technology/2017/jun/19/social-media-proganda-manipulating-public-opinion-bots-accounts-facebook-twitter?CMP=share_btn_tw


Processing URLs:  27%|██▋       | 274/1000 [18:08<35:19,  2.92s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2016-12-05/china-transforms-frontier-neighbors-with-cash-for-rails-to-power
URL filtered: https://twitter.com/ChristopherJM/status/1497258980040589313,
URL filtered: https://twitter.com/realDonaldTrump/status/819164172781060096?ref_src=twsrc%5Etfw


Processing URLs:  28%|██▊       | 281/1000 [18:13<12:56,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-polls-phones-idUSKCN0YB1VO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-polls-phones-idUSKCN0YB1VO
Error extracting text from http://www.nytimes.com/2016/08/21/world/europe/moscow-kremlin-silence-critics-poison.html?smid=fb-share&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/21/world/europe/moscow-kremlin-silence-critics-poison.html?smid=fb-share&amp;_r=0


Processing URLs:  28%|██▊       | 284/1000 [18:17<15:12,  1.27s/it]

Error extracting text from http://www.timescolonist.com/lawmakers-seoul-spy-agency-says-n-korea-preparing-for-nuke-test-but-not-in-immediate-future-1.2090055#sthash.WKGpCMNg.dpuf: 404 Client Error: Not Found for url: http://www.timescolonist.com/lawmakers-seoul-spy-agency-says-n-korea-preparing-for-nuke-test-but-not-in-immediate-future-1.2090055#sthash.WKGpCMNg.dpuf


Processing URLs:  29%|██▉       | 290/1000 [18:40<26:53,  2.27s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-10/29/c_134763526.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-10/29/c_134763526.htm


Processing URLs:  29%|██▉       | 292/1000 [18:42<18:59,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-25/saudi-central-bank-gives-banks-5-3-billion-to-boost-stability


Processing URLs:  29%|██▉       | 294/1000 [18:45<19:33,  1.66s/it]

Error extracting text from http://38north.org/2016/01/sinpo010516/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  30%|██▉       | 295/1000 [18:46<17:17,  1.47s/it]

Error extracting text from http://seekingalpha.com/article/4052021-t-time-warner-deal-blocked-doj-believe-hype: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4052021-t-time-warner-deal-blocked-doj-believe-hype


Processing URLs:  30%|██▉       | 299/1000 [18:50<10:05,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-usa-fiscal-vote-idUSKBN0TS2WF20151210#D2hEHuIqXrXM1Ddr.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-vote-idUSKBN0TS2WF20151210#D2hEHuIqXrXM1Ddr.97
Error extracting text from https://www.latimes.com/world-nation/story/2021-04-09/us-urges-arms-embargo-and-sanctions-against-myanmar-military: 403 Client Error: Forbidden for url: https://www.latimes.com/world-nation/story/2021-04-09/us-urges-arms-embargo-and-sanctions-against-myanmar-military


Processing URLs:  30%|███       | 300/1000 [18:51<10:18,  1.13it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/12/14/0200000000AEN20151214002951315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/news/articles/2015-11-30/south-africa-edging-closer-to-junk-as-credit-rating-reviews-loom


Processing URLs:  31%|███       | 309/1000 [19:02<09:08,  1.26it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-01/brazil-company-illegally-funded-2010-rousseff-campaign-folha
Error extracting text from http://greece.greekreporter.com/2016/08/25/citigroup-grexit-back-on-table-with-escalation-of-crisis-in-2018/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/08/25/citigroup-grexit-back-on-table-with-escalation-of-crisis-in-2018/


Processing URLs:  31%|███       | 310/1000 [19:02<07:15,  1.58it/s]

Error extracting text from http://www.reuters.com/article/2015/09/28/markets-money-idUSL1N11Y1VP20150928: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/28/markets-money-idUSL1N11Y1VP20150928


Processing URLs:  31%|███       | 311/1000 [19:03<07:16,  1.58it/s]

Error extracting text from http://gcaptain.com/panama-canal-expansion-completion-delayed-to-end-june/: 403 Client Error: Forbidden for url: http://gcaptain.com/panama-canal-expansion-completion-delayed-to-end-june/
Error extracting text from https://www.reuters.com/article/us-asia-storm-japan/weakening-typhoon-talim-brings-heavy-rain-to-southwestern-japan-idUSKCN1BR05M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asia-storm-japan/weakening-typhoon-talim-brings-heavy-rain-to-southwestern-japan-idUSKCN1BR05M


Processing URLs:  31%|███▏      | 313/1000 [19:03<04:40,  2.45it/s]

Error extracting text from https://www.iol.co.za/news/politics/president-zuma-will-deliver-sona-says-magashule-13092694: 403 Client Error: Forbidden for url: https://www.iol.co.za/news/politics/president-zuma-will-deliver-sona-says-magashule-13092694


Processing URLs:  32%|███▏      | 317/1000 [20:08<3:15:55, 17.21s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-01-20/plan-to-drive-isis-from-mosul-missing-one-thing-troops: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  32%|███▏      | 321/1000 [21:16<4:25:01, 23.42s/it]

Error extracting text from https://www.ikn.army.mil/apps/IKNWMS/Home/WebSite/MIPB: HTTPSConnectionPool(host='www.ikn.army.mil', port=443): Max retries exceeded with url: /apps/IKNWMS/Home/WebSite/MIPB (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x305b23b00>, 'Connection to www.ikn.army.mil timed out. (connect timeout=60)'))


Processing URLs:  33%|███▎      | 327/1000 [21:24<44:10,  3.94s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-08-24/tiny-issuer-thinks-it-just-got-the-edge-in-race-for-bitcoin-etf?utm_source=twitter&amp;utm_medium=social&amp;utm_content=business&amp;utm_campaign=socialflow-organic&amp;cmpid=socialflow-twitter-business


Processing URLs:  34%|███▎      | 335/1000 [21:48<28:42,  2.59s/it]

Error extracting text from http://capone.mtsu.edu/studskl/hd/hemispheric_dominance.html: 404 Client Error: Not Found for url: http://capone.mtsu.edu/studskl/hd/hemispheric_dominance.html


Processing URLs:  34%|███▍      | 339/1000 [21:51<13:29,  1.22s/it]

Error extracting text from https://www.nytimes.com/2017/03/09/world/europe/britain-theresa-may-brexit-european-union.html?ribbon-ad-idx=5&amp;rref=world/europe: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/09/world/europe/britain-theresa-may-brexit-european-union.html?ribbon-ad-idx=5&amp;rref=world/europe


Processing URLs:  34%|███▍      | 341/1000 [21:52<08:52,  1.24it/s]

Error extracting text from http://www.wsj.com/articles/nigeria-grapples-with-abrupt-end-to-rapid-growth-1460418207: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nigeria-grapples-with-abrupt-end-to-rapid-growth-1460418207


Processing URLs:  34%|███▍      | 343/1000 [21:57<16:40,  1.52s/it]

Error extracting text from http://insideevs.com/anti-tesla-bill-indiana/: 404 Client Error: Not Found for url: https://insideevs.com:443/anti-tesla-bill-indiana/


Processing URLs:  34%|███▍      | 344/1000 [21:59<19:40,  1.80s/it]

Error extracting text from http://tass.ru/en/politics/834299: 404 Client Error: Not Found for url: https://tass.ru/en/politics/834299


Processing URLs:  34%|███▍      | 345/1000 [21:59<14:33,  1.33s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-kerry-idUSKCN0ZM2GU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-kerry-idUSKCN0ZM2GU


Processing URLs:  35%|███▌      | 351/1000 [22:12<22:31,  2.08s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-10/venezuela-bond-traders-only-care-about-oil-as-correlation-jumps


Processing URLs:  35%|███▌      | 353/1000 [22:15<19:02,  1.77s/it]

URL filtered: https://www.washingtonpost.com/technology/2021/01/07/trump-twitter-ban/


Processing URLs:  36%|███▌      | 355/1000 [22:16<13:48,  1.28s/it]

Error extracting text from http://www.globaltimes.cn/content/1009094.shtml: 404 Client Error: Not Found for url: https://www.globaltimes.cn/content/1009094.shtml


Processing URLs:  36%|███▌      | 358/1000 [22:28<27:51,  2.60s/it]

Error extracting text from https://www.el19digital.com/articulos/ver/titulo:112122-companera-rosario-murillo-en-multinoticias-22-01-21: 403 Client Error: Forbidden for url: https://www.el19digital.com/articulos/ver/titulo:112122-companera-rosario-murillo-en-multinoticias-22-01-21


Processing URLs:  36%|███▌      | 359/1000 [22:29<23:16,  2.18s/it]

Error extracting text from https://bit.ly/3BXu8mQ: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/reboot/russia%E2%80%99s-silo-based-rs-28-sarmat-icbm-looks-killer-182293


Processing URLs:  36%|███▌      | 360/1000 [22:31<24:26,  2.29s/it]

Error extracting text from https://www.uefa.com/uefachampionsleague/fixtures-results/#/rd/2001309-2: 403 Client Error: Forbidden for url: https://www.uefa.com/uefachampionsleague/fixtures-results/#/rd/2001309-2


Processing URLs:  36%|███▋      | 364/1000 [22:45<33:38,  3.17s/it]

Error extracting text from https://reut.rs/3h1r7Z3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/us-targets-five-chinese-companies-over-alleged-forced-labour-xinjiang-2021-06-24/


Processing URLs:  36%|███▋      | 365/1000 [22:50<40:26,  3.82s/it]

URL filtered: http://bloomberg.econoday.com/byshoweventfull.asp?fid=467001&amp;cust=bloomberg-us&amp;year=2015&amp;lid=0&amp;prev=/bymonth.asp#top


Processing URLs:  37%|███▋      | 369/1000 [22:58<30:06,  2.86s/it]

URL filtered: https://www.vox.com/future-perfect/2020/2/14/21137882/prediction-markets-bloomberg-sanders-president


Processing URLs:  37%|███▋      | 371/1000 [23:01<21:59,  2.10s/it]

Error extracting text from https://www.reuters.com/business/cop/outline-carbon-markets-deal-emerges-un-climate-summit-2021-11-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/cop/outline-carbon-markets-deal-emerges-un-climate-summit-2021-11-13/


Processing URLs:  37%|███▋      | 373/1000 [23:01<15:26,  1.48s/it]

Error extracting text from https://uk.finance.yahoo.com/news/bank-of-england-andrew-bailey-negative-interest-rates-group-of-thirty-074209112.html: 404 Client Error: Not Found for url: https://uk.finance.yahoo.com/news/bank-of-england-andrew-bailey-negative-interest-rates-group-of-thirty-074209112.html
URL filtered: http://qz.com/865964/facebook-fb-could-face-e500000-fines-for-each-fake-news-post-in-germany/


Processing URLs:  38%|███▊      | 377/1000 [23:09<15:18,  1.47s/it]

Error extracting text from https://www.middleeastmonitor.com/news/europe/23724-moscow-senior-military-advisor-killed-in-syria: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/europe/23724-moscow-senior-military-advisor-killed-in-syria


Processing URLs:  38%|███▊      | 379/1000 [23:11<12:24,  1.20s/it]

Error extracting text from https://www.nytimes.com/2021/06/14/science/covid-lab-leak-fauci-kristian-andersen.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/14/science/covid-lab-leak-fauci-kristian-andersen.html


Processing URLs:  38%|███▊      | 380/1000 [23:12<12:26,  1.20s/it]

Error extracting text from https://www.kpbs.org/news/2021/jul/26/san-diego-county-will-not-reinstate-mask-mandates-/: 403 Client Error: Forbidden for url: https://www.kpbs.org/news/2021/jul/26/san-diego-county-will-not-reinstate-mask-mandates-/


Processing URLs:  38%|███▊      | 382/1000 [23:13<08:10,  1.26it/s]

Error extracting text from http://www.tradingeconomics.com/south-korea/government-budget/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/south-korea/government-budget/forecast
Error extracting text from http://www.reuters.com/article/us-trump-asia-japan/trump-says-japan-would-shoot-north-korean-missiles-out-of-sky-if-it-bought-u-s-weaponry-idUSKBN1D602F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trump-asia-japan/trump-says-japan-would-shoot-north-korean-missiles-out-of-sky-if-it-bought-u-s-weaponry-idUSKBN1D602F


Processing URLs:  38%|███▊      | 383/1000 [23:15<14:28,  1.41s/it]

Error extracting text from https://www.tvnz.co.nz/one-news/world/fighting-resumes-in-ethiopias-tigray-region: 404 Client Error: Not Found for url: https://www.1news.co.nz/one-news/world/fighting-resumes-in-ethiopias-tigray-region/


Processing URLs:  39%|███▊      | 386/1000 [23:20<14:01,  1.37s/it]

Error extracting text from http://www.tradingeconomics.com/france/youth-unemployment-rate: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/france/youth-unemployment-rate


Processing URLs:  39%|███▉      | 388/1000 [23:22<11:23,  1.12s/it]

Error extracting text from https://www.yahoo.com/news/scotlands-sturgeon-says-independence-happen-172203027.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/scotlands-sturgeon-says-independence-happen-172203027.html


Processing URLs:  39%|███▉      | 389/1000 [23:23<10:18,  1.01s/it]

Error extracting text from https://www.amazon.com/Dark-War-Officers-Account-Evacuation/dp/1642934712: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dark-War-Officers-Account-Evacuation/dp/1642934712
URL filtered: https://twitter.com/EU_TTIP_team


Processing URLs:  40%|███▉      | 395/1000 [23:27<07:04,  1.43it/s]

Error extracting text from https://www.reuters.com/article/us-somalia-attacks-protests/somalis-defy-police-to-protest-against-deadly-truck-bombings-idUSKBN1CN13E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-attacks-protests/somalis-defy-police-to-protest-against-deadly-truck-bombings-idUSKBN1CN13E


Processing URLs:  40%|███▉      | 398/1000 [23:29<05:03,  1.98it/s]

Error extracting text from http://www.reuters.com/video/2017/04/17/oil-prices-force-riyadh-to-cut-billions?videoId=371502778&amp;videoChannel=-13668: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2017/04/17/oil-prices-force-riyadh-to-cut-billions?videoId=371502778&amp;videoChannel=-13668
Error extracting text from https://www.reuters.com/article/us-italy-politics-draghi/draghi-forms-new-italian-government-names-politicians-technocrats-as-ministers-idUSKBN2AC2AG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-draghi/draghi-forms-new-italian-government-names-politicians-technocrats-as-ministers-idUSKBN2AC2AG?il=0


Processing URLs:  41%|████      | 407/1000 [24:56<3:19:35, 20.19s/it]

Error extracting text from http://www.usnews.com/news/articles/2017-01-24/donald-trump-to-sign-executive-orders-limiting-immigration-refugees-report: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  41%|████      | 409/1000 [24:58<1:41:16, 10.28s/it]

Error extracting text from http://middle-east-online.com/english/?id=75427: 404 Client Error: Not Found for url: https://middle-east-online.com/english/?id=75427
Error extracting text from http://www.reuters.com/article/us-iran-usa-idUSKBN15H253: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-idUSKBN15H253


Processing URLs:  41%|████      | 410/1000 [24:58<1:11:08,  7.23s/it]

Error extracting text from https://www.france24.com/en/live-news/20210302-yemen-step-away-from-devastating-famine-un: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210302-yemen-step-away-from-devastating-famine-un


Processing URLs:  41%|████      | 412/1000 [25:00<38:16,  3.91s/it]  

Error extracting text from https://www.wsj.com/articles/venezuela-in-crisis-1509541108?mod=cx_picks&amp;cx_navSource=cx_picks&amp;cx_tag=contextual&amp;cx_artPos=2#cxrecs_s: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuela-in-crisis-1509541108?mod=cx_picks&amp;cx_navSource=cx_picks&amp;cx_tag=contextual&amp;cx_artPos=2#cxrecs_s


Processing URLs:  41%|████▏     | 414/1000 [25:02<25:08,  2.57s/it]

Error extracting text from https://www.reuters.com/world/hamas-radio-reports-israeli-air-strike-gaza-2021-06-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/hamas-radio-reports-israeli-air-strike-gaza-2021-06-15/


Processing URLs:  42%|████▏     | 417/1000 [25:06<17:07,  1.76s/it]

Error extracting text from https://scontent-mia1-1.xx.fbcdn.net/v/t1.0-9/13315716_10153837546873740_5034133144957347609_n.jpg?oh=917a417e5e53bc24e451c2e16430bfd7&amp;oe=57DA2744: HTTPSConnectionPool(host='scontent-mia1-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13315716_10153837546873740_5034133144957347609_n.jpg?oh=917a417e5e53bc24e451c2e16430bfd7&amp;oe=57DA2744 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30352b530>: Failed to resolve 'scontent-mia1-1.xx.fbcdn.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 419/1000 [25:09<15:00,  1.55s/it]

Error extracting text from http://www.nzherald.co.nz/the-country/news/article.cfm?c_id=16&amp;objectid=11800192: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/the-country/news/article.cfm?c_id=16&amp;objectid=11800192


Processing URLs:  42%|████▏     | 421/1000 [25:14<18:13,  1.89s/it]

Error extracting text from https://tribecacitizen.com/2018/02/26/self-driving-cars-are-coming-to-lower-manhattan/: 403 Client Error: Forbidden for url: https://tribecacitizen.com/2018/02/26/self-driving-cars-are-coming-to-lower-manhattan/


Processing URLs:  43%|████▎     | 426/1000 [25:19<10:03,  1.05s/it]

Error extracting text from http://www.automobilwoche.de/article/20160704/AGENTURMELDUNGEN/307049943/elektroauto-kaufpramie-der-gro%C3%9Fe-ansturm-bleibt-aus: 403 Client Error: Forbidden for url: https://www.automobilwoche.de/article/20160704/AGENTURMELDUNGEN/307049943/elektroauto-kaufpramie-der-gro%C3%9Fe-ansturm-bleibt-aus


Processing URLs:  43%|████▎     | 427/1000 [25:21<11:59,  1.26s/it]

Error extracting text from https://www.enca.com/africa/burundi-opposition-says-not-invited-to-peace-talks: 404 Client Error: Not Found for url: https://www.enca.com/africa/burundi-opposition-says-not-invited-to-peace-talks


Processing URLs:  43%|████▎     | 431/1000 [25:29<13:48,  1.46s/it]

Error extracting text from http://time.com/5051659/steve-bannon-roy-moore-rally-mitt-romney/: 404 Client Error: Not Found for url: https://time.com/5051659/steve-bannon-roy-moore-rally-mitt-romney/


Processing URLs:  43%|████▎     | 432/1000 [25:30<11:00,  1.16s/it]

Error extracting text from http://www.thestandard.com.hk/breaking_news_detail.asp?id=69315&amp;icid=1&amp;d_str=: 403 Client Error: Forbidden for url: https://www.thestandard.com.hk/breaking_news_detail.asp?id=69315&amp;icid=1&amp;d_str=


Processing URLs:  43%|████▎     | 433/1000 [26:30<2:57:26, 18.78s/it]

Error extracting text from https://www.usnews.com/news/best-states/arizona/articles/2017-10-12/arizona-election-chief-says-state-not-targeted-by-hackers: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  44%|████▍     | 444/1000 [26:51<19:30,  2.10s/it]  

URL filtered: https://www.youtube.com/watch?v=cvE7w1tFj7M


Processing URLs:  45%|████▍     | 446/1000 [26:52<11:57,  1.30s/it]

Error extracting text from https://www.intelligence.senate.gov/hearings: 403 Client Error: Forbidden for url: https://www.intelligence.senate.gov/hearings


Processing URLs:  45%|████▌     | 450/1000 [26:57<11:19,  1.24s/it]

Error extracting text from https://www.miit.gov.cn/: 403 Client Error: Forbidden for url: https://www.miit.gov.cn/


Processing URLs:  45%|████▌     | 454/1000 [27:02<10:06,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN15S2KH?ex_cid=SigDig: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN15S2KH?ex_cid=SigDig


Processing URLs:  46%|████▌     | 456/1000 [27:05<09:40,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/turkish-police-detain-boydak-executives-in-gulen-investigation-1457095909: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkish-police-detain-boydak-executives-in-gulen-investigation-1457095909


Processing URLs:  46%|████▌     | 461/1000 [27:20<22:38,  2.52s/it]

Error extracting text from http://www.isightpartners.com/2014/10/cve-2014-4114/: 403 Client Error: Forbidden for url: https://www.isightpartners.com/2014/10/cve-2014-4114/


Processing URLs:  46%|████▋     | 463/1000 [27:22<13:26,  1.50s/it]

Error extracting text from https://www.msn.com/en-us/news/world/russia-plans-security-talks-with-us-before-nato-meeting/: 404 Client Error: Not Found for url: https://www.msn.com/en-us/news/world/russia-plans-security-talks-with-us-before-nato-meeting/


Processing URLs:  47%|████▋     | 466/1000 [27:25<10:33,  1.19s/it]

Error extracting text from https://www.c-span.org/video/?c4632358/donald-trump-un-headquarters-renovation: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?c4632358/donald-trump-un-headquarters-renovation


Processing URLs:  47%|████▋     | 468/1000 [27:27<09:36,  1.08s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/multimidia/video/594-dos-brasileiros-sao-favoraveis-ao-impeachment-de-dilma&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/multimidia/video/594-dos-brasileiros-sao-favoraveis-ao-impeachment-de-dilma&amp;prev=search


Processing URLs:  47%|████▋     | 471/1000 [27:30<09:10,  1.04s/it]

Error extracting text from http://www.edmond-de-rothschild.com/site/International/en/news/asset-management/8925-financing-infrastructures-to-build-the-future: 520 Server Error: Unknown Code for url: https://www.edmond-de-rothschild.com/site/International/en/news/asset-management/8925-financing-infrastructures-to-build-the-future


Processing URLs:  47%|████▋     | 474/1000 [27:33<08:27,  1.04it/s]

Error extracting text from http://www.wsj.com/articles/brazils-interim-president-michel-temer-linked-to-corruption-probe-in-plea-bargain-1466022380: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-interim-president-michel-temer-linked-to-corruption-probe-in-plea-bargain-1466022380


Processing URLs:  48%|████▊     | 477/1000 [27:40<16:33,  1.90s/it]

Error extracting text from https://mashable.com/2012/11/07/nate-silver-wins/: 404 Client Error: Not Found for url: https://mashable.com/2012/11/07/nate-silver-wins/


Processing URLs:  48%|████▊     | 480/1000 [27:46<16:01,  1.85s/it]

Error extracting text from http://oceansidepost.com/2015/12/29/tear-gas-fired-at-montenegro-anti-government-protest.html: 404 Client Error: Not Found for url: https://oceansidepost.com/2015/12/29/tear-gas-fired-at-montenegro-anti-government-protest.html


Processing URLs:  48%|████▊     | 482/1000 [27:48<12:26,  1.44s/it]

Error extracting text from https://finance.yahoo.com/news/test-flights-for-superfast-jets-will-be-in-the-air-in-just-a-matter-of-months-boom-supersonic-ceo-110933170.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/test-flights-for-superfast-jets-will-be-in-the-air-in-just-a-matter-of-months-boom-supersonic-ceo-110933170.html


Processing URLs:  48%|████▊     | 484/1000 [27:50<10:39,  1.24s/it]

Error extracting text from http://bostinno.streetwise.co/2015/10/26/apple-aapl-q4-earnings-iphone-6s-sales-stock-price-movements/: HTTPConnectionPool(host='bostinno.streetwise.co', port=80): Max retries exceeded with url: /2015/10/26/apple-aapl-q4-earnings-iphone-6s-sales-stock-price-movements/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebeed0>: Failed to resolve 'bostinno.streetwise.co' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 485/1000 [27:51<08:16,  1.04it/s]

Error extracting text from http://www.wsj.com/articles/the-xi-jinping-ascendancy-1467748121: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-xi-jinping-ascendancy-1467748121


Processing URLs:  49%|████▉     | 489/1000 [27:56<08:27,  1.01it/s]

Error extracting text from https://www.reuters.com/article/us-tesla-truck-research/teslas-unfettered-ambition-will-drain-finances-analysts-idUSKBN1DH1M4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-truck-research/teslas-unfettered-ambition-will-drain-finances-analysts-idUSKBN1DH1M4


Processing URLs:  49%|████▉     | 491/1000 [27:59<10:19,  1.22s/it]

Error extracting text from https://missilethreat.csis.org/country/dprk/): 403 Client Error: Forbidden for url: https://missilethreat.csis.org/country/dprk/)


Processing URLs:  49%|████▉     | 493/1000 [28:01<09:41,  1.15s/it]

Error extracting text from https://www.nytimes.com/2020/11/15/business/china-trade-rcep.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/11/15/business/china-trade-rcep.html


Processing URLs:  50%|████▉     | 498/1000 [28:10<14:54,  1.78s/it]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-idUSKCN0WB0I9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-idUSKCN0WB0I9


Processing URLs:  50%|█████     | 503/1000 [28:16<10:33,  1.27s/it]

URL filtered: https://twitter.com/navalny/status/952827886897123329


Processing URLs:  50%|█████     | 505/1000 [28:17<07:25,  1.11it/s]

URL filtered: https://www.youtube.com/watch?v=J7SAvi3fnL8


Processing URLs:  51%|█████     | 507/1000 [28:19<07:43,  1.06it/s]

Error extracting text from https://af.reuters.com/article/commoditiesNews/idAFL8N1MP0DI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  51%|█████     | 508/1000 [28:19<06:58,  1.17it/s]

Error extracting text from https://www3.nhk.or.jp/nhkworld/en/news/20210416_20/: 404 Client Error: Not Found for url: https://www3.nhk.or.jp/nhkworld/en/news/20210416_20/


Processing URLs:  51%|█████     | 510/1000 [28:23<10:09,  1.24s/it]

Error extracting text from http://www.wsj.com/articles/gop-lawmakers-advance-bipartisan-effort-to-reauthorize-ex-im-bank-1444406397?mod=djemalertNEWS: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gop-lawmakers-advance-bipartisan-effort-to-reauthorize-ex-im-bank-1444406397?mod=djemalertNEWS


Processing URLs:  51%|█████     | 512/1000 [28:26<09:45,  1.20s/it]

Error extracting text from http://www.businessinsider.com/r-un-security-council-to-vote-on-new-north-korea-sanctions-tuesday-us-2016-2: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-un-security-council-to-vote-on-new-north-korea-sanctions-tuesday-us-2016-2


Processing URLs:  52%|█████▏    | 515/1000 [28:30<10:08,  1.26s/it]

Error extracting text from http://africanbusinessmagazine.com/sectors/development/malawis-drone-corridor-challenges-scepticism-towards-uavs/: 403 Client Error: Forbidden for url: https://african.business/sectors/development/malawis-drone-corridor-challenges-scepticism-towards-uavs/


Processing URLs:  52%|█████▏    | 519/1000 [28:37<11:02,  1.38s/it]

Error extracting text from http://globalnation.inquirer.net/146098/duterte-foreign-policy-to-undermine-us-influence-in-south-china-sea-study: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/146098/duterte-foreign-policy-to-undermine-us-influence-in-south-china-sea-study


Processing URLs:  52%|█████▏    | 521/1000 [28:37<06:38,  1.20it/s]

Error extracting text from http://www.tradingeconomics.com: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/


Processing URLs:  52%|█████▏    | 522/1000 [28:38<06:08,  1.30it/s]

Error extracting text from http://postimg.org/image/p3j9mdadr/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/p3j9mdadr/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761fa10>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.youtube.com/watch?v=5UXnulANF8g


Processing URLs:  53%|█████▎    | 527/1000 [28:40<03:44,  2.11it/s]

Error extracting text from https://www.reuters.com/business/aerospace-defense/south-korea-sees-imminent-prospect-north-icbm-test-newspaper-2022-03-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/aerospace-defense/south-korea-sees-imminent-prospect-north-icbm-test-newspaper-2022-03-14/
Error extracting text from http://www.nytimes.com/2015/09/03/world/middleeast/michigan-imam-visits-amir-hekmati-longest-held-american-in-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/03/world/middleeast/michigan-imam-visits-amir-hekmati-longest-held-american-in-iran.html
Error extracting text from https://2sjjwunnql41ia7ki31qqub1-wpengine.netdna-ssl.com/wp-content/uploads/2021/03/SavantaComRes_TheScotsman_March2021Tracker_110321_Voting.pdf: HTTPSConnectionPool(host='2sjjwunnql41ia7ki31qqub1-wpengine.netdna-ssl.com', port=443): Max retries exceeded with url: /wp-content/uploads/2021/03/SavantaComRes_TheScotsman_March2021Tracker_110321_Voting.

Processing URLs:  53%|█████▎    | 533/1000 [28:51<13:39,  1.75s/it]

Error extracting text from http://entnemdept.ufl.edu/creatures/aquatic/aedes_aegypti.htm: HTTPSConnectionPool(host='entnemdept.ufl.edu', port=443): Max retries exceeded with url: /creatures/aquatic/aedes_aegypti.htm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  53%|█████▎    | 534/1000 [28:53<13:18,  1.71s/it]

Error extracting text from http://www.infowars.com/bizarre-message-plays-on-hacked-radio-station-trump-will-go-26th/: 404 Client Error: Not Found for url: https://www.infowars.com/bizarre-message-plays-on-hacked-radio-station-trump-will-go-26th/


Processing URLs:  54%|█████▎    | 535/1000 [28:54<11:00,  1.42s/it]

Error extracting text from https://bit.ly/3njDnHe: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-hopes-for-snp-majority-continue-to-fade-as-more-support-slips-away-shows-poll-3209589


Processing URLs:  54%|█████▍    | 540/1000 [29:00<08:41,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-envoy-exclusive-idUSKCN0WY4L2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-envoy-exclusive-idUSKCN0WY4L2
URL filtered: https://www.forbes.com/sites/rachelsandler/2021/01/29/facebooks-oversight-board-is-taking-public-comments-on-trumps-ban/?sh=50c411d5ddc0
Error extracting text from http://www.nato.int/cps/en/natolive/topics_50115.htm#: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_50115.htm


Processing URLs:  55%|█████▍    | 546/1000 [29:08<11:39,  1.54s/it]

Error extracting text from https://uk.reuters.com/article/uk-somalia-military-exclusive/exclusive-u-s-suspends-aid-to-somalias-battered-military-over-graft-idUKKBN1E81XW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  55%|█████▌    | 551/1000 [29:18<10:50,  1.45s/it]

Error extracting text from http://wire.seenews.com/news/qatari-diar-plans-to-open-luxury-resort-in-montenegro-in-2018-fin-min-478157: HTTPConnectionPool(host='wire.seenews.com', port=80): Max retries exceeded with url: /news/qatari-diar-plans-to-open-luxury-resort-in-montenegro-in-2018-fin-min-478157 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x304ebce00>: Failed to establish a new connection: [Errno 61] Connection refused'))
URL filtered: https://twitter.com/Waymo/status/955563835422687232


Processing URLs:  55%|█████▌    | 553/1000 [29:20<08:41,  1.17s/it]

Error extracting text from http://english.farsnews.com/newstext.aspx?nn=13930702000879: HTTPConnectionPool(host='english.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13930702000879 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761d2e0>: Failed to resolve 'english.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  56%|█████▌    | 555/1000 [29:22<07:22,  1.01it/s]

Error extracting text from https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-summary-and-minutes/mpcvoting.xlsx?la=en&amp;hash=BD20B2AB676F5EB8FFBE568C8568C5A34FF4A742: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/-/media/boe/files/monetary-policy-summary-and-minutes/mpcvoting.xlsx?la=en&amp;hash=BD20B2AB676F5EB8FFBE568C8568C5A34FF4A742


Processing URLs:  56%|█████▋    | 563/1000 [29:29<05:25,  1.34it/s]

Error extracting text from http://news.yahoo.com/pentagon-russia-us-agree-minimize-risks-syrian-skies-175557742--politics.html: 404 Client Error: Not Found for url: http://news.yahoo.com/pentagon-russia-us-agree-minimize-risks-syrian-skies-175557742--politics.html
Error extracting text from http://www.nytimes.com/reuters/2015/10/14/business/14reuters-usa-economy.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2015/10/14/business/14reuters-usa-economy.html


Processing URLs:  57%|█████▋    | 568/1000 [29:37<07:47,  1.08s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/anc-orders-zuma-to-step-down-as-south-african-president-idUSKBN1FW028?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/anc-orders-zuma-to-step-down-as-south-african-president-idUSKBN1FW028?il=0


Processing URLs:  57%|█████▋    | 574/1000 [29:46<08:42,  1.23s/it]

Error extracting text from http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html?emc=edit_th_20151219&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html?emc=edit_th_20151219&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  57%|█████▊    | 575/1000 [29:47<08:15,  1.17s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/putin-russian-state-involved-hacking-47763799: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/putin-russian-state-involved-hacking-47763799


Processing URLs:  58%|█████▊    | 576/1000 [29:48<09:14,  1.31s/it]

URL filtered: https://www.youtube.com/watch?v=whHySar4EoY


Processing URLs:  58%|█████▊    | 581/1000 [30:02<20:01,  2.87s/it]

Error extracting text from http://www.redistrictingmajorityproject.com/?p=646: 406 Client Error: Not Acceptable for url: http://www.redistrictingmajorityproject.com/?p=646


Processing URLs:  59%|█████▉    | 588/1000 [30:13<11:48,  1.72s/it]

Error extracting text from http://www.newsweek.com/ethnic-violence-dr-congo-kills-least-21-un-424649?piano_t=1: 403 Client Error: Forbidden for url: https://www.newsweek.com/ethnic-violence-dr-congo-kills-least-21-un-424649?piano_t=1


Processing URLs:  59%|█████▉    | 589/1000 [30:13<08:53,  1.30s/it]

Error extracting text from http://www.wsj.com/articles/austrias-hofer-says-eu-exit-referendum-only-last-resort-1468066407: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/austrias-hofer-says-eu-exit-referendum-only-last-resort-1468066407


Processing URLs:  59%|█████▉    | 592/1000 [30:17<08:06,  1.19s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2015/11/22/720998/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2015/11/22/720998/story.html
Error extracting text from http://www.reuters.com/article/us-usa-economy-idUSKCN0Z01I7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-economy-idUSKCN0Z01I7


Processing URLs:  60%|█████▉    | 595/1000 [30:18<03:36,  1.87it/s]

Error extracting text from http://www.financialexpress.com/world-news/nafta-has-been-a-catastrophe-for-the-us-donald-trump/535600/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/world-news/nafta-has-been-a-catastrophe-for-the-us-donald-trump/535600/
Error extracting text from https://www.reuters.com/business/energy/germanys-nord-stream-2-gatekeeper-long-road-until-gas-flows-2021-10-15/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  60%|██████    | 601/1000 [30:22<03:30,  1.89it/s]

URL filtered: https://twitter.com/navalny/status/957581631921033216
URL filtered: https://fivethirtyeight.com/features/six-charts-to-help-americans-understand-the-upcoming-german-election/?ex_cid=story-twitter
Error extracting text from https://www.researchgate.net/publication/227378486_India%27s_International_Reserves_How_Large_and_How_Diversified: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/227378486_India%27s_International_Reserves_How_Large_and_How_Diversified


Processing URLs:  60%|██████    | 602/1000 [30:22<03:05,  2.14it/s]

Error extracting text from https://www.nytimes.com/2017/02/13/opinion/a-rare-republican-call-to-climate-action.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/13/opinion/a-rare-republican-call-to-climate-action.html


Processing URLs:  60%|██████    | 605/1000 [30:28<08:24,  1.28s/it]

Error extracting text from http://atimes.com/2016/06/a-pax-sinica-in-the-middle-east-redux/: 404 Client Error: Not Found for url: https://atimes.com/2016/06/a-pax-sinica-in-the-middle-east-redux/


Processing URLs:  61%|██████    | 606/1000 [30:28<06:53,  1.05s/it]

Error extracting text from http://www.thesun.co.uk/sol/homepage/news/politics/6799367/IMF-boss-tells-David-Cameron-to-speed-up-EU-referendum.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/news/politics/6799367/IMF-boss-tells-David-Cameron-to-speed-up-EU-referendum.html


Processing URLs:  61%|██████    | 608/1000 [30:31<07:31,  1.15s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732130/pdf/nihms71775.pdf: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732130/pdf/nihms71775.pdf


Processing URLs:  61%|██████▏   | 613/1000 [30:36<08:13,  1.28s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20151117/1305178647.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20151117/1305178647.html


Processing URLs:  61%|██████▏   | 614/1000 [30:38<09:29,  1.48s/it]

Error extracting text from http://esa.un.org/unpd/wpp/Publications/Files/Key_Findings_WPP_2015.pdf: HTTPSConnectionPool(host='esa.un.org', port=443): Max retries exceeded with url: /unpd/wpp/Publications/Files/Key_Findings_WPP_2015.pdf (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  62%|██████▏   | 617/1000 [30:41<06:32,  1.03s/it]

Error extracting text from http://www.worldbulletin.net/news/167878/montenegro-nato-dispute-prompts-vote-of-confidence: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/news/167878/montenegro-nato-dispute-prompts-vote-of-confidence


Processing URLs:  62%|██████▏   | 619/1000 [30:43<05:47,  1.10it/s]

Error extracting text from https://www.afghanistan-analysts.org/mehwar-e-mardom-e-afghanistan-new-opposition-group-with-an-ambiguous-link-to-karzai/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/mehwar-e-mardom-e-afghanistan-new-opposition-group-with-an-ambiguous-link-to-karzai/
Error extracting text from http://www.nato.int/cps/en/natohq/news_123859.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_123859.htm


Processing URLs:  62%|██████▏   | 620/1000 [31:00<36:05,  5.70s/it]

Error extracting text from https://www.investopedia.com/terms/m/meanreversion.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/m/meanreversion.asp


Processing URLs:  62%|██████▏   | 622/1000 [31:02<21:04,  3.35s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/23/early-christmas-gift-accidental-clustering/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/23/early-christmas-gift-accidental-clustering/


Processing URLs:  62%|██████▏   | 624/1000 [31:04<13:06,  2.09s/it]

Error extracting text from http://ndb.int/New%20Development-Bank-end-2-5-billion-next-year-KV-Kamath.php#parentHorizontalTab2: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /New%20Development-Bank-end-2-5-billion-next-year-KV-Kamath.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3019912e0>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 627/1000 [31:09<11:38,  1.87s/it]

Error extracting text from http://www.channelnewsasia.com/news/singapore/chinese-president-xi/2189520.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/singapore/chinese-president-xi/2189520.html


Processing URLs:  63%|██████▎   | 629/1000 [31:13<11:42,  1.89s/it]

Error extracting text from https://reut.rs/3eknxbX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/saudi-arabia-uae-reach-compromise-oil-output-deal-opec-source-2021-07-14/


Processing URLs:  63%|██████▎   | 632/1000 [31:28<22:08,  3.61s/it]

Error extracting text from http://www.brainpreservation.org/tech-prize/: 406 Client Error: Not Acceptable for url: http://www.brainpreservation.org/tech-prize/


Processing URLs:  63%|██████▎   | 634/1000 [31:37<26:54,  4.41s/it]

Error extracting text from http://news.trust.org/item/20160414154410-i3usn/?source=hpMostPopularTheWire: 404 Client Error:  for url: https://news.trust.org:443/item/20160414154410-i3usn/?source=hpMostPopularTheWire


Processing URLs:  64%|██████▍   | 641/1000 [31:49<08:11,  1.37s/it]

Error extracting text from http://www.agweb.com/article/opening-of-panama-canal-expansion-delayed-naa-alison-rice/: 403 Client Error: Forbidden for url: http://www.agweb.com/article/opening-of-panama-canal-expansion-delayed-naa-alison-rice/
Error extracting text from http://www.nytimes.com/2015/09/22/world/middleeast/france-opens-trade-office-in-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/22/world/middleeast/france-opens-trade-office-in-iran.html


Processing URLs:  64%|██████▍   | 642/1000 [31:50<07:42,  1.29s/it]

Error extracting text from https://www.reuters.com/article/us-petrobras-bolsonaro-ceo-idUSKBN2AJ2KA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-petrobras-bolsonaro-ceo-idUSKBN2AJ2KA


Processing URLs:  64%|██████▍   | 643/1000 [31:51<05:48,  1.02it/s]

Error extracting text from http://www.infotep.gov.do/pdf_prog_form/organigrama_infotep.pdf: 403 Client Error: Forbidden for url: http://www.infotep.gov.do/pdf_prog_form/organigrama_infotep.pdf


Processing URLs:  64%|██████▍   | 645/1000 [31:51<03:50,  1.54it/s]

Error extracting text from http://www.reuters.com/article/us-colombia-peace-idUSKCN12204M?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-colombia-peace-idUSKCN12204M?il=0


Processing URLs:  65%|██████▍   | 646/1000 [31:52<04:01,  1.46it/s]

Error extracting text from http://www.amazon.com/Red-Notice-Finance-Murder-Justice/dp/147675571X: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Red-Notice-Finance-Murder-Justice/dp/147675571X


Processing URLs:  65%|██████▍   | 648/1000 [31:54<04:36,  1.27it/s]

URL filtered: https://en.wikipedia.org/wiki/Michael_Bloomberg


Processing URLs:  65%|██████▌   | 654/1000 [32:00<06:18,  1.09s/it]

Error extracting text from https://www.theregreview.org/2022/01/13/khodor-support-waiver-covid-19-vaccine-patents/: 403 Client Error: Forbidden for url: https://www.theregreview.org/2022/01/13/khodor-support-waiver-covid-19-vaccine-patents/
Error extracting text from http://www.france24.com/en/20160114-sarkozy-urges-arab-powers-send-troops-syria-calls-conflict-world-war-three: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160114-sarkozy-urges-arab-powers-send-troops-syria-calls-conflict-world-war-three


Processing URLs:  66%|██████▌   | 660/1000 [32:04<04:44,  1.19it/s]

Error extracting text from http://www.foxnews.com/world/2015/09/25/north-korean-long-range-rocket-launch-unlikely-at-anniversary-institute-says/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/09/25/north-korean-long-range-rocket-launch-unlikely-at-anniversary-institute-says/


Processing URLs:  66%|██████▌   | 661/1000 [32:07<07:58,  1.41s/it]

Error extracting text from http://38north.org/2015/10/sohae100915/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  67%|██████▋   | 666/1000 [32:13<05:28,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-britain-election-polls-idUSKBN1820RQ: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  67%|██████▋   | 668/1000 [32:14<04:13,  1.31it/s]

Error extracting text from https://www.instituteforgovernment.org.uk/brexit-explained/eu-divorce-bill: 404 Client Error: Not Found for url: https://www.instituteforgovernment.org.uk/brexit-explained/eu-divorce-bill


Processing URLs:  67%|██████▋   | 669/1000 [32:17<06:51,  1.24s/it]

Error extracting text from http://38north.org/2017/05/missile050217/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  67%|██████▋   | 673/1000 [32:23<06:24,  1.17s/it]

Error extracting text from http://www.wsj.com/articles/saudi-oil-ministers-exit-tightens-princes-grip-1462709963: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-oil-ministers-exit-tightens-princes-grip-1462709963


Processing URLs:  68%|██████▊   | 678/1000 [32:27<04:07,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-ttip-idUSKCN0ZA2V0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-ttip-idUSKCN0ZA2V0


Processing URLs:  68%|██████▊   | 683/1000 [32:33<06:41,  1.27s/it]

Error extracting text from http://www.fvs-ri.com/files/m_a.pdf: 404 Client Error: Not Found for url: https://www.flossbachvonstorch-researchinstitute.com/de/files/m_a.pdf


Processing URLs:  68%|██████▊   | 684/1000 [32:37<09:47,  1.86s/it]

Error extracting text from https://ccdcoe.org/research.html: 404 Client Error: Not Found for url: https://ccdcoe.org/research.html


Processing URLs:  68%|██████▊   | 685/1000 [32:37<07:10,  1.37s/it]

Error extracting text from http://apps.azsos.gov/election/2014/General/Canvass2014GE.pdf: 403 Client Error: Forbidden for url: https://apps.azsos.gov/election/2014/General/Canvass2014GE.pdf
URL filtered: http://www.bloomberg.com/quote/GGGB10YR:IND


Processing URLs:  69%|██████▊   | 687/1000 [32:38<05:20,  1.02s/it]

Error extracting text from https://www.mid.ru/en/foreign_policy/news/-/asset_publisher/cKNonkJE02Bw/content/id/4816379: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  69%|██████▉   | 690/1000 [32:40<03:40,  1.41it/s]

Error extracting text from https://www.dhs.gov/national-terrorism-advisory-system: 403 Client Error: Forbidden for url: https://www.dhs.gov/national-terrorism-advisory-system
Error extracting text from http://www.reuters.com/article/us-iran-europe-eu-idUSKCN0WB1HV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-europe-eu-idUSKCN0WB1HV


Processing URLs:  69%|██████▉   | 691/1000 [32:40<02:46,  1.85it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-raids-idUSKBN0UM0PX20160108: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-raids-idUSKBN0UM0PX20160108


Processing URLs:  70%|██████▉   | 698/1000 [32:54<08:21,  1.66s/it]

Error extracting text from https://www.nytimes.com/2020/11/17/world/middleeast/iran-biden-trump-nuclear-sanctions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/11/17/world/middleeast/iran-biden-trump-nuclear-sanctions.html


Processing URLs:  70%|███████   | 701/1000 [32:57<05:19,  1.07s/it]

Error extracting text from http://thehill.com/regulation/court-battles/289643-clintons-court-shortlist-emerges: 403 Client Error: Forbidden for url: https://thehill.com/regulation/court-battles/289643-clintons-court-shortlist-emerges/
Error extracting text from https://www.wsj.com/articles/north-korea-is-accelerating-plan-to-land-missiles-in-u-s-1494792608: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-is-accelerating-plan-to-land-missiles-in-u-s-1494792608
Error extracting text from https://www.reuters.com/lifestyle/sports/nadal-skips-barcelona-open-return-date-still-uncertain-2022-04-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/nadal-skips-barcelona-open-return-date-still-uncertain-2022-04-12/
URL filtered: https://www.linkedin.com/in/larswe/recent-activity/


Processing URLs:  70%|███████   | 705/1000 [32:59<03:43,  1.32it/s]

Error extracting text from https://www.nytimes.com/2021/06/25/opinion/coronavirus-lab.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/25/opinion/coronavirus-lab.html


Processing URLs:  71%|███████   | 708/1000 [33:02<03:54,  1.24it/s]

Error extracting text from http://i3.kym-cdn.com/entries/icons/original/000/007/423/untitle.JPG: 404 Client Error: Not Found for url: http://i3.kym-cdn.com/entries/icons/original/000/007/423/untitle.JPG


Processing URLs:  71%|███████▏  | 713/1000 [33:11<05:42,  1.19s/it]

Error extracting text from http://uk.reuters.com/article/uk-haiti-election-idUKKCN0XC0D6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from https://werideai.medium.com/weride-ceo-tony-han-committed-to-long-term-strategy-in-autonomous-driving-7037eb838cd3: 403 Client Error: Forbidden for url: https://werideai.medium.com/weride-ceo-tony-han-committed-to-long-term-strategy-in-autonomous-driving-7037eb838cd3


Processing URLs:  72%|███████▏  | 715/1000 [33:12<03:58,  1.19it/s]

Error extracting text from http://www.reuters.com/article/us-lemieux-scotus-commentary/commentary-why-supreme-court-must-tell-anti-gay-baker-his-cakes-arent-art-idUSKBN1D60B1?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-11-07&amp;utm_term=US%20Reuters%20News%20Now: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-lemieux-scotus-commentary/commentary-why-supreme-court-must-tell-anti-gay-baker-his-cakes-arent-art-idUSKBN1D60B1?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-11-07&amp;utm_term=US%20Reuters%20News%20Now


Processing URLs:  72%|███████▏  | 716/1000 [33:16<08:24,  1.78s/it]

Error extracting text from https://www.conservativehome.com/thetorydiary/2021/08/sunak-leads-our-first-next-tory-leader-survey-in-two-years.html: 403 Client Error: Forbidden for url: https://conservativehome.com/thetorydiary/2021/08/sunak-leads-our-first-next-tory-leader-survey-in-two-years.html


Processing URLs:  72%|███████▏  | 718/1000 [33:26<17:29,  3.72s/it]

Error extracting text from http://usacac.army.mil/sites/default/files/documents/ufmcs/The_Applied_Critical_Thinking_Handbook_v7.0.pdf: 404 Client Error: Not Found for url: https://usacac.army.mil/sites/default/files/documents/ufmcs/The_Applied_Critical_Thinking_Handbook_v7.0.pdf


Processing URLs:  72%|███████▏  | 722/1000 [33:31<08:31,  1.84s/it]

Error extracting text from http://www.oxforddictionaries.com/us/definition/american_english/sanction: 403 Client Error: Forbidden for url: https://languages.oup.com/


Processing URLs:  72%|███████▎  | 725/1000 [33:37<07:48,  1.70s/it]



Processing URLs:  73%|███████▎  | 728/1000 [33:40<04:53,  1.08s/it]

Error extracting text from https://info.kpmg.us/ma-survey-2017.html: 404 Client Error: Not Found for url: https://info.kpmg.us/ma-survey-2017.html
Error extracting text from http://www.reuters.com/article/us-exxon-mobil-deals-permian-idUSKBN15120F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-exxon-mobil-deals-permian-idUSKBN15120F


Processing URLs:  73%|███████▎  | 734/1000 [34:19<16:53,  3.81s/it]

Error extracting text from https://www.france24.com/en/live-news/20210610-fourteen-palestinians-arrested-after-israeli-mp-s-rally-police: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210610-fourteen-palestinians-arrested-after-israeli-mp-s-rally-police


Processing URLs:  74%|███████▍  | 738/1000 [34:25<09:25,  2.16s/it]

Error extracting text from https://www.sacurrent.com/the-daily/archives/2018/10/04/we-are-not-sin-city-houston-city-council-bans-proposed-sex-robot-brothel: 404 Client Error: Not Found for url: https://www.sacurrent.com/the-daily/archives/2018/10/04/we-are-not-sin-city-houston-city-council-bans-proposed-sex-robot-brothel


Processing URLs:  74%|███████▍  | 741/1000 [34:30<06:41,  1.55s/it]

Error extracting text from http://www.wsj.com/articles/irs-says-cyberattacks-on-taxpayer-accounts-more-extensive-than-previously-reported-1456514909: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/irs-says-cyberattacks-on-taxpayer-accounts-more-extensive-than-previously-reported-1456514909


Processing URLs:  74%|███████▍  | 742/1000 [34:35<10:50,  2.52s/it]

Error extracting text from http://thephilippinestar.ph/articles/2016-03-19/news/chinese-exclusion-zone-looms-in-spratlys/144536: HTTPConnectionPool(host='thephilippinestar.ph', port=80): Max retries exceeded with url: /articles/2016-03-19/news/chinese-exclusion-zone-looms-in-spratlys/144536 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3047239e0>: Failed to resolve 'thephilippinestar.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  74%|███████▍  | 745/1000 [34:38<06:13,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN14Q1QI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKBN14Q1QI
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN14H0WE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN14H0WE
URL filtered: http://mobile.reuters.com/article/idUSKBN1582XQ?feedType=RSS&amp;feedName=topNews&amp;utm_source=twitter&amp;utm_medium=Social


Processing URLs:  75%|███████▍  | 749/1000 [34:48<09:43,  2.33s/it]

Error extracting text from http://www.newsweek.com/2016/03/04/iraqi-forces-fighting-isis-ramadi-fallujah-mosul-430042.html: 403 Client Error: Forbidden for url: https://www.newsweek.com/2016/03/04/iraqi-forces-fighting-isis-ramadi-fallujah-mosul-430042.html


Processing URLs:  75%|███████▌  | 750/1000 [34:49<07:38,  1.83s/it]

Error extracting text from http://www.sandiegouniontribune.com/news/politics/sd-me-issa-maher-20170224-story.html: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/politics/sd-me-issa-maher-20170224-story.html


Processing URLs:  76%|███████▌  | 755/1000 [34:54<03:54,  1.04it/s]

Error extracting text from http://www.wsj.com/articles/iran-wants-humanitarian-solution-for-jailed-u-s-reporter-jason-rezaian-1445091276: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-wants-humanitarian-solution-for-jailed-u-s-reporter-jason-rezaian-1445091276
Error extracting text from http://www.reuters.com/article/us-iran-election-rouhani-idUSKBN17G0XU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-rouhani-idUSKBN17G0XU


Processing URLs:  76%|███████▌  | 759/1000 [35:01<05:40,  1.41s/it]

Error extracting text from http://www.nytimes.com/2016/02/10/us/politics/new-hampshire-primary.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/10/us/politics/new-hampshire-primary.html


Processing URLs:  76%|███████▌  | 761/1000 [36:03<1:16:12, 19.13s/it]

Error extracting text from https://archive.li/6svYX: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  76%|███████▌  | 762/1000 [36:05<55:30, 13.99s/it]  

Error extracting text from http://www.ajc.com/news/lifestyles/carter-center-launches-near-real-time-conflict-map/nqhPZ/: 404 Client Error: Not Found for url: https://www.ajc.com/news/lifestyles/carter-center-launches-near-real-time-conflict-map/nqhPZ/


Processing URLs:  76%|███████▋  | 764/1000 [36:11<31:55,  8.12s/it]

Error extracting text from http://www.democraticaudit.com/?p=16668: 403 Client Error: Forbidden for url: http://www.democraticaudit.com/?p=16668


Processing URLs:  76%|███████▋  | 765/1000 [36:11<23:00,  5.87s/it]

Error extracting text from http://nationalinterest.org/blog/the-skeptics/the-battle-mosul-doomed-18740: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-skeptics/the-battle-mosul-doomed-18740


Processing URLs:  78%|███████▊  | 776/1000 [36:34<06:10,  1.65s/it]

Error extracting text from http://allafrica.com/stories/201707210223.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201707210223.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3028e0470>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  78%|███████▊  | 778/1000 [36:37<05:11,  1.40s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/10/02/0200000000AEN20151002003151315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  78%|███████▊  | 783/1000 [36:47<06:17,  1.74s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AB0E7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AB0E7


Processing URLs:  78%|███████▊  | 785/1000 [36:48<03:31,  1.02it/s]

Error extracting text from http://www.scientificamerican.com/article/computer-beats-go-champion-for-first-time/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/computer-beats-go-champion-for-first-time/


Processing URLs:  79%|███████▉  | 789/1000 [36:54<04:21,  1.24s/it]

Error extracting text from http://www.wsj.com/articles/brazils-supreme-court-clears-way-for-impeachment-of-president-1450394252: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-supreme-court-clears-way-for-impeachment-of-president-1450394252


Processing URLs:  79%|███████▉  | 790/1000 [36:55<03:47,  1.08s/it]

Error extracting text from http://warontherocks.com/2016/06/a-guide-to-stepping-it-up-in-the-south-china-sea/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/06/a-guide-to-stepping-it-up-in-the-south-china-sea/


Processing URLs:  79%|███████▉  | 793/1000 [36:59<04:28,  1.30s/it]

Error extracting text from https://www.macrotrends.net/stocks/charts/AMZN/amazon/market-cap: 403 Client Error: Forbidden for url: https://www.macrotrends.net/stocks/charts/AMZN/amazon/market-cap


Processing URLs:  80%|███████▉  | 797/1000 [37:07<04:51,  1.44s/it]

Error extracting text from http://dealbreaker.com/2015/09/joe-biden-is-either-running-for-president-or-wasting-robert-wolfs-time/: 403 Client Error: Forbidden for url: http://dealbreaker.com/2015/09/joe-biden-is-either-running-for-president-or-wasting-robert-wolfs-time/


Processing URLs:  80%|███████▉  | 798/1000 [37:08<04:30,  1.34s/it]

Error extracting text from http://uvb-76.net/: 406 Client Error: Not Acceptable for url: http://uvb-76.net/


Processing URLs:  80%|████████  | 800/1000 [37:10<03:55,  1.18s/it]

Error extracting text from https://www.yahoo.com/finance/news/russia-gas-limits-pose-increasing-203016711.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/russia-gas-limits-pose-increasing-203016711.html


Processing URLs:  80%|████████  | 801/1000 [37:12<04:19,  1.30s/it]

URL filtered: https://www.youtube.com/results?search_query=franz+ferdinand+demagogue


Processing URLs:  80%|████████  | 804/1000 [37:13<02:45,  1.18it/s]

Error extracting text from http://www.wsj.com/articles/when-should-a-company-write-down-assets-1474064470: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/when-should-a-company-write-down-assets-1474064470


Processing URLs:  81%|████████  | 806/1000 [37:15<02:40,  1.21it/s]

Error extracting text from https://www.barrons.com/articles/chinas-services-sector-falls-to-lowest-level-since-pandemics-start-51630414072: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/chinas-services-sector-falls-to-lowest-level-since-pandemics-start-51630414072


Processing URLs:  81%|████████  | 807/1000 [37:18<04:09,  1.29s/it]

Error extracting text from http://blogs.wsj.com/briefly/2016/03/19/the-eu-turkey-migrants-deal-at-a-glance/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/briefly/2016/03/19/the-eu-turkey-migrants-deal-at-a-glance/


Processing URLs:  81%|████████  | 809/1000 [38:18<44:50, 14.09s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-09-19/trump-choice-for-russia-ambassador-no-question-russia-meddled: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  81%|████████▏ | 813/1000 [38:27<16:57,  5.44s/it]

Error extracting text from http://news.yahoo.com/french-us-host-defence-minister-talks-fight-against-023605233.html: 404 Client Error: Not Found for url: http://news.yahoo.com/french-us-host-defence-minister-talks-fight-against-023605233.html


Processing URLs:  81%|████████▏ | 814/1000 [38:27<12:18,  3.97s/it]

Error extracting text from http://www.reuters.com/article/2015/09/18/us-mideast-crisis-usa-russia-idUKKCN0RI1T520150918: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/18/us-mideast-crisis-usa-russia-idUKKCN0RI1T520150918


Processing URLs:  82%|████████▏ | 820/1000 [38:36<05:01,  1.68s/it]

Error extracting text from http://www.reuters.com/article/2015/10/06/imf-g20-japan-idUSL3N1251R920151006: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/06/imf-g20-japan-idUSL3N1251R920151006


Processing URLs:  82%|████████▏ | 822/1000 [38:43<06:31,  2.20s/it]

Error extracting text from http://www.investopedia.com/articles/insights/020917/scottish-independence-talk-regains-momentum.asp: 405 Client Error: Signal - Not Acceptable for url: http://www.investopedia.com/articles/insights/020917/scottish-independence-talk-regains-momentum.asp
Error extracting text from http://www.wsj.com/articles/china-japan-south-korea-to-hold-summit-this-year-1444821235: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-japan-south-korea-to-hold-summit-this-year-1444821235


Processing URLs:  82%|████████▏ | 824/1000 [38:43<03:24,  1.16s/it]

Error extracting text from https://www.nytimes.com/2017/07/26/world/europe/uk-diesel-petrol-emissions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/26/world/europe/uk-diesel-petrol-emissions.html
Error extracting text from http://www.reuters.com/article/us-eu-politics-tusk-idUSKCN1241NY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-politics-tusk-idUSKCN1241NY


Processing URLs:  83%|████████▎ | 830/1000 [38:52<04:17,  1.51s/it]

Error extracting text from https://www.jsg.utexas.edu/lacp/2016/03/deepening-default-fears-cast-shadow-over-venezuelas-oil-flows/: 410 Client Error: Gone for url: https://www.jsg.utexas.edu/lacp/2016/03/deepening-default-fears-cast-shadow-over-venezuelas-oil-flows/


Processing URLs:  83%|████████▎ | 831/1000 [38:52<03:25,  1.22s/it]

Error extracting text from http://thehill.com/policy/finance/255461-gop-lawmaker-plots-ex-im-power-play: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/255461-gop-lawmaker-plots-ex-im-power-play/


Processing URLs:  83%|████████▎ | 832/1000 [38:54<03:50,  1.37s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-24/economy-in-u-s-expands-more-than-first-estimated-on-inventories


Processing URLs:  84%|████████▎ | 837/1000 [38:58<02:06,  1.29it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-30/venezuela-cut-deeper-into-junk-by-fitch-on-probable-default
Error extracting text from http://www.washingtontimes.com/news/2016/jan/19/joseph-detraini-north-korea-may-be-ready-to-talk/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/19/joseph-detraini-north-korea-may-be-ready-to-talk/


Processing URLs:  84%|████████▍ | 838/1000 [38:58<01:48,  1.49it/s]

Error extracting text from http://news.asiaone.com/news/singapore/keppel-unit-targeted-brazils-biggest-corruption-probe: 404 Client Error: Not Found for url: https://www.asiaone.com/news/news/singapore/keppel-unit-targeted-brazils-biggest-corruption-probe


Processing URLs:  84%|████████▍ | 842/1000 [39:01<02:18,  1.14it/s]

Error extracting text from https://drive.google.com/file/d/0BwkmbNcJlrgYYzBaNmtVMUVnT1U/view?pref=2&amp;pli=1: 404 Client Error: Not Found for url: https://drive.google.com/file/d/0BwkmbNcJlrgYYzBaNmtVMUVnT1U/view?pref=2&amp;pli=1
URL filtered: https://www.bloomberg.com/news/articles/2017-02-28/rutte-expects-41-seats-in-dutch-vote-giving-him-a-path-to-power
URL filtered: http://foreignpolicy.com/2016/02/08/nigeria-is-coming-apart-at-the-seamsbiafra/?utm_content=buffer21c83&amp;utm_medium=social&amp;utm_source=facebook.com&amp;utm_campaign=buffer


Processing URLs:  85%|████████▍ | 849/1000 [39:13<04:50,  1.92s/it]

Error extracting text from http://www.selectagents.gov/index.html: 404 Client Error: Not Found for url: https://www.selectagents.gov/index.html
URL filtered: http://www.bloomberg.com/news/articles/2015-11-04/yellen-and-dudley-signal-december-is-still-live-for-rate-hike


Processing URLs:  85%|████████▌ | 852/1000 [39:23<06:44,  2.73s/it]

Error extracting text from http://vestnikkavkaza.net/analysis/Turkey-EU-relations-are-sinking.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/analysis/Turkey-EU-relations-are-sinking.html


Processing URLs:  86%|████████▌ | 857/1000 [39:28<03:12,  1.35s/it]

Error extracting text from https://www.devex.com/news/us-congress-to-keep-pressure-on-ethiopia-after-unilateral-cease-fire-100272: 403 Client Error: Forbidden for url: https://www.devex.com/news/us-congress-to-keep-pressure-on-ethiopia-after-unilateral-cease-fire-100272


Processing URLs:  86%|████████▌ | 861/1000 [39:32<02:20,  1.01s/it]

Error extracting text from http://www.latimes.com/politics/la-na-trump-polls-20151221-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-trump-polls-20151221-story.html


Processing URLs:  86%|████████▌ | 862/1000 [39:34<03:03,  1.33s/it]

Error extracting text from http://pilotonline.com/news/military/local/norfolk-based-uss-san-antonio-came-under-missile-attack-with/article_75be0991-0713-5cd8-b3ee-e8fa08533133.html: 404 Client Error: Not Found for url: https://www.pilotonline.com/news/military/local/norfolk-based-uss-san-antonio-came-under-missile-attack-with/article_75be0991-0713-5cd8-b3ee-e8fa08533133.html


Processing URLs:  86%|████████▋ | 864/1000 [39:37<03:01,  1.33s/it]

URL filtered: http://www.welt.de/politik/deutschland/article152460616/200-000-Fluechtlinge-warten-auf-Ueberfahrt-nach-Europa.html?wtrid=socialmedia.socialflow....socialflow_twitter


Processing URLs:  87%|████████▋ | 866/1000 [39:37<01:59,  1.12it/s]

Error extracting text from http://investmentguruindia.com/StockMarket/rare-metals-investment-blow-up-shows-risks-lurking-in-chinas-financial-system: HTTPSConnectionPool(host='investmentguruindia.com', port=443): Max retries exceeded with url: /StockMarket/rare-metals-investment-blow-up-shows-risks-lurking-in-chinas-financial-system (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://www.bloomberg.com/news/articles/2017-11-07/pdvsa-credit-default-swaps-hit-record-as-time-runs-out-to-pay
URL filtered: https://www.engadget.com/2016/12/11/facebook-adds-a-fake-news-reporting-option/


Processing URLs:  87%|████████▋ | 872/1000 [39:40<01:08,  1.87it/s]

URL filtered: https://twitter.com/ioannZH/status/1395012653836349441
Error extracting text from https://www.predictit.org/home/browse?Search=dutch&amp;isSearch=true: 403 Client Error: Forbidden for url: https://www.predictit.org/home/browse?Search=dutch&amp;isSearch=true
URL filtered: http://www.bloomberg.com/graphics/2016-china-debt/


Processing URLs:  87%|████████▋ | 874/1000 [39:40<00:47,  2.68it/s]

Error extracting text from http://www.iol.co.za/news/politics/zuma-under-fire-as-dossier-leaked-2056520: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/zuma-under-fire-as-dossier-leaked-2056520


Processing URLs:  88%|████████▊ | 882/1000 [39:46<00:52,  2.26it/s]

URL filtered: https://www.bloomberg.com/news/articles/2020-12-15/supersonic-jet-startup-boom-technology-is-now-a-unicorn
Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15M2DU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15M2DU
Error extracting text from http://www.reuters.com/article/2015/10/28/us-usa-fed-idUSKCN0SM0BJ20151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-usa-fed-idUSKCN0SM0BJ20151028


Processing URLs:  88%|████████▊ | 884/1000 [39:47<00:57,  2.01it/s]

Error extracting text from http://www.fxstreet.com/news/forex-news/article.aspx?storyid=561fc252-6a13-4d71-b2be-6be53845ca44: 410 Client Error: Gone for url: http://www.fxstreet.com/news/561fc252-6a13-4d71-b2be-6be53845ca44


Processing URLs:  89%|████████▊ | 886/1000 [39:49<01:20,  1.42it/s]

Error extracting text from http://www.guardian.com/politics2c: 404 Client Error: Not Found for url: http://www.guardian.com/politics2c


Processing URLs:  89%|████████▊ | 887/1000 [39:50<01:26,  1.30it/s]

Error extracting text from https://www.yahoo.com/finance/news/u-house-not-tackle-healthcare-ryan-says-reuters-133558840--business.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/u-house-not-tackle-healthcare-ryan-says-reuters-133558840--business.html
URL filtered: http://www.bloomberg.com/politics/trackers/2015-12-29/trump-says-he-will-spend-big-in-iowa-n-h-s-c-


Processing URLs:  89%|████████▉ | 893/1000 [39:58<02:07,  1.19s/it]

Error extracting text from http://internal.uk.mobile.reuters.com/article/idUKKCN0ZO22K: HTTPConnectionPool(host='internal.uk.mobile.reuters.com', port=80): Max retries exceeded with url: /article/idUKKCN0ZO22K (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffe69010>: Failed to resolve 'internal.uk.mobile.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|████████▉ | 895/1000 [40:00<01:51,  1.06s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-28/most-of-china-s-electric-car-startups-face-wipeout-by-new-rules


Processing URLs:  90%|████████▉ | 897/1000 [40:01<01:16,  1.35it/s]

URL filtered: http://www.reuters.com/article/us-mideast-crisis-syria-assad-iduskcn0yt18z?utm_campaign=trueAnthem:+Trending+Content&amp;utm_content=5756d48e04d30128af738fa8&amp;utm_medium=trueAnthem&amp;utm_source=twitter


Processing URLs:  90%|█████████ | 900/1000 [40:04<01:43,  1.04s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/eu-trade-development-ministers-hold-first-joint-gathering-eyeing-increased: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/eu-trade-development-ministers-hold-first-joint-gathering-eyeing-increased


Processing URLs:  90%|█████████ | 901/1000 [41:05<24:49, 15.05s/it]

Error extracting text from http://ewp.dali.dartmouth.edu/questions/28: HTTPConnectionPool(host='ewp.dali.dartmouth.edu', port=80): Max retries exceeded with url: /questions/28 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffe6ad20>, 'Connection to ewp.dali.dartmouth.edu timed out. (connect timeout=60)'))


Processing URLs:  90%|█████████ | 903/1000 [41:06<14:01,  8.67s/it]

Error extracting text from http://talkingpointsmemo.com/election/2016/us-senate: 404 Client Error: Not Found for url: https://talkingpointsmemo.com/election/2016/us-senate


Processing URLs:  90%|█████████ | 904/1000 [42:07<36:37, 22.89s/it]

Error extracting text from http://www.ledger-enquirer.com/news/business/article131254869.html: HTTPConnectionPool(host='www.ledger-enquirer.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  90%|█████████ | 905/1000 [42:07<26:09, 16.52s/it]

Error extracting text from https://www.wsj.com/articles/saudi-king-leads-hundreds-of-princes-clerics-military-officials-on-asia-trip-1488197009: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-king-leads-hundreds-of-princes-clerics-military-officials-on-asia-trip-1488197009
URL filtered: https://www.youtube.com/watch?v=36fxbG_HDyo


Processing URLs:  91%|█████████ | 908/1000 [42:09<11:14,  7.33s/it]

Error extracting text from http://www.counterpunch.org/2015/09/18/re-opening-american-and-iranian-embassies-expected-soon/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2015/09/18/re-opening-american-and-iranian-embassies-expected-soon/


Processing URLs:  91%|█████████ | 912/1000 [42:15<04:20,  2.96s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/southeast-asian-nations-oppose-arms-embargo-myanmar-report-2021-05-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/southeast-asian-nations-oppose-arms-embargo-myanmar-report-2021-05-28/


Processing URLs:  92%|█████████▏| 918/1000 [42:26<02:49,  2.06s/it]

Error extracting text from https://missilethreat.csis.org/russia-to-deploy-sarmat-icbm-in-2021/: 403 Client Error: Forbidden for url: https://missilethreat.csis.org/russia-to-deploy-sarmat-icbm-in-2021/


Processing URLs:  92%|█████████▏| 919/1000 [42:29<02:57,  2.20s/it]

Error extracting text from http://www.reuters.com/video/2016/03/03/harsh-sanctions-imposed-on-north-korea?videoId=367601758: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/03/03/harsh-sanctions-imposed-on-north-korea?videoId=367601758


Processing URLs:  92%|█████████▏| 921/1000 [42:29<01:42,  1.30s/it]

Error extracting text from https://www.nytimes.com/2017/06/10/us/politics/trump-comey-russia-fbi.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/10/us/politics/trump-comey-russia-fbi.html?_r=0


Processing URLs:  92%|█████████▏| 923/1000 [42:30<01:10,  1.09it/s]

Error extracting text from http://www.latimes.com/world/mexico-americas/la-fg-us-colombia-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/mexico-americas/la-fg-us-colombia-story.html


Processing URLs:  92%|█████████▏| 924/1000 [42:31<01:12,  1.04it/s]

Error extracting text from http://dragonproductsltd.com/iran-take-reins-domestic-oil-gas-drilling-projects/: 403 Client Error: Forbidden for url: http://dragonproductsltd.com/iran-take-reins-domestic-oil-gas-drilling-projects/


Processing URLs:  92%|█████████▎| 925/1000 [42:31<00:58,  1.28it/s]

Error extracting text from http://www.thestreet.com/story/13301131/1/yellen-indicates-fed-is-poised-to-grant-bill-gross-s-wish-on-interest-rates.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/13301131/1/yellen-indicates-fed-is-poised-to-grant-bill-gross-s-wish-on-interest-rates.html


Processing URLs:  93%|█████████▎| 932/1000 [42:40<01:01,  1.11it/s]

Error extracting text from http://www.france24.com/en/20160713-spanish-pm-without-allies-form-government-after-party-talks: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160713-spanish-pm-without-allies-form-government-after-party-talks
Error extracting text from http://www.nato.int/cps/en/natohq/topics_49727.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_49727.htm


Processing URLs:  94%|█████████▎| 937/1000 [42:43<00:39,  1.59it/s]

URL filtered: http://www.bloomberg.com/politics/articles/2015-09-23/bloomberg-poll-joe-biden-now-top-presidential-choice-for-1-in-4-democrats


Processing URLs:  94%|█████████▍| 941/1000 [43:46<15:41, 15.95s/it]

Error extracting text from http://www.itv.com/news/border/story/2017-03-13/sturgeon-to-seek-approval-for-second-independence-referendum/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  94%|█████████▍| 945/1000 [43:50<04:31,  4.94s/it]

Error extracting text from http://www.wsj.com/articles/venezuela-regime-fights-back-against-surging-opposition-as-elections-near-1449191000: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuela-regime-fights-back-against-surging-opposition-as-elections-near-1449191000


Processing URLs:  95%|█████████▍| 949/1000 [43:55<01:47,  2.10s/it]

Error extracting text from https://www.nytimes.com/2017/07/09/technology/att-time-warner-merger.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/09/technology/att-time-warner-merger.html


Processing URLs:  95%|█████████▌| 952/1000 [44:14<04:51,  6.07s/it]

Error extracting text from https://www.almasdarnews.com/article/egypt-promises-new-air-terminal-russian-tourists/: 522 Server Error:  for url: https://www.almasdarnews.com/article/egypt-promises-new-air-terminal-russian-tourists/


Processing URLs:  95%|█████████▌| 954/1000 [44:16<02:38,  3.44s/it]

Error extracting text from http://www.wsj.com/articles/saudi-arabia-executes-dozens-for-terrorism-1451726342: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabia-executes-dozens-for-terrorism-1451726342


Processing URLs:  96%|█████████▌| 955/1000 [44:17<01:58,  2.63s/it]

Error extracting text from http://www.amazon.com/The-Fall-UBS-Reasons-Switzerland/dp/0944188206: 500 Server Error: Internal Server Error for url: https://www.amazon.com/The-Fall-UBS-Reasons-Switzerland/dp/0944188206


Processing URLs:  96%|█████████▌| 959/1000 [44:22<01:10,  1.71s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/afghanistan-welcomes/2864604.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/afghanistan-welcomes/2864604.html


Processing URLs:  96%|█████████▌| 960/1000 [44:23<00:51,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-garland-idUSKCN11C2HM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-garland-idUSKCN11C2HM


Processing URLs:  96%|█████████▌| 962/1000 [44:25<00:41,  1.09s/it]

Error extracting text from https://www.afghanistan-analysts.org/the-afghanistan-election-conundrum-3-the-dilemma-of-electoral-constituencies/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/the-afghanistan-election-conundrum-3-the-dilemma-of-electoral-constituencies/


Processing URLs:  96%|█████████▋| 964/1000 [44:28<00:53,  1.47s/it]

Error extracting text from http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/NKorea%20SRES%201718.pdf: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/NKorea%20SRES%201718.pdf


Processing URLs:  97%|█████████▋| 966/1000 [44:30<00:37,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKCN0X81YA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKCN0X81YA


Processing URLs:  97%|█████████▋| 967/1000 [44:30<00:27,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-tesla-product-idUSKCN10Y1R2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-product-idUSKCN10Y1R2


Processing URLs:  97%|█████████▋| 970/1000 [44:38<00:44,  1.47s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2015/12/22/a_brokered_convention_in_2016_why_it_might_happen_what_it_might_mean_129119.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2015/12/22/a_brokered_convention_in_2016_why_it_might_happen_what_it_might_mean_129119.html
Error extracting text from http://www.reuters.com/article/us-usa-pipeline-keystone-transcanada-idUSKBN16V1CN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-pipeline-keystone-transcanada-idUSKBN16V1CN


Processing URLs:  97%|█████████▋| 971/1000 [44:39<00:40,  1.41s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-poll-idUKKCN1002A0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  98%|█████████▊| 977/1000 [44:52<01:00,  2.64s/it]

Error extracting text from https://apollo.auto/robotaxi/index.html: 404 Client Error: Not Found for url: https://www.apollo.auto/robotaxi/index.html


Processing URLs:  98%|█████████▊| 978/1000 [44:53<00:42,  1.91s/it]

Error extracting text from http://news.yahoo.com/brazils-rousseff-vows-never-resign-172516303.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazils-rousseff-vows-never-resign-172516303.html


Processing URLs:  98%|█████████▊| 983/1000 [44:59<00:24,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN17Q1KO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN17Q1KO


Processing URLs:  99%|█████████▊| 987/1000 [45:06<00:19,  1.53s/it]

Error extracting text from http://www.worldbulletin.net/burundi/169458/dozen-wounded-in-burundi-grenade-attacks: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/burundi/169458/dozen-wounded-in-burundi-grenade-attacks


Processing URLs:  99%|█████████▉| 989/1000 [45:46<02:12, 12.08s/it]

Error extracting text from http://www.uawire.org/news/polish-president-signs-law-on-demolition-of-communist-monuments: 502 Server Error: Bad Gateway for url: https://www.uawire.org/news/polish-president-signs-law-on-demolition-of-communist-monuments


Processing URLs:  99%|█████████▉| 990/1000 [46:46<04:24, 26.48s/it]

Error extracting text from http://www.miamiherald.com/news/business/biz-monday/article49418020.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs: 100%|█████████▉| 995/1000 [46:54<00:28,  5.72s/it]

Error extracting text from http://www.nytimes.com/2016/04/12/business/tesla-recalls-2700-model-x-suvs-for-a-seat-back-issue.html?emc=edit_th_20160412&amp;nl=todaysheadlines&amp;nlid=45205797: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/12/business/tesla-recalls-2700-model-x-suvs-for-a-seat-back-issue.html?emc=edit_th_20160412&amp;nl=todaysheadlines&amp;nlid=45205797


Processing URLs: 100%|█████████▉| 997/1000 [46:56<00:09,  3.19s/it]

Error extracting text from https://tradingeconomics.com/iran/crude-oil-production: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/iran/crude-oil-production


Processing URLs: 100%|██████████| 1000/1000 [47:00<00:00,  2.82s/it]
Processing URLs:   0%|          | 3/1000 [00:04<22:54,  1.38s/it]

Error extracting text from http://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2254%282015%29: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2254%282015%29


Processing URLs:   1%|          | 6/1000 [00:08<19:12,  1.16s/it]

Error extracting text from http://www.reuters.com/article/iran-oil-exports-idUSL3N1JQ2LK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/iran-oil-exports-idUSL3N1JQ2LK


Processing URLs:   1%|▏         | 13/1000 [00:19<18:03,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-usa-election-trump-idUSKCN0V00C9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-trump-idUSKCN0V00C9


Processing URLs:   1%|▏         | 14/1000 [00:19<14:44,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKCN0WB1J0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKCN0WB1J0


Processing URLs:   2%|▏         | 15/1000 [00:32<1:12:31,  4.42s/it]

Error extracting text from https://www.google.ca/amp/s/www.washingtonpost.com/amphtml/news/post-nation/wp/2016/06/30/adnan-syed-granted-new-trial-in-serial-case-attorney-says/?client=ms-android-rogers-ca#: 404 Client Error: Not Found for url: https://www.washingtonpost.com/amphtml/news/post-nation/wp/2016/06/30/adnan-syed-granted-new-trial-in-serial-case-attorney-says/


Processing URLs:   2%|▎         | 25/1000 [00:54<16:38,  1.02s/it]  

Error extracting text from http://www.nytimes.com/2016/03/11/us/politics/senate-judiciary-committee-supreme-court-nominee.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/11/us/politics/senate-judiciary-committee-supreme-court-nominee.html


Processing URLs:   3%|▎         | 28/1000 [00:58<19:55,  1.23s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/iran-s-crude-oil-exports-increase-to-level-last-seen-in-1970s


Processing URLs:   4%|▎         | 35/1000 [01:11<23:40,  1.47s/it]

Error extracting text from https://www.nytimes.com/2017/06/27/technology/eu-google-fine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/27/technology/eu-google-fine.html
URL filtered: https://www.youtube.com/watch?v=SUbqykXVx0A


Processing URLs:   4%|▍         | 40/1000 [01:16<17:15,  1.08s/it]

URL filtered: https://www.youtube.com/results?search_query=ted+cruz+speech
Error extracting text from https://www.reuters.com/world/europe/russia-finds-navalny-ally-sobol-guilty-breaching-covid-19-health-rules-2021-08-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russia-finds-navalny-ally-sobol-guilty-breaching-covid-19-health-rules-2021-08-03/


Processing URLs:   4%|▍         | 44/1000 [06:27<22:10:30, 83.50s/it]

Error extracting text from http://www.japantoday.com/category/politics/view/abe-plans-busy-overseas-travel-schedule-for-next-three-months: 404 Client Error: Not Found for url: https://japantoday.com/category/politics/abe-plans-busy-overseas-travel-schedule-for-next-three-months


Processing URLs:   4%|▍         | 45/1000 [07:27<20:25:17, 76.98s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-03-25/trump-holds-delegate-lead-but-cruz-maneuvers-to-outflank-him-at-convention: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   5%|▍         | 46/1000 [07:29<14:45:24, 55.69s/it]

Error extracting text from https://www.sec.gov/edgar/browse/?CIK=1418091&owner=exclude: 403 Client Error: Forbidden for url: https://www.sec.gov/edgar/browse/?CIK=1418091&owner=exclude


Processing URLs:   5%|▍         | 48/1000 [07:33<7:37:10, 28.81s/it] 

Error extracting text from http://www.jsonline.com/news/statepolitics/big-names-filling-ron-johnsons-russ-feingolds-coffers-b99672922z1-369600651.html: 404 Client Error: OK for url: https://www.jsonline.com/news/statepolitics/big-names-filling-ron-johnsons-russ-feingolds-coffers-b99672922z1-369600651.html/


Processing URLs:   5%|▌         | 53/1000 [07:42<1:33:57,  5.95s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2?il=0


Processing URLs:   6%|▌         | 58/1000 [07:51<40:24,  2.57s/it]  

Error extracting text from https://blog.osvdb.org/category/vulnerability-statistics/: HTTPSConnectionPool(host='blog.osvdb.org', port=443): Max retries exceeded with url: /category/vulnerability-statistics/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blog.osvdb.org'. (_ssl.c:1000)")))


Processing URLs:   6%|▌         | 61/1000 [07:59<41:16,  2.64s/it]

URL filtered: https://www.bloomberg.com/quicktake/iran-s-oil


Processing URLs:   7%|▋         | 66/1000 [08:06<27:42,  1.78s/it]

Error extracting text from http://news.xinhuanet.com/english/2008-04/11/content_7959627.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2008-04/11/content_7959627.htm


Processing URLs:   7%|▋         | 67/1000 [08:06<21:50,  1.40s/it]

Error extracting text from https://www.sciencedirect.com/science/article/pii/S0048969721004812?via%3Dihub: 403 Client Error: Forbidden for url: https://www.sciencedirect.com/science/article/pii/S0048969721004812?via%3Dihub


Processing URLs:   7%|▋         | 68/1000 [08:08<23:34,  1.52s/it]

Error extracting text from http://yahoolabs.tumblr.com/post/120705107556/a-novel-diverse-dataset-for-automatic-video: 404 Client Error: Not Found for url: https://yahoolabs.tumblr.com/post/120705107556/a-novel-diverse-dataset-for-automatic-video


Processing URLs:   7%|▋         | 71/1000 [08:10<13:32,  1.14it/s]

URL filtered: http://pro.boxoffice.com/twitter/today/
Error extracting text from http://www.nytimes.com/2016/02/26/world/africa/burundi-violence.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/26/world/africa/burundi-violence.html?_r=0


Processing URLs:   8%|▊         | 75/1000 [08:12<08:50,  1.74it/s]

Error extracting text from http://www.reuters.com/article/southchinasea-china-idUSKBN0UI1ZX20160105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/southchinasea-china-idUSKBN0UI1ZX20160105
Error extracting text from http://www.nytimes.com/reuters/2016/02/10/world/asia/10reuters-southchinasea-india-usa.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/02/10/world/asia/10reuters-southchinasea-india-usa.html


Processing URLs:   8%|▊         | 78/1000 [08:28<43:31,  2.83s/it]  

Error extracting text from https://www.nytimes.com/2017/11/03/us/politics/republicans-tax-cut-obamacare-mandate-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/03/us/politics/republicans-tax-cut-obamacare-mandate-trump.html


Processing URLs:   8%|▊         | 79/1000 [08:30<38:45,  2.53s/it]

Error extracting text from http://www.ibtimes.com/iran-ramps-presence-syria-deploys-troops-ceasefire-breaks-down-2354965: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-ramps-presence-syria-deploys-troops-ceasefire-breaks-down-2354965


Processing URLs:   8%|▊         | 80/1000 [09:30<5:02:41, 19.74s/it]

Error extracting text from http://archive.is/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:   8%|▊         | 81/1000 [09:33<3:45:31, 14.72s/it]

Error extracting text from https://fcw.com/articles/2017/09/28/dhs-jeh-johnson-election-attack-forum.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2017/09/28/dhs-jeh-johnson-election-attack-forum.aspx/


Processing URLs:   8%|▊         | 85/1000 [09:43<1:18:34,  5.15s/it]

Error extracting text from http://www.pressdemocrat.com/news/4456885-181/north-coast-congressman-jared-huffman: 403 Client Error: Forbidden for url: https://www.pressdemocrat.com/news/4456885-181/north-coast-congressman-jared-huffman


Processing URLs:   9%|▊         | 86/1000 [09:45<1:03:17,  4.16s/it]

Error extracting text from https://slatestarcodex.com/2019/01/04/preregistration-of-investigations-for-the-2019-ssc-survey/: 403 Client Error: Forbidden for url: https://slatestarcodex.com/2019/01/04/preregistration-of-investigations-for-the-2019-ssc-survey/


Processing URLs:   9%|▉         | 89/1000 [09:46<25:43,  1.69s/it]  

Error extracting text from https://www.hindustantimes.com/india-news/delhi-hc-asks-libgen-sci-hub-to-stop-uploading-articles-as-they-face-copyright-infringement-charges/story-cRWCB1sGs1yMqR3TCpuvmL.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/india-news/delhi-hc-asks-libgen-sci-hub-to-stop-uploading-articles-as-they-face-copyright-infringement-charges/story-cRWCB1sGs1yMqR3TCpuvmL.html
Error extracting text from https://www.reuters.com/article/us-usa-fed-powell-idUSKBN2C12HZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fed-powell-idUSKBN2C12HZ


Processing URLs:   9%|▉         | 90/1000 [09:48<23:04,  1.52s/it]

Error extracting text from http://in.reuters.com/article/2015/09/24/china-oil-soereform-idINL4N11T56320150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from https://www.reuters.com/article/us-britain-politics-scotland/scottish-support-for-independence-drops-poll-shows-idUSKBN2AB146: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-politics-scotland/scottish-support-for-independence-drops-poll-shows-idUSKBN2AB146


Processing URLs:  10%|▉         | 95/1000 [10:22<2:20:57,  9.34s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_dutch-pm-rutte-urges-ankara-to-cut-migrant-flows-towards-zero_413917.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_dutch-pm-rutte-urges-ankara-to-cut-migrant-flows-towards-zero_413917.html


Processing URLs:  10%|▉         | 96/1000 [10:23<1:42:23,  6.80s/it]

Error extracting text from https://www.nytimes.com/2021/01/26/world/europe/biden-putin-nuclear-treaty.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/26/world/europe/biden-putin-nuclear-treaty.html
URL filtered: https://www.facebook.com/XinhuaNewsAgency/posts/the-russian-navy-will-join-the-aman-2021-drills-off-the-coast-of-pakistan-with-n/4611956642165077/


Processing URLs:  11%|█         | 107/1000 [10:44<33:09,  2.23s/it] 

Error extracting text from https://www.nytimes.com/2017/11/29/world/europe/brexit-divorce-demands.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/29/world/europe/brexit-divorce-demands.html
URL filtered: https://twitter.com/SoberLook/status/654454067348029446?lang=en
URL filtered: http://www.sandiegouniontribune.com/opinion/the-conversation/sd-how-much-money-russians-spent-twitter-facebook-ads-20170928-htmlstory.html


Processing URLs:  11%|█         | 110/1000 [10:45<18:20,  1.24s/it]

URL filtered: https://twitter.com/ArmsControlWonk/status/1482029904753577988
URL filtered: https://www.youtube.com/watch?v=ejwrxGs_Y_I


Processing URLs:  11%|█▏        | 114/1000 [10:47<11:59,  1.23it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/pa/pennsylvania_senate_toomey_vs_sestak-3928.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/pa/pennsylvania_senate_toomey_vs_sestak-3928.html


Processing URLs:  12%|█▏        | 118/1000 [10:55<23:38,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-12/bank-of-america-has-some-good-news-for-apple-and-it-comes-from-china


Processing URLs:  12%|█▏        | 121/1000 [10:58<17:39,  1.20s/it]

Error extracting text from http://heather-maclean.com/how-the-new-york-times-bestseller-list-works/: 404 Client Error: Not Found for url: http://heather-maclean.com/how-the-new-york-times-bestseller-list-works/


Processing URLs:  13%|█▎        | 127/1000 [12:16<4:43:05, 19.46s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-01-12/iraqi-pm-vows-to-expel-is-after-deadly-mall-attack: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  13%|█▎        | 129/1000 [12:17<2:25:37, 10.03s/it]

Error extracting text from http://www.tradingeconomics.com/united-states/corruption-rank: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/corruption-rank


Processing URLs:  13%|█▎        | 131/1000 [12:20<1:20:13,  5.54s/it]

Error extracting text from http://seekingalpha.com/article/4056248-lithium-evs-teslas-outsourcing-vs-toyotas-vertical-integration?app=1&amp;auth_param=1dvndc:1ccpn5s:b97325622e131947d8b9f228386eed20&amp;dr=1#alt2: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4056248-lithium-evs-teslas-outsourcing-vs-toyotas-vertical-integration?app=1&amp;auth_param=1dvndc:1ccpn5s:b97325622e131947d8b9f228386eed20&amp;dr=1#alt2


Processing URLs:  13%|█▎        | 133/1000 [12:24<54:05,  3.74s/it]  

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&amp;s=wcrfpus2&amp;f=4).: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  14%|█▎        | 137/1000 [12:27<22:15,  1.55s/it]

Error extracting text from https://www.nytimes.com/2017/11/18/world/asia/afghanistan-taliban-army-recruitment.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/18/world/asia/afghanistan-taliban-army-recruitment.html


Processing URLs:  14%|█▍        | 140/1000 [12:34<29:05,  2.03s/it]

Error extracting text from http://www.stripes.com/news/pacific/island-standoff-clouds-china-japan-ties-in-run-up-to-g-20-1.426216: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/asia_pacific/island-standoff-clouds-china-japan-ties-in-run-up-to-g-20-1.426216


Processing URLs:  15%|█▍        | 148/1000 [12:44<16:33,  1.17s/it]

Error extracting text from http://www.nti.org/treaties-and-regimes/association-southeast-asian-nations-asean/: 403 Client Error: Forbidden for url: https://www.nti.org/treaties-and-regimes/association-southeast-asian-nations-asean/


Processing URLs:  16%|█▌        | 158/1000 [13:03<26:07,  1.86s/it]

Error extracting text from http://www.frontiermyanmar.net/en/old-regime-prepares-hand-over-power-national-league-democracy: 404 Client Error: Not Found for url: https://www.frontiermyanmar.net/en/old-regime-prepares-hand-over-power-national-league-democracy


Processing URLs:  16%|█▌        | 161/1000 [13:11<31:12,  2.23s/it]

Error extracting text from http://www.lowyinterpreter.org/post/2016/03/11/After-40-years-Five-eyes-out-in-the-open.aspx: 404 Client Error: Not Found for url: https://www.lowyinstitute.org/the-interpreter/post/2016/03/11/After-40-years-Five-eyes-out-in-the-open.aspx


Processing URLs:  16%|█▋        | 163/1000 [13:14<27:24,  1.96s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/05/14/754030/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/05/14/754030/story.html


Processing URLs:  17%|█▋        | 168/1000 [13:26<29:01,  2.09s/it]

Error extracting text from http://www.wsj.com/articles/nato-set-to-invite-montenegro-to-join-in-first-expansion-of-alliance-since-2009-1448908484: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nato-set-to-invite-montenegro-to-join-in-first-expansion-of-alliance-since-2009-1448908484


Processing URLs:  17%|█▋        | 171/1000 [13:28<13:23,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-alphabet-uber-ruling/u-s-judge-says-uber-withheld-evidence-delays-waymo-trial-idUSKBN1DS26X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-alphabet-uber-ruling/u-s-judge-says-uber-withheld-evidence-delays-waymo-trial-idUSKBN1DS26X
Error extracting text from http://www.reuters.com/article/us-opec-venezuela-idUSKCN0VS2PY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-venezuela-idUSKCN0VS2PY


Processing URLs:  17%|█▋        | 173/1000 [13:29<10:01,  1.37it/s]

Error extracting text from http://www.nationmultimedia.com/breakingnews/Myanmar-brings-forward-nomination-election-of-next-30280518.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/breakingnews/Myanmar-brings-forward-nomination-election-of-next-30280518.html


Processing URLs:  18%|█▊        | 180/1000 [13:40<18:42,  1.37s/it]

Error extracting text from https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1484699123&amp;sr=1-1&amp;keywords=the+dictator%27s+handbook: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1484699123&amp;sr=1-1&amp;keywords=the+dictator%27s+handbook


Processing URLs:  18%|█▊        | 185/1000 [13:50<23:01,  1.69s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-14/brazil-congress-to-revive-impeachment-following-record-protests


Processing URLs:  19%|█▉        | 191/1000 [13:55<16:45,  1.24s/it]

Error extracting text from http://www.joc.com/maritime-news/panama-learn-december-whether-canal-opening-stays-schedule_20151118.html: 404 Client Error: Not Found for url: https://www.joc.com/article/panama-learn-december-whether-canal-opening-stays-schedule_20151118.html


Processing URLs:  20%|█▉        | 195/1000 [14:00<13:13,  1.02it/s]

Error extracting text from http://www.nytimes.com/2016/08/04/business/teslas-big-loss-reflects-its-costly-ambitions.html?emc=edit_th_20160804&amp;amp;nl=todaysheadlines&amp;amp;nlid=28699183&amp;amp;_r=5: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/04/business/teslas-big-loss-reflects-its-costly-ambitions.html?emc=edit_th_20160804&amp;amp;nl=todaysheadlines&amp;amp;nlid=28699183&amp;amp;_r=5


Processing URLs:  20%|██        | 200/1000 [14:07<12:15,  1.09it/s]

Error extracting text from https://www.nti.org/analysis/articles/cns-north-korea-missile-test-database/: 403 Client Error: Forbidden for url: https://www.nti.org/analysis/articles/cns-north-korea-missile-test-database/
Error extracting text from http://www.latimes.com/science/sciencenow/la-sci-sn-fda-drugs-safety-20170509-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/science/sciencenow/la-sci-sn-fda-drugs-safety-20170509-story.html


Processing URLs:  20%|██        | 202/1000 [14:10<15:54,  1.20s/it]

URL filtered: https://twitter.com/thedailybeast/status/1497071843848638485


Processing URLs:  20%|██        | 204/1000 [14:12<13:27,  1.01s/it]

Error extracting text from https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/609c4eb73fc8f9142091b468/1620856545962/REINZ+Monthly+Property+Report+-+April+2021.pdf: 403 Client Error: Forbidden for url: https://static1.squarespace.com/static/5ce1fd700bf20400017d3a30/t/609c4eb73fc8f9142091b468/1620856545962/REINZ+Monthly+Property+Report+-+April+2021.pdf


Processing URLs:  21%|██        | 207/1000 [15:13<2:37:50, 11.94s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-06-02/the-iranian-fingerprints-on-iraqs-fallujah-plan: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/2015/07/14/venezuela-opposition-idUSL2N0ZU1T420150714: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/07/14/venezuela-opposition-idUSL2N0ZU1T420150714


Processing URLs:  21%|██        | 210/1000 [15:16<1:05:54,  5.01s/it]

Error extracting text from https://www.wsj.com/articles/five-big-players-steer-trumps-foreign-policy-toward-the-mainstream-1491839286: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/five-big-players-steer-trumps-foreign-policy-toward-the-mainstream-1491839286


Processing URLs:  22%|██▏       | 217/1000 [15:27<20:37,  1.58s/it]  

Error extracting text from https://www.transparency.org/country/#COG: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#COG


Processing URLs:  22%|██▏       | 220/1000 [15:35<32:17,  2.48s/it]

Error extracting text from http://latinvex.com/mobile/article.aspx?id=2683: 404 Client Error: Not Found for url: https://latinvex.com/mobile/article.aspx?id=2683


Processing URLs:  22%|██▏       | 223/1000 [15:37<16:25,  1.27s/it]

Error extracting text from http://www.cdm.me/english/research-by-ipsos-for-nato-52-percent-of-citizens: 403 Client Error: Forbidden for url: https://www.cdm.me/english/research-by-ipsos-for-nato-52-percent-of-citizens


Processing URLs:  23%|██▎       | 226/1000 [15:38<07:48,  1.65it/s]

Error extracting text from http://business.financialpost.com/news/agriculture/game-changer-canada-recreational-pot-sales-could-reach-6-billion-by-2021-analyst-says: 403 Client Error: Forbidden for url: https://financialpost.com/news/agriculture/game-changer-canada-recreational-pot-sales-could-reach-6-billion-by-2021-analyst-says
Error extracting text from http://www.reuters.com/article/us-yemen-security-idUSKCN1B60OG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-idUSKCN1B60OG?il=0
Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13961125001085: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13961125001085 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300c93860>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://techcrunch.com/2021/09/21/the-oversight-board-wants-f

Processing URLs:  23%|██▎       | 228/1000 [15:38<05:19,  2.41it/s]

Error extracting text from https://www.nytimes.com/2020/11/03/us/politics/qanon-candidates-marjorie-taylor-greene.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/11/03/us/politics/qanon-candidates-marjorie-taylor-greene.html


Processing URLs:  23%|██▎       | 233/1000 [15:45<14:54,  1.17s/it]

Error extracting text from http://europe.newsweek.com/mosul-offensive-against-isis-begin-soon-says-french-defense-minister-504679?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/mosul-offensive-against-isis-begin-soon-says-french-defense-minister-504679


Processing URLs:  23%|██▎       | 234/1000 [15:48<18:37,  1.46s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-18/bangladesh-reserves-theft-a-threat-to-philippines-lawmaker-says


Processing URLs:  24%|██▍       | 238/1000 [15:49<08:18,  1.53it/s]

Error extracting text from https://www.reuters.com/article/us-oil-opec-iran/iran-says-majority-of-opec-members-support-extending-oil-output-cuts-idUSKBN1DK0X2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-opec-iran/iran-says-majority-of-opec-members-support-extending-oil-output-cuts-idUSKBN1DK0X2


Processing URLs:  24%|██▍       | 239/1000 [15:50<09:46,  1.30it/s]

Error extracting text from http://www.cnbc.com/2015/11/09/upbeat-rosengren-points-to-december-fed-rate-hike.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/11/09/upbeat-rosengren-points-to-december-fed-rate-hike.html


Processing URLs:  24%|██▍       | 241/1000 [15:52<11:00,  1.15it/s]

Error extracting text from http://www.who.int/csr/don/07-august-2017-ah7n9-china/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/07-august-2017-ah7n9-china/en/


Processing URLs:  24%|██▍       | 242/1000 [15:53<11:46,  1.07it/s]

Error extracting text from http://www.trackingterrorism.org/: 403 Client Error: Forbidden for url: https://trackingterrorism.org/


Processing URLs:  25%|██▍       | 246/1000 [15:59<18:37,  1.48s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-03-23/turkey-the-politics-behind-erdogan-s-central-bank-decision


Processing URLs:  25%|██▌       | 252/1000 [16:04<10:39,  1.17it/s]

Error extracting text from http://thehill.com/policy/energy-environment/266339-trump-calls-for-higher-ethanol-mandate: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/266339-trump-calls-for-higher-ethanol-mandate/


Processing URLs:  26%|██▌       | 256/1000 [16:09<13:40,  1.10s/it]

Error extracting text from http://csis.org/ppp/index.htm: HTTPConnectionPool(host='jcpoatimeline.csis.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffe6a240>: Failed to resolve 'jcpoatimeline.csis.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  26%|██▌       | 257/1000 [16:11<18:40,  1.51s/it]

Error extracting text from https://www.reuters.com/article/us-britain-scotland-independence/scottish-nationalists-announce-plans-for-new-independence-referendum-idUSKBN25S5SX?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-scotland-independence/scottish-nationalists-announce-plans-for-new-independence-referendum-idUSKBN25S5SX?il=0


Processing URLs:  26%|██▌       | 262/1000 [16:18<20:37,  1.68s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-22/pound-bears-brunt-of-brussels-tragedy-seen-raising-brexit-risk


Processing URLs:  27%|██▋       | 266/1000 [16:23<14:39,  1.20s/it]

Error extracting text from https://www.reuters.com/world/kremlin-critic-navalny-discloses-details-extremism-ruling-2021-06-23/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/kremlin-critic-navalny-discloses-details-extremism-ruling-2021-06-23/
URL filtered: https://www.npr.org/sections/alltechconsidered/2018/03/21/591622450/section-230-a-key-legal-shield-for-facebook-google-is-about-to-change


Processing URLs:  27%|██▋       | 270/1000 [16:28<13:51,  1.14s/it]

Error extracting text from http://www.reuters.com/article/saudi-ipo-idUSL8N1AZ4LT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/saudi-ipo-idUSL8N1AZ4LT


Processing URLs:  27%|██▋       | 274/1000 [17:21<1:35:02,  7.86s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-bonds-idUSKCN18D2DI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-bonds-idUSKCN18D2DI


Processing URLs:  28%|██▊       | 285/1000 [18:38<3:48:38, 19.19s/it]

Error extracting text from http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  29%|██▊       | 287/1000 [18:49<2:28:10, 12.47s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/afghan-official-no-plans-to-peace-effort-with-the-taliban/2016/07/14/7b470316-4989-11e6-8dac-0c6e4accc5b1_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/afghan-official-no-plans-to-peace-effort-with-the-taliban/2016/07/14/7b470316-4989-11e6-8dac-0c6e4accc5b1_story.html


Processing URLs:  29%|██▉       | 288/1000 [18:50<1:46:10,  8.95s/it]

Error extracting text from https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=6800: 404 Client Error: Not Found for url: https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=6800


Processing URLs:  29%|██▉       | 290/1000 [18:54<1:05:05,  5.50s/it]

Error extracting text from http://www.parl.gc.ca/LegisInfo/BillDetails.aspx?Language=E&amp;Mode=1&amp;billId=8886269: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))
URL filtered: https://www.bloomberg.com/news/articles/2017-10-18/no-3-gop-senator-joins-those-opening-door-to-tax-revamp-in-2018


Processing URLs:  30%|██▉       | 297/1000 [19:01<16:29,  1.41s/it]  

Error extracting text from http://news.riskadvisory.net/2015/11/iran-progress-on-nuclear-deal-but-sanctions-relief-months-off/: 409 Client Error: Conflict for url: http://news.riskadvisory.net/2015/11/iran-progress-on-nuclear-deal-but-sanctions-relief-months-off/
URL filtered: https://www.youtube.com/watch?v=amoFqJTDurU


Processing URLs:  30%|███       | 304/1000 [19:54<1:16:12,  6.57s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-22/u-s-is-hiding-treasury-bond-data-that-s-suddenly-become-crucial


Processing URLs:  31%|███       | 307/1000 [19:55<35:30,  3.07s/it]  

Error extracting text from http://www.reuters.com/article/2015/11/17/us-mideast-crisis-syria-advance-idUSKCN0T611C20151117#VhDTZ1DGoOAgkW3y.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/17/us-mideast-crisis-syria-advance-idUSKCN0T611C20151117#VhDTZ1DGoOAgkW3y.97


Processing URLs:  31%|███       | 312/1000 [20:05<23:07,  2.02s/it]

Error extracting text from http://www.ibtimes.com/china-will-not-recklessly-use-force-south-china-sea-wants-avoid-unexpected-conflicts-2145167: 403 Client Error: Forbidden for url: https://www.ibtimes.com/china-will-not-recklessly-use-force-south-china-sea-wants-avoid-unexpected-conflicts-2145167


Processing URLs:  32%|███▏      | 316/1000 [20:17<29:32,  2.59s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-07-26/u-s-can-t-blame-russia-as-easily-as-north-korea-in-latest-hack


Processing URLs:  32%|███▏      | 318/1000 [20:20<22:47,  2.01s/it]

Error extracting text from http://www.tor.com/2016/01/13/winds-of-winter-book-publishing-process/: 403 Client Error: Forbidden for url: https://reactormag.com/2016/01/13/winds-of-winter-book-publishing-process/


Processing URLs:  32%|███▏      | 319/1000 [20:22<22:29,  1.98s/it]

Error extracting text from http://www.waaytv.com/redstone_alabama/as-america-s-icbm-defense-system-gets-ready-for-first/article_49c4f666-1b27-11e7-b029-b3f11e0dc44f.html: 404 Client Error: Not Found for url: https://www.waaytv.com/redstone_alabama/as-america-s-icbm-defense-system-gets-ready-for-first/article_49c4f666-1b27-11e7-b029-b3f11e0dc44f.html


Processing URLs:  32%|███▏      | 323/1000 [20:27<15:59,  1.42s/it]

URL filtered: https://www.youtube.com/watch?v=PBnO9dw3n6A
URL filtered: https://www.youtube.com/watch?v=ienp4J3pW7U


Processing URLs:  33%|███▎      | 328/1000 [20:29<07:53,  1.42it/s]

Error extracting text from https://theconversation.com/how-to-protest-chinas-human-rights-violations-without-boycotting-the-2022-olympics-149344: 403 Client Error: Forbidden for url: https://theconversation.com/how-to-protest-chinas-human-rights-violations-without-boycotting-the-2022-olympics-149344


Processing URLs:  33%|███▎      | 330/1000 [20:30<07:49,  1.43it/s]

Error extracting text from http://www.amazon.com/best-sellers-books-Amazon/zgbs/books/ref=pd_dp_ts_b_1: 503 Server Error: Service Unavailable for url: https://www.amazon.com/best-sellers-books-Amazon/zgbs/books/ref=pd_dp_ts_b_1


Processing URLs:  34%|███▎      | 335/1000 [20:38<12:26,  1.12s/it]

Error extracting text from http://www.migrationpolicy.org/programs/data-hub/charts/largest-immigrant-groups-over-time: 403 Client Error: Forbidden for url: https://www.migrationpolicy.org/programs/data-hub/charts/largest-immigrant-groups-over-time


Processing URLs:  34%|███▍      | 341/1000 [20:45<09:12,  1.19it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/ia/iowa_democratic_presidential_caucus-3195.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/ia/iowa_democratic_presidential_caucus-3195.html
Error extracting text from http://www.unhcr.org/en-us/news/briefing/2016/7/5784b1ef4/unhcr-calls-open-borders-possible-south-sudan-refugee-outflows.html: 403 Client Error: Forbidden for url: http://www.unhcr.org/us/news/briefing/2016/7/5784b1ef4/unhcr-calls-open-borders-possible-south-sudan-refugee-outflows.html


Processing URLs:  34%|███▍      | 343/1000 [20:46<05:18,  2.06it/s]

Error extracting text from http://www.scotsman.com/news/opinion/brian-monteith-snp-and-the-death-of-the-case-for-independence-1-4325638: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/opinion/brian-monteith-snp-and-the-death-of-the-case-for-independence-1-4325638
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-drills-idUSKCN1080O8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-drills-idUSKCN1080O8


Processing URLs:  35%|███▍      | 347/1000 [20:56<29:40,  2.73s/it]

Error extracting text from http://www.askitas.com/2016/06/01/to-brexit-or-not-to-brexit-june-update/: 404 Client Error: Not Found for url: https://askitas.com/2016/06/01/to-brexit-or-not-to-brexit-june-update/


Processing URLs:  35%|███▍      | 349/1000 [20:59<21:51,  2.01s/it]

Error extracting text from http://eng.chinamil.com.cn/view/2017-11/07/content_7815184.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/view/2017-11/07/content_7815184.htm


Processing URLs:  35%|███▌      | 351/1000 [21:00<13:48,  1.28s/it]

Error extracting text from https://www.reuters.com/world/europe/ukraine-sets-ceasefire-goal-new-russia-talks-breakthrough-looks-distant-2022-03-29/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/ukraine-sets-ceasefire-goal-new-russia-talks-breakthrough-looks-distant-2022-03-29/


Processing URLs:  35%|███▌      | 354/1000 [21:01<08:29,  1.27it/s]

Error extracting text from http://english.aawsat.com/2016/09/article55357887/france-deploys-artillery-reinforce-iraqi-army: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/09/article55357887/france-deploys-artillery-reinforce-iraqi-army


Processing URLs:  36%|███▌      | 356/1000 [21:02<06:38,  1.62it/s]

Error extracting text from https://www.consilium.europa.eu/en/meetings/calendar/?Category=meeting&amp;Page=1&amp;dateFrom=2020%2F10%2F01&amp;dateTo=2022%2F02%2F03: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/meetings/calendar/?Category=meeting&amp;Page=1&amp;dateFrom=2020%2F10%2F01&amp;dateTo=2022%2F02%2F03


Processing URLs:  36%|███▌      | 358/1000 [21:05<09:19,  1.15it/s]

Error extracting text from http://m.thenational.ae/arts-life/art/the-uae-based-atassi-foundation-works-to-preserve-syrias-modern-artworks: 400 Client Error: Bad Request for url: http://m.thenational.ae/arts-life/art/the-uae-based-atassi-foundation-works-to-preserve-syrias-modern-artworks
URL filtered: https://www.youtube.com/watch?v=SAQQ7kdqmlU


Processing URLs:  36%|███▋      | 363/1000 [21:10<08:55,  1.19it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160815/0905197261.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160815/0905197261.html
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-may-referendum-idUSKBN16K2QK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-may-referendum-idUSKBN16K2QK


Processing URLs:  37%|███▋      | 366/1000 [21:12<08:50,  1.19it/s]

Error extracting text from http://allafrica.com/stories/201607250594.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607250594.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x305a38f50>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  37%|███▋      | 368/1000 [21:15<11:23,  1.08s/it]

Error extracting text from https://www.reuters.com/world/americas/mail-in-voting-set-soar-canada-election-could-undermine-trudeau-new-democratic-2021-08-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/mail-in-voting-set-soar-canada-election-could-undermine-trudeau-new-democratic-2021-08-17/


Processing URLs:  38%|███▊      | 377/1000 [21:25<07:41,  1.35it/s]

Error extracting text from http://www.reuters.com/article/us-iran-usa-idUSKBN0UH0JZ20160104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-idUSKBN0UH0JZ20160104


Processing URLs:  38%|███▊      | 382/1000 [21:32<11:57,  1.16s/it]

Error extracting text from https://www.nytimes.com/2021/01/31/us/politics/gerrymander-census-democrats-republicans.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/31/us/politics/gerrymander-census-democrats-republicans.html


Processing URLs:  38%|███▊      | 385/1000 [21:35<09:38,  1.06it/s]

URL filtered: http://aranews.net/2016/04/kurdish-leaders-thank-united-states-415m-aid-peshmerga-forces/?utm_source=twitter
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear/north-korea-earthquake-suggests-sixth-nuclear-test-as-trump-abe-discuss-escalating-crisis-idUSKCN1BD0VW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear/north-korea-earthquake-suggests-sixth-nuclear-test-as-trump-abe-discuss-escalating-crisis-idUSKCN1BD0VW


Processing URLs:  39%|███▊      | 386/1000 [21:36<10:44,  1.05s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/us-giving-638m-aid-yemen-somalia-nigeria-south-48520948: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/us-giving-638m-aid-yemen-somalia-nigeria-south-48520948


Processing URLs:  39%|███▊      | 387/1000 [21:38<12:30,  1.22s/it]

Error extracting text from https://www.stripes.com/news/us-philippines-cancel-annual-amphibious-landing-drill-1.443993: 404 Client Error: Not Found for url: https://www.stripes.com/theaters/us-philippines-cancel-annual-amphibious-landing-drill-1.443993


Processing URLs:  39%|███▉      | 392/1000 [21:42<06:24,  1.58it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-oil-output-idUSKCN0XU22R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-output-idUSKCN0XU22R
URL filtered: http://www.bloomberg.com/news/articles/2015-12-18/boj-keeps-stimulus-unchanged-makes-adjustments-to-jgbs-etfs
Error extracting text from http://www.nytimes.com/2016/01/15/us/politics/ted-cruz-gop-debate.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/15/us/politics/ted-cruz-gop-debate.html


Processing URLs:  39%|███▉      | 394/1000 [21:44<10:18,  1.02s/it]

Error extracting text from http://www.mineweb.com/news/energy/venezuela-bond-traders-only-care-about-oil-as-correlation-jumps/: 523 Server Error:  for url: http://www.mineweb.com/news/energy/venezuela-bond-traders-only-care-about-oil-as-correlation-jumps/
URL filtered: https://twitter.com/OYCar/status/1473369148361035776


Processing URLs:  40%|████      | 400/1000 [21:49<07:17,  1.37it/s]

Error extracting text from http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?emc=edit_th_20160527&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/27/business/dealbook/north-korea-linked-to-digital-thefts-from-global-banks.html?emc=edit_th_20160527&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  40%|████      | 405/1000 [21:55<10:01,  1.01s/it]

Error extracting text from http://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2270%282016%29: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2270%282016%29


Processing URLs:  41%|████      | 410/1000 [22:04<17:21,  1.76s/it]

Error extracting text from http://www.wsj.com/articles/chinese-jets-intercept-u-s-spy-plane-over-east-china-sea-1465360954: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinese-jets-intercept-u-s-spy-plane-over-east-china-sea-1465360954


Processing URLs:  41%|████▏     | 413/1000 [22:08<13:58,  1.43s/it]

Error extracting text from https://www.reuters.com/article/us-usa-southsudan-arms-exclusive/exclusive-u-s-to-impose-arms-embargo-on-south-sudan-to-end-conflict-sources-idUSKBN1FM0ZE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-southsudan-arms-exclusive/exclusive-u-s-to-impose-arms-embargo-on-south-sudan-to-end-conflict-sources-idUSKBN1FM0ZE


Processing URLs:  42%|████▏     | 421/1000 [22:21<11:23,  1.18s/it]

Error extracting text from https://www.timesofisrael.com/olympic-committee-accused-of-ignoring-human-rights-for-2022-beijing-games/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/olympic-committee-accused-of-ignoring-human-rights-for-2022-beijing-games/


Processing URLs:  42%|████▏     | 423/1000 [22:26<17:59,  1.87s/it]

Error extracting text from http://www.hanford.gov/files.cfm/frenchesp.pdf: HTTPSConnectionPool(host='www.hanford.gov', port=443): Max retries exceeded with url: /files.cfm/frenchesp.pdf (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  43%|████▎     | 429/1000 [22:33<12:32,  1.32s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/19/gitrep-18mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/19/gitrep-18mar16pm/


XRef object at 1061319 can not be read, some object may be missing
Processing URLs:  44%|████▎     | 436/1000 [22:43<11:15,  1.20s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKBN0TO0AU20151205: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKBN0TO0AU20151205
URL filtered: https://www.bloomberg.com/features/2016-elon-musk-companies/


Processing URLs:  44%|████▍     | 439/1000 [22:47<11:56,  1.28s/it]

Error extracting text from https://www.nytimes.com/2017/05/07/world/europe/german-state-vote-schleswig-holstein-merkel.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/07/world/europe/german-state-vote-schleswig-holstein-merkel.html


Processing URLs:  44%|████▍     | 441/1000 [22:50<11:40,  1.25s/it]

Error extracting text from https://bit.ly/3p5bIce: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210209-how-long-with-italy-s-honeymoon-with-draghi-last


Processing URLs:  44%|████▍     | 442/1000 [22:51<10:13,  1.10s/it]

Error extracting text from http://www.maritime-executive.com/article/acp-new-panama-canal-locks-to-be-completed-by-june: 404 Client Error: Not Found for url: https://www.maritime-executive.com/403.shtml


Processing URLs:  44%|████▍     | 445/1000 [22:54<08:28,  1.09it/s]

Error extracting text from http://www.nytimes.com/2015/09/17/us/politics/obama-hints-at-sanctions-against-china-over-cyberattacks.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/17/us/politics/obama-hints-at-sanctions-against-china-over-cyberattacks.html


Processing URLs:  45%|████▍     | 449/1000 [22:59<09:59,  1.09s/it]

Error extracting text from http://pressroom.toyota.com/releases/2016-aiada-annual-meeting-lentz.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/2016-aiada-annual-meeting-lentz/


Processing URLs:  46%|████▌     | 457/1000 [23:13<11:38,  1.29s/it]

Error extracting text from http://dailykanban.com/2014/11/brokerage-tesla-sitting-3000-unsold-cars/: 403 Client Error: Forbidden for url: https://dailykanban.com/2014/11/brokerage-tesla-sitting-3000-unsold-cars/


Processing URLs:  46%|████▌     | 458/1000 [23:13<09:54,  1.10s/it]

Error extracting text from https://www.france24.com/en/europe/20210321-several-german-states-seek-to-extend-restrictions-amid-third-wave-of-covid-19: 403 Client Error: Forbidden for url: https://www.france24.com/en/europe/20210321-several-german-states-seek-to-extend-restrictions-amid-third-wave-of-covid-19


Processing URLs:  46%|████▌     | 462/1000 [23:20<10:49,  1.21s/it]

Error extracting text from http://www.reuters.com/video/2016/05/31/north-korea-missile-launch-attempt-fails?videoId=368706402: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/05/31/north-korea-missile-launch-attempt-fails?videoId=368706402
Error extracting text from http://www.reuters.com/article/us-peru-election-mining-idUSKCN0YM1R1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-mining-idUSKCN0YM1R1


Processing URLs:  46%|████▋     | 465/1000 [23:23<09:23,  1.05s/it]

Error extracting text from http://www.nytimes.com/reuters/2016/03/09/world/middleeast/09reuters-iran-missiles.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/03/09/world/middleeast/09reuters-iran-missiles.html


Processing URLs:  47%|████▋     | 471/1000 [23:32<10:22,  1.18s/it]

Error extracting text from http://www.reuters.com/article/2015/06/12/us-ukraine-crisis-nato-poland-idUSKBN0OS0Y120150612: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/06/12/us-ukraine-crisis-nato-poland-idUSKBN0OS0Y120150612


Processing URLs:  48%|████▊     | 477/1000 [23:39<11:12,  1.29s/it]

Error extracting text from https://kelo.com/2021/11/11/pressure-mounts-as-u-n-climate-negotiations-enter-final-day/: 403 Client Error: Forbidden for url: https://kelo.com/2021/11/11/pressure-mounts-as-u-n-climate-negotiations-enter-final-day/


Processing URLs:  48%|████▊     | 479/1000 [23:40<07:43,  1.12it/s]

Error extracting text from http://www.thelocal.es/20160407/three-way-coalition-talks-launched-as-clock-ticks-in-spain: 403 Client Error: Forbidden for url: https://www.thelocal.es/20160407/three-way-coalition-talks-launched-as-clock-ticks-in-spain


Processing URLs:  48%|████▊     | 482/1000 [23:54<20:48,  2.41s/it]

Error extracting text from https://www.maritime-executive.com/article/china-plans-to-loosen-its-coast-guard-s-rules-on-use-of-force: 403 Client Error: Forbidden for url: https://www.maritime-executive.com/article/china-plans-to-loosen-its-coast-guard-s-rules-on-use-of-force
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-shoigu-idUSKBN12W3EE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-shoigu-idUSKBN12W3EE


Processing URLs:  48%|████▊     | 484/1000 [23:55<12:30,  1.45s/it]

Error extracting text from http://www.realcleardefense.com/articles/2016/02/24/connecting_the_dots_su-35s_over_taipei_dprk_rockets_over_okinawa_109077.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/02/24/connecting_the_dots_su-35s_over_taipei_dprk_rockets_over_okinawa_109077.html


Processing URLs:  49%|████▊     | 487/1000 [23:58<09:06,  1.06s/it]

Error extracting text from http://www.scotsman.com/news/politics/scottish-independence-would-bring-five-years-of-cuts-says-snp-mp-1-4181483: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-independence-would-bring-five-years-of-cuts-says-snp-mp-1-4181483


Processing URLs:  49%|████▉     | 489/1000 [24:00<07:17,  1.17it/s]

Error extracting text from https://www.nytimes.com/2017/07/05/world/asia/private-hansen-kirkpatrick-killed-afghanistan.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/05/world/asia/private-hansen-kirkpatrick-killed-afghanistan.html?_r=0


Processing URLs:  49%|████▉     | 492/1000 [24:04<10:06,  1.19s/it]

URL filtered: https://www.youtube.com/watch?v=hlmsjstN6Aw


Processing URLs:  49%|████▉     | 494/1000 [24:06<09:47,  1.16s/it]

Error extracting text from https://www.justsecurity.org/46663/public-lesson-russian-strategic-deception-its-hear/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/46663/public-lesson-russian-strategic-deception-its-hear/


Processing URLs:  50%|████▉     | 498/1000 [24:25<35:20,  4.22s/it]

Error extracting text from https://www.elisascience.org/articles/elisa-mission/mission-concept: 404 Client Error: Not Found for url: https://www.elisascience.org/articles/elisa-mission/mission-concept


Processing URLs:  50%|█████     | 500/1000 [24:28<23:11,  2.78s/it]

Error extracting text from https://www.axios.com/trump-softens-on-nafta-1515969352-841775cc-6a83-44a3-85b4-cb5beea01c2f.html: 403 Client Error: Forbidden for url: https://www.axios.com/trump-softens-on-nafta-1515969352-841775cc-6a83-44a3-85b4-cb5beea01c2f.html


Processing URLs:  51%|█████     | 510/1000 [24:58<18:23,  2.25s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-israel-russia-idUSKBN16C0H5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-israel-russia-idUSKBN16C0H5


Processing URLs:  51%|█████▏    | 514/1000 [25:00<08:19,  1.03s/it]

Error extracting text from https://uk.reuters.com/article/us-alphabet-uber-lawsuit/u-s-judge-deals-setback-to-waymo-damage-claim-in-uber-lawsuit-idUKKBN1D32J0?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://twitter.com/MosulEye
Error extracting text from http://www.reuters.com/article/us-china-japan-idUSKCN0XR02H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-japan-idUSKCN0XR02H


Processing URLs:  52%|█████▏    | 515/1000 [25:01<06:30,  1.24it/s]

Error extracting text from https://borneobulletin.com.bn/south-africas-zuma-faces-another-no-confidence-vote/: 403 Client Error: Forbidden for url: https://borneobulletin.com.bn/south-africas-zuma-faces-another-no-confidence-vote/


Processing URLs:  52%|█████▏    | 517/1000 [25:04<09:15,  1.15s/it]

Error extracting text from https://oic2016istanbulsummit.org/programme-of-work/: 404 Client Error: Not Found for url: https://oic2016istanbulsummit.org/programme-of-work/


Processing URLs:  52%|█████▏    | 522/1000 [25:13<13:01,  1.64s/it]

Error extracting text from http://www.ibtimes.co.uk/ethiopia-oromo-protests-will-continue-unless-government-ceases-killings-torture-1545199: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/ethiopia-oromo-protests-will-continue-unless-government-ceases-killings-torture-1545199


Processing URLs:  52%|█████▏    | 524/1000 [25:14<08:34,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-gigafactory-idUSKCN10G2E2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-gigafactory-idUSKCN10G2E2


Processing URLs:  53%|█████▎    | 530/1000 [25:37<24:44,  3.16s/it]

Error extracting text from http://mobile.tasnimnews.com/en/news/1393/01/01/860587: HTTPConnectionPool(host='mobile.tasnimnews.com', port=80): Max retries exceeded with url: /en/news/1393/01/01/860587 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x305866630>: Failed to resolve 'mobile.tasnimnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  53%|█████▎    | 533/1000 [25:41<15:58,  2.05s/it]

Error extracting text from https://www.reuters.com/world/middle-east/iran-fails-fully-honour-agreement-monitoring-equipment-iaea-says-2021-09-26/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/iran-fails-fully-honour-agreement-monitoring-equipment-iaea-says-2021-09-26/


Processing URLs:  54%|█████▍    | 541/1000 [25:53<08:23,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-credit-idUSKCN11D2RZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-credit-idUSKCN11D2RZ
URL filtered: https://www.bloomberg.com/news/articles/2022-05-06/china-rejects-its-exclusion-from-wto-vaccine-waiver-proposal


Processing URLs:  55%|█████▌    | 550/1000 [26:14<22:17,  2.97s/it]

Error extracting text from https://thehill.com/policy/healthcare/569773-us-intel-review-inconclusive-on-covid-19-origin: 403 Client Error: Forbidden for url: https://thehill.com/policy/healthcare/569773-us-intel-review-inconclusive-on-covid-19-origin/
URL filtered: http://www.bloomberg.com/news/articles/2016-03-13/iran-on-oil-freeze-leave-us-alone-until-production-higher


Processing URLs:  56%|█████▌    | 556/1000 [26:21<09:42,  1.31s/it]

Error extracting text from https://en.zamanalwsl.net/mobile/readNews.php?id=12940: 403 Client Error: Forbidden for url: https://en.zamanalwsl.net/mobile/readNews.php?id=12940


Processing URLs:  56%|█████▌    | 559/1000 [26:25<08:40,  1.18s/it]

Error extracting text from http://dallasmorningviewsblog.dallasnews.com/2016/01/is-ted-cruz-who-he-says-he-is-or-is-he-posing.html/: 404 Client Error: Not Found for url: http://dallasmorningviewsblog.dallasnews.com/2016/01/is-ted-cruz-who-he-says-he-is-or-is-he-posing.html/


Processing URLs:  56%|█████▌    | 562/1000 [26:29<08:50,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-images-idUSKCN10K08P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-images-idUSKCN10K08P


Processing URLs:  56%|█████▋    | 564/1000 [26:33<12:53,  1.77s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-17/yellen-may-emulate-taper-template-and-raise-rates-in-december


Processing URLs:  57%|█████▋    | 571/1000 [26:45<16:00,  2.24s/it]

URL filtered: https://www.youtube.com/watch?v=KKIABKa0ISM


Processing URLs:  57%|█████▋    | 573/1000 [26:45<09:43,  1.37s/it]

URL filtered: https://www.politico.com/amp/news/2020/12/16/federal-reserve-recession-vaccine-446774?__twitter_impression=true


Processing URLs:  57%|█████▊    | 575/1000 [26:46<06:51,  1.03it/s]

Error extracting text from https://www.vesselfinder.com/news/4778-Panama-Canal-Expansion-95-Percent-Complete-Video: 403 Client Error: Forbidden for url: https://www.vesselfinder.com/news/4778-Panama-Canal-Expansion-95-Percent-Complete-Video


Processing URLs:  58%|█████▊    | 577/1000 [26:48<06:34,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-pmdb-idUSKCN0VQ2LX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-pmdb-idUSKCN0VQ2LX


Processing URLs:  58%|█████▊    | 581/1000 [26:50<05:07,  1.36it/s]

Error extracting text from http://n.neurology.org/content/43/2/301: 403 Client Error: Forbidden for url: http://n.neurology.org/content/43/2/301


Processing URLs:  58%|█████▊    | 582/1000 [26:51<04:43,  1.47it/s]

Error extracting text from http://www.japantimes.co.jp/news/2017/07/13/asia-pacific/moon-adviser-proposes-five-way-talks-north-koreas-nuclear-program/#.WWf-n9Pytt8: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2017/07/13/asia-pacific/moon-adviser-proposes-five-way-talks-north-koreas-nuclear-program/#.WWf-n9Pytt8


Processing URLs:  58%|█████▊    | 583/1000 [26:51<03:50,  1.81it/s]

Error extracting text from http://news.yahoo.com/ex-tanzania-president-named-burundi-crisis-mediator-153038239.html: 404 Client Error: Not Found for url: http://news.yahoo.com/ex-tanzania-president-named-burundi-crisis-mediator-153038239.html


Processing URLs:  58%|█████▊    | 585/1000 [26:54<05:35,  1.24it/s]

Error extracting text from http://www.iol.co.za/news/politics/zuma-is-staying-put-says-anc-nec-2056806: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/zuma-is-staying-put-says-anc-nec-2056806


Processing URLs:  59%|█████▉    | 588/1000 [26:58<08:40,  1.26s/it]

Error extracting text from http://www.consensuseconomics.com/Forecast_Surveys/Real_Interest_Rate_Forecasts.htm: 403 Client Error: Forbidden for url: https://www.consensuseconomics.com/forecast-surveys/real-interest-rate-forecasts/


Processing URLs:  59%|█████▉    | 591/1000 [27:02<09:20,  1.37s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-usa-congress/u-s-congress-in-sprint-to-fund-government-approve-covid-19-emergency-aid-idUSKBN28O19Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-usa-congress/u-s-congress-in-sprint-to-fund-government-approve-covid-19-emergency-aid-idUSKBN28O19Z


Processing URLs:  59%|█████▉    | 593/1000 [27:09<15:43,  2.32s/it]

Error extracting text from http://www.parl.gc.ca/housechamberbusiness/chambercalendar.aspx?Key=2017&amp;Language=E&amp;Mode=1&amp;Parl=42&amp;Ses=1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  59%|█████▉    | 594/1000 [27:11<15:02,  2.22s/it]

Error extracting text from http://www.sec.gov/Archives/edgar/data/1591517/000159151716000039/timeinc4q2015.htm: 403 Client Error: Forbidden for url: http://www.sec.gov/Archives/edgar/data/1591517/000159151716000039/timeinc4q2015.htm


Processing URLs:  60%|██████    | 601/1000 [27:20<09:21,  1.41s/it]

Error extracting text from https://shar.es/1YbzbM: 404 Client Error: Not Found for url: https://shar.es/1YbzbM/


Processing URLs:  60%|██████    | 602/1000 [27:20<07:07,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/07/16/us/politics/health-care-vote-john-mccain.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/16/us/politics/health-care-vote-john-mccain.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  60%|██████    | 604/1000 [27:24<08:52,  1.34s/it]

Error extracting text from https://www.barrons.com/articles/a-bitcoin-etf-is-still-in-the-works-here-are-your-options-in-the-meantime-51618014033: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/a-bitcoin-etf-is-still-in-the-works-here-are-your-options-in-the-meantime-51618014033


Processing URLs:  61%|██████    | 607/1000 [28:27<2:03:43, 18.89s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2018-01-15/russian-officials-move-to-shut-navalnys-foundation: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  61%|██████    | 611/1000 [28:30<33:00,  5.09s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14I1TY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-cyber-idUSKBN14I1TY


Processing URLs:  62%|██████▏   | 616/1000 [28:42<15:37,  2.44s/it]

Error extracting text from http://www.thecipherbrief.com/article/europe/montenegro-needs-us-nato-help-fend-russians-1091?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=9e2f0dbf76-EMAIL_CAMPAIGN_2017_05_24&amp;utm_medium=email&amp;utm_term=0_02cbee778d-9e2f0dbf76-122492589: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/europe/montenegro-needs-us-nato-help-fend-russians-1091?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=9e2f0dbf76-EMAIL_CAMPAIGN_2017_05_24&amp;utm_medium=email&amp;utm_term=0_02cbee778d-9e2f0dbf76-122492589


Processing URLs:  62%|██████▏   | 618/1000 [28:42<08:31,  1.34s/it]

Error extracting text from https://www.predictit.org/Browse/Group/54/Congress: 403 Client Error: Forbidden for url: https://www.predictit.org/Browse/Group/54/Congress


Processing URLs:  62%|██████▏   | 623/1000 [29:03<34:53,  5.55s/it]

Error extracting text from http://www.almasdarnews.com/article/complete-battle-map-of-syria-october-2015-update/: 522 Server Error:  for url: https://www.almasdarnews.com/article/complete-battle-map-of-syria-october-2015-update/


Processing URLs:  62%|██████▏   | 624/1000 [29:09<36:25,  5.81s/it]

Error extracting text from http://thessismun.org/2014/wp-content/uploads/2012/11/NAC-A.pdf: 404 Client Error: Not Found for url: https://www.thessismun.org/2014/wp-content/uploads/2012/11/NAC-A.pdf


Processing URLs:  63%|██████▎   | 626/1000 [29:15<27:10,  4.36s/it]

Error extracting text from http://www.eaipchina.cn/Version/201508/Html/MaterialVersion.htm#: 404 Client Error: Not Found for url: https://www.eaipchina.cn:443/Version/201508/Html/MaterialVersion.htm


Processing URLs:  63%|██████▎   | 627/1000 [29:15<19:52,  3.20s/it]

Error extracting text from https://theconversation.com/new-zealand-approves-pfizer-vaccine-for-young-people-from-12-to-15-but-theyll-have-to-wait-their-turn-163158: 403 Client Error: Forbidden for url: https://theconversation.com/new-zealand-approves-pfizer-vaccine-for-young-people-from-12-to-15-but-theyll-have-to-wait-their-turn-163158


Processing URLs:  63%|██████▎   | 628/1000 [29:16<14:45,  2.38s/it]

Error extracting text from https://www.theafricareport.com/86352/ethiopia-egypt-the-impressive-feat-of-engineering-that-is-the-gerd/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/86352/ethiopia-egypt-the-impressive-feat-of-engineering-that-is-the-gerd/
URL filtered: https://twitter.com/jbloom_lab/status/1407445604029009923


Processing URLs:  63%|██████▎   | 634/1000 [29:25<10:33,  1.73s/it]

Error extracting text from https://global.handelsblatt.com/politics/a-north-korean-meditator-828469: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/politics/a-north-korean-meditator-828469


Processing URLs:  64%|██████▎   | 635/1000 [29:27<10:22,  1.71s/it]

Error extracting text from http://www.understandingwar.org/sites/default/files/Campaign%20for%20Mosul%20Map%20Turkey.pdf: 404 Client Error: Not Found for url: https://www.understandingwar.org/sites/default/files/Campaign%20for%20Mosul%20Map%20Turkey.pdf


Processing URLs:  64%|██████▎   | 636/1000 [29:28<09:47,  1.61s/it]

Error extracting text from https://chinapower.csis.org/military-spending/: 403 Client Error: Forbidden for url: https://chinapower.csis.org/military-spending/


Processing URLs:  64%|██████▎   | 637/1000 [29:30<10:23,  1.72s/it]

URL filtered: http://www.bloombergview.com/articles/2016-01-17/prisoner-swap-may-help-iran-arm-assad


Processing URLs:  64%|██████▍   | 639/1000 [29:32<08:18,  1.38s/it]

Error extracting text from http://www.ibtimes.com/singles-day-2015-iphone-6s-apple-watch-could-make-apple-inc-big-winner-chinas-black-2177799: 403 Client Error: Forbidden for url: https://www.ibtimes.com/singles-day-2015-iphone-6s-apple-watch-could-make-apple-inc-big-winner-chinas-black-2177799


Processing URLs:  64%|██████▍   | 645/1000 [29:49<15:34,  2.63s/it]

Error extracting text from http://Satan.Khanenei: HTTPConnectionPool(host='satan.khanenei', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe55d430>: Failed to resolve 'satan.khanenei' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▍   | 649/1000 [29:51<06:42,  1.15s/it]

Error extracting text from http://www.wsj.com/articles/u-s-china-divided-over-response-to-north-koreas-nuclear-program-1453884250: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-china-divided-over-response-to-north-koreas-nuclear-program-1453884250


Processing URLs:  65%|██████▌   | 650/1000 [30:51<1:39:23, 17.04s/it]

Error extracting text from http://www.seattletimes.com/business/boeing-aerospace/analysis-boeing-jobs-threat-over-ex-im-bank-stalemate-isnt-all-bluster/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  65%|██████▌   | 651/1000 [30:56<1:19:03, 13.59s/it]

Error extracting text from http://www.theaustralian.com.au/business/wall-street-journal/iran-nuclear-deals-promise-is-fading-as-middle-east-rifts-widen/story-fnay3ubk-1227559479874: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/business/wall-street-journal/iran-nuclear-deals-promise-is-fading-as-middle-east-rifts-widen/story-fnay3ubk-1227559479874?nk=ccb31bf4f533f97561f376eb77da1cdf-1706842460


Processing URLs:  65%|██████▌   | 654/1000 [30:57<29:43,  5.15s/it]  

Error extracting text from http://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151205: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting-idUSKBN0TM30B20151205


Processing URLs:  66%|██████▌   | 656/1000 [31:00<17:55,  3.13s/it]

Error extracting text from https://www.stripes.com/news/trump-administration-plans-to-certify-iranian-compliance-with-nuclear-agreement-1.477878#.WWfSpzMfn-Y: 404 Client Error: Not Found for url: https://www.stripes.com/news/trump-administration-plans-to-certify-iranian-compliance-with-nuclear-agreement-1.477878#.WWfSpzMfn-Y


Processing URLs:  66%|██████▌   | 658/1000 [31:02<11:35,  2.03s/it]

Error extracting text from http://boingboing.net/2016/02/17/russian-central-bank-shutting.html: 403 Client Error: Forbidden for url: https://boingboing.net/2016/02/17/russian-central-bank-shutting.html


Processing URLs:  66%|██████▌   | 660/1000 [31:05<09:50,  1.74s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0047272798000425: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0047272798000425


Processing URLs:  67%|██████▋   | 666/1000 [31:23<14:18,  2.57s/it]

Error extracting text from http://www.asahi.com/sp/ajw/articles/AJ201605190028.html: 404 Client Error: Not Found for url: https://www.asahi.com/sp/ajw/articles/AJ201605190028.html
URL filtered: https://twitter.com/realdonaldtrump/status/144148110799675392?lang=en


Processing URLs:  67%|██████▋   | 669/1000 [31:27<10:44,  1.95s/it]

Error extracting text from http://tass.ru/en/world/893204: 404 Client Error: Not Found for url: https://tass.ru/en/world/893204


Processing URLs:  67%|██████▋   | 670/1000 [31:28<08:34,  1.56s/it]

Error extracting text from https://www.ipsos.com/en/kenya-2017-poll-two-horse-race?language_content_entity=en: 403 Client Error: Forbidden for url: https://www.ipsos.com/en/kenya-2017-poll-two-horse-race?language_content_entity=en


Processing URLs:  67%|██████▋   | 674/1000 [31:40<12:32,  2.31s/it]

URL filtered: https://www.metaculus.com/questions/4733/will-stefan-molyneux-receive-a-long-term-twitter-ban-before-2021/


Processing URLs:  68%|██████▊   | 677/1000 [31:41<06:07,  1.14s/it]

Error extracting text from http://www.un.org/en/sc/about/rules/chapter1.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/sc/about/rules/chapter1.shtml
Error extracting text from http://www.reuters.com/article/us-usa-china-southchinasea-exclusive-idUSKBN1AQ0YK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-southchinasea-exclusive-idUSKBN1AQ0YK


Processing URLs:  68%|██████▊   | 678/1000 [31:41<04:42,  1.14it/s]

Error extracting text from https://www.sciencedaily.com/releases/2015/08/150817132556.htm: 403 Client Error: Forbidden for url: https://www.sciencedaily.com/releases/2015/08/150817132556.htm


Processing URLs:  68%|██████▊   | 679/1000 [31:41<04:08,  1.29it/s]

Error extracting text from http://www.huffingtonpost.ca/2017/03/08/marijuana-stock-prices-canada_n_15239898.html: 502 Server Error: Bad Gateway for url: https://www.huffingtonpost.ca/2017/03/08/marijuana-stock-prices-canada_n_15239898.html
URL filtered: https://twitter.com/DmytroKuleba/status/1497496218359898116


Processing URLs:  68%|██████▊   | 682/1000 [32:42<1:17:57, 14.71s/it]

Error extracting text from https://www2.southeastern.edu/Academics/Faculty/rallain/plab194/error.html: HTTPSConnectionPool(host='www2.southeastern.edu', port=443): Max retries exceeded with url: /Academics/Faculty/rallain/plab194/error.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30601caa0>, 'Connection to www2.southeastern.edu timed out. (connect timeout=60)'))


Processing URLs:  68%|██████▊   | 684/1000 [32:45<45:42,  8.68s/it]  

Error extracting text from http://www.local10.com/news/politics/us-official-navy-aircraft-threatened-with-shoot-down-by-iran: 404 Client Error: Not Found for url: https://www.local10.com/news/politics/us-official-navy-aircraft-threatened-with-shoot-down-by-iran/
Error extracting text from http://www.financialexpress.com/article/economy/john-kerry-us-wont-block-foreign-business-deals-under-iran-nuke-deal/241853/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/economy/john-kerry-us-wont-block-foreign-business-deals-under-iran-nuke-deal/241853/


Processing URLs:  69%|██████▉   | 688/1000 [32:52<16:43,  3.22s/it]

Error extracting text from http://www.reuters.com/article/us-russia-turkey-europe-minister-idUSKCN10L0S9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-turkey-europe-minister-idUSKCN10L0S9


Processing URLs:  69%|██████▉   | 693/1000 [33:10<13:31,  2.64s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-24/iran-says-opec-action-on-output-cuts-must-address-libya-nigeria


Processing URLs:  70%|██████▉   | 695/1000 [33:10<07:30,  1.48s/it]

Error extracting text from https://www.reuters.com/article/us-peru-politics-idUSKBN29N1HT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-politics-idUSKBN29N1HT


Processing URLs:  70%|██████▉   | 698/1000 [33:13<05:45,  1.14s/it]

Error extracting text from http://www.latimes.com/sports/sportsnow/la-sp-sn-zika-virus-advisory-board-20160304-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/sports/sportsnow/la-sp-sn-zika-virus-advisory-board-20160304-story.html


Processing URLs:  70%|███████   | 701/1000 [33:15<04:12,  1.19it/s]

Error extracting text from http://thehill.com/policy/defense/265059-dems-urge-sanctions-after-iran-missile-tests: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/265059-dems-urge-sanctions-after-iran-missile-tests/


Processing URLs:  70%|███████   | 702/1000 [33:16<04:14,  1.17it/s]

Error extracting text from http://www.businessinsider.com/petrobras-scandal-arrests-hurt-brazil-president-dilma-rousseff-2015-11: 404 Client Error: Not Found for url: https://www.businessinsider.com/petrobras-scandal-arrests-hurt-brazil-president-dilma-rousseff-2015-11


Processing URLs:  71%|███████   | 706/1000 [33:21<04:40,  1.05it/s]

Error extracting text from http://data.unhcr.org/mediterranean/country.php?id=83: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/country.php?id=83
Error extracting text from http://www.nytimes.com/2016/09/18/world/middleeast/his-position-still-secure-bashar-al-assad-smiles-as-syria-burns.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/18/world/middleeast/his-position-still-secure-bashar-al-assad-smiles-as-syria-burns.html?_r=0


Processing URLs:  71%|███████   | 707/1000 [33:22<05:10,  1.06s/it]

Error extracting text from http://www.murc.jp/thinktank/economy/fncm/commodity/como_1607.pdf: 404 Client Error: Not Found for url: https://www.murc.jp/thinktank/economy/fncm/commodity/como_1607.pdf


Processing URLs:  71%|███████   | 709/1000 [33:24<04:56,  1.02s/it]

Error extracting text from https://www.npd.com/news/press-releases/2021/after-a-slow-start--u-s--print-book-sales-rose-8-2-percent-in-2020--the-npd-group-says/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /news/press-releases/2021/after-a-slow-start--u-s--print-book-sales-rose-8-2-percent-in-2020--the-npd-group-says/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:  71%|███████▏  | 714/1000 [33:30<05:13,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-china-northkorea-idUSKCN0X20U4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-northkorea-idUSKCN0X20U4


Processing URLs:  72%|███████▏  | 719/1000 [33:36<04:54,  1.05s/it]

Error extracting text from http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/research_report_2_secretary_general_appointment2015.pdf: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/research_report_2_secretary_general_appointment2015.pdf


Processing URLs:  72%|███████▏  | 723/1000 [33:39<03:40,  1.26it/s]

Error extracting text from http://news.softpedia.com/news/apple-sees-weaker-demand-for-iphone-6s-reduces-component-orders-497077.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/apple-sees-weaker-demand-for-iphone-6s-reduces-component-orders-497077.shtml
Error extracting text from http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?smprod=nytcore-iphone&amp;smid=nytcore-iphone-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?smprod=nytcore-iphone&amp;smid=nytcore-iphone-share


Processing URLs:  73%|███████▎  | 729/1000 [33:51<08:05,  1.79s/it]

Error extracting text from https://reut.rs/34M2rhn: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/china-us-will-work-issues-producers-consumers-next-step-ministry-2021-06-03/
URL filtered: https://www.youtube.com/watch?v=8e2fJfiddx4


Processing URLs:  73%|███████▎  | 731/1000 [33:53<06:17,  1.40s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-21/oil-holds-slide-into-bear-market-as-libya-to-u-s-raise-supply


Processing URLs:  74%|███████▎  | 736/1000 [34:00<06:52,  1.56s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-21/imf-may-not-approve-ukraine-s-tranche-in-july-minister-says


Processing URLs:  74%|███████▍  | 738/1000 [34:08<10:35,  2.42s/it]

Error extracting text from http://www.dprktoday.com: HTTPConnectionPool(host='www.dprktoday.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3081a4aa0>: Failed to establish a new connection: [Errno 51] Network is unreachable'))


Processing URLs:  74%|███████▍  | 740/1000 [34:09<07:06,  1.64s/it]

Error extracting text from http://www.reuters.com/article/us-usa-fed-rosengren-idUSKBN16T06T?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fed-rosengren-idUSKBN16T06T?il=0
Error extracting text from http://www.nato.int/cps/en/natolive/topics_49187.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49187.htm
URL filtered: http://www.bloomberg.com/news/articles/2014-06-25/turkey-sells-200-tons-of-secret-gold-to-iran


Processing URLs:  75%|███████▍  | 747/1000 [34:15<04:39,  1.10s/it]

Error extracting text from http://www.nytimes.com/2015/09/29/world/middleeast/obama-and-putin-clash-at-un-on-mideast-crisis.html?ref=middleeast: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/29/world/middleeast/obama-and-putin-clash-at-un-on-mideast-crisis.html?ref=middleeast


Processing URLs:  75%|███████▍  | 749/1000 [34:16<03:54,  1.07it/s]

Error extracting text from https://www.wsj.com/articles/trump-administration-says-iran-complying-with-nuclear-deal-1492573995: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-administration-says-iran-complying-with-nuclear-deal-1492573995


Processing URLs:  75%|███████▌  | 752/1000 [34:18<02:19,  1.77it/s]

Error extracting text from http://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN16S10L?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-gorsuch-idUSKBN16S10L?il=0
Error extracting text from https://www.barrons.com/articles/federal-reserve-stocks-tapering-51627939038: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/federal-reserve-stocks-tapering-51627939038


Processing URLs:  76%|███████▌  | 755/1000 [34:22<04:34,  1.12s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=22947&amp;Kw1=Polisario+Front&amp;Kw2=Morocco&amp;Kw3=#.WeZdGIRX9FI: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=22947&amp;Kw1=Polisario+Front&amp;Kw2=Morocco&amp;Kw3=#.WeZdGIRX9FI
Error extracting text from https://www.nord-stream2.com/media-info/news-events/the-first-nord-stream-2-string-filled-with-technical-gas-154/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /media-info/news-events/the-first-nord-stream-2-string-filled-with-technical-gas-154/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3082fc7d0>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▌  | 757/1000 [34:23<03:00,  1.35it/s]

Error extracting text from http://thehill.com/blogs/pundits-blog/international-affairs/312796-trump-will-need-democrats-help-to-rework-nafta: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/international-affairs/312796-trump-will-need-democrats-help-to-rework-nafta/


Processing URLs:  76%|███████▌  | 760/1000 [34:26<03:29,  1.15it/s]

Error extracting text from https://www.reuters.com/article/us-russia-usa-ambassador-idUSKBN2B92N1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-ambassador-idUSKBN2B92N1


Processing URLs:  76%|███████▋  | 763/1000 [34:30<04:22,  1.11s/it]

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-2021-alex-salmonds-party-on-track-to-win-eight-seats-poll-suggests-3215409: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-alex-salmonds-party-on-track-to-win-eight-seats-poll-suggests-3215409


Processing URLs:  76%|███████▋  | 765/1000 [34:32<03:36,  1.09it/s]

Error extracting text from http://m.washingtontimes.com/news/2015/dec/22/l-todd-wood-north-korea-shows-new-icbm/: 403 Client Error: Forbidden for url: http://m.washingtontimes.com/news/2015/dec/22/l-todd-wood-north-korea-shows-new-icbm/


Processing URLs:  77%|███████▋  | 769/1000 [34:38<04:51,  1.26s/it]

Error extracting text from http://nationalinterest.org/feature/tanmen-militia-china%E2%80%99s-maritime-rights-protection-vanguard-12816?page=show: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/tanmen-militia-china%E2%80%99s-maritime-rights-protection-vanguard-12816?page=show


Processing URLs:  77%|███████▋  | 770/1000 [34:39<04:44,  1.23s/it]

Error extracting text from https://www.reuters.com/world/europe/regret-defiance-europes-vaccine-shy-east-covid-19-rages-2021-10-21).: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/regret-defiance-europes-vaccine-shy-east-covid-19-rages-2021-10-21).


Processing URLs:  78%|███████▊  | 778/1000 [34:57<10:14,  2.77s/it]

Error extracting text from https://www.almasdarnews.com/article/syrian-army-doubles-its-territory-in-aleppo-over-5-months-due-to-russian-airstrikes-map-update/: 522 Server Error:  for url: https://www.almasdarnews.com/article/syrian-army-doubles-its-territory-in-aleppo-over-5-months-due-to-russian-airstrikes-map-update/
URL filtered: https://twitter.com/faisalislam/status/1469350565616496642
Error extracting text from https://www.yahoo.com/news/polands-ruling-party-leader-vague-support-eus-tusk-142931455.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/polands-ruling-party-leader-vague-support-eus-tusk-142931455.html


Processing URLs:  78%|███████▊  | 781/1000 [35:02<07:18,  2.00s/it]

Error extracting text from http://www.npr.org/sections/goatsandsoda/2017/03/24/520547986/nobody-wants-to-drop-food-from-a-plane-but-its-happening: 500 Server Error: Internal Server Error for url: https://www.npr.org/sections/goatsandsoda/2017/03/24/520547986/nobody-wants-to-drop-food-from-a-plane-but-its-happening


Processing URLs:  78%|███████▊  | 783/1000 [35:04<05:19,  1.47s/it]

Error extracting text from https://www.predictit.org/Contract/523/Will-Joe-Biden-run-for-president-in-2016#openoffers1: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/523/Will-Joe-Biden-run-for-president-in-2016#openoffers1


Processing URLs:  79%|███████▉  | 789/1000 [35:14<05:28,  1.56s/it]

Error extracting text from http://www.cfr.org/human-rights/understanding-myanmar/p14385: 404 Client Error: Not Found for url: https://www.cfr.org/human-rights/understanding-myanmar/p14385


Processing URLs:  79%|███████▉  | 792/1000 [35:15<02:28,  1.40it/s]

Error extracting text from http://www.nytimes.com/2013/01/30/business/fda-approves-genetic-drug-to-treat-rare-disease.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2013/01/30/business/fda-approves-genetic-drug-to-treat-rare-disease.html
Error extracting text from https://www.axios.com/tax-reform-shocker-the-white-house-actually-has-a-plan-2460447660.html: 403 Client Error: Forbidden for url: https://www.axios.com/tax-reform-shocker-the-white-house-actually-has-a-plan-2460447660.html
URL filtered: https://www.bloomberg.com/news/articles/2021-08-19/u-s-court-drops-trade-duty-reviews-as-lumber-dispute-roars-on
Error extracting text from https://www.ameren.com/: 403 Client Error: Forbidden for url: https://www.ameren.com/


Processing URLs:  81%|████████  | 810/1000 [35:54<02:58,  1.06it/s]

Error extracting text from https://www.predictit.org/Contract/5495/Will-Uhuru-Kenyatta-be-re-elected-president-of-Kenya-in-2017#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/5495/Will-Uhuru-Kenyatta-be-re-elected-president-of-Kenya-in-2017#data
Error extracting text from http://www.nytimes.com/2004/10/03/washington/us/the-nuclear-card-the-aluminum-tube-story-a-special-report-how.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2004/10/03/washington/us/the-nuclear-card-the-aluminum-tube-story-a-special-report-how.html?_r=0


Processing URLs:  81%|████████  | 811/1000 [35:54<02:47,  1.13it/s]

Error extracting text from http://churchandstate.org.uk/2016/12/vaticans-political-influence-on-population-growth-control/: 403 Client Error: Forbidden for url: http://churchandstate.org.uk/2016/12/vaticans-political-influence-on-population-growth-control/


Processing URLs:  81%|████████  | 812/1000 [36:54<51:10, 16.33s/it]

Error extracting text from http://www.seattletimes.com/nation-world/senator-predicts-sanctions-against-russia-warns-trump-against-veto/: HTTPConnectionPool(host='www.seattletimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  81%|████████▏ | 813/1000 [36:55<37:19, 11.98s/it]

Error extracting text from http://www.reuters.com/article/2015/10/20/us-usa-oilexports-refiners-idUSKCN0SE2VW20151020#GGWFh4f0B73PEiMV.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/20/us-usa-oilexports-refiners-idUSKCN0SE2VW20151020#GGWFh4f0B73PEiMV.97
Error extracting text from https://www.reuters.com/world/russia-opens-new-investigations-into-jailed-kremlin-critic-navalny-navalnys-2021-05-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/russia-opens-new-investigations-into-jailed-kremlin-critic-navalny-navalnys-2021-05-25/


Processing URLs:  82%|████████▏ | 817/1000 [37:03<15:33,  5.10s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/china-military-news/2016-02/22/content_6922983.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/china-military-news/2016-02/22/content_6922983.htm


Processing URLs:  82%|████████▏ | 821/1000 [37:13<08:28,  2.84s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-03/tesla-bulls-can-exhale-as-musk-on-time-with-dawn-of-model-3-era


Processing URLs:  83%|████████▎ | 827/1000 [37:19<03:29,  1.21s/it]

Error extracting text from http://www.chicagotribune.com/sns-wp-abe-fd7303d6-8949-11e5-be39-0034bb576eee-20151112-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/sns-wp-abe-fd7303d6-8949-11e5-be39-0034bb576eee-20151112-story.html


Processing URLs:  83%|████████▎ | 831/1000 [37:21<01:51,  1.51it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-07/u-s-army-corps-to-grant-dakota-access-oil-pipeline-easement
Error extracting text from http://asiafoundation.org/in-asia/2015/09/09/tpp-and-rcep-boon-or-bane-for-asean/: 403 Client Error: Forbidden for url: http://asiafoundation.org/in-asia/2015/09/09/tpp-and-rcep-boon-or-bane-for-asean/


Processing URLs:  84%|████████▎ | 835/1000 [37:22<01:05,  2.51it/s]

URL filtered: https://www.businessinsider.com/facebook-digital-currency-to-finally-launch-q1-2021-2
Error extracting text from http://www.nytimes.com/2016/12/16/your-money/wall-streets-annual-stock-forecasts-bullish-and-often-wrong.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/16/your-money/wall-streets-annual-stock-forecasts-bullish-and-often-wrong.html


Processing URLs:  84%|████████▎ | 837/1000 [37:23<00:49,  3.29it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.terra.com.br/brasil/politica/lava-jato/delacao-de-delcidio-confirma-nossa-tese-diz-autora-do-pedido-de-impeachment-de-dilma,5593a80bdfe258c43767432409ee0954s06xtgyr.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.terra.com.br/brasil/politica/lava-jato/delacao-de-delcidio-confirma-nossa-tese-diz-autora-do-pedido-de-impeachment-de-dilma,5593a80bdfe258c43767432409ee0954s06xtgyr.html&amp;prev=search


Processing URLs:  84%|████████▍ | 838/1000 [37:23<00:45,  3.57it/s]

Error extracting text from https://www.yahoo.com/news/pentagon-sends-legendary-b-52-bomber-action-against-182402057.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/pentagon-sends-legendary-b-52-bomber-action-against-182402057.html


Processing URLs:  84%|████████▍ | 839/1000 [37:26<02:33,  1.05it/s]

Error extracting text from http://www.kabultribune.com/index.php/2016/09/18/wj-summons-icoic-members-on-electoral-reform-decree/: 436 Client Error:  for url: http://ww16.kabultribune.com/index.php/2016/09/18/wj-summons-icoic-members-on-electoral-reform-decree/?sub1=20240202-1400-5135-af0b-fac4ba156e05


Processing URLs:  84%|████████▍ | 840/1000 [37:29<04:08,  1.55s/it]

Error extracting text from http://www.ambafrance-ir.org/: 429 Client Error: Too Many Requests for url: https://ir.ambafrance.org/


Processing URLs:  84%|████████▍ | 843/1000 [37:38<06:48,  2.60s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/keiko-fujimori-acepta-debate-presidencial-ppk-piura-noticia-1900816: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/keiko-fujimori-acepta-debate-presidencial-ppk-piura-noticia-1900816/


Processing URLs:  84%|████████▍ | 845/1000 [37:40<04:37,  1.79s/it]

Error extracting text from https://www.reuters.com/business/energy/oil-prices-march-again-tight-market-us-crude-7-yr-high-2021-10-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-prices-march-again-tight-market-us-crude-7-yr-high-2021-10-25/


Processing URLs:  85%|████████▍ | 848/1000 [37:46<04:14,  1.67s/it]

Error extracting text from http://www.pnas.org/content/112/49/E6736.abstract: 403 Client Error: Forbidden for url: https://www.pnas.org/content/112/49/E6736.abstract
URL filtered: https://www.bloomberg.com/news/articles/2017-08-23/trump-shutdown-threat-complicates-congress-s-debt-ceiling-plans


Processing URLs:  86%|████████▋ | 863/1000 [38:07<02:17,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-iran-europe-rouhani-economy-idUSKCN0V411Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-europe-rouhani-economy-idUSKCN0V411Q


Processing URLs:  86%|████████▋ | 864/1000 [38:08<02:20,  1.03s/it]

URL filtered: https://twitter.com/eci_ttip


Processing URLs:  87%|████████▋ | 868/1000 [38:16<03:24,  1.55s/it]

Error extracting text from http://www.kurdistan24.net/en/news/f591f624-70e0-4f4a-9d30-ca8772ad9cae/Masrour-Barzani---Fight-against-IS-a-long-battle-: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/f591f624-70e0-4f4a-9d30-ca8772ad9cae/Masrour-Barzani---Fight-against-IS-a-long-battle-


Processing URLs:  87%|████████▋ | 870/1000 [38:32<11:15,  5.19s/it]

Error extracting text from http://www.foreigndesknews.com/news/politics/china-builds-new-military/: 404 Client Error: Not Found for url: https://foreigndesknews.com/news/politics/china-builds-new-military/


Processing URLs:  87%|████████▋ | 873/1000 [38:36<05:27,  2.58s/it]

Error extracting text from http://www.nytimes.com/2016/05/25/business/international/greece-debt-relief-imf-eurozone-bailout.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/25/business/international/greece-debt-relief-imf-eurozone-bailout.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  88%|████████▊ | 878/1000 [38:45<03:08,  1.55s/it]

Error extracting text from https://www.nytimes.com/2021/07/07/health/delta-variant-cdc.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/07/health/delta-variant-cdc.html


Processing URLs:  88%|████████▊ | 879/1000 [38:45<02:27,  1.22s/it]

Error extracting text from http://www.bangkokpost.com/business/news/1080409/rcep-leaders-to-skip-2016-completion-target: 404 Client Error: Not Found for url: https://www.bangkokpost.com/business/general/1080409/rcep-leaders-to-skip-2016-completion-target


Processing URLs:  88%|████████▊ | 882/1000 [38:48<02:10,  1.11s/it]

Error extracting text from https://www.barrons.com/articles/stock-market-forecast-2022-bank-of-america-51638234854: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/stock-market-forecast-2022-bank-of-america-51638234854


Processing URLs:  89%|████████▉ | 890/1000 [38:59<01:53,  1.04s/it]



Processing URLs:  90%|████████▉ | 896/1000 [39:09<02:59,  1.73s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-conflict-un/saudi-arabia-should-fund-all-humanitarian-aid-to-yemen-wfp-idUSKCN1BF22R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-conflict-un/saudi-arabia-should-fund-all-humanitarian-aid-to-yemen-wfp-idUSKCN1BF22R


Processing URLs:  90%|████████▉ | 897/1000 [39:10<02:22,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=d93_u1HHgM4


Processing URLs:  90%|████████▉ | 899/1000 [39:15<03:16,  1.94s/it]

Error extracting text from https://news.brown.edu/articles/2016/08/costs-war: 404 Client Error: Not Found for url: https://www.brown.edu/news/2016-08-09/Costs-of-War


Processing URLs:  90%|█████████ | 903/1000 [39:20<02:11,  1.35s/it]

Error extracting text from https://www.educba.com/aws-competitors/: 403 Client Error: Forbidden for url: https://www.educba.com/aws-competitors/


Processing URLs:  90%|█████████ | 904/1000 [39:21<02:07,  1.33s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-14/wilders-party-slumps-in-shock-dutch-poll-on-eve-of-election


Processing URLs:  91%|█████████ | 907/1000 [39:24<01:40,  1.08s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/dr.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/dr.html


Processing URLs:  91%|█████████ | 910/1000 [39:29<01:36,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-eu-turkey-hahn-idUSKCN11F1D7?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-turkey-hahn-idUSKCN11F1D7?il=0


Processing URLs:  91%|█████████ | 911/1000 [39:30<01:49,  1.23s/it]

Error extracting text from http://en.abna24.com/cultural/archive/2016/08/01/769341/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/cultural/archive/2016/08/01/769341/story.html


Processing URLs:  92%|█████████▏| 916/1000 [39:35<01:22,  1.02it/s]

Error extracting text from http://www.iol.co.za/dailynews/news/heavy-price-to-pay-for-freedom-1988897: 403 Client Error: Forbidden for url: http://www.iol.co.za/dailynews/news/heavy-price-to-pay-for-freedom-1988897


Processing URLs:  92%|█████████▏| 919/1000 [39:37<00:58,  1.39it/s]

Error extracting text from https://ottawacitizen.com/opinion/hui-and-li-beijing-is-a-risky-place-for-our-athletes-move-the-2022-winter-olympics: 403 Client Error: Forbidden for url: https://ottawacitizen.com/opinion/hui-and-li-beijing-is-a-risky-place-for-our-athletes-move-the-2022-winter-olympics


Processing URLs:  92%|█████████▏| 920/1000 [39:38<00:57,  1.38it/s]

Error extracting text from https://www.bbc.co.uk/news/magazine-28262541this: 404 Client Error: Not Found for url: https://www.bbc.co.uk/news/magazine-28262541this


Processing URLs:  92%|█████████▏| 921/1000 [39:38<00:52,  1.49it/s]

Error extracting text from http://evobsession.com/europe-electric-car-sales-report-may-2016/: 403 Client Error: Forbidden for url: http://evobsession.com/europe-electric-car-sales-report-may-2016/


Processing URLs:  93%|█████████▎| 927/1000 [39:45<01:20,  1.10s/it]

Error extracting text from http://www.nytimes.com/2015/10/17/world/asia/south-korea-japan-summit-park-geun-hye-shinzo-abe.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/17/world/asia/south-korea-japan-summit-park-geun-hye-shinzo-abe.html


Processing URLs:  93%|█████████▎| 929/1000 [39:47<01:08,  1.04it/s]

Error extracting text from http://www.cdc.gov/zika/geo/active-countries.html: 404 Client Error: Not Found for url: https://www.cdc.gov/zika/geo/active-countries.html


Processing URLs:  93%|█████████▎| 933/1000 [40:00<02:13,  1.99s/it]

Error extracting text from http://www.humanosphere.org/world-politics/2016/09/guterres-extends-lead-in-race-for-next-u-n-secretary-general/: 404 Client Error: Not Found for url: http://www.humanosphere.org/world-politics/2016/09/guterres-extends-lead-in-race-for-next-u-n-secretary-general/


Processing URLs:  94%|█████████▍| 938/1000 [40:27<03:03,  2.95s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-oil-giant-pdvsa-get-s-p-debt-upgrade-1478273616: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-oil-giant-pdvsa-get-s-p-debt-upgrade-1478273616


Processing URLs:  94%|█████████▍| 939/1000 [40:29<02:28,  2.43s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0TT2PD20151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0TT2PD20151210


Processing URLs:  95%|█████████▍| 949/1000 [40:44<01:14,  1.46s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/lawmakers-russias-violation-nuke-treaty-worsened-42919663: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/lawmakers-russias-violation-nuke-treaty-worsened-42919663


Processing URLs:  95%|█████████▌| 950/1000 [40:46<01:18,  1.58s/it]

Error extracting text from https://www.france24.com/en/live-news/20211225-saudi-led-coalition-launches-large-scale-yemen-operation-after-deadly-strike: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20211225-saudi-led-coalition-launches-large-scale-yemen-operation-after-deadly-strike
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZGG4Y6KLVR501-2H9DA2TCMKEVSGGSNRC32GGNR5


Processing URLs:  96%|█████████▌| 956/1000 [41:06<03:07,  4.27s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/diplomats-nato-to-invite-montenegro-to-join-alliance/2015/12/01/8fbd3060-9820-11e5-aca6-1ae3be6f06d2_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/diplomats-nato-to-invite-montenegro-to-join-alliance/2015/12/01/8fbd3060-9820-11e5-aca6-1ae3be6f06d2_story.html


Processing URLs:  96%|█████████▌| 958/1000 [41:10<02:13,  3.17s/it]

Error extracting text from http://www.wsj.com/articles/pound-slides-on-interest-rate-expectations-eu-referendum-fears-1451410270: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pound-slides-on-interest-rate-expectations-eu-referendum-fears-1451410270


Processing URLs:  96%|█████████▌| 960/1000 [41:12<01:23,  2.10s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/09/20/world/obama-says-mosul-can-taken-quickly-plans-strategize-iraqi-leader/#.WAExVNxXeEc: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/09/20/world/obama-says-mosul-can-taken-quickly-plans-strategize-iraqi-leader/#.WAExVNxXeEc


Processing URLs:  96%|█████████▌| 961/1000 [41:14<01:18,  2.02s/it]

Error extracting text from http://cphpost.dk/opinion/trading-kingdoms-britains-brexit-dilemma-exit-or-reform.html: 404 Client Error: Not Found for url: http://cphpost.dk/opinion/trading-kingdoms-britains-brexit-dilemma-exit-or-reform.html


Processing URLs:  96%|█████████▋| 963/1000 [41:16<00:54,  1.48s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2016-01-06/cruz-wins-praise-from-pro-ethanol-group-protesting-him


Processing URLs:  97%|█████████▋| 966/1000 [41:19<00:42,  1.24s/it]

URL filtered: https://www.youtube.com/watch?v=-21vccBEZxM


Processing URLs:  97%|█████████▋| 969/1000 [41:20<00:26,  1.19it/s]

Error extracting text from http://bigstory.ap.org/article/ba03d8d925cf4d42b8449f11c3decd40/chinese-military-reaches-out-amid-south-china-sea-tensions: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/ba03d8d925cf4d42b8449f11c3decd40/chinese-military-reaches-out-amid-south-china-sea-tensions (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3021da840>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  97%|█████████▋| 970/1000 [41:21<00:23,  1.30it/s]

Error extracting text from http://thehill.com/homenews/campaign/363336-alabama-poll-jones-leads-moore-by-4-points: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/363336-alabama-poll-jones-leads-moore-by-4-points/


Processing URLs:  97%|█████████▋| 971/1000 [41:24<00:40,  1.41s/it]

Error extracting text from http://www.startribune.com/french-troops-to-increase-for-the-1st-time-in-10-years/365318091/: 404 Client Error: Not Found for url: https://www.startribune.com/french-troops-to-increase-for-the-1st-time-in-10-years/365318091/


Processing URLs:  98%|█████████▊| 979/1000 [41:56<00:56,  2.68s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/international-affairs/286243-the-russian-plan-to-oust-bashar-al-assad: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/international-affairs/286243-the-russian-plan-to-oust-bashar-al-assad/


Processing URLs:  98%|█████████▊| 981/1000 [41:59<00:33,  1.79s/it]

Error extracting text from http://crofsblogs.typepad.com/h5n1/2016/08/how-polio-returned-to-nigeria.html: 403 Client Error: Forbidden for url: https://crofsblogs.typepad.com/h5n1/2016/08/how-polio-returned-to-nigeria.html


Processing URLs:  99%|█████████▊| 986/1000 [42:03<00:11,  1.21it/s]

Error extracting text from http://www.nigeriatoday.ng/2016/07/4-dead-as-fulani-herdsmen-farmers-face-off/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/07/4-dead-as-fulani-herdsmen-farmers-face-off/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe55c470>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▉| 988/1000 [42:05<00:09,  1.21it/s]

Error extracting text from http://gawker.com/brazilian-conservatives-respond-to-zika-spread-with-pla-1760671686: 404 Client Error: Not Found for url: https://gawker.com/brazilian-conservatives-respond-to-zika-spread-with-pla-1760671686


Processing URLs:  99%|█████████▉| 992/1000 [42:09<00:07,  1.01it/s]

Error extracting text from https://finance.yahoo.com/news/signs-demand-weakness-emerge-oil-190000042.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/signs-demand-weakness-emerge-oil-190000042.html


Processing URLs:  99%|█████████▉| 994/1000 [42:11<00:05,  1.11it/s]

Error extracting text from https://www.reuters.com/world/middle-east/israel-permit-right-wing-march-through-jerusalems-old-city-2021-06-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/israel-permit-right-wing-march-through-jerusalems-old-city-2021-06-08/


Processing URLs: 100%|█████████▉| 996/1000 [42:14<00:04,  1.10s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://josiasdesouza.blogosfera.uol.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://josiasdesouza.blogosfera.uol.com.br/&amp;prev=search


Processing URLs: 100%|█████████▉| 997/1000 [43:14<00:56, 18.80s/it]

Error extracting text from https://money.usnews.com/investing/news/articles/2017-11-11/venezuela-debt-meeting-set-for-monday-afternoon: HTTPSConnectionPool(host='money.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs: 100%|██████████| 1000/1000 [43:18<00:00,  2.60s/it]
Processing URLs:   0%|          | 1/1000 [00:00<05:29,  3.04it/s]

Error extracting text from http://www.wsj.com/articles/feds-dudley-still-likely-on-track-for-2015-rate-rise-1443448239: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-dudley-still-likely-on-track-for-2015-rate-rise-1443448239


Processing URLs:   0%|          | 3/1000 [00:00<05:00,  3.32it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-aleppo-ceasefire-idUSKCN0Z20F5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-aleppo-ceasefire-idUSKCN0Z20F5


Processing URLs:   1%|          | 10/1000 [00:09<14:48,  1.11it/s]

Error extracting text from http://www.reuters.com/article/2015/09/08/indonesia-opec-idUSL5N11E3HX20150908: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/08/indonesia-opec-idUSL5N11E3HX20150908
URL filtered: https://www.youtube.com/watch?v=QJxed2-NuKY


Processing URLs:   2%|▏         | 19/1000 [00:22<16:42,  1.02s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/14/gitrep-13mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/14/gitrep-13mar16pm/
Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN1685QS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN1685QS


Processing URLs:   2%|▏         | 21/1000 [00:25<20:06,  1.23s/it]

Error extracting text from https://gcaptain.com/2015/12/22/april-opening-of-panama-canal-expansion-appears-unlikely/#.VodNPkorLIU: 403 Client Error: Forbidden for url: https://gcaptain.com/2015/12/22/april-opening-of-panama-canal-expansion-appears-unlikely/#.VodNPkorLIU


Processing URLs:   2%|▏         | 23/1000 [00:27<16:56,  1.04s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/thaad-more-useful-stick-against-china-north-korean-missiles-15243: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/thaad-more-useful-stick-against-china-north-korean-missiles-15243


Processing URLs:   3%|▎         | 28/1000 [00:32<12:49,  1.26it/s]

Error extracting text from http://www.ihsairport360.com/article/6370/asean-single-aviation-market-entails-atm-changes: HTTPConnectionPool(host='www.janesairport360.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3082fe390>: Failed to resolve 'www.janesairport360.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.inews163.com/2016/01/25/japanese-media-abe-or-because-of-very-concerned-trends-in-china-intends-to-visit-iran-139935.html: HTTPConnectionPool(host='www.inews163.com', port=80): Max retries exceeded with url: /2016/01/25/japanese-media-abe-or-because-of-very-concerned-trends-in-china-intends-to-visit-iran-139935.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3082ff0b0>: Failed to resolve 'www.inews163.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.co

Processing URLs:   3%|▎         | 34/1000 [00:40<17:10,  1.07s/it]

Error extracting text from https://www.cisa.gov/news/2021/01/05/joint-statement-federal-bureau-investigation-fbi-cybersecurity-and-infrastructure: 403 Client Error: Forbidden for url: https://www.cisa.gov/news/2021/01/05/joint-statement-federal-bureau-investigation-fbi-cybersecurity-and-infrastructure


Processing URLs:   4%|▎         | 36/1000 [00:44<22:24,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN1AD1I4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN1AD1I4
Error extracting text from https://www.reuters.com/article/us-usa-court-election/supreme-court-blocks-redrawing-of-north-carolina-congressional-maps-idUSKBN1F73C1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-election/supreme-court-blocks-redrawing-of-north-carolina-congressional-maps-idUSKBN1F73C1


Processing URLs:   4%|▍         | 39/1000 [00:48<20:50,  1.30s/it]

Error extracting text from http://www.gc.noaa.gov/gcil_maritime.html#contiguous: 404 Client Error: Not Found for url: https://www.gc.noaa.gov/gcil_maritime.html#contiguous


Processing URLs:   4%|▍         | 44/1000 [01:53<4:53:39, 18.43s/it]

Error extracting text from https://www.aa.com.tr/en/americas/goldman-sachs-raises-us-gdp-forecast-for-2021/2102968: HTTPSConnectionPool(host='www.aa.com.tr', port=443): Read timed out. (read timeout=60)


Processing URLs:   5%|▌         | 50/1000 [02:01<55:22,  3.50s/it]  

Error extracting text from http://www.bostonmagazine.com/news/blog/2017/02/03/general-electric-border-adjustment-tax/: 404 Client Error: Not Found for url: https://www.bostonmagazine.com/news/blog/2017/02/03/general-electric-border-adjustment-tax/


Processing URLs:   5%|▌         | 52/1000 [02:01<28:35,  1.81s/it]

Error extracting text from http://www.wsj.com/articles/iranian-oil-set-to-return-to-europe-next-month-say-officials-1453462239: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iranian-oil-set-to-return-to-europe-next-month-say-officials-1453462239
Error extracting text from http://www.reuters.com/article/us-turkey-israel-russia-idUSKCN0ZD29U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-israel-russia-idUSKCN0ZD29U


Processing URLs:   5%|▌         | 53/1000 [02:03<28:04,  1.78s/it]

Error extracting text from http://www.ibtimes.com/apple-inc-aapl-cuts-supply-chain-orders-slowing-demand-iphone-6s-2206595: 403 Client Error: Forbidden for url: https://www.ibtimes.com/apple-inc-aapl-cuts-supply-chain-orders-slowing-demand-iphone-6s-2206595


Processing URLs:   5%|▌         | 54/1000 [02:03<22:02,  1.40s/it]

Error extracting text from http://amti.csis.org/ArbitrationTL/index.html: 403 Client Error: Forbidden for url: http://amti.csis.org/ArbitrationTL/index.html


Processing URLs:   6%|▌         | 57/1000 [02:08<22:53,  1.46s/it]

Error extracting text from http://yournewswire.com/wikileaks-pedophile-code-words-podesta/: 410 Client Error: Gone for url: http://yournewswire.com/wikileaks-pedophile-code-words-podesta/


Processing URLs:   6%|▌         | 60/1000 [03:12<3:33:31, 13.63s/it]

Error extracting text from http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/soybean.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-afghanistan-security-idUSKBN18U05U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-security-idUSKBN18U05U


Processing URLs:   6%|▌         | 62/1000 [03:14<1:52:12,  7.18s/it]

URL filtered: https://www.youtube.com/watch?v=Nuz7fcgSbRs


Processing URLs:   7%|▋         | 66/1000 [03:19<47:54,  3.08s/it]  

Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-idUSKCN1B10V6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-idUSKCN1B10V6


Processing URLs:   7%|▋         | 68/1000 [03:20<32:24,  2.09s/it]

Error extracting text from http://www.cfr.org/world/lost-logic-deterrence/p30092: 404 Client Error: Not Found for url: https://www.cfr.org/world/lost-logic-deterrence/p30092


Processing URLs:   7%|▋         | 72/1000 [03:25<23:06,  1.49s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.diariodopoder.com.br/noticia.php%3Fi%3D49905264168&amp;usg=ALkJrhjV7nRdFq0KSq34Q3L6tLXQ2-ouzg: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.diariodopoder.com.br/noticia.php%3Fi%3D49905264168&amp;usg=ALkJrhjV7nRdFq0KSq34Q3L6tLXQ2-ouzg


Processing URLs:   7%|▋         | 74/1000 [03:27<18:55,  1.23s/it]

Error extracting text from http://en.abna24.com/service/iran/archive/2016/08/28/775031/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/iran/archive/2016/08/28/775031/story.html
Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-burgum-idUSKBN14X16L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-burgum-idUSKBN14X16L


Processing URLs:   8%|▊         | 81/1000 [03:34<11:54,  1.29it/s]

Error extracting text from http://www.wsj.com/articles/u-n-says-afghan-civilian-casualties-near-record-high-1469427316: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-n-says-afghan-civilian-casualties-near-record-high-1469427316
Error extracting text from https://www.fxempire.com/forecasts/article/crude-oil-price-forecast-crude-oil-markets-recover-on-tuesday-2-747384: 403 Client Error: Forbidden for url: https://www.fxempire.com/forecasts/article/crude-oil-price-forecast-crude-oil-markets-recover-on-tuesday-2-747384


Processing URLs:   8%|▊         | 83/1000 [03:35<11:30,  1.33it/s]

Error extracting text from http://www.realcleardefense.com/articles/2016/10/18/the_battle_for_mosul_begins_110222.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/10/18/the_battle_for_mosul_begins_110222.html


Processing URLs:   8%|▊         | 84/1000 [03:37<16:48,  1.10s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-05/tesla-falls-after-deliveries-miss-amid-extreme-output-boost


Processing URLs:   9%|▉         | 89/1000 [03:42<16:54,  1.11s/it]

Error extracting text from http://newsbout.com/id/16337184584: HTTPConnectionPool(host='newsbout.com', port=80): Max retries exceeded with url: /id/16337184584 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303884ce0>: Failed to resolve 'newsbout.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|▉         | 96/1000 [03:49<12:55,  1.16it/s]

Error extracting text from https://www.nytimes.com/2020/07/04/us/politics/south-china-sea-aircraft-carrier.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/07/04/us/politics/south-china-sea-aircraft-carrier.html


Processing URLs:  10%|▉         | 98/1000 [03:49<08:49,  1.70it/s]

Error extracting text from http://blog.moneysavingexpert.com/2016/06/05/how-to-vote-in-the-eu-referendum/: 403 Client Error: Forbidden for url: https://blog.moneysavingexpert.com/2016/06/05/how-to-vote-in-the-eu-referendum/


Processing URLs:  10%|█         | 102/1000 [03:58<33:35,  2.24s/it]

Error extracting text from http://www.trtworld.com/turkey/russian-foreign-minister-visits-turkey-to-discuss-regional-problems-242563: 404 Client Error: Not Found for url: https://www.trtworld.com:443/turkey/russian-foreign-minister-visits-turkey-to-discuss-regional-problems-242563


Processing URLs:  10%|█         | 103/1000 [04:29<2:41:39, 10.81s/it]

Error extracting text from http://news.morningstar.com/articlenet/article.aspx?id=715303: 504 Server Error: Gateway Time-out for url: http://news.morningstar.com/articlenet/article.aspx?id=715303


Processing URLs:  11%|█         | 107/1000 [04:33<47:57,  3.22s/it]  

Error extracting text from https://www.uyghurcongress.org/en/wp-content/uploads/2020/12/Beijing2020-JointLetter-JASamaranch-December2020.pdf: 403 Client Error: Forbidden for url: https://www.uyghurcongress.org/en/wp-content/uploads/2020/12/Beijing2020-JointLetter-JASamaranch-December2020.pdf


Processing URLs:  11%|█         | 109/1000 [04:36<31:39,  2.13s/it]

Error extracting text from http://www.amazon.com/Africa-2016-2017-World-Today-Stryker/dp/1475829027/ref=sr_1_2?s=books&amp;ie=UTF8&amp;qid=1478114131&amp;sr=1-2&amp;keywords=africa+the+world+today: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Africa-2016-2017-World-Today-Stryker/dp/1475829027/ref=sr_1_2?s=books&amp;ie=UTF8&amp;qid=1478114131&amp;sr=1-2&amp;keywords=africa+the+world+today


Processing URLs:  11%|█         | 112/1000 [04:45<34:42,  2.34s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2013-03
Error extracting text from http://www.washingtontimes.com/news/2016/may/13/obama-urged-to-sanction-iranian-hackers/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/may/13/obama-urged-to-sanction-iranian-hackers/


Processing URLs:  11%|█▏        | 114/1000 [04:46<23:12,  1.57s/it]

Error extracting text from https://www.gop.com/the-official-guide-to-the-2016-republican-nominating-process/: 403 Client Error: Forbidden for url: https://www.gop.com/the-official-guide-to-the-2016-republican-nominating-process/


Processing URLs:  12%|█▏        | 115/1000 [04:49<28:38,  1.94s/it]

URL filtered: https://www.youtube.com/watch?v=CXVu_DeB4wo


Processing URLs:  12%|█▏        | 119/1000 [04:57<29:32,  2.01s/it]

URL filtered: https://www.bloomberg.com/gadfly/articles/2016-06-26/iran-s-oil-boom-fizzles-out


Processing URLs:  12%|█▏        | 122/1000 [05:01<23:16,  1.59s/it]

Error extracting text from https://www.sfgate.com/news/article/Push-to-end-Daylight-Saving-Time-in-California-12997311.php?utm_campaign=reddit-desktop&utm_source=CMS%20Sharing%20Button&utm_medium=social&utm_campaign=reddit-desktop&utm_source=CMS%20Sharing%20Button&utm_medium=social: 403 Client Error: Forbidden for url: https://www.sfgate.com/news/article/Push-to-end-Daylight-Saving-Time-in-California-12997311.php?utm_campaign=reddit-desktop&utm_source=CMS%20Sharing%20Button&utm_medium=social&utm_campaign=reddit-desktop&utm_source=CMS%20Sharing%20Button&utm_medium=social


Processing URLs:  12%|█▏        | 123/1000 [05:01<18:13,  1.25s/it]

Error extracting text from http://www.wsj.com/articles/north-korea-launches-five-projectiles-into-sea-1458549029: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-launches-five-projectiles-into-sea-1458549029


Processing URLs:  13%|█▎        | 126/1000 [05:06<18:36,  1.28s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/15/gitrep-15mar16am/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/15/gitrep-15mar16am/


Processing URLs:  13%|█▎        | 127/1000 [05:06<14:19,  1.02it/s]

Error extracting text from https://www.us-cert.gov/HIDDEN-COBRA-North-Korean-Malicious-Cyber-Activity: 403 Client Error: Forbidden for url: https://www.us-cert.gov/HIDDEN-COBRA-North-Korean-Malicious-Cyber-Activity


Processing URLs:  13%|█▎        | 130/1000 [05:10<16:15,  1.12s/it]

Error extracting text from http://www.ft.com/intl/cms/s/0/f372528e-2: 404 Client Error: Not Found for url: https://www.ft.com/cms/s/0/f372528e-2


Processing URLs:  13%|█▎        | 133/1000 [05:15<21:42,  1.50s/it]

Error extracting text from https://tradingeconomics.com/commodity/carbon: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/carbon


Processing URLs:  14%|█▎        | 137/1000 [05:20<18:04,  1.26s/it]

Error extracting text from https://www.khaama.com/isis-militants-kill-taliban-commander-and-his-fighters-in-nangarhar-04124: 403 Client Error: Forbidden for url: https://www.khaama.com/isis-militants-kill-taliban-commander-and-his-fighters-in-nangarhar-04124


Processing URLs:  14%|█▍        | 141/1000 [05:25<19:46,  1.38s/it]

Error extracting text from https://www.wirelessweek.com/news/2017/01/all-eyes-doj-time-warner-deal-approval-following-trump-t-meeting: 403 Client Error: Forbidden for url: https://www.5gtechnologyworld.com/wirelessweek-signup/?utm_source=wirelessweek&utm_medium=url&utm_campaign=WirelessWeek&utm_term=WWW


Processing URLs:  14%|█▍        | 144/1000 [05:29<16:46,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-usa-iran-cyber-idUSKCN0WP2NM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-cyber-idUSKCN0WP2NM


Processing URLs:  14%|█▍        | 145/1000 [05:29<12:52,  1.11it/s]

Error extracting text from http://www.wsj.com/articles/putin-plans-first-visit-to-iran-since-2007-1447423597: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/putin-plans-first-visit-to-iran-since-2007-1447423597


Processing URLs:  15%|█▍        | 147/1000 [05:31<11:21,  1.25it/s]

Error extracting text from http://atimes.com/2016/03/russia-signals-interest-to-defrost-ties-with-turkey/: 404 Client Error: Not Found for url: https://atimes.com/2016/03/russia-signals-interest-to-defrost-ties-with-turkey/
Error extracting text from http://www.nytimes.com/2016/04/22/world/middleeast/europe-says-us-regulations-keeping-it-from-trade-with-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/world/middleeast/europe-says-us-regulations-keeping-it-from-trade-with-iran.html


Processing URLs:  15%|█▍        | 148/1000 [05:33<15:06,  1.06s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/how-venezuela-bondholders-finally-ran-out-of-time-quicktake-q-a


Processing URLs:  15%|█▌        | 151/1000 [05:33<08:10,  1.73it/s]

Error extracting text from http://www.reuters.com/article/2015/10/30/us-mideast-crisis-syria-iran-idUSKCN0SN2Z620151030: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/30/us-mideast-crisis-syria-iran-idUSKCN0SN2Z620151030
URL filtered: https://www.youtube.com/watch?v=FFqb1I-hiHE


Processing URLs:  15%|█▌        | 154/1000 [05:36<09:28,  1.49it/s]

Error extracting text from http://www.reuters.com/article/2015/10/11/us-trade-tpp-rcep-idUSKCN0S500220151011: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/11/us-trade-tpp-rcep-idUSKCN0S500220151011


Processing URLs:  16%|█▌        | 157/1000 [05:40<11:38,  1.21it/s]

Error extracting text from http://www.nytimes.com/2015/12/03/world/europe/kerry-nato-syria-russia.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/world/europe/kerry-nato-syria-russia.html?_r=0


Processing URLs:  17%|█▋        | 167/1000 [06:11<25:06,  1.81s/it]  

Error extracting text from https://bit.ly/37hBzHK: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  17%|█▋        | 168/1000 [06:12<21:30,  1.55s/it]

Error extracting text from http://www.comres.co.uk/polls/itv-news-march-2016-eu-referendum-poll/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/itv-news-march-2016-eu-referendum-poll/


Processing URLs:  17%|█▋        | 169/1000 [06:13<20:34,  1.49s/it]

Error extracting text from https://www.cbsnews.com/amp/news/car-t-leukemia-cancer-gene-therapy-fda/G: 404 Client Error: Not Found for url: https://www.cbsnews.com/amp/news/car-t-leukemia-cancer-gene-therapy-fda/G


Processing URLs:  18%|█▊        | 176/1000 [06:21<13:02,  1.05it/s]

Error extracting text from http://www.timesofisrael.com/iran-seeking-illegal-nuke-missile-technology-says-german-intel-report/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iran-seeking-illegal-nuke-missile-technology-says-german-intel-report/


Processing URLs:  18%|█▊        | 180/1000 [06:22<05:18,  2.57it/s]

Error extracting text from http://www.nti.org/gmap/biological_kazakhstan.html: 403 Client Error: Forbidden for url: https://www.nti.org/gmap/biological_kazakhstan.html
URL filtered: https://www.bloomberg.com/news/articles/2016-08-03/venezuelan-pride-keeps-cheap-oil-flowing-as-economy-collapses
Error extracting text from https://www.reuters.com/article/us-britain-eu-settlement/eu-and-britain-agree-settlement-post-brexit-senior-eu-official-idUSKBN1DU2KO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-settlement/eu-and-britain-agree-settlement-post-brexit-senior-eu-official-idUSKBN1DU2KO


Processing URLs:  19%|█▊        | 186/1000 [06:36<26:49,  1.98s/it]

Error extracting text from http://nr2.com.ua/blogs/Ksenija_Kirillova/V-Bryussele-rasskazali-o-gibridnoy-voyne-Rossii-protiv-Ukrainy--122075.html: 404 Client Error: Not Found for url: https://nr2.com.ua/blogs/Ksenija_Kirillova/V-Bryussele-rasskazali-o-gibridnoy-voyne-Rossii-protiv-Ukrainy--122075.html
Error extracting text from https://www.reuters.com/article/us-southkorea-china-russia-idUSKBN28W139: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-china-russia-idUSKBN28W139


Processing URLs:  19%|█▉        | 193/1000 [06:58<22:19,  1.66s/it]  

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-obama-hollande-isis-20151124-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-obama-hollande-isis-20151124-story.html


Processing URLs:  20%|█▉        | 195/1000 [06:59<13:53,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-cohen/cohen-close-trump-business-adviser-to-testify-in-senate-on-tuesday-idUSKCN1BS0XW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-cohen/cohen-close-trump-business-adviser-to-testify-in-senate-on-tuesday-idUSKCN1BS0XW
URL filtered: https://twitter.com/AmyEGardner/status/1336836103454797824


Processing URLs:  20%|█▉        | 197/1000 [06:59<09:29,  1.41it/s]

URL filtered: http://bloomberg.econoday.com/byshoweventfull.asp?fid=467025&amp;cust=bloomberg-us&amp;year=2015&amp;lid=0&amp;prev=/byweek.asp#top


Processing URLs:  20%|█▉        | 199/1000 [07:00<08:03,  1.66it/s]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&amp;s=RBRTE&amp;f=M: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  20%|██        | 200/1000 [07:00<06:52,  1.94it/s]

Error extracting text from http://www.wsj.com/articles/senate-passes-government-funding-bill-prior-to-midnight-deadline-1443623598: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/senate-passes-government-funding-bill-prior-to-midnight-deadline-1443623598
Error extracting text from https://www.reuters.com/business/energy/oil-falls-us-gasoline-stock-draw-raises-prospect-spr-release-2021-11-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-falls-us-gasoline-stock-draw-raises-prospect-spr-release-2021-11-17/


Processing URLs:  20%|██        | 202/1000 [07:01<05:55,  2.24it/s]

URL filtered: https://www.bloomberg.com/view/articles/2017-05-02/why-people-care-about-the-estate-tax


Processing URLs:  20%|██        | 204/1000 [07:02<07:10,  1.85it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/turkey-battle-iraqs-mosul-spark-sectarian-strife-42550551: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/turkey-battle-iraqs-mosul-spark-sectarian-strife-42550551


Processing URLs:  21%|██        | 208/1000 [07:07<11:11,  1.18it/s]

Error extracting text from http://www.cdc.gov/zika/pregnancy/question-answers.html: 404 Client Error: Not Found for url: https://www.cdc.gov/zika/pregnancy/question-answers.html


Processing URLs:  21%|██        | 210/1000 [07:08<10:32,  1.25it/s]

Error extracting text from https://www.un.org/press/en/2017/db170623.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2017/db170623.doc.htm


Processing URLs:  22%|██▏       | 215/1000 [07:23<35:00,  2.68s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3931692/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3931692/


Processing URLs:  22%|██▏       | 219/1000 [07:28<19:09,  1.47s/it]

Error extracting text from https://thehill.com/policy/energy-environment/578627-3-issues-to-watch-at-climate-summit: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/578627-3-issues-to-watch-at-climate-summit/


Processing URLs:  22%|██▏       | 220/1000 [07:28<14:32,  1.12s/it]

Error extracting text from https://www.gov.il/en/departments/prime_ministers_office: 403 Client Error: Forbidden for url: https://www.gov.il/en/departments/prime_ministers_office


Processing URLs:  22%|██▏       | 224/1000 [07:33<15:49,  1.22s/it]

Error extracting text from http://www.freenewspos.com/en/pop-news-article/c/4563164/ebola/polio-resurfaces-in-mali-from-ebola-hit-guinea-who: HTTPSConnectionPool(host='sbobetmobile5758.com', port=443): Max retries exceeded with url: /en/pop-news-article/c/4563164/ebola/polio-resurfaces-in-mali-from-ebola-hit-guinea-who (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff0b9730>: Failed to resolve 'sbobetmobile5758.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  23%|██▎       | 227/1000 [07:39<26:15,  2.04s/it]

URL filtered: https://www.theguardian.com/technology/2017/jan/06/facebook-hires-campbell-brown-media-partnerships


Processing URLs:  23%|██▎       | 229/1000 [07:41<17:24,  1.35s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VQ152: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VQ152


Processing URLs:  23%|██▎       | 233/1000 [07:47<18:08,  1.42s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-18/saudis-said-to-weigh-raising-gasoline-prices-by-end-of-november


Processing URLs:  24%|██▎       | 235/1000 [07:48<13:13,  1.04s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/topics_31803.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_31803.htm


Processing URLs:  24%|██▎       | 237/1000 [07:49<12:15,  1.04it/s]

Error extracting text from http://www.stripes.com/news/pentagon-china-deploys-16-fighter-jets-to-disputed-south-china-sea-island-1.404391: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/pentagon-china-deploys-16-fighter-jets-to-disputed-south-china-sea-island-1.404391


Processing URLs:  25%|██▍       | 248/1000 [08:10<19:05,  1.52s/it]

Error extracting text from https://www.google.com/amp/www.ibtimes.com/south-china-sea-controversy-indonesia-deploy-f-16-fighter-jets-natuna-islands-2346657%3famp=1?client=safari#: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-controversy-indonesia-deploy-f-16-fighter-jets-natuna-islands-2346657?amp=1


Processing URLs:  25%|██▌       | 252/1000 [08:15<17:46,  1.43s/it]

Error extracting text from http://the-japan-news.com/news/article/0002455021: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002455021


Processing URLs:  26%|██▌       | 257/1000 [08:21<14:51,  1.20s/it]

Error extracting text from http://www.cnbc.com/2017/03/02/reuters-america-brics-bank-to-lend-between-25-3-bln-in-2017--china-daily.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/03/02/reuters-america-brics-bank-to-lend-between-25-3-bln-in-2017--china-daily.html


Processing URLs:  26%|██▌       | 259/1000 [09:24<3:57:08, 19.20s/it]

Error extracting text from http://www.itv.com/news/2016-06-12/claims-that-uk-is-set-to-open-doors-to-1m-turkish-citizens-completely-untrue-says-government/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  26%|██▌       | 260/1000 [09:24<2:47:51, 13.61s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/350035-fbi-investigating-russian-news-agency-report: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/350035-fbi-investigating-russian-news-agency-report/


Processing URLs:  26%|██▌       | 261/1000 [09:27<2:05:46, 10.21s/it]

Error extracting text from http://www.38north.org/2017/10/sinpo101117/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  26%|██▌       | 262/1000 [09:28<1:34:17,  7.67s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/gop-moved-back-moore-allegations-51616817: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/gop-moved-back-moore-allegations-51616817


Processing URLs:  26%|██▋       | 263/1000 [09:29<1:08:08,  5.55s/it]

Error extracting text from https://www.un.org/en/observances/delegates-day: 403 Client Error: Forbidden for url: https://www.un.org/en/observances/delegates-day


Processing URLs:  26%|██▋       | 264/1000 [09:30<50:57,  4.15s/it]  

Error extracting text from http://www2.politicalbetting.com/index.php/archives/2016/05/02/if-zac-loses-london-and-the-brexiters-fail-it-will-say-a-lot-about-the-declining-influence-of-the-press/: 404 Client Error: Not Found for url: http://www2.politicalbetting.com/index.php/archives/2016/05/02/if-zac-loses-london-and-the-brexiters-fail-it-will-say-a-lot-about-the-declining-influence-of-the-press/


Processing URLs:  26%|██▋       | 265/1000 [09:31<38:25,  3.14s/it]



Processing URLs:  27%|██▋       | 269/1000 [09:37<23:37,  1.94s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0WC0VE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0WC0VE


Processing URLs:  27%|██▋       | 270/1000 [09:40<24:41,  2.03s/it]

Error extracting text from http://www.mediapost.com/publications/article/284320/smart-publishers-should-move-beyond-header-bidding.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  28%|██▊       | 275/1000 [09:47<16:03,  1.33s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN10B10G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN10B10G


Processing URLs:  28%|██▊       | 277/1000 [09:48<13:25,  1.11s/it]

Error extracting text from http://www.ibtimes.co.uk/yemen-crisis-who-are-houthis-what-do-they-want-1493573: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/yemen-crisis-who-are-houthis-what-do-they-want-1493573


Processing URLs:  28%|██▊       | 279/1000 [09:53<22:19,  1.86s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-28/ringgit-gain-leads-emerging-markets-budget-review-in-spotlight


Processing URLs:  28%|██▊       | 283/1000 [09:57<13:32,  1.13s/it]

Error extracting text from https://www.opendemocracy.net/en/democraciaabierta/nicaragua-s-failed-coup/: 403 Client Error: Forbidden for url: https://www.opendemocracy.net/en/democraciaabierta/nicaragua-s-failed-coup/


Processing URLs:  29%|██▉       | 288/1000 [10:05<18:46,  1.58s/it]

URL filtered: http://uk.businessinsider.com/uk-government-pressure-facebook-russian-interference-twitter-brexit-2017-11


Processing URLs:  29%|██▉       | 291/1000 [10:07<12:17,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missile-japan-idUSKCN0V70IB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-japan-idUSKCN0V70IB


Processing URLs:  29%|██▉       | 292/1000 [10:09<14:42,  1.25s/it]

URL filtered: https://www.youtube.com/watch?v=I4vJM4L2D2U&amp;list=RDI4vJM4L2D2U&amp;t=316


Processing URLs:  29%|██▉       | 294/1000 [10:30<1:02:03,  5.27s/it]

URL filtered: http://www.bloombergview.com/articles/2015-12-22/assad-is-reaching-out-to-washington-insiders?cmpid=wsdemand


Processing URLs:  30%|██▉       | 297/1000 [10:36<41:50,  3.57s/it]  

URL filtered: https://www.youtube.com/watch?v=2FWBTaIgO1s


Processing URLs:  30%|███       | 300/1000 [10:39<25:13,  2.16s/it]

Error extracting text from http://www.tradingeconomics.com/country-list/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/country-list/rating


Processing URLs:  30%|███       | 303/1000 [10:42<17:16,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=jv3JvJGlKWU


Processing URLs:  30%|███       | 305/1000 [10:42<10:54,  1.06it/s]

Error extracting text from https://seekingalpha.com/article/4106050-t-time-warner-coming-home-stretch: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4106050-t-time-warner-coming-home-stretch


Processing URLs:  31%|███       | 306/1000 [10:45<15:57,  1.38s/it]

Error extracting text from https://www.fhu.com/hegelian.html: 404 Client Error: Not Found for url: https://www.fhu.com/hegelian.html


Processing URLs:  31%|███       | 308/1000 [10:46<11:41,  1.01s/it]

Error extracting text from http://www.nytimes.com/2015/11/30/world/middleeast/hope-for-nefertitis-tomb-and-egypts-economy.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/30/world/middleeast/hope-for-nefertitis-tomb-and-egypts-economy.html?_r=0
URL filtered: https://twitter.com/DonnaBow/status/730692705517441026


Processing URLs:  31%|███       | 312/1000 [10:53<17:50,  1.56s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-09/china-to-resume-ipos-by-year-end-as-stocks-enter-bull-market


Processing URLs:  31%|███▏      | 314/1000 [10:53<11:24,  1.00it/s]

Error extracting text from http://eng.mod.gov.cn/MilitaryExercises/#: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/MilitaryExercises/


Processing URLs:  32%|███▏      | 316/1000 [10:54<07:22,  1.55it/s]

Error extracting text from http://www.wsj.com/articles/time-inc-plans-significant-reorganization-to-generate-non-print-revenue-1467395384: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-plans-significant-reorganization-to-generate-non-print-revenue-1467395384


Processing URLs:  32%|███▏      | 320/1000 [11:01<17:36,  1.55s/it]

URL filtered: https://twitter.com/RichardHanania/status/1470999607194767368


Processing URLs:  32%|███▏      | 323/1000 [11:09<26:11,  2.32s/it]

Error extracting text from http://www.diplomatie.gouv.fr/en/country-files/democratic-republic-of-the-congo/events/article/democratic-republic-of-congo-situation-in-the-kasai-region-20-02-17: 404 Client Error: Not Found for url: https://www.diplomatie.gouv.fr/en/country-files/democratic-republic-of-the-congo/news/article/democratic-republic-of-congo-situation-in-the-kasai-region-20-02-17


Processing URLs:  33%|███▎      | 326/1000 [11:13<19:34,  1.74s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-iraq-mosul-idUKKCN0WH1DU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  33%|███▎      | 329/1000 [11:18<17:43,  1.58s/it]

Error extracting text from http://nanonews.org/irans-uranium-stockpile-turned-over-to-russia-under-nuclear/: 500 Server Error: Internal Server Error for url: https://nanonews.org/irans-uranium-stockpile-turned-over-to-russia-under-nuclear/
URL filtered: https://www.cnet.com/news/facebook-libra-cryptocurrency-hearings-with-congress-day-1-watch-here-live/
URL filtered: https://www.instagram.com/nike/


Processing URLs:  34%|███▎      | 337/1000 [11:26<13:26,  1.22s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/haiti-will-miss-election/2698588.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/haiti-will-miss-election/2698588.html


Processing URLs:  34%|███▍      | 338/1000 [11:26<11:20,  1.03s/it]

Error extracting text from http://marginalrevolution.com/marginalrevolution/2016/04/claims-about-brexit.html: 403 Client Error: Forbidden for url: http://marginalrevolution.com/marginalrevolution/2016/04/claims-about-brexit.html


Processing URLs:  35%|███▍      | 348/1000 [11:48<28:18,  2.60s/it]

Error extracting text from https://www.c-span.org/video/?297168-1/senate-session&amp;start=749: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?297168-1/senate-session&amp;start=749


Processing URLs:  35%|███▌      | 350/1000 [11:50<19:44,  1.82s/it]

Error extracting text from https://www.reuters.com/business/energy/german-regulator-says-touch-with-nord-stream-2-over-certification-issues-2021-10-22/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/german-regulator-says-touch-with-nord-stream-2-over-certification-issues-2021-10-22/


Processing URLs:  35%|███▌      | 353/1000 [12:53<3:27:06, 19.21s/it]

Error extracting text from http://olympic.org/: HTTPConnectionPool(host='olympic.org', port=80): Read timed out. (read timeout=60)


Processing URLs:  36%|███▌      | 355/1000 [12:56<1:46:51,  9.94s/it]

Error extracting text from http://www.reuters.com/article/us-russia-oil-idUSKBN17Q1K2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-oil-idUSKBN17Q1K2


Processing URLs:  36%|███▌      | 356/1000 [12:57<1:17:48,  7.25s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-09-20/self-driving-cars-must-meet-15-benchmarks-in-new-u-s-guidance


Processing URLs:  36%|███▌      | 360/1000 [12:58<25:59,  2.44s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-court-nominee-idUSKBN158288?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-nominee-idUSKBN158288?il=0


Processing URLs:  36%|███▋      | 363/1000 [13:04<21:33,  2.03s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-24/sales-of-new-u-s-homes-in-january-were-slower-than-forecast


Processing URLs:  37%|███▋      | 367/1000 [13:09<15:51,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YM0U5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YM0U5


Processing URLs:  37%|███▋      | 368/1000 [13:10<14:39,  1.39s/it]

Error extracting text from http://www.iran-daily.com/News/166972.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  37%|███▋      | 371/1000 [14:15<3:14:55, 18.59s/it]

Error extracting text from http://www.usnews.com/news/articles/2015-12-31/us-readies-sanctions-on-iran-for-missile-tests: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  37%|███▋      | 373/1000 [14:20<1:50:04, 10.53s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/eu-us-trade-chiefs-press-for-ttip-advances-despite-election-brexit-dynamic: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/eu-us-trade-chiefs-press-for-ttip-advances-despite-election-brexit-dynamic


Processing URLs:  37%|███▋      | 374/1000 [14:21<1:22:38,  7.92s/it]

URL filtered: https://twitter.com/TBowmanNPR/status/1426163798063337473?s=19


Processing URLs:  38%|███▊      | 377/1000 [14:23<37:29,  3.61s/it]  

Error extracting text from http://seekingalpha.com/article/4028253-teslas-december-remember-model-3-problems: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4028253-teslas-december-remember-model-3-problems


Processing URLs:  38%|███▊      | 379/1000 [14:33<39:39,  3.83s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/iranian-parliament-speaker-urges-decision-nuclear-deal-34234047: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iranian-parliament-speaker-urges-decision-nuclear-deal-34234047


Processing URLs:  38%|███▊      | 380/1000 [14:33<29:22,  2.84s/it]

Error extracting text from http://www.reuters.com/article/iran-energy-tender-idUSL8N1JH13B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/iran-energy-tender-idUSL8N1JH13B


Processing URLs:  39%|███▊      | 386/1000 [14:36<07:35,  1.35it/s]

Error extracting text from http://www.nytimes.com/aponline/2015/11/24/world/europe/ap-eu-turkey-syria-plane.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/11/24/world/europe/ap-eu-turkey-syria-plane.html?ref=world


Processing URLs:  39%|███▉      | 391/1000 [14:45<09:54,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-security-contest-idUSKBN1872RO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-contest-idUSKBN1872RO


Processing URLs:  39%|███▉      | 393/1000 [14:47<11:14,  1.11s/it]

Error extracting text from https://www.who.int/blueprint/priority-diseases/key-action/novel-coronavirus-landscape-ncov.pdf?ua=1: 404 Client Error: Not Found for url: https://www.who.int/blueprint/priority-diseases/key-action/novel-coronavirus-landscape-ncov.pdf?ua=1


Processing URLs:  40%|███▉      | 395/1000 [15:06<44:06,  4.38s/it]  

Error extracting text from https://www.nytimes.com/2017/01/23/health/malnutrition-nigeria-children.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/23/health/malnutrition-nigeria-children.html?_r=0


Processing URLs:  40%|███▉      | 397/1000 [16:08<3:27:33, 20.65s/it]

Error extracting text from http://aa.com.tr/en/economy/turkey-says-russia-is-keeping-its-promises-on-syria/713897http://www.reuters.com/article/us-mideast-crisis-syria-russia-idUSKBN14D0JA: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  40%|███▉      | 398/1000 [16:26<3:21:10, 20.05s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-11-09/switzerland-s-central-bank-holds-u-s-stocks-worth-127-billion


Processing URLs:  40%|████      | 401/1000 [16:29<1:26:58,  8.71s/it]

Error extracting text from https://www.reuters.com/article/us-iran-oil-opec-cut-idUSKBN19C1LJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-opec-cut-idUSKBN19C1LJ


Processing URLs:  40%|████      | 404/1000 [16:31<42:02,  4.23s/it]  

Error extracting text from http://www.reuters.com/article/us-southchinasea-usa-china-idUSKCN0YR01D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-usa-china-idUSKCN0YR01D


Processing URLs:  41%|████      | 410/1000 [16:43<17:44,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AE0JO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AE0JO?il=0


Processing URLs:  42%|████▏     | 416/1000 [16:52<09:43,  1.00it/s]

Error extracting text from http://www.wsj.com/articles/teslas-autopilot-vexes-some-drivers-even-its-fans-1467827084: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/teslas-autopilot-vexes-some-drivers-even-its-fans-1467827084
Error extracting text from http://www.straitstimes.com/world/europe/russia-slams-un-envoy-for-sabotaging-resolution-on-syria-peace-talks: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  42%|████▏     | 419/1000 [16:57<12:32,  1.30s/it]

Error extracting text from http://abcnews.go.com/Business/wireStory/us-budget-deficit-widened-november-35697500: 404 Client Error: Not Found for url: https://abcnews.go.com/Business/wireStory/us-budget-deficit-widened-november-35697500


Processing URLs:  42%|████▏     | 423/1000 [16:58<05:04,  1.89it/s]

URL filtered: https://www.bloomberg.com/news/articles/2020-07-10/elon-musk-rockets-past-warren-buffett-on-billionaires-ranking?sref=DOTC0U32&utm_source=twitter&utm_content=business&utm_medium=social&utm_campaign=socialflow-organic&cmpid=socialflow-twitter-business
Error extracting text from http://www.reuters.com/article/2015/09/15/us-usa-ge-eximbank-idUSKCN0RF1KF20150915: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/15/us-usa-ge-eximbank-idUSKCN0RF1KF20150915
URL filtered: http://www.bloomberg.com/news/articles/2016-06-09/east-china-sea-tensions-rise-over-chinese-ships-plane-intercept


Processing URLs:  43%|████▎     | 426/1000 [17:02<09:35,  1.00s/it]

Error extracting text from http://www.middleeasteye.net/news/nusra-aleppo-ceasefire-1772863533http://aranews.net/2016/04/syrian-islamist-rebels-capture-two-key-towns-south-aleppo/: 404 Client Error: Not Found for url: https://www.middleeasteye.net/news/nusra-aleppo-ceasefire-1772863533http:/aranews.net/2016/04/syrian-islamist-rebels-capture-two-key-towns-south-aleppo/


Processing URLs:  43%|████▎     | 427/1000 [17:04<12:15,  1.28s/it]

Error extracting text from https://www.reuters.com/article/us-apple-stocks/apple-market-value-we-may-need-a-bigger-chart-idUSKBN1D20BQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apple-stocks/apple-market-value-we-may-need-a-bigger-chart-idUSKBN1D20BQ


Processing URLs:  43%|████▎     | 431/1000 [17:09<12:28,  1.32s/it]

Error extracting text from https://www.feedbaac.com/whats-new/1569/amazon-performs-its-first-public-prime-air-drone-delivery-demonstration-dubai-uae: 404 Client Error: Not Found for url: https://www.feedbaac.com/whats-new/1569/amazon-performs-its-first-public-prime-air-drone-delivery-demonstration-dubai-uae


Processing URLs:  43%|████▎     | 432/1000 [17:10<10:05,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/joe-biden-affirms-commitment-to-isis-fight-if-syria-talks-fail-1453565763: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/joe-biden-affirms-commitment-to-isis-fight-if-syria-talks-fail-1453565763


Processing URLs:  44%|████▍     | 438/1000 [17:17<10:38,  1.14s/it]

Error extracting text from http://cs.ro/: HTTPConnectionPool(host='cs.ro', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffa50e30>: Failed to resolve 'cs.ro' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  44%|████▍     | 439/1000 [17:19<12:13,  1.31s/it]

Error extracting text from http://www.ibtimes.com/will-britain-leave-eu-over-panama-papers-how-david-cameron-may-affect-brexit-vote-2349347: 403 Client Error: Forbidden for url: https://www.ibtimes.com/will-britain-leave-eu-over-panama-papers-how-david-cameron-may-affect-brexit-vote-2349347


Processing URLs:  44%|████▍     | 441/1000 [17:20<09:05,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-usa-election-clinton-syria-idUSKCN0RZ1C020151005: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-clinton-syria-idUSKCN0RZ1C020151005


Processing URLs:  45%|████▍     | 446/1000 [17:26<09:18,  1.01s/it]

Error extracting text from http://www.nytimes.com/2016/04/17/opinion/sunday/a-challenge-to-polands-anti-democratic-drift.html?emc=edit_th_20160417&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/17/opinion/sunday/a-challenge-to-polands-anti-democratic-drift.html?emc=edit_th_20160417&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  45%|████▍     | 447/1000 [17:26<07:35,  1.22it/s]

Error extracting text from https://www.axios.com/lina-khan-ftc-chair-tech-24fa3aa0-d3ba-47b0-bbda-db87f401ac25.html: 403 Client Error: Forbidden for url: https://www.axios.com/lina-khan-ftc-chair-tech-24fa3aa0-d3ba-47b0-bbda-db87f401ac25.html


Processing URLs:  45%|████▍     | 449/1000 [17:27<05:17,  1.74it/s]

Error extracting text from https://english.aawsat.com/home/article/2981776/eu-top-envoy-iran-nuclear-talks-says-confident-deal-will-be-reached: 403 Client Error: Forbidden for url: https://english.aawsat.com/home/article/2981776/eu-top-envoy-iran-nuclear-talks-says-confident-deal-will-be-reached
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16X11O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16X11O


Processing URLs:  45%|████▌     | 453/1000 [17:37<13:53,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2?feedType=RSS&amp;feedName=topNews&amp;rpc=932: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2?feedType=RSS&amp;feedName=topNews&amp;rpc=932


Processing URLs:  45%|████▌     | 454/1000 [17:38<13:15,  1.46s/it]

Error extracting text from https://warisboring.com/the-united-states-is-getting-more-and-more-irritated-at-russias-nuke-treaty-violation-4feab0fa631e#.k8ug11rfx: 403 Client Error: Forbidden for url: https://warisboring.com/the-united-states-is-getting-more-and-more-irritated-at-russias-nuke-treaty-violation-4feab0fa631e#.k8ug11rfx


Processing URLs:  46%|████▌     | 455/1000 [17:40<12:48,  1.41s/it]

Error extracting text from http://www.fpri.org/articles/2016/02/quiet-frenchman-why-francois-hollande-staying-silent-brexit: 403 Client Error: Forbidden for url: https://www.fpri.org/articles/2016/02/quiet-frenchman-why-francois-hollande-staying-silent-brexit


Processing URLs:  46%|████▌     | 456/1000 [17:50<36:09,  3.99s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/syria-mission-demonstrates-russias-new-prowess/2015/10/24/ae6b07a2-7a3a-11e5-a5e2-40d6b2ad18dd_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/syria-mission-demonstrates-russias-new-prowess/2015/10/24/ae6b07a2-7a3a-11e5-a5e2-40d6b2ad18dd_story.html


Processing URLs:  46%|████▌     | 457/1000 [17:50<26:20,  2.91s/it]

Error extracting text from http://greece.greekreporter.com/2016/04/11/imf-chief-greece-just-sits-and-waits-government-must-assume-responsibility/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/04/11/imf-chief-greece-just-sits-and-waits-government-must-assume-responsibility/


Processing URLs:  46%|████▌     | 458/1000 [17:51<20:01,  2.22s/it]

Error extracting text from http://www.yenisafak.com/en/news/smart-tower-construction-begins-on-turkeys-syrian-border-2481465: 422 Client Error:  for url: http://www.yenisafak.com/en/news/smart-tower-construction-begins-on-turkeys-syrian-border-2481465


Processing URLs:  46%|████▌     | 461/1000 [17:55<13:23,  1.49s/it]

Error extracting text from http://www.hbo.com/game-of-thrones/index.html: 404 Client Error: Not Found for url: https://www.hbo.com/game-of-thrones/index.html


Processing URLs:  46%|████▌     | 462/1000 [17:57<15:04,  1.68s/it]

Error extracting text from http://www.newstatesman.com/world/2016/05/great-wall-sand: 404 Client Error: Not Found for url: https://www.newstatesman.com/world/2016/05/great-wall-sand


Processing URLs:  46%|████▋     | 463/1000 [17:58<13:11,  1.47s/it]

Error extracting text from http://nationalinterest.org/feature/4-ways-north-koreas-nukes-may-actually-be-used-21790: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/4-ways-north-koreas-nukes-may-actually-be-used-21790


Processing URLs:  46%|████▋     | 464/1000 [17:59<13:44,  1.54s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/tunneling-nkorea-nuclear-test-site-website-35537511: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/tunneling-nkorea-nuclear-test-site-website-35537511


Processing URLs:  47%|████▋     | 467/1000 [18:02<09:00,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-germany-politics-merkel-idUSKBN15J0LN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-merkel-idUSKBN15J0LN


Processing URLs:  47%|████▋     | 468/1000 [18:02<07:50,  1.13it/s]

Error extracting text from http://thehill.com/homenews/campaign/361588-flake-trump-made-a-big-mistake-backing-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361588-flake-trump-made-a-big-mistake-backing-moore/


Processing URLs:  48%|████▊     | 475/1000 [18:13<09:46,  1.12s/it]

Error extracting text from http://www.nytimes.com/2011/06/09/business/global/09opec.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2011/06/09/business/global/09opec.html


Processing URLs:  48%|████▊     | 477/1000 [18:14<06:20,  1.37it/s]

Error extracting text from https://www.longroom.com/discussion/44901/computer-virus-found-in-bavarian-nuclear-plant: HTTPSConnectionPool(host='www.longroom.com', port=443): Max retries exceeded with url: /discussion/44901/computer-virus-found-in-bavarian-nuclear-plant (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3042e5f10>: Failed to resolve 'www.longroom.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 480/1000 [18:21<14:30,  1.67s/it]

Error extracting text from http://www.parl.gc.ca/housechamberbusiness/ChamberCalendar.aspx?Key=2017&amp;Language=E&amp;Mode=1&amp;Parl=42&amp;Ses=1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  49%|████▉     | 488/1000 [18:37<09:39,  1.13s/it]

Error extracting text from https://commonslibrary.parliament.uk/research-briefings/cbp-7886/: 403 Client Error: Forbidden for url: https://commonslibrary.parliament.uk/research-briefings/cbp-7886/
Error extracting text from http://www.reuters.com/article/us-usa-debt-moody-s-idUSKBN16K09L?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-debt-moody-s-idUSKBN16K09L?il=0


Processing URLs:  49%|████▉     | 490/1000 [18:38<06:18,  1.35it/s]

Error extracting text from http://www.nytimes.com/2016/12/24/world/asia/pakistan-israel-khawaja-asif-fake-news-nuclear.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/24/world/asia/pakistan-israel-khawaja-asif-fake-news-nuclear.html


Processing URLs:  49%|████▉     | 491/1000 [18:39<06:38,  1.28it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57331#.WY6-XFGGOUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57331#.WY6-XFGGOUk
Error extracting text from http://www.reuters.com/article/us-iran-oil-idUSKBN19X0SL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-idUSKBN19X0SL


Processing URLs:  49%|████▉     | 494/1000 [18:41<05:21,  1.57it/s]

Error extracting text from http://news.nationalpost.com/news/canada/canadian-politics/john-ivison-liberals-looking-to-implement-roadside-tests-for-pot-smokers-says-trudeaus-marijuana-czar: 403 Client Error: Forbidden for url: https://nationalpost.com/category/news//
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-surplus-idUSKBN17N1G9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-surplus-idUSKBN17N1G9


Processing URLs:  50%|████▉     | 498/1000 [18:49<12:15,  1.47s/it]

URL filtered: https://www.fool.com.au/2021/06/18/is-facebook-about-to-finally-launch-its-cryptocurrency/


Processing URLs:  50%|█████     | 502/1000 [18:56<13:44,  1.66s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/uss-abraham-lincoln-the-navys-first-carrier-equipped-carry-20600: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/uss-abraham-lincoln-the-navys-first-carrier-equipped-carry-20600


Processing URLs:  50%|█████     | 505/1000 [18:58<08:52,  1.08s/it]

Error extracting text from http://electionlawblog.org/: 403 Client Error: Forbidden for url: http://electionlawblog.org/


Processing URLs:  51%|█████     | 506/1000 [19:59<2:30:39, 18.30s/it]

Error extracting text from http://www.aa.com.tr/en/turkey/russia-bombs-turkish-aid-agency-bakery-in-syria/483161: HTTPConnectionPool(host='www.aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  51%|█████     | 509/1000 [20:02<58:31,  7.15s/it]  

Error extracting text from http://ekurd.net/kurdish-control-manbij-syria-2016-08-06: 403 Client Error: Forbidden for url: https://ekurd.net/kurdish-control-manbij-syria-2016-08-06


Processing URLs:  51%|█████     | 511/1000 [20:04<32:15,  3.96s/it]

URL filtered: https://www.youtube.com/watch?v=BBvIweCIgwk


Processing URLs:  51%|█████▏    | 514/1000 [20:06<16:27,  2.03s/it]

Error extracting text from http://english.alarabiya.net/en/webtv/programs/special-interview/2016/04/25/Deputy-Crown-Prince-This-is-the-Saudi-vision-2030.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/webtv/programs/special-interview/2016/04/25/Deputy-Crown-Prince-This-is-the-Saudi-vision-2030.html


Processing URLs:  52%|█████▏    | 516/1000 [20:09<14:01,  1.74s/it]

Error extracting text from https://www.nato.int/cps/en/natohq/topics_52121.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/topics_52121.htm


Processing URLs:  52%|█████▏    | 518/1000 [21:10<1:54:27, 14.25s/it]

Error extracting text from http://www.historycommons.org/timeline.jsp?timeline=neoconinfluence&amp;neoconinfluence_prominent_neoconservatives=neoconinfluence_paul_wolfowitz: HTTPConnectionPool(host='www.historycommons.org', port=80): Max retries exceeded with url: /timeline.jsp?timeline=neoconinfluence&amp;neoconinfluence_prominent_neoconservatives=neoconinfluence_paul_wolfowitz (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3042e41d0>, 'Connection to www.historycommons.org timed out. (connect timeout=60)'))


Processing URLs:  52%|█████▏    | 520/1000 [21:12<1:07:17,  8.41s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-missiles-idUSKCN0VP2VT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-missiles-idUSKCN0VP2VT


Processing URLs:  52%|█████▏    | 523/1000 [21:18<36:31,  4.59s/it]  

Error extracting text from http://tass.ru/en/economy/838503: 404 Client Error: Not Found for url: https://tass.ru/en/economy/838503


Processing URLs:  52%|█████▎    | 525/1000 [21:19<20:20,  2.57s/it]

Error extracting text from http://nyti.ms/1KNN37i: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/26/world/asia/north-korea-sanctions.html
Error extracting text from http://www.reuters.com/article/us-venezuela-india-oil-exclusive/exclusive-venezuelas-pdvsa-misses-debt-payments-to-indias-top-oil-producer-idUSKBN1D90L4SDN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-india-oil-exclusive/exclusive-venezuelas-pdvsa-misses-debt-payments-to-indias-top-oil-producer-idUSKBN1D90L4SDN


Processing URLs:  53%|█████▎    | 528/1000 [21:22<13:04,  1.66s/it]



Processing URLs:  53%|█████▎    | 532/1000 [22:34<2:34:09, 19.76s/it]

Error extracting text from http://www.itv.com/news/border/2017-03-13/second-independence-referendum-what-you-need-to-know/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  53%|█████▎    | 534/1000 [22:34<1:18:10, 10.07s/it]

Error extracting text from http://adage.com/article/media/ikea-sponsoring-time-s-experiment-social-content/296558/: 403 Client Error: Forbidden for url: https://adage.com/article/media/ikea-sponsoring-time-s-experiment-social-content/296558/


Processing URLs:  54%|█████▎    | 537/1000 [22:40<36:45,  4.76s/it]  

Error extracting text from https://www.swarmproject.info/about: 436 Client Error: status code 436 for url: https://www.swarmproject.info/about


Processing URLs:  54%|█████▍    | 539/1000 [22:47<31:25,  4.09s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-02-10/spd-stands-for-solid-finances-senior-party-member-tells-spiegel


Processing URLs:  54%|█████▍    | 541/1000 [22:48<18:35,  2.43s/it]

Error extracting text from http://sustainabletransport.org/one-goal-two-approaches-electric-buses-in-china-and-german: 404 Client Error: Not Found for url: https://sustainabletransport.org/one-goal-two-approaches-electric-buses-in-china-and-german


Processing URLs:  54%|█████▍    | 544/1000 [22:55<16:19,  2.15s/it]

Error extracting text from http://www.newsweek.com/bernie-sanders-i-would-withdraw-merrick-garland-supreme-court-nominee-438537: 403 Client Error: Forbidden for url: https://www.newsweek.com/bernie-sanders-i-would-withdraw-merrick-garland-supreme-court-nominee-438537
Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/04/14/2-reasons-venezuela-wont-default/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/04/14/2-reasons-venezuela-wont-default/
URL filtered: https://variety.com/2020/digital/news/facebook-permanent-work-from-home-1234613548/


Processing URLs:  55%|█████▍    | 547/1000 [22:56<08:30,  1.13s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2015/12/14/77/0301000000AEN20151214002951315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  55%|█████▌    | 550/1000 [22:58<06:54,  1.08it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-10/opec-attains-record-90-of-output-cuts-as-demand-grows-iea-says


Processing URLs:  55%|█████▌    | 554/1000 [23:00<05:16,  1.41it/s]

Error extracting text from http://www.businessinsider.com/iraqi-forces-mosul-isis-2016-11: 404 Client Error: Not Found for url: https://www.businessinsider.com/iraqi-forces-mosul-isis-2016-11


Processing URLs:  56%|█████▌    | 556/1000 [23:04<08:17,  1.12s/it]

Error extracting text from http://www.wsj.com/articles/germany-says-bank-risks-must-be-cut-before-saver-deposit-scheme-agreed-1466157655?mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germany-says-bank-risks-must-be-cut-before-saver-deposit-scheme-agreed-1466157655?mg=id-wsj


Processing URLs:  56%|█████▌    | 558/1000 [23:09<11:46,  1.60s/it]

Error extracting text from http://csbcorrespondent.com/market-update-october-30-2015: 403 Client Error: Forbidden for url: https://www.southstatecorrespondent.com


Processing URLs:  56%|█████▌    | 562/1000 [23:13<08:23,  1.15s/it]

Error extracting text from https://www.yahoo.com/news/trump-kaine-head-wisconsin-elections-waning-days-144734003.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/trump-kaine-head-wisconsin-elections-waning-days-144734003.html
URL filtered: https://www.theverge.com/2021/6/3/22474738/facebook-ending-political-figure-exemption-moderation-policy


Processing URLs:  56%|█████▋    | 564/1000 [23:14<05:24,  1.35it/s]

Error extracting text from http://www.scotsman.com/news/politics/could-a-second-scottish-independence-referendum-be-called-1-4186826: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/could-a-second-scottish-independence-referendum-be-called-1-4186826


Processing URLs:  56%|█████▋    | 565/1000 [23:15<05:51,  1.24it/s]

Error extracting text from http://www.nrc.no/?did=9214715: 404 Client Error: Not Found for url: https://www.nrc.no/?did=9214715


Processing URLs:  57%|█████▋    | 569/1000 [23:31<21:14,  2.96s/it]

URL filtered: https://www.youtube.com/results?search_query=donald+trump+speech


Processing URLs:  58%|█████▊    | 576/1000 [23:41<13:36,  1.93s/it]

Error extracting text from http://en.trend.az/business/energy/2458629.html: 404 Client Error: Not Found for url: https://www.trend.az/business/energy/2458629.html


Processing URLs:  58%|█████▊    | 578/1000 [23:41<07:29,  1.07s/it]

Error extracting text from https://www.predictit.org/Contract/523/Will-Joe-Biden-run-for-president-in-2016#data1: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/523/Will-Joe-Biden-run-for-president-in-2016#data1
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-aleppo-idUSKCN0X70CW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-aleppo-idUSKCN0X70CW


Processing URLs:  58%|█████▊    | 579/1000 [23:41<05:32,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-israel-palestinians-hamas-netanyahu-idUSKBN1830YX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-hamas-netanyahu-idUSKBN1830YX
URL filtered: https://www.bloomberg.com/news/articles/2017-06-20/jacob-zuma-blamed-for-south-africa-s-woes


Processing URLs:  58%|█████▊    | 583/1000 [23:43<03:32,  1.97it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/some-bondholders-have-received-late-pdvsa-bond-payment-sources-idUSKBN1D15Y1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/some-bondholders-have-received-late-pdvsa-bond-payment-sources-idUSKBN1D15Y1


Processing URLs:  58%|█████▊    | 584/1000 [23:43<03:01,  2.29it/s]

Error extracting text from https://www.nytimes.com/2021/08/16/arts/design/nyc-museums-vaccine-required.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/16/arts/design/nyc-museums-vaccine-required.html


Processing URLs:  59%|█████▉    | 588/1000 [23:46<03:33,  1.93it/s]

URL filtered: https://twitter.com/ThabisoSithole/status/963059085284466690
Error extracting text from http://www.reuters.com/article/us-venezuela-india-oil-exclusive/exclusive-venezuelas-pdvsa-misses-debt-payments-to-indias-top-oil-producer-idUSKBN1D90L4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-india-oil-exclusive/exclusive-venezuelas-pdvsa-misses-debt-payments-to-indias-top-oil-producer-idUSKBN1D90L4


Processing URLs:  59%|█████▉    | 589/1000 [23:47<05:16,  1.30it/s]

Error extracting text from https://www.38north.org/2018/01/no17factory180130/: 403 Client Error: Forbidden for url: https://www.38north.org/2018/01/no17factory180130/


Processing URLs:  59%|█████▉    | 591/1000 [23:50<05:50,  1.17it/s]

Error extracting text from http://www.huffingtonpost.com.au/2016/08/08/erdogan-threatens-eu-turkey-refugee-deal-could-collapse/: 404 Client Error: Not Found for url: https://www.huffpost.com/archive/au/entry/2016/08/08/erdogan-threatens-eu-turkey-refugee-deal-could-collapse/
Error extracting text from https://www.nato.int/nato-welcome/index.html: 403 Client Error: Forbidden for url: https://www.nato.int/nato-welcome/index.html


Processing URLs:  60%|█████▉    | 596/1000 [23:59<09:16,  1.38s/it]

Error extracting text from http://www.rsaconference.com/writable/presentations/file_upload/ht1-403.pdf: 403 Client Error: Forbidden for url: https://www.rsaconference.com/library#q=ht1-403.pdf


Processing URLs:  60%|█████▉    | 597/1000 [24:00<09:06,  1.35s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/22334-parliament-rejects-ghanis-decree-on-caretaker-law: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/22334-parliament-rejects-ghanis-decree-on-caretaker-law


Processing URLs:  60%|█████▉    | 598/1000 [24:01<08:12,  1.23s/it]

Error extracting text from http://pasj.oxfordjournals.org/content/65/3/49.full.pdf+html: 403 Client Error: Forbidden for url: http://pasj.oxfordjournals.org/content/65/3/49.full.pdf+html


Processing URLs:  60%|██████    | 602/1000 [24:07<08:35,  1.30s/it]

URL filtered: https://www.youtube.com/watch?v=cE4lpSFNFUE


Processing URLs:  60%|██████    | 604/1000 [24:09<07:57,  1.21s/it]

Error extracting text from http://www.sabc.co.za/news/a/e8f570804c0c960d87588f86af38e8be/Moodys-wont-downgrade-SAs-economy-to-junk-status:-Economist---20160316: 404 Client Error: Not Found for url: https://www.sabc.co.za:443/news/a/e8f570804c0c960d87588f86af38e8be/Moodys-wont-downgrade-SAs-economy-to-junk-status:-Economist---20160316


Processing URLs:  60%|██████    | 605/1000 [24:11<08:04,  1.23s/it]

Error extracting text from http://www.securitycouncilreport.org/south-sudan/: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/south-sudan/
URL filtered: https://www.youtube.com/watch?v=Kas0tIxDvrg


Processing URLs:  61%|██████    | 607/1000 [24:12<06:46,  1.03s/it]

Error extracting text from http://www.lseg.com/sites/default/files/content/documents/20170105%20Masala%20Bonds%20Presentation_1.pdf: 404 Client Error: Not Found for url: https://www.lseg.com/sites/default/files/content/documents/20170105%20masala%20bonds%20presentation_1.pdf


Processing URLs:  61%|██████    | 611/1000 [24:16<05:33,  1.17it/s]

Error extracting text from http://nyti.ms/RuF8A3: 403 Client Error: Forbidden for url: http://www.nytimes.com/2012/12/15/us/politics/redistricting-helped-republicans-hold-onto-congress.html?smid=tw-share
Error extracting text from http://www.nasdaq.com/earnings/report/goog: 403 Client Error: Forbidden for url: http://www.nasdaq.com/earnings/report/goog
Error extracting text from http://www.nytimes.com/aponline/2015/11/03/world/europe/ap-eu-russia-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/11/03/world/europe/ap-eu-russia-syria.html


Processing URLs:  61%|██████▏   | 613/1000 [24:21<08:55,  1.38s/it]

Error extracting text from https://www.neweurope.eu/article/poland-no-point-dialogue-venice-commission/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/poland-no-point-dialogue-venice-commission/


Processing URLs:  62%|██████▏   | 615/1000 [24:23<06:52,  1.07s/it]

Error extracting text from http://finviz.com/forex_charts.ashx?t=EURGBP&amp;tf=mo: 403 Client Error: Forbidden for url: https://finviz.com/forex_charts.ashx?t=EURGBP&amp;tf=mo


Processing URLs:  62%|██████▏   | 619/1000 [24:26<05:27,  1.16it/s]

Error extracting text from http://data.unhcr.org/mediterranean/regional.html: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/regional.html


Processing URLs:  62%|██████▏   | 622/1000 [24:31<08:37,  1.37s/it]

URL filtered: http://www.janes.com/article/59305/indonesia-to-deploy-skyshield-air-defence-system-in-south-china-sea?utm_content=buffer1cd9d&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  62%|██████▎   | 625/1000 [24:33<05:04,  1.23it/s]

Error extracting text from http://warisboring.com/articles/these-are-the-wars-that-will-rage-in-africa-in-2016/?mc_cid=ecf95c6678&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/these-are-the-wars-that-will-rage-in-africa-in-2016/?mc_cid=ecf95c6678&amp;mc_eid=0467f21653


Processing URLs:  63%|██████▎   | 629/1000 [24:38<05:57,  1.04it/s]

Error extracting text from http://www.nytimes.com/2015/12/26/world/middleeast/zahran-alloush-syria-rebel-leader-reported-killed.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/26/world/middleeast/zahran-alloush-syria-rebel-leader-reported-killed.html?_r=0


Processing URLs:  63%|██████▎   | 633/1000 [24:42<04:54,  1.25it/s]

Error extracting text from http://www.nytimes.com/2015/10/03/business/economy/jobs-report-hiring-unemployment-wages-fed-rates.html?emc=edit_th_20151003&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/03/business/economy/jobs-report-hiring-unemployment-wages-fed-rates.html?emc=edit_th_20151003&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  65%|██████▍   | 647/1000 [25:03<06:56,  1.18s/it]

Error extracting text from https://www.army.mil/article/196311/active_army_cyber_teams_fully_operational_a_year_plus_ahead_of_schedule: 403 Client Error: Forbidden for url: https://www.army.mil/article/196311/active_army_cyber_teams_fully_operational_a_year_plus_ahead_of_schedule


Processing URLs:  65%|██████▍   | 649/1000 [25:06<07:43,  1.32s/it]

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/gunmen-attack-isis-headquarters-in-mosul-34-casualties/amp/?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/gunmen-attack-isis-headquarters-in-mosul-34-casualties/amp/


Processing URLs:  65%|██████▌   | 650/1000 [25:06<05:56,  1.02s/it]

Error extracting text from https://www.oddschecker.com/politics/british-politics/boris-johnson-exit-date: 403 Client Error: Forbidden for url: https://www.oddschecker.com/politics/british-politics/boris-johnson-exit-date


Processing URLs:  65%|██████▌   | 652/1000 [25:08<05:47,  1.00it/s]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3209653/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3209653/


Processing URLs:  65%|██████▌   | 654/1000 [25:12<08:09,  1.42s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-02-02/merkel-sees-serious-gaps-to-bridge-in-german-coalition-endgame


Processing URLs:  66%|██████▌   | 656/1000 [25:12<04:59,  1.15it/s]

Error extracting text from https://theconversation.com/qanda-thomas-piketty-responds-to-surprise-greek-election-result-47873: 403 Client Error: Forbidden for url: https://theconversation.com/qanda-thomas-piketty-responds-to-surprise-greek-election-result-47873


Processing URLs:  66%|██████▌   | 660/1000 [25:14<02:54,  1.95it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57335#.WYyDYFWGOUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57335#.WYyDYFWGOUk
URL filtered: https://twitter.com/GalloVOA/status/1483921671778185225
Error extracting text from https://english.alarabiya.net/News/world/2022/02/28/-Very-probable-that-Switzerland-will-freeze-Russian-assets-President: 403 Client Error: Forbidden for url: https://english.alarabiya.net/News/world/2022/02/28/-Very-probable-that-Switzerland-will-freeze-Russian-assets-President


Processing URLs:  66%|██████▋   | 663/1000 [25:16<02:57,  1.89it/s]

Error extracting text from https://www.timesofisrael.com/us-official-says-gulf-remains-in-iran-nuclear-talks-but-still-bullish-on-deal/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/us-official-says-gulf-remains-in-iran-nuclear-talks-but-still-bullish-on-deal/


Processing URLs:  67%|██████▋   | 666/1000 [25:19<04:20,  1.28it/s]

Error extracting text from http://www.ibtimes.co.uk/iraqi-government-running-out-money-combat-isis-terror-group-1549209: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/iraqi-government-running-out-money-combat-isis-terror-group-1549209
Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-contract-exclusive-idUSKCN1112D7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-contract-exclusive-idUSKCN1112D7


Processing URLs:  68%|██████▊   | 678/1000 [25:39<09:29,  1.77s/it]

Error extracting text from https://warontherocks.com/2021/06/does-iran-actually-want-to-rejoin-the-nuclear-deal/: 403 Client Error: Forbidden for url: https://warontherocks.com/2021/06/does-iran-actually-want-to-rejoin-the-nuclear-deal/


Processing URLs:  68%|██████▊   | 679/1000 [25:41<10:12,  1.91s/it]

URL filtered: https://twitter.com/DominicRaab/status/1422280080752091147


Processing URLs:  68%|██████▊   | 681/1000 [25:42<06:06,  1.15s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/gop-primaries/259469-carson-thanks-biased-media-for-35m-fundraising-haul: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/gop-primaries/259469-carson-thanks-biased-media-for-35m-fundraising-haul/


Processing URLs:  68%|██████▊   | 682/1000 [25:47<10:49,  2.04s/it]

Error extracting text from http://theiranproject.com/blog/2015/11/26/a-look-at-iranian-newspaper-front-pages-on-nov-26-2/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=a-look-at-iranian-newspaper-front-pages-on-nov-26-2


Processing URLs:  68%|██████▊   | 683/1000 [25:47<09:04,  1.72s/it]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol-61-no-4/a-call-for-humility.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol-61-no-4/a-call-for-humility.html


Processing URLs:  69%|██████▊   | 686/1000 [25:52<08:03,  1.54s/it]

Error extracting text from http://in.reuters.com/article/2015/12/02/opec-meeting-non-opec-idINKBN0TL2EV20151202: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
URL filtered: https://www.youtube.com/watch?v=70RmF0rPj9o


Processing URLs:  69%|██████▉   | 689/1000 [25:54<05:41,  1.10s/it]

Error extracting text from http://aranews.net/2016/07/sweden-increase-support-kurdish-peshmerga-part-anti-isis-campaign/: 404 Client Error: Not Found for url: http://aranews.net/2016/07/sweden-increase-support-kurdish-peshmerga-part-anti-isis-campaign/


Processing URLs:  69%|██████▉   | 690/1000 [25:58<08:46,  1.70s/it]

Error extracting text from https://reut.rs/3pfHbsC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1A20BR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1A20BR


Processing URLs:  69%|██████▉   | 694/1000 [27:11<1:27:42, 17.20s/it]

Error extracting text from http://www.usnews.com/news/articles/2014/03/11/a-brokered-gop-convention-in…: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  70%|███████   | 700/1000 [27:17<16:20,  3.27s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-eu-arguments-idUSKCN18A0D7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-arguments-idUSKCN18A0D7
URL filtered: http://www.bloomberg.com/news/articles/2015-09-22/brazil-s-currency-tumbles-to-record-on-pessimism-over-budget


Processing URLs:  70%|███████   | 704/1000 [27:21<07:55,  1.61s/it]

Error extracting text from http://www.cdm.me/english/kacin-london-and-ankara-wont-stall-the-ratification: 403 Client Error: Forbidden for url: https://www.cdm.me/english/kacin-london-and-ankara-wont-stall-the-ratification


Processing URLs:  71%|███████   | 706/1000 [27:25<08:55,  1.82s/it]

Error extracting text from https://www.clickondetroit.com/news/international/the-chaos-in-venezuela-could-get-much-: 404 Client Error: Not Found for url: https://www.clickondetroit.com/news/international/the-chaos-in-venezuela-could-get-much-/


Processing URLs:  71%|███████   | 708/1000 [28:31<1:34:59, 19.52s/it]

Error extracting text from https://www.journal-topics.com/articles/moylan-to-back-era-ratification/: HTTPSConnectionPool(host='www.journal-topics.com', port=443): Max retries exceeded with url: /articles/moylan-to-back-era-ratification/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fedb4980>, 'Connection to www.journal-topics.com timed out. (connect timeout=60)'))


Processing URLs:  71%|███████   | 709/1000 [28:31<1:07:42, 13.96s/it]

Error extracting text from http://frankwilczek.com/: 406 Client Error: Not Acceptable for url: http://frankwilczek.com/


Processing URLs:  71%|███████   | 711/1000 [28:33<35:26,  7.36s/it]  

Error extracting text from http://egc2015.cz/results/mainwall: 404 Client Error: Not Found for url: http://egc2015.cz/results/mainwall


Processing URLs:  72%|███████▏  | 715/1000 [28:38<12:44,  2.68s/it]

Error extracting text from https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?crid=3DTB6ZAB81TJV&amp;dchild=1&amp;keywords=dictators+handbook&amp;qid=1623714626&amp;sprefix=dictators%2Caps%2C186&amp;sr=8-1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?crid=3DTB6ZAB81TJV&amp;dchild=1&amp;keywords=dictators+handbook&amp;qid=1623714626&amp;sprefix=dictators%2Caps%2C186&amp;sr=8-1


Processing URLs:  72%|███████▏  | 717/1000 [28:42<10:14,  2.17s/it]

Error extracting text from http://www.skynews.com.au/news/world/mideast/2016/11/07/islamic-state-slow-mosul-offensive.html: 404 Client Error: Not Found for url: https://www.skynews.com.au/news/world/mideast/2016/11/07/islamic-state-slow-mosul-offensive.html?nk=e1187372b0a26f7afe7b0c100cda765c-1706844926


Processing URLs:  72%|███████▏  | 722/1000 [29:47<45:26,  9.81s/it]  

Error extracting text from https://www.nytimes.com/2021/03/16/technology/amazon-unions-virginia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/16/technology/amazon-unions-virginia.html
Error extracting text from http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347875A1.pdf: 403 Client Error: Forbidden for url: http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347875A1.pdf


Processing URLs:  72%|███████▎  | 725/1000 [30:49<1:40:24, 21.91s/it]

Error extracting text from https://archive.is/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  73%|███████▎  | 728/1000 [30:56<40:38,  8.97s/it]  

URL filtered: https://twitter.com/kaltoons?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  73%|███████▎  | 731/1000 [30:59<18:53,  4.21s/it]

Error extracting text from https://www.washingtontimes.com/news/2018/feb/18/the-new-great-game-in-syria/: 403 Client Error: Forbidden for url: https://www.washingtontimes.com/news/2018/feb/18/the-new-great-game-in-syria/
Error extracting text from http://www.france24.com/en/20160301-iraq-major-anti-islamic-state-group-operation-north-baghdad: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160301-iraq-major-anti-islamic-state-group-operation-north-baghdad


Processing URLs:  74%|███████▎  | 737/1000 [31:04<05:51,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuela-creditors-skeptical-despite-pledges-on-caracas-meeting-idUSKBN1D9291: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuela-creditors-skeptical-despite-pledges-on-caracas-meeting-idUSKBN1D9291
Error extracting text from https://defence-blog.com/news/army/russia-deploys-iskander-systems-with-extended-range-missiles-to-ukrainian-border.html: 403 Client Error: Forbidden for url: https://defence-blog.com/news/army/russia-deploys-iskander-systems-with-extended-range-missiles-to-ukrainian-border.html


Processing URLs:  74%|███████▍  | 739/1000 [32:04<55:20, 12.72s/it]  

Error extracting text from http://www.usnews.com/opinion/world-report/articles/2016-12-30/donald-trump-shouldnt-try-for-a-grand-bargain-with-russia: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)
Error extracting text from https://news.usni.org/2017/02/10/manila-predicts-beijing-will-build-base-on-scarborough-shoal?utm_source=USNI+News&amp;utm_campaign=80663826d9-USNI_NEWS_DAILY&amp;utm_medium=email&amp;utm_term=0_0dd4a1450b-80663826d9-231045397&amp;mc_cid=80663826d9&amp;mc_eid=2394c94b5b: 403 Client Error: Forbidden for url: https://news.usni.org/2017/02/10/manila-predicts-beijing-will-build-base-on-scarborough-shoal?utm_source=USNI+News&amp;utm_campaign=80663826d9-USNI_NEWS_DAILY&amp;utm_medium=email&amp;utm_term=0_0dd4a1450b-80663826d9-231045397&amp;mc_cid=80663826d9&amp;mc_eid=2394c94b5b


Processing URLs:  74%|███████▍  | 740/1000 [32:05<39:53,  9.21s/it]

Error extracting text from http://usat.ly/1LkCUyq&quot: 404 Client Error: Not Found for url: http://usat.ly/1LkCUyq&quot


Processing URLs:  74%|███████▍  | 743/1000 [32:18<25:28,  5.95s/it]

Error extracting text from https://www.espn.com/mlb/story/_/id/33260279/mlb-commissioner-rob-manfred-losing-games-lockout-disastrous-outcome: 403 Client Error: Forbidden for url: https://www.espn.com/mlb/story/_/id/33260279/mlb-commissioner-rob-manfred-losing-games-lockout-disastrous-outcome


Processing URLs:  74%|███████▍  | 745/1000 [32:24<18:29,  4.35s/it]

Error extracting text from http://www.cepr.net/blogs/beat-the-press/washington-post-ed-board-federal-reserve-board-cultists: 404 Client Error: Not Found for url: https://www.cepr.net/blogs/beat-the-press/washington-post-ed-board-federal-reserve-board-cultists
URL filtered: https://twitter.com/Waymo/status/846438598421336064


Processing URLs:  75%|███████▍  | 748/1000 [32:30<12:19,  2.94s/it]

Error extracting text from http://www.bignewsnetwork.com/news/239370491/uk-should-hold-referendum-on-eu-membership-soon-says-imf-chief: 403 Client Error: Forbidden for url: https://www.bignewsnetwork.com/news/239370491/uk-should-hold-referendum-on-eu-membership-soon-says-imf-chief


Processing URLs:  75%|███████▍  | 749/1000 [32:33<11:45,  2.81s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-may-6-2016: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-may-6-2016


Processing URLs:  76%|███████▌  | 756/1000 [32:47<07:13,  1.78s/it]

Error extracting text from https://www.porttechnology.org/news/panama_canal_passes_lock_gate_test: 403 Client Error: Forbidden for url: https://www.porttechnology.org/news/panama_canal_passes_lock_gate_test
Error extracting text from http://www.nytimes.com/2015/11/12/world/middleeast/on-the-road-in-syria-struggle-all-around.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/12/world/middleeast/on-the-road-in-syria-struggle-all-around.html?_r=0


Processing URLs:  76%|███████▌  | 759/1000 [33:21<31:24,  7.82s/it]

Error extracting text from http://38north.org/2017/04/jdelury040417/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  76%|███████▌  | 762/1000 [33:24<13:07,  3.31s/it]

Error extracting text from https://www.reuters.com/article/us-opec-meeting/opec-russia-agree-oil-cut-extension-to-end-of-2018-idUSKBN1DU0WW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting/opec-russia-agree-oil-cut-extension-to-end-of-2018-idUSKBN1DU0WW


Processing URLs:  77%|███████▋  | 767/1000 [33:29<05:23,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-china-defence-navy-idUSKBN16500P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-defence-navy-idUSKBN16500P


Processing URLs:  77%|███████▋  | 768/1000 [33:32<07:12,  1.87s/it]

Error extracting text from https://bit.ly/2PHhSDs: 404 Client Error: Not Found for url: https://salten.cz/2021/03/07/two-polls-find-scots-would-vote-against-independence/


Processing URLs:  77%|███████▋  | 769/1000 [33:34<06:48,  1.77s/it]

Error extracting text from http://aaj.tv/2016/01/iran-reformists-demand-review-after-candidates-rejected/: 404 Client Error: Not Found for url: https://www.aaj.tv/2016/01/iran-reformists-demand-review-after-candidates-rejected/


Processing URLs:  77%|███████▋  | 770/1000 [33:34<05:38,  1.47s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/china-strikes-back-the-south-china-sea-adiz-style-14234: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/china-strikes-back-the-south-china-sea-adiz-style-14234


Processing URLs:  78%|███████▊  | 776/1000 [33:45<05:42,  1.53s/it]

Error extracting text from http://www.reuters.com/article/2015/11/28/us-northkorea-missile-idUSKBN0TH09M20151128#J2TT0LIPb8YuzbGW.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/28/us-northkorea-missile-idUSKBN0TH09M20151128#J2TT0LIPb8YuzbGW.97


Processing URLs:  78%|███████▊  | 777/1000 [33:47<05:22,  1.45s/it]

Error extracting text from https://tnsr.org/roundtable/policy-roundtable-close-look-2018-national-defense-strategy/: 403 Client Error: Forbidden for url: https://tnsr.org/roundtable/policy-roundtable-close-look-2018-national-defense-strategy/


Processing URLs:  78%|███████▊  | 778/1000 [33:47<04:20,  1.17s/it]

Error extracting text from http://thehill.com/homenews/news/362252-how-abortion-could-tip-the-scales-in-alabama: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/362252-how-abortion-could-tip-the-scales-in-alabama/


Processing URLs:  78%|███████▊  | 780/1000 [33:49<03:17,  1.11it/s]

Error extracting text from https://www.middleeastmonitor.com/20170423-poll-netanyahus-party-most-popular/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20170423-poll-netanyahus-party-most-popular/


Processing URLs:  78%|███████▊  | 783/1000 [34:04<10:27,  2.89s/it]

Error extracting text from http://www.reuters.com/article/us-philippines-china-arbitration-idUSKCN0SN26320151030: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-china-arbitration-idUSKCN0SN26320151030


Processing URLs:  78%|███████▊  | 785/1000 [34:04<06:07,  1.71s/it]

Error extracting text from http://thehill.com/homenews/house/253744-house-conservatives-warm-to-mccarthy-as-speaker: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/253744-house-conservatives-warm-to-mccarthy-as-speaker/


Processing URLs:  79%|███████▊  | 787/1000 [34:11<07:53,  2.23s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/tourist-arrivals: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/tourist-arrivals


Processing URLs:  79%|███████▉  | 788/1000 [34:14<08:51,  2.51s/it]

Error extracting text from http://www.ethnologue.com/country/US/status: 404 Client Error: Not Found for url: https://www.ethnologue.com/country/US/status


Processing URLs:  79%|███████▉  | 790/1000 [34:15<05:04,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN15E2SW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-tillerson-idUSKBN15E2SW
Error extracting text from http://blogs.wsj.com/brussels/2015/09/24/will-montenegro-become-natos-29th-member/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/brussels/2015/09/24/will-montenegro-become-natos-29th-member/


Processing URLs:  79%|███████▉  | 792/1000 [34:16<03:10,  1.09it/s]

Error extracting text from http://www.carscoops.com/2015/01/toyota-gets-1500-orders-for-mirai-in.html: 403 Client Error: Forbidden for url: https://www.carscoops.com:443/2015/01/toyota-gets-1500-orders-for-mirai-in.html


Processing URLs:  80%|███████▉  | 795/1000 [34:20<04:28,  1.31s/it]

Error extracting text from http://thephilippinestar.ph/articles/2016-03-02/news/un-security-council-to-vote-on-new-nokor-sanctions/142318: HTTPConnectionPool(host='thephilippinestar.ph', port=80): Max retries exceeded with url: /articles/2016-03-02/news/un-security-council-to-vote-on-new-nokor-sanctions/142318 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306e0d0a0>: Failed to resolve 'thephilippinestar.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  80%|████████  | 801/1000 [34:32<05:11,  1.57s/it]

URL filtered: https://en.wikipedia.org/wiki/List_of_most-subscribed_YouTube_channels#Most-subscribed_channels


Processing URLs:  80%|████████  | 803/1000 [34:34<04:03,  1.24s/it]

URL filtered: https://www.youtube.com/watch?v=noU7KeA5PWY


Processing URLs:  81%|████████  | 809/1000 [34:42<04:16,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16122B?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16122B?il=0


Processing URLs:  81%|████████  | 810/1000 [34:44<04:48,  1.52s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-september-29-2015: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-september-29-2015


Processing URLs:  81%|████████  | 812/1000 [34:46<03:37,  1.16s/it]

Error extracting text from http://www.france24.com/en/20151130-nato-set-invite-montenegro-join-alliance-sources: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151130-nato-set-invite-montenegro-join-alliance-sources


Processing URLs:  81%|████████▏ | 813/1000 [34:46<02:53,  1.08it/s]

Error extracting text from https://seekingalpha.com/amp/article/4463214-amazon-time-to-spin-off-aws: 403 Client Error: Forbidden for url: https://seekingalpha.com/amp/article/4463214-amazon-time-to-spin-off-aws


Processing URLs:  82%|████████▏ | 815/1000 [34:50<03:57,  1.29s/it]

Error extracting text from http://www.phac-aspc.gc.ca/tmp-pmv/notices-avis/index-eng.php: 404 Client Error: Not Found for url: http://www.phac-aspc.gc.ca/tmp-pmv/notices-avis/index-eng.php


Processing URLs:  82%|████████▏ | 817/1000 [34:53<04:35,  1.51s/it]

URL filtered: https://www.youtube.com/watch?v=W_tbrlkDJsc


Processing URLs:  82%|████████▏ | 821/1000 [34:58<04:02,  1.35s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=weekly&amp;id=jurassicpark4.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=weekly&amp;id=jurassicpark4.htm


Processing URLs:  82%|████████▏ | 823/1000 [35:00<03:31,  1.19s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-idUKKCN0VC0XW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  82%|████████▎ | 825/1000 [35:02<02:51,  1.02it/s]

Error extracting text from http://www.businessinsider.com.au/us-hacker-army-stuxnet-2016-7: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/us-hacker-army-stuxnet-2016-7


Processing URLs:  83%|████████▎ | 828/1000 [35:06<04:09,  1.45s/it]

Error extracting text from http://thefederalist.com/2016/05/23/the-iran-deal-wasnt-about-nukes-at-all/: 403 Client Error: Forbidden for url: http://thefederalist.com/2016/05/23/the-iran-deal-wasnt-about-nukes-at-all/


Processing URLs:  83%|████████▎ | 830/1000 [35:10<04:35,  1.62s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2016/05/24/gop_senators_praise_corker_as_potential_trump_vp_130654.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2016/05/24/gop_senators_praise_corker_as_potential_trump_vp_130654.html


Processing URLs:  83%|████████▎ | 831/1000 [35:11<03:27,  1.23s/it]

Error extracting text from https://www.wsj.com/articles/iran-nuclear-deal-biden-blinken-jcpoa-ebrahim-raisi-khamenei-11628539180: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-nuclear-deal-biden-blinken-jcpoa-ebrahim-raisi-khamenei-11628539180


Processing URLs:  84%|████████▍ | 840/1000 [35:25<03:54,  1.46s/it]

Error extracting text from http://www.middle-east-online.com/english/?id=75449: 404 Client Error: Not Found for url: https://www.middle-east-online.com/english/?id=75449


Processing URLs:  84%|████████▍ | 842/1000 [35:27<03:38,  1.38s/it]

Error extracting text from http://v.intr: HTTPConnectionPool(host='v.intr', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306e0f8f0>: Failed to resolve 'v.intr' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  85%|████████▍ | 847/1000 [35:32<03:07,  1.23s/it]

Error extracting text from http://www.cissm.umd.edu/sites/default/files/Iranian%20Attitudes%20in%20Advance%20of%20the%20Parliamentary%20Elections%20-%20020116%20-%20FINAL%20-%20sm.pdf: 404 Client Error: Not Found for url: https://www.cissm.umd.edu/sites/default/files/Iranian%20Attitudes%20in%20Advance%20of%20the%20Parliamentary%20Elections%20-%20020116%20-%20FINAL%20-%20sm.pdf


Processing URLs:  85%|████████▍ | 849/1000 [35:33<02:19,  1.08it/s]

Error extracting text from http://english.ahram.org.eg/NewsContent/1/64/216540/Egypt/Politics-/Germany-resumes-direct-flights-to-Sharm-ElSheikh-M.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/1/64/216540/Egypt/Politics-/Germany-resumes-direct-flights-to-Sharm-ElSheikh-M.aspx


Processing URLs:  85%|████████▌ | 851/1000 [35:34<01:21,  1.82it/s]

Error extracting text from http://allafrica.com/stories/201602250881.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201602250881.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe74c320>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0ZK06E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKCN0ZK06E


Processing URLs:  85%|████████▌ | 853/1000 [35:37<02:12,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-russia-putin-trump-meeting-idUSKBN14C1FK?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-putin-trump-meeting-idUSKBN14C1FK?il=0
URL filtered: http://www.bloomberg.com/news/articles/2015-07-02/iran-s-nod-to-iaea-monitoring-rights-sets-path-for-nuclear-deal


Processing URLs:  86%|████████▌ | 855/1000 [35:37<01:19,  1.83it/s]

Error extracting text from https://www.nytimes.com/2017/06/07/us/politics/christopher-wray-bio.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/07/us/politics/christopher-wray-bio.html


Processing URLs:  86%|████████▌ | 860/1000 [35:44<03:10,  1.36s/it]

Error extracting text from http://www.thelibertybeacon.com/2016/02/20/secret-ttip-talks-resume-monday-eu-us-rifts-deepen/: 404 Client Error: Not Found for url: http://www.thelibertybeacon.com/2016/02/20/secret-ttip-talks-resume-monday-eu-us-rifts-deepen/
URL filtered: https://twitter.com/FT/status/692895262029688832


Processing URLs:  86%|████████▌ | 862/1000 [35:44<02:08,  1.07it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-19/north-korean-cyber-capability-among-world-s-best-brooks-says


Processing URLs:  86%|████████▋ | 865/1000 [35:46<01:44,  1.29it/s]

Error extracting text from https://borderlex.net/2021/05/26/comment-eu-taiwan-relations-in-the-indo-pacific-time-for-an-upgrade/: 403 Client Error: Forbidden for url: https://borderlex.net/2021/05/26/comment-eu-taiwan-relations-in-the-indo-pacific-time-for-an-upgrade/


Processing URLs:  87%|████████▋ | 869/1000 [35:49<01:22,  1.59it/s]

Error extracting text from http://english.aawsat.com/2016/02/article55347867/shiite-leaderships-obstruct-mosul-dam-maintenance: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/02/article55347867/shiite-leaderships-obstruct-mosul-dam-maintenance
Error extracting text from http://www.reuters.com/article/us-eu-usa-ttip-europeancommission-idUSKCN114143?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-ttip-europeancommission-idUSKCN114143?il=0


Processing URLs:  87%|████████▋ | 874/1000 [35:56<02:33,  1.22s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/21/gitrep-20mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/21/gitrep-20mar16pm/


Processing URLs:  88%|████████▊ | 876/1000 [35:57<01:46,  1.16it/s]

Error extracting text from http://seekingalpha.com/article/3679746-even-light-oil-is-heavy: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3679746-even-light-oil-is-heavy


Processing URLs:  88%|████████▊ | 878/1000 [35:57<01:03,  1.93it/s]

Error extracting text from http://english.alarabiya.net/en/News/2016/01/28/Italy-PM-Germany-France-can-t-solve-refugee-issue-without-me.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/2016/01/28/Italy-PM-Germany-France-can-t-solve-refugee-issue-without-me.html


Processing URLs:  88%|████████▊ | 880/1000 [36:00<01:51,  1.08it/s]

Error extracting text from http://www.ibtimes.co.uk/nigeria-zaria-killings-imn-releases-names-700-missing-shias-zakzaky-returns-abuja-1540076: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/nigeria-zaria-killings-imn-releases-names-700-missing-shias-zakzaky-returns-abuja-1540076


Processing URLs:  88%|████████▊ | 885/1000 [36:06<01:40,  1.15it/s]

Error extracting text from https://www.nytimes.com/2017/03/09/us/politics/justice-dept-declines-to-back-claim-trump-is-not-under-investigation.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/09/us/politics/justice-dept-declines-to-back-claim-trump-is-not-under-investigation.html?_r=0
Error extracting text from https://www.reuters.com/article/us-usa-trump-capitol-arrests/oath-keepers-militia-members-arrested-for-role-in-u-s-capitol-siege-idUSKBN29O275: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-capitol-arrests/oath-keepers-militia-members-arrested-for-role-in-u-s-capitol-siege-idUSKBN29O275


Processing URLs:  89%|████████▉ | 891/1000 [36:19<03:32,  1.95s/it]

Error extracting text from http://essay.utwente.nl/62005/1/bachelor_thesis_Galdiga_s1006959.pdf: PyCryptodome is required for AES algorithm


Processing URLs:  89%|████████▉ | 893/1000 [36:22<03:18,  1.85s/it]

Error extracting text from https://mobile.nytimes.com/2017/11/29/business/gm-driverless-cars.html?referer=http://www.google.co.uk/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/29/business/gm-driverless-cars.html?referer=http://www.google.co.uk/


Processing URLs:  89%|████████▉ | 894/1000 [36:34<08:38,  4.89s/it]

URL filtered: https://www.youtube.com/watch?v=JRMOMjCoR58


Processing URLs:  90%|████████▉ | 899/1000 [38:50<1:01:17, 36.41s/it]

Error extracting text from https://www.yang2020.com/policies/carbon-fee-dividend/: HTTPSConnectionPool(host='www.yang2020.com', port=443): Max retries exceeded with url: /policies/carbon-fee-dividend/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30647ade0>, 'Connection to www.yang2020.com timed out. (connect timeout=60)'))


Processing URLs:  90%|█████████ | 902/1000 [38:52<22:16, 13.64s/it]  

Error extracting text from http://www.reuters.com/article/usa-threats-idUSL1N0C44CQ20130312: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/usa-threats-idUSL1N0C44CQ20130312
Error extracting text from http://www.reuters.com/article/us-eu-usa-trade-belgium-idUSKCN1190L0?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-trade-belgium-idUSKCN1190L0?il=0


Processing URLs:  90%|█████████ | 903/1000 [38:52<15:43,  9.73s/it]

Error extracting text from https://www.nytimes.com/2017/02/16/us/politics/affordable-care-act-congress.html?emc=edit_th_20170217&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/16/us/politics/affordable-care-act-congress.html?emc=edit_th_20170217&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  90%|█████████ | 904/1000 [38:53<11:25,  7.14s/it]

Error extracting text from http://uk.reuters.com/article/venezuela-pdvsa-contract-idUKL2N1BZ0O5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  91%|█████████ | 909/1000 [38:59<03:31,  2.33s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/us-envoy-to-thailand/2332148.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/us-envoy-to-thailand/2332148.html


Processing URLs:  91%|█████████▏| 913/1000 [39:13<03:28,  2.40s/it]

Error extracting text from https://www.bbc.co.uk/sounds/play/w3cszl4w&amp;gt: 404 Client Error: Not Found for url: https://www.bbc.co.uk/sounds/play/w3cszl4w&amp;gt


Processing URLs:  92%|█████████▏| 915/1000 [39:16<02:48,  1.98s/it]

Error extracting text from http://www.cfr.org/iran/irans-revolutionary-guards/p14324: 404 Client Error: Not Found for url: https://www.cfr.org/iran/irans-revolutionary-guards/p14324


Processing URLs:  92%|█████████▏| 919/1000 [39:20<01:45,  1.30s/it]

Error extracting text from http://tass.com/defense/927769: 502 Server Error: Bad Gateway for url: https://tass.com/defense/927769


Processing URLs:  93%|█████████▎| 928/1000 [39:37<01:24,  1.17s/it]

URL filtered: https://www.newsweek.com/ahmad-alissas-facebook-posts-islam-kickboxing-girlfriend-1578167
Error extracting text from http://www.reuters.com/article/us-israel-netanyahu-idUSKBN14S0EX?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-idUSKBN14S0EX?il=0


Processing URLs:  93%|█████████▎| 930/1000 [39:39<01:04,  1.08it/s]

Error extracting text from http://www.geekwire.com/2016/drone-delivery-flying-blimp-amazon/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2016/drone-delivery-flying-blimp-amazon/


Processing URLs:  93%|█████████▎| 933/1000 [39:42<01:04,  1.04it/s]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html#Concern: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/variant-surveillance/variant-info.html#Concern


Processing URLs:  94%|█████████▎| 935/1000 [39:45<01:17,  1.20s/it]

Error extracting text from http://pa.oxfordjournals.org/content/60/4/548.short: 403 Client Error: Forbidden for url: http://pa.oxfordjournals.org/content/60/4/548.short


Processing URLs:  94%|█████████▎| 936/1000 [39:47<01:27,  1.36s/it]

Error extracting text from http://en.trend.az/world/turkey/2467302.html: 404 Client Error: Not Found for url: https://www.trend.az/world/turkey/2467302.html


Processing URLs:  94%|█████████▍| 942/1000 [40:04<02:16,  2.36s/it]

Error extracting text from http://www.nytimes.com/2016/06/17/world/middleeast/syria-assad-obama-airstrikes-diplomats-memo.html?mabReward=A6&amp;module=WelcomeBackModal&amp;contentCollection=Middle%20East&amp;region=FixedCenter&amp;action=click&amp;src=recg&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/17/world/middleeast/syria-assad-obama-airstrikes-diplomats-memo.html?mabReward=A6&amp;module=WelcomeBackModal&amp;contentCollection=Middle%20East&amp;region=FixedCenter&amp;action=click&amp;src=recg&amp;pgtype=article


Processing URLs:  94%|█████████▍| 944/1000 [40:07<01:40,  1.79s/it]

Error extracting text from https://www.bankofengland.co.uk/statistics/yield-curves: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/statistics/yield-curves


Processing URLs:  94%|█████████▍| 945/1000 [40:07<01:17,  1.41s/it]

Error extracting text from http://www.thesun.co.uk/sol/homepage/showbiz/tv/6712846/True-story-of-gem-thieves-The-Pink-Panthers.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/showbiz/tv/6712846/True-story-of-gem-thieves-The-Pink-Panthers.html


Processing URLs:  95%|█████████▍| 948/1000 [40:20<02:26,  2.83s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-16/merkel-says-europe-controls-its-own-fate-in-rebuff-to-trump


Processing URLs:  96%|█████████▌| 955/1000 [40:29<00:56,  1.26s/it]

Error extracting text from http://www.dbresearch.com/PROD/DBR_INTERNET_EN-PROD/PROD0000000000329687/The+GCC+going+East%253A+Economic+ties+with+developing+Asia+on+the+rise.pdf: 404 Client Error: Not Found for url: https://www.dbresearch.com/PROD/DBR_INTERNET_EN-PROD/PROD0000000000329687/The+GCC+going+East%253A+Economic+ties+with+developing+Asia+on+the+rise.pdf
Error extracting text from http://www.nytimes.com/2016/02/05/us/politics/obama-praises-colombias-peace-efforts-with-rebels-and-seeks-big-aid-increase.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/05/us/politics/obama-praises-colombias-peace-efforts-with-rebels-and-seeks-big-aid-increase.html?ref=world


Processing URLs:  96%|█████████▌| 959/1000 [40:32<00:27,  1.46it/s]

Error extracting text from https://www.debka.com/200-russian-advisers-killed-last-weeks-clash-us-forces-syria/ : HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /200-russian-advisers-killed-last-weeks-clash-us-forces-syria/%20 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://www.youtube.com/watch?v=_bnbJFeVOr4&amp;t=7s
Error extracting text from http://www.reuters.com/article/us-brazil-corruption-nike-idUSKCN0WJ29O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-nike-idUSKCN0WJ29O


Processing URLs:  96%|█████████▌| 960/1000 [40:32<00:24,  1.66it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-redcross-idUSKCN1090TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-redcross-idUSKCN1090TL


Processing URLs:  96%|█████████▋| 963/1000 [40:38<00:49,  1.35s/it]

Error extracting text from http://space.mit.edu/home/tegmark/brain.html: HTTPSConnectionPool(host='space.mit.edu', port=443): Max retries exceeded with url: /home/tegmark/brain.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  96%|█████████▋| 965/1000 [40:41<00:46,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-apple-suppliers-idUSKCN0V00V5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apple-suppliers-idUSKCN0V00V5


Processing URLs:  97%|█████████▋| 967/1000 [40:42<00:27,  1.22it/s]

Error extracting text from https://medium.com/opacity/the-syrian-war-condensed-a-more-rigorous-way-to-look-at-the-conflict-f841404c3b1d#.h6dyma67g: 403 Client Error: Forbidden for url: https://medium.com/opacity/the-syrian-war-condensed-a-more-rigorous-way-to-look-at-the-conflict-f841404c3b1d#.h6dyma67g


Processing URLs:  97%|█████████▋| 970/1000 [41:16<05:04, 10.15s/it]

Error extracting text from http://gas2.org/2016/11/04/video-tesla-fire-shows-bursting-battery-cells/: 522 Server Error:  for url: https://gas2.org/2016/11/04/video-tesla-fire-shows-bursting-battery-cells/


Processing URLs:  97%|█████████▋| 972/1000 [41:20<02:45,  5.91s/it]

Error extracting text from http://realclearpolitics.com/: HTTPSConnectionPool(host='realclearpolitics.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  97%|█████████▋| 974/1000 [41:37<03:26,  7.96s/it]

Error extracting text from http://www.investopedia.com/articles/investing/111913/investing-foreign-stocks-adrs-and-gdrs.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/investing/111913/investing-foreign-stocks-adrs-and-gdrs.asp


Processing URLs:  98%|█████████▊| 978/1000 [41:42<00:59,  2.70s/it]

Error extracting text from http://www.nytimes.com/2016/11/26/world/europe/finger-pointed-at-russians-in-alleged-coup-plot-in-montenegro.html?emc=edit_th_20161127&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/26/world/europe/finger-pointed-at-russians-in-alleged-coup-plot-in-montenegro.html?emc=edit_th_20161127&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  98%|█████████▊| 981/1000 [41:47<00:32,  1.69s/it]

Error extracting text from http://www.straitstimes.com/business/companies-markets/keppel-reiterates-its-zero-tolerance-stance-against-bribery-amid-fresh: 403 Client Error: Forbidden for url: https://www.straitstimes.com/business/companies-markets/keppel-reiterates-its-zero-tolerance-stance-against-bribery-amid-fresh


Processing URLs:  98%|█████████▊| 984/1000 [41:51<00:20,  1.31s/it]

Error extracting text from http://www.whig.com/article/20161128/AP/311289865#: 404 Client Error: Not Found for url: https://www.whig.com/article/20161128/ap/311289865/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-falluja-idUSKCN0YP0MI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-falluja-idUSKCN0YP0MI


Processing URLs:  99%|█████████▉| 989/1000 [41:55<00:07,  1.44it/s]

Error extracting text from http://pulitzercenter.org/reporting/iraq-front-lines: 403 Client Error: Forbidden for url: http://pulitzercenter.org/reporting/iraq-front-lines
Error extracting text from http://www.reuters.com/article/us-ukraine-cyber/new-cyber-attacks-hit-airport-metro-in-ukraine-idUSKBN1CT21F?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-cyber/new-cyber-attacks-hit-airport-metro-in-ukraine-idUSKBN1CT21F?il=0


Processing URLs:  99%|█████████▉| 992/1000 [41:58<00:07,  1.06it/s]

Error extracting text from https://jamanetwork.com/journals/jama/fullarticle/2762028: 403 Client Error: Forbidden for url: https://jamanetwork.com/journals/jama/fullarticle/2762028


Processing URLs:  99%|█████████▉| 994/1000 [42:00<00:04,  1.32it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil247.com/pt/247/poder/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil247.com/pt/247/poder/&amp;prev=search


Processing URLs: 100%|█████████▉| 995/1000 [42:00<00:03,  1.66it/s]

Error extracting text from https://www.reuters.com/world/europe/putin-orders-military-operations-ukraine-demands-kyiv-forces-surrender-2022-02-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/putin-orders-military-operations-ukraine-demands-kyiv-forces-surrender-2022-02-24/


Processing URLs: 100%|█████████▉| 996/1000 [42:00<00:02,  1.80it/s]

Error extracting text from https://www.google.ca/amp/s/www.yahoo.com/amphtml/news/weakening-inside-iraqi-city-mosul-pentagon-203809199.html?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: https://www.yahoo.com/amphtml/news/weakening-inside-iraqi-city-mosul-pentagon-203809199.html


Processing URLs: 100%|█████████▉| 998/1000 [42:02<00:01,  1.49it/s]

Error extracting text from https://www.thelocal.fr/20210205/it-makes-no-sense-french-hospital-chiefs-fear-macrons-lockdown-gamble-will-backfire: 403 Client Error: Forbidden for url: https://www.thelocal.fr/20210205/it-makes-no-sense-french-hospital-chiefs-fear-macrons-lockdown-gamble-will-backfire


Processing URLs: 100%|██████████| 1000/1000 [42:04<00:00,  2.52s/it]
Processing URLs:   0%|          | 1/1000 [00:00<02:22,  7.02it/s]

Error extracting text from http://www.pakistantoday.com.pk/2017/02/01/afghan-government-and-taliban/: 403 Client Error: Forbidden for url: http://www.pakistantoday.com.pk/2017/02/01/afghan-government-and-taliban/


Processing URLs:   0%|          | 2/1000 [00:00<04:00,  4.16it/s]

Error extracting text from http://www.wsj.com/articles/u-s-encourages-firms-to-make-deals-with-iran-in-bid-to-cement-nuclear-deal-1466727183: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-encourages-firms-to-make-deals-with-iran-in-bid-to-cement-nuclear-deal-1466727183


Processing URLs:   0%|          | 3/1000 [00:00<03:55,  4.23it/s]

Error extracting text from https://www.axios.com/putin-will-double-down-on-active-measures-2512821627.html?utm_source=sidebar: 403 Client Error: Forbidden for url: https://www.axios.com/putin-will-double-down-on-active-measures-2512821627.html?utm_source=sidebar


Processing URLs:   0%|          | 4/1000 [00:00<03:58,  4.18it/s]

Error extracting text from http://www.reuters.com/article/imf-ukraine-idUSL1N1A00VO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/imf-ukraine-idUSL1N1A00VO


Processing URLs:   0%|          | 5/1000 [00:01<06:31,  2.54it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/361083-retired-alabama-officer-roy-moore-rumors-treated-like-a-joke: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/361083-retired-alabama-officer-roy-moore-rumors-treated-like-a-joke/


Processing URLs:   1%|          | 8/1000 [00:03<08:54,  1.86it/s]

Error extracting text from http://espn.go.com/nfl/story/_/id/14538048/commissioner-roger-goodell-says-plans-keep-chargers-raiders-rams-inadequate-current-home-markets: 403 Client Error: Forbidden for url: http://espn.go.com/nfl/story/_/id/14538048/commissioner-roger-goodell-says-plans-keep-chargers-raiders-rams-inadequate-current-home-markets
Error extracting text from https://www.reuters.com/business/energy/us-shale-industry-tempers-output-even-oil-price-jumps-2021-06-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/us-shale-industry-tempers-output-even-oil-price-jumps-2021-06-28/
URL filtered: https://twitter.com/TGhazniwal/status/1425304020336394243


Processing URLs:   1%|▏         | 13/1000 [00:11<20:46,  1.26s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-12-09/bloomberg-politics-poll-trump-muslim-ban-proposal


Processing URLs:   2%|▏         | 15/1000 [00:11<13:21,  1.23it/s]

Error extracting text from https://www.nytimes.com/2017/04/06/us/politics/neil-gorsuch-supreme-court-senate.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/06/us/politics/neil-gorsuch-supreme-court-senate.html?_r=0


Processing URLs:   2%|▏         | 21/1000 [00:19<21:40,  1.33s/it]

Error extracting text from http://www.us-iran.org/news/2016/3/3/iran-digest-week-of-february-26-march-4-2016: 404 Client Error: Not Found for url: http://www.us-iran.org/news/2016/3/3/iran-digest-week-of-february-26-march-4-2016


Processing URLs:   2%|▏         | 24/1000 [00:24<20:55,  1.29s/it]

Error extracting text from http://www.nytimes.com/2015/05/30/us/us-removes-cuba-from-state-terrorism-list.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/05/30/us/us-removes-cuba-from-state-terrorism-list.html


Processing URLs:   2%|▎         | 25/1000 [00:25<22:07,  1.36s/it]

Error extracting text from http://chicago.suntimes.com/news/illinois-at-center-of-revived-equal-rights-amendment-fight/: 404 Client Error: Not Found for url: https://chicago.suntimes.com/news/illinois-at-center-of-revived-equal-rights-amendment-fight/


Processing URLs:   3%|▎         | 26/1000 [00:56<2:46:08, 10.23s/it]

Error extracting text from http://www.internationallawoffice.com/Newsletters/International-Trade/European-Union/King-Spalding-LLP/Energy-in-TTIP-negotiations-an-EU-perspective: 522 Server Error:  for url: http://www.internationallawoffice.com/Newsletters/International-Trade/European-Union/King-Spalding-LLP/Energy-in-TTIP-negotiations-an-EU-perspective


Processing URLs:   3%|▎         | 30/1000 [01:01<51:29,  3.19s/it]  

Error extracting text from http://warontherocks.com/2016/05/talking-to-the-islamic-state-co-opting-jihadists-into-a-political-process/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/05/talking-to-the-islamic-state-co-opting-jihadists-into-a-political-process/
URL filtered: https://www.youtube.com/watch?v=GAGWap3uSqM


Processing URLs:   3%|▎         | 34/1000 [01:04<25:29,  1.58s/it]

Error extracting text from https://www.lesswrong.com/posts/XJxwFMSL5TPN2usC6/modes-of-petrov-day: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/XJxwFMSL5TPN2usC6/modes-of-petrov-day


Processing URLs:   4%|▎         | 36/1000 [01:07<25:19,  1.58s/it]

Error extracting text from https://www.dw.com/de/28082021-langsam-gesprochene-nachrichten/a-59010622: 404 Client Error: Not Found for url: https://www.dw.com/de/28082021-langsam-gesprochene-nachrichten/a-59010622


Processing URLs:   4%|▎         | 37/1000 [01:08<20:39,  1.29s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/361475-gop-lawmaker-says-fbi-seeking-interview-about-assange-meeting: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/361475-gop-lawmaker-says-fbi-seeking-interview-about-assange-meeting/


Processing URLs:   4%|▍         | 43/1000 [01:29<30:44,  1.93s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-02-18/how-many-vaccine-doses-are-available-u-s-should-see-a-surge?sref=x7nYEkiY


Processing URLs:   5%|▍         | 47/1000 [01:32<18:15,  1.15s/it]

Error extracting text from http://uk.mobile.reuters.com/article/idUKKCN0XE0OR?irpc=932: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUKKCN0XE0OR?irpc=932
URL filtered: https://amp.cnn.com/cnn/2021/07/16/politics/biden-intel-review-covid-origins/index.html?__twitter_impression=true
URL filtered: https://www.youtube.com/watch?v=3IXjqWi2Ci8


Processing URLs:   5%|▌         | 52/1000 [01:37<17:15,  1.09s/it]

Error extracting text from http://intelligencebriefs.com/suicide-car-bomb-target-amisom-convoy-near-mogadishu/: 406 Client Error: Not Acceptable for url: http://intelligencebriefs.com/suicide-car-bomb-target-amisom-convoy-near-mogadishu/


Processing URLs:   5%|▌         | 54/1000 [01:38<12:26,  1.27it/s]

Error extracting text from https://www.wsj.com/articles/merkel-begins-last-ditch-effort-to-form-coalition-government-1515338103: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/merkel-begins-last-ditch-effort-to-form-coalition-government-1515338103


Processing URLs:   6%|▌         | 57/1000 [01:42<15:18,  1.03it/s]

Error extracting text from https://www.consilium.europa.eu/media/21766/directives-for-the-negotiation-xt21016-ad01re02en17.pdf: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/media/21766/directives-for-the-negotiation-xt21016-ad01re02en17.pdf


Processing URLs:   6%|▋         | 64/1000 [01:56<26:29,  1.70s/it]

Error extracting text from http://www.businessinsider.com.au/faraday-future-production-vehicle-release-date-2016-11?r=US&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/faraday-future-production-vehicle-release-date-2016-11?r=US&amp;IR=T


Processing URLs:   7%|▋         | 67/1000 [02:08<41:55,  2.70s/it]

Error extracting text from http://taskandpurpose.com/intel-officials-knew-flynn-vulnerable-blackmail-still-shared-sensitive-info/: 404 Client Error: Not Found for url: https://taskandpurpose.com/intel-officials-knew-flynn-vulnerable-blackmail-still-shared-sensitive-info/


Processing URLs:   7%|▋         | 70/1000 [02:13<29:16,  1.89s/it]

Error extracting text from http://www.undispatch.com/elections-in-the-democratic-republic-of-congo-could-mean-trouble/: 403 Client Error: Forbidden for url: http://undispatch.com/elections-in-the-democratic-republic-of-congo-could-mean-trouble/


Processing URLs:   7%|▋         | 71/1000 [02:17<39:28,  2.55s/it]

URL filtered: https://twitter.com/hashemi_rfsjn


Processing URLs:   7%|▋         | 73/1000 [02:18<25:49,  1.67s/it]

Error extracting text from http://articles.latimes.com/2013/may/01/world/la-fg-wn-russia-japan-peace-treaty-20130430: 403 Client Error: Forbidden for url: https://www.latimes.com/archives/la-xpm-2013-may-01-la-fg-wn-russia-japan-peace-treaty-20130430-story.html
Error extracting text from https://ycharts.com/indicators/iran_crude_oil_production: 403 Client Error: Forbidden for url: https://ycharts.com/indicators/iran_crude_oil_production


Processing URLs:   8%|▊         | 79/1000 [02:53<1:16:55,  5.01s/it]

URL filtered: https://twitter.com/ThisWeekABC/status/686201894025445380


Processing URLs:   8%|▊         | 83/1000 [03:56<4:22:44, 17.19s/it]

Error extracting text from https://betting.betfair.com/politics/brexit/brexit-betting-odds-eu-referendum-betting-may-29-270516-204.html: HTTPSConnectionPool(host='betting.betfair.com', port=443): Max retries exceeded with url: /politics/brexit/brexit-betting-odds-eu-referendum-betting-may-29-270516-204.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30060eea0>, 'Connection to betting.betfair.com timed out. (connect timeout=60)'))


Processing URLs:   8%|▊         | 84/1000 [03:57<3:18:49, 13.02s/it]

URL filtered: https://www.youtube.com/watch?v=nE0mKpShJSU


Processing URLs:   9%|▉         | 88/1000 [04:04<1:21:27,  5.36s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_democratic_presidential_primary-3351.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_democratic_presidential_primary-3351.html


Processing URLs:   9%|▉         | 92/1000 [04:09<39:16,  2.60s/it]  

Error extracting text from http://newsok.com/article/5485578: 404 Client Error: OK for url: https://www.oklahoman.com/article/5485578


Processing URLs:  10%|▉         | 96/1000 [04:17<33:06,  2.20s/it]

Error extracting text from http://thecipherbrief.com/column/strategic-view/integrated-agile-intelligence-key-combatting-dynamic-threats-1093: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/strategic-view/integrated-agile-intelligence-key-combatting-dynamic-threats-1093


Processing URLs:  10%|▉         | 97/1000 [04:18<30:06,  2.00s/it]

Error extracting text from http://uavcoach.com/zipline-raises-25-million-for-work-in-africa/: 404 Client Error: Not Found for url: https://uavcoach.com/zipline-raises-25-million-for-work-in-africa/


Processing URLs:  10%|▉         | 99/1000 [04:20<20:16,  1.35s/it]

Error extracting text from http://thehill.com/policy/finance/260966-syria-refugee-fight-emerges-as-government-shutdown-threat: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/260966-syria-refugee-fight-emerges-as-government-shutdown-threat/


Processing URLs:  10%|█         | 100/1000 [04:24<32:53,  2.19s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-13/pdvsa-bonds-surge-amid-speculation-bond-swap-announcement-near


Processing URLs:  10%|█         | 103/1000 [04:27<21:11,  1.42s/it]

Error extracting text from http://www.consilium.europa.eu/en/policies/sanctions/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/policies/sanctions/


Processing URLs:  11%|█         | 110/1000 [04:44<30:29,  2.06s/it]

URL filtered: https://twitter.com/IranDataPortal


Processing URLs:  11%|█▏        | 114/1000 [04:46<17:12,  1.17s/it]

Error extracting text from https://apple.news/AISK726HHTv2cgNWOtD4S3A: 404 Client Error: Not Found for url: https://apple.news/AISK726HHTv2cgNWOtD4S3A


Processing URLs:  12%|█▏        | 119/1000 [04:56<22:19,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-stone/trump-ally-stone-flatly-rejects-allegations-of-russia-collusion-idUSKCN1C103S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-stone/trump-ally-stone-flatly-rejects-allegations-of-russia-collusion-idUSKCN1C103S


Processing URLs:  12%|█▏        | 121/1000 [04:58<19:12,  1.31s/it]

Error extracting text from http://nationalinterest.org/feature/iraqs-silver-bullet-the-isis-fight-17053?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/iraqs-silver-bullet-the-isis-fight-17053?page=2


Processing URLs:  12%|█▎        | 125/1000 [05:03<15:11,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0WM0SL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0WM0SL


Processing URLs:  13%|█▎        | 128/1000 [05:17<39:28,  2.72s/it]  

Error extracting text from http://english.shafaaq.com/security/18149-mosul-battle-plans-ready,-could-be-concluded-by-year-end-kurdish-leader: HTTPConnectionPool(host='english.shafaaq.com', port=80): Max retries exceeded with url: /security/18149-mosul-battle-plans-ready,-could-be-concluded-by-year-end-kurdish-leader (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307c815e0>: Failed to resolve 'english.shafaaq.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  13%|█▎        | 130/1000 [05:20<33:03,  2.28s/it]

Error extracting text from https://www.reuters.com/business/healthcare-pharmaceuticals/merck-says-research-shows-its-covid-19-pill-works-against-variants-2021-09-29/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/merck-says-research-shows-its-covid-19-pill-works-against-variants-2021-09-29/


Processing URLs:  14%|█▍        | 138/1000 [06:15<2:14:55,  9.39s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950603001289: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950603001289 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302e0b890>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  14%|█▍        | 143/1000 [06:20<39:52,  2.79s/it]  

Error extracting text from http://news.yahoo.com/brazils-congress-start-examining-rousseff-impeachment-031620740.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazils-congress-start-examining-rousseff-impeachment-031620740.html


Processing URLs:  14%|█▍        | 144/1000 [06:24<42:50,  3.00s/it]

Error extracting text from http://bangladeshchronicle.net/2016/06/11684-arrested-in-four-days/: 404 Client Error: Not Found for url: https://bangladeshchronicle.net/2016/06/11684-arrested-in-four-days/


Processing URLs:  14%|█▍        | 145/1000 [06:24<32:46,  2.30s/it]

URL filtered: https://www.buzzfeed.com/jimwaterson/british-mps-are-targeting-facebook-with-fake-news-inquiry


Processing URLs:  15%|█▍        | 149/1000 [06:29<23:28,  1.66s/it]

Error extracting text from https://polcms.secure.europarl.europa.eu/cmsdata/113840/Study.pdf: 403 Client Error: Forbidden for url: https://telacms.europarl.europa.eu/cmsdata/113840/Study.pdf


Processing URLs:  15%|█▌        | 150/1000 [06:29<18:46,  1.33s/it]

Error extracting text from https://www.venable.com/new-sanctions-imposed-on-north-korea-following-cyber-attack/: 403 Client Error: Forbidden for url: https://www.venable.com/new-sanctions-imposed-on-north-korea-following-cyber-attack/


Processing URLs:  15%|█▌        | 154/1000 [06:34<14:15,  1.01s/it]

Error extracting text from https://www.thestreet.com/etffocus/market-intelligence/vaneck-launch-bitcoin-etf-again#:~:text=The%20regulatory%20body%20has%20refused,product%20in%20an%20ETF%20wrapper.&amp;text=The%20coin%2Dbased%20funds%20were,about%20the%20futures%2Dbased%20products: 403 Client Error: Forbidden for url: https://www.thestreet.com/etffocus/market-intelligence/vaneck-launch-bitcoin-etf-again#:~:text=The%20regulatory%20body%20has%20refused,product%20in%20an%20ETF%20wrapper.&amp;text=The%20coin-based%20funds%20were,about%20the%20futures-based%20products


Processing URLs:  16%|█▌        | 155/1000 [06:35<14:56,  1.06s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-05-22/israel-hamas-war-in-gaza-showcased-iran-s-rocket-threat


Processing URLs:  16%|█▌        | 158/1000 [06:37<09:10,  1.53it/s]

Error extracting text from https://www.fire.ca.gov/incidents/2020/8/16/czu-lightning-complex-including-warnella-fire/: 403 Client Error: Forbidden for url: https://www.fire.ca.gov/incidents/2020/8/16/czu-lightning-complex-including-warnella-fire/
Error extracting text from https://www.wsj.com/articles/goldman-sachs-bought-venezuelan-oil-co-bonds-last-week-1496020176: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/goldman-sachs-bought-venezuelan-oil-co-bonds-last-week-1496020176


Processing URLs:  16%|█▌        | 161/1000 [06:44<22:29,  1.61s/it]

Error extracting text from https://weather.com/storms/hurricane/video/watching-extremely-dangerous-matthew-in-caribbean-0: 404 Client Error: Not Found for url: https://weather.com/storms/hurricane/video/watching-extremely-dangerous-matthew-in-caribbean-0


Processing URLs:  16%|█▌        | 162/1000 [06:46<24:18,  1.74s/it]

Error extracting text from http://www.coastweek.com/4102-South-Africa-ruling-party-denies-reports-about-president-Zumas-demands-for-stepping-down.htm: 404 Client Error: Not Found for url: https://www.coastweek.com/4102-South-Africa-ruling-party-denies-reports-about-president-Zumas-demands-for-stepping-down.htm


Processing URLs:  16%|█▋        | 164/1000 [06:49<23:02,  1.65s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-19/venezuelans-paralyze-capital-to-oppose-a-constitutional-rewrite


Processing URLs:  17%|█▋        | 167/1000 [07:01<41:32,  2.99s/it]

Error extracting text from http://missoulian.com/news/opinion/obama-s-band-aid-for-syria/article_aa43ff5f-5671-5185-8898-b0274ce02d01.html: 404 Client Error: Not Found for url: https://missoulian.com/news/opinion/obama-s-band-aid-for-syria/article_aa43ff5f-5671-5185-8898-b0274ce02d01.html


Processing URLs:  17%|█▋        | 169/1000 [07:06<34:27,  2.49s/it]

Error extracting text from https://medium.com/dfrlab/revealing-the-next-big-17460ced1fdd: 403 Client Error: Forbidden for url: https://medium.com/dfrlab/revealing-the-next-big-17460ced1fdd
URL filtered: http://www.bloomberg.com/news/articles/2016-05-25/new-york-london-on-notice-as-china-targets-commodities-pricing


Processing URLs:  18%|█▊        | 175/1000 [08:10<4:01:45, 17.58s/it]

Error extracting text from https://www.mcclatchydc.com/news/politics-government/white-house/article210477439.html: HTTPSConnectionPool(host='www.mcclatchydc.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  18%|█▊        | 176/1000 [08:12<3:00:03, 13.11s/it]

URL filtered: http://www.npr.org/sections/thetwo-way/2016/06/30/484225226/adnan-syed-subject-of-serial-podcast-will-get-a-new-trial?utm_source=twitter.com&amp;utm_campaign=npr&amp;utm_medium=social&amp;utm_term=nprnews


Processing URLs:  18%|█▊        | 184/1000 [08:21<29:35,  2.18s/it]  

Error extracting text from http://m.fredericksburg.com/news/news-wire/turkey-iran-pledge-cooperation-over-syria/article_b73565b8-4d5b-5769-9d8f-4fc69f1bb9a3.html?mode=jqm: 404 Client Error: 404 Not Found for url: http://m.fredericksburg.com/news/news-wire/turkey-iran-pledge-cooperation-over-syria/article_b73565b8-4d5b-5769-9d8f-4fc69f1bb9a3.html?mode=jqm


Processing URLs:  19%|█▊        | 186/1000 [08:23<20:58,  1.55s/it]

Error extracting text from https://www.fao.org/worldfoodsituation/foodpricesindex/en/): 404 Client Error: Not Found for url: https://www.fao.org/worldfoodsituation/foodpricesindex/en/)


Processing URLs:  19%|█▉        | 188/1000 [08:24<13:30,  1.00it/s]

Error extracting text from http://www.themarketbusiness.com/2015-10-24-fed-interest-rate-hike-may-not-happen-in-2015-experts: 403 Client Error: Forbidden for url: https://www.themarketbusiness.com/2015-10-24-fed-interest-rate-hike-may-not-happen-in-2015-experts
Error extracting text from http://www.reuters.com/article/us-yemen-security-saudi-un/saudi-arabia-says-u-n-report-on-yemen-inaccurate-and-misleading-idUSKBN1CB2BL?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-saudi-un/saudi-arabia-says-u-n-report-on-yemen-inaccurate-and-misleading-idUSKBN1CB2BL?il=0


Processing URLs:  19%|█▉        | 189/1000 [08:25<12:04,  1.12it/s]

URL filtered: http://www.bloomberg.com/news/videos/2016-02-09/goldman-s-cohn-i-m-worried-about-liquidity


Processing URLs:  19%|█▉        | 193/1000 [08:28<12:24,  1.08it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.pps.org.br/2016/02/06/impeachment-de-dilma-devolveria-a-esperanca-aos-brasileiros-diz-freire/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.pps.org.br/2016/02/06/impeachment-de-dilma-devolveria-a-esperanca-aos-brasileiros-diz-freire/&amp;prev=search
URL filtered: http://www.bloomberg.com/news/articles/2016-09-08/saudi-arabia-said-poised-to-add-boutique-adviser-for-aramco-ipo


Processing URLs:  20%|█▉        | 197/1000 [08:29<07:31,  1.78it/s]

Error extracting text from https://thehill.com/homenews/senate/575631-democrats-signal-they-will-accept-mcconnell-offer: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/575631-democrats-signal-they-will-accept-mcconnell-offer/


Processing URLs:  20%|█▉        | 198/1000 [08:31<11:57,  1.12it/s]

Error extracting text from https://reut.rs/3aGrhlx: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-scotland-referendum-explainer/explainer-can-scotland-hold-another-independence-referendum-idUSKBN2A51LQ?il=0


Processing URLs:  21%|██        | 206/1000 [08:46<15:41,  1.19s/it]



Processing URLs:  21%|██        | 209/1000 [08:49<12:22,  1.07it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-13/iran-sanctions-seen-lifted-by-monday-as-nuclear-deal-implemented


Processing URLs:  21%|██        | 211/1000 [08:50<11:58,  1.10it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/venezuela-is-seeking-debt-relief-and-confusing-its-bondholders?utm_source=yahoo&amp;utm_medium=bd&amp;utm_campaign=headline&amp;cmpId=yhoo.headline&amp;yptr=yahoo


Processing URLs:  21%|██▏       | 214/1000 [08:54<14:23,  1.10s/it]

URL filtered: http://www.bloomberg.com/politics/videos/2015-11-04/james-carville-on-how-to-stop-donald-trump-wait-on-gravity


Processing URLs:  22%|██▏       | 216/1000 [08:55<10:58,  1.19it/s]

Error extracting text from http://www.maritime-executive.com/article/panama-canal-expansion-95-percent-complete: 404 Client Error: Not Found for url: https://www.maritime-executive.com/403.shtml


Processing URLs:  22%|██▏       | 219/1000 [09:00<18:05,  1.39s/it]

Error extracting text from http://www.foxnews.com/politics/2015/10/19/ambassador-sought-security-staffing-before-benghazi-attack-email-shows/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/10/19/ambassador-sought-security-staffing-before-benghazi-attack-email-shows/


Processing URLs:  22%|██▏       | 220/1000 [09:54<3:18:06, 15.24s/it]

Error extracting text from http://ewn.co.za/2016/04/25/Zim-police-mulling-plans-to-garnish-bank-accounts-of-traffic-offenders: 500 Server Error: Internal Server Error for url: https://www.ewn.co.za/2016/04/25/Zim-police-mulling-plans-to-garnish-bank-accounts-of-traffic-offenders


Processing URLs:  22%|██▏       | 221/1000 [09:58<2:35:34, 11.98s/it]

Error extracting text from https://science.nasa.gov/science-news/science-at-nasa/2006/10may_longrange/: 404 Client Error: Page not found: /science-news/science-at-nasa/2006/10may_longrange/ for url: https://science.nasa.gov/science-news/science-at-nasa/2006/10may_longrange/
URL filtered: http://www.bloomberg.com/news/articles/2015-09-20/iaea-s-top-inspector-gains-access-to-iran-s-parchin-site-iesqmrqc


Processing URLs:  22%|██▏       | 223/1000 [10:01<1:36:43,  7.47s/it]

Error extracting text from http://buenosairesherald.com/article/208652/venezuela-expands-military-influence-over-oil-and-mining: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/208652/venezuela-expands-military-influence-over-oil-and-mining


Processing URLs:  22%|██▏       | 224/1000 [10:02<1:16:58,  5.95s/it]

Error extracting text from https://reut.rs/3slnbr5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-boe-bailey-negative/boes-bailey-says-there-are-a-lot-of-issues-with-negative-rates-idUSKBN29H12Z


Processing URLs:  23%|██▎       | 226/1000 [10:04<48:05,  3.73s/it]  

Error extracting text from http://www.novelrank.com/asin/0804136696: HTTPSConnectionPool(host='www.novelrank.com', port=443): Max retries exceeded with url: /asin/0804136696 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  23%|██▎       | 232/1000 [10:26<30:19,  2.37s/it]  

Error extracting text from http://www.basnews.com/index.php/en/news/kurdistan/290516: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/kurdistan/290516
URL filtered: https://twitter.com/WarintheFuture/status/1503499716209754115
URL filtered: https://twitter.com/CNBCnow/status/690573471949135872


Processing URLs:  24%|██▎       | 236/1000 [10:28<15:26,  1.21s/it]

Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)31766-4/fulltext?elsca1=etoc: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)31766-4/fulltext?elsca1=etoc


Processing URLs:  24%|██▍       | 238/1000 [10:49<56:25,  4.44s/it]  

Error extracting text from https://www.chathamhouse.org/sites/files/chathamhouse/field/field_document/20150305IranOilGasStevens.pdf: 404 Client Error: Not Found for url: https://www.chathamhouse.org/sites/files/chathamhouse/field/field_document/20150305IranOilGasStevens.pdf
URL filtered: http://www.bloomberg.com/news/articles/2016-02-25/australia-boosts-defense-spend-as-south-china-sea-tensions-rise


Processing URLs:  24%|██▍       | 242/1000 [10:53<28:37,  2.27s/it]

Error extracting text from http://af.reuters.com/article/worldNews/idAFKCN0T00CQ20151111: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  25%|██▍       | 246/1000 [10:58<16:20,  1.30s/it]

Error extracting text from http://www.wsj.com/articles/european-leaders-weigh-options-to-halt-migration-flow-1453633203: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/european-leaders-weigh-options-to-halt-migration-flow-1453633203


Processing URLs:  25%|██▍       | 248/1000 [11:02<19:47,  1.58s/it]

Error extracting text from http://www.wsj.com/articles/time-inc-revenue-falls-digital-cant-offset-print-declines-1478173930: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-revenue-falls-digital-cant-offset-print-declines-1478173930


Processing URLs:  25%|██▍       | 249/1000 [11:03<18:27,  1.48s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-23/carmaker-cheating-on-emissions-almost-as-old-as-pollution-tests


Processing URLs:  25%|██▌       | 253/1000 [11:07<13:06,  1.05s/it]

URL filtered: https://www.youtube.com/watch?v=s2IaFaJrmno


Processing URLs:  26%|██▌       | 255/1000 [11:08<10:38,  1.17it/s]

Error extracting text from https://www.politics.co.uk/blogs/2011/12/13/want-to-fire-your-mp-here-s-how: 403 Client Error: Forbidden for url: https://www.politics.co.uk/blogs/2011/12/13/want-to-fire-your-mp-here-s-how


Processing URLs:  26%|██▌       | 257/1000 [11:10<11:19,  1.09it/s]

Error extracting text from http://news.yahoo.com/least-140-killed-ethiopia-protests-over-land-plan-065740622.html: 404 Client Error: Not Found for url: http://news.yahoo.com/least-140-killed-ethiopia-protests-over-land-plan-065740622.html


Processing URLs:  26%|██▌       | 260/1000 [11:14<12:06,  1.02it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/249392-senate-votes-to-reauthorize-export-import-bank: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/249392-senate-votes-to-reauthorize-export-import-bank/


Processing URLs:  26%|██▋       | 264/1000 [12:46<2:55:03, 14.27s/it]

Error extracting text from https://www.nytimes.com/2022/01/08/us/politics/us-sanctions-russia-ukraine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/08/us/politics/us-sanctions-russia-ukraine.html


Processing URLs:  27%|██▋       | 266/1000 [12:48<1:31:12,  7.46s/it]

Error extracting text from https://www.un.org/press/en/2021/sc14488.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/sc14488.doc.htm


Processing URLs:  27%|██▋       | 268/1000 [12:48<46:20,  3.80s/it]  

Error extracting text from http://www.wsj.com/articles/economists-overwhelmingly-expect-fed-to-raise-interest-rates-in-december-1447340397: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/economists-overwhelmingly-expect-fed-to-raise-interest-rates-in-december-1447340397
Error extracting text from http://www.reuters.com/article/us-venezuela-economy-china-idUSKCN0Y80PY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-china-idUSKCN0Y80PY


Processing URLs:  27%|██▋       | 273/1000 [12:54<18:06,  1.49s/it]

Error extracting text from http://www.economiccalendar.com/2016/07/22/canadas-cpi-steady-at-2-1-for-june/: 404 Client Error: Not Found for url: http://www.economiccalendar.com/2016/07/22/canadas-cpi-steady-at-2-1-for-june/


Processing URLs:  28%|██▊       | 275/1000 [12:56<16:56,  1.40s/it]

Error extracting text from http://www.who.int/csr/don/archive/disease/poliomyelitis/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/archive/disease/poliomyelitis/en/


Processing URLs:  28%|██▊       | 277/1000 [12:57<10:41,  1.13it/s]

Error extracting text from http://news.sky.com/story/1701753/ex-general-tells-pm-to-bugger-off-over-brexit: 404 Client Error: Not Found for url: https://news.sky.com/story/1701753/ex-general-tells-pm-to-bugger-off-over-brexit
Error extracting text from http://www.reuters.com/article/2015/09/15/us-china-southchinasea-airstrips-idUSKCN0RE28220150915: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/15/us-china-southchinasea-airstrips-idUSKCN0RE28220150915


Processing URLs:  28%|██▊       | 281/1000 [13:05<17:58,  1.50s/it]

Error extracting text from https://in.reuters.com/article/germany-politics/few-cheers-at-home-for-germanys-loveless-coalition-idINKBN1FS0ZZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  28%|██▊       | 283/1000 [13:05<09:44,  1.23it/s]

Error extracting text from http://www.pionline.com/article/20171207/ONLINE/171209871/moodys-proposes-upping-pension-debt-weighting-in-credit-rating-decisions: 403 Client Error: Forbidden for url: https://www.pionline.com/article/20171207/ONLINE/171209871/moodys-proposes-upping-pension-debt-weighting-in-credit-rating-decisions
Error extracting text from http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html?_r=0


Processing URLs:  28%|██▊       | 284/1000 [13:06<11:02,  1.08it/s]

Error extracting text from https://thefederalist.com/2020/12/01/uyghur-leader-were-worried-about-a-biden-administration/: 403 Client Error: Forbidden for url: https://thefederalist.com/2020/12/01/uyghur-leader-were-worried-about-a-biden-administration/


Processing URLs:  29%|██▊       | 287/1000 [13:08<06:37,  1.80it/s]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-russia-mcafee/mcafee-says-it-no-longer-will-permit-government-source-code-reviews-idUSKBN1CV2MP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-mcafee/mcafee-says-it-no-longer-will-permit-government-source-code-reviews-idUSKBN1CV2MP
Error extracting text from https://www.reuters.com/article/us-china-djibouti/china-sends-troops-to-open-first-overseas-military-base-in-djibouti-idUSKBN19X049: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-djibouti/china-sends-troops-to-open-first-overseas-military-base-in-djibouti-idUSKBN19X049


Processing URLs:  29%|██▉       | 289/1000 [13:10<07:45,  1.53it/s]

Error extracting text from http://www.independent.mk/articles/42177/Serbian+Analysts+No+Proof+that+Genocide+on+Albanians+Occurred: 403 Client Error: Forbidden for url: http://www.independent.mk/articles/42177/Serbian+Analysts+No+Proof+that+Genocide+on+Albanians+Occurred


Processing URLs:  29%|██▉       | 290/1000 [13:10<06:23,  1.85it/s]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-naimi-idUSKBN0UD15H20151230: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-naimi-idUSKBN0UD15H20151230


Processing URLs:  30%|██▉       | 298/1000 [13:32<34:24,  2.94s/it]

Error extracting text from https://abcnews.go.com/International/wireStory/venezuelans-wait-results-boycotted-congress-vote-74577392: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/venezuelans-wait-results-boycotted-congress-vote-74577392


Processing URLs:  30%|██▉       | 299/1000 [13:34<29:59,  2.57s/it]

Error extracting text from http://www.who.int/features/factfiles/polio/facts/en/index3.html: 404 Client Error: Not Found for url: https://www.who.int/features/factfiles/polio/facts/en/index3.html
Error extracting text from http://www.reuters.com/article/us-southkorea-usa-thaad-china-idUSKBN16709W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-usa-thaad-china-idUSKBN16709W
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-kurds-idUSKCN0UU0AO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-kurds-idUSKCN0UU0AO


Processing URLs:  30%|███       | 302/1000 [13:36<17:24,  1.50s/it]

Error extracting text from https://www.google.ca/amp/www.bbc.co.uk/news/amp/37674693?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: https://www.bbc.co.uk/news/37674693.amp
Error extracting text from https://www.reuters.com/world/switzerland-is-most-likely-venue-putin-biden-summit-russias-kommersant-2021-05-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/switzerland-is-most-likely-venue-putin-biden-summit-russias-kommersant-2021-05-17/


Processing URLs:  31%|███       | 308/1000 [13:40<09:07,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-security-syria-idUSKCN0YX05I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-syria-idUSKCN0YX05I


Processing URLs:  32%|███▏      | 317/1000 [13:56<15:01,  1.32s/it]

Error extracting text from https://www.reuters.com/world/ukraines-president-russia-stop-your-bombs-before-ceasefire-talks-can-start-2022-03-01/?fbclid=IwAR34NOcRPOuFnGgvLKNO1p4H802vZtgw3jXa7LfQjqYf7zXCq1u0e1bi-pk: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/ukraines-president-russia-stop-your-bombs-before-ceasefire-talks-can-start-2022-03-01/?fbclid=IwAR34NOcRPOuFnGgvLKNO1p4H802vZtgw3jXa7LfQjqYf7zXCq1u0e1bi-pk


Processing URLs:  32%|███▏      | 318/1000 [13:59<18:12,  1.60s/it]

Error extracting text from http://www.reuters.com.bbc.5:02: HTTPConnectionPool(host='www.reuters.com.bbc.5', port=2): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30748c8f0>: Failed to resolve 'www.reuters.com.bbc.5' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.youtube.com/watch?v=_W4eHVlPCMY


Processing URLs:  32%|███▏      | 321/1000 [14:00<11:20,  1.00s/it]

Error extracting text from http://yosemite.epa.gov/opa/admpress.nsf/a883dc3da7094f97852572a00065d7d8/dfc8e33b5ab162b985257ec40057813b!OpenDocument: HTTPSConnectionPool(host='yosemite.epa.gov', port=443): Max retries exceeded with url: /opa/admpress.nsf/a883dc3da7094f97852572a00065d7d8/dfc8e33b5ab162b985257ec40057813b!OpenDocument (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  32%|███▏      | 322/1000 [14:01<10:10,  1.11it/s]

Error extracting text from http://amti.csis.org/arbitration-support-tracker/: 403 Client Error: Forbidden for url: http://amti.csis.org/arbitration-support-tracker/


Processing URLs:  32%|███▎      | 325/1000 [14:05<12:57,  1.15s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/vietnam-moves-rocket/3029974.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/vietnam-moves-rocket/3029974.html


Processing URLs:  33%|███▎      | 327/1000 [14:09<18:52,  1.68s/it]

Error extracting text from http://news.az/articles/economy/102584: 404 Client Error: Not Found for url: https://news.az/articles/economy/102584


Processing URLs:  33%|███▎      | 334/1000 [14:22<17:30,  1.58s/it]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol53no1/the-cia-and-the-culture-of-failure-u.s..html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol53no1/the-cia-and-the-culture-of-failure-u.s..html


Processing URLs:  35%|███▍      | 347/1000 [14:45<10:21,  1.05it/s]

Error extracting text from http://www.truthdig.com/report/item/islamic_state_sends_families_out_mosul_as_kurdish_shiite_forces_20160830: 403 Client Error: Forbidden for url: http://www.truthdig.com/report/item/islamic_state_sends_families_out_mosul_as_kurdish_shiite_forces_20160830


Processing URLs:  35%|███▍      | 349/1000 [14:48<13:41,  1.26s/it]

Error extracting text from http://brussels.cta.int/index.php?option=com_k2&amp;id=12615:troubles-in-burundi-the-eu-is-preparing-to-suspend-its-aid&amp;view=item&amp;Itemid=54: 403 Client Error: Forbidden for url: http://brussels.cta.int/index.php?option=com_k2&amp;id=12615:troubles-in-burundi-the-eu-is-preparing-to-suspend-its-aid&amp;view=item&amp;Itemid=54


Processing URLs:  35%|███▌      | 353/1000 [14:52<13:36,  1.26s/it]

Error extracting text from http://worldmaritimenews.com/archives/179152/shipping-confidence-continues-downward-trend/: HTTPConnectionPool(host='worldmaritimenews.com', port=80): Max retries exceeded with url: /archives/179152/shipping-confidence-continues-downward-trend/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d95e0>: Failed to resolve 'worldmaritimenews.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.youtube.com/watch?v=aTT-Jmi1nOc
URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/consumer-comfort-at-one-year-low-as-u-s-buying-climate-dims
URL filtered: http://www.rt.com/news/311758-russia-embargo-one-year/#.Veh8PE66o9E.twitter


Processing URLs:  36%|███▋      | 364/1000 [14:58<07:20,  1.44it/s]

Error extracting text from https://larswericson.wordpress.com/2016/01/02/book-length-comment-thread-preserved-for-posterity/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/01/02/book-length-comment-thread-preserved-for-posterity/


Processing URLs:  37%|███▋      | 369/1000 [15:09<17:34,  1.67s/it]

Error extracting text from http://in.reuters.com/article/2015/11/09/us-eurozone-greece-idINKCN0SY1D420151109: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  37%|███▋      | 370/1000 [15:10<14:02,  1.34s/it]

Error extracting text from http://thehill.com/policy/national-security/351639-report-manafort-offered-private-briefings-to-russian-billionaire: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/351639-report-manafort-offered-private-briefings-to-russian-billionaire/


Processing URLs:  37%|███▋      | 371/1000 [16:10<3:17:16, 18.82s/it]

Error extracting text from http://www.post-gazette.com/opinion/editorials/2016/05/28/Another-NATO-newbie-It-s-a-long-way-to-Montenegro/stories/201605260033: HTTPConnectionPool(host='www.post-gazette.com', port=80): Max retries exceeded with url: /opinion/editorials/2016/05/28/Another-NATO-newbie-It-s-a-long-way-to-Montenegro/stories/201605260033 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3051dbcb0>, 'Connection to www.post-gazette.com timed out. (connect timeout=60)'))


Processing URLs:  37%|███▋      | 372/1000 [16:10<2:19:47, 13.36s/it]

Error extracting text from http://www.espn.com/mlb/player/_/id/28963/clayton-kershaw: 403 Client Error: Forbidden for url: http://www.espn.com/mlb/player/_/id/28963/clayton-kershaw


Processing URLs:  37%|███▋      | 374/1000 [16:28<2:02:10, 11.71s/it]

Error extracting text from https://www.almasdarnews.com/article/complete-battlefield-map-of-syria-and-implemented-ceasefire-zones/: 522 Server Error:  for url: https://www.almasdarnews.com/article/complete-battlefield-map-of-syria-and-implemented-ceasefire-zones/


Processing URLs:  38%|███▊      | 375/1000 [16:29<1:27:25,  8.39s/it]

Error extracting text from https://nationalinterest.org/blog/reboot/russias-new-rs-28-sarmat-icbm-will-enter-combat-duty-next-year-184818: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/reboot/russias-new-rs-28-sarmat-icbm-will-enter-combat-duty-next-year-184818


Processing URLs:  38%|███▊      | 376/1000 [16:29<1:01:57,  5.96s/it]

Error extracting text from https://www.nytimes.com/2017/10/15/world/asia/north-korea-hacking-cyber-sony.html?action=click&amp;contentCollection=world&amp;region=rank&amp;module=package&amp;version=highlights&amp;con: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/15/world/asia/north-korea-hacking-cyber-sony.html?action=click&amp;contentCollection=world&amp;region=rank&amp;module=package&amp;version=highlights&amp;con


Processing URLs:  38%|███▊      | 377/1000 [16:29<44:53,  4.32s/it]  

Error extracting text from http://www.riverkeeper.org/campaigns/stop-polluters/indian-point/radioactive-waste/: 403 Client Error: Forbidden for url: http://www.riverkeeper.org/campaigns/stop-polluters/indian-point/radioactive-waste/


Processing URLs:  38%|███▊      | 378/1000 [16:30<33:09,  3.20s/it]

Error extracting text from http://www.pravdareport.com/news/world/asia/syria/20-10-2016/135935-nato_russia_warships-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/news/world/asia/syria/20-10-2016/135935-nato_russia_warships-0/


Processing URLs:  38%|███▊      | 379/1000 [16:32<28:25,  2.75s/it]

Error extracting text from http://www.ibtimes.co.uk/chinese-patrol-vessels-including-armed-ship-enters-japanese-waters-1534977: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/chinese-patrol-vessels-including-armed-ship-enters-japanese-waters-1534977


Processing URLs:  38%|███▊      | 380/1000 [16:33<23:29,  2.27s/it]

Error extracting text from http://www.ibtimes.co.uk/clinton-us-should-use-military-response-fight-cyberattacks-russia-china-1579187: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/clinton-us-should-use-military-response-fight-cyberattacks-russia-china-1579187


Processing URLs:  38%|███▊      | 381/1000 [16:33<17:08,  1.66s/it]

Error extracting text from http://news.yahoo.com/un-nuclear-watchdog-report-iran-probe-next-week-145921569.html: 404 Client Error: Not Found for url: http://news.yahoo.com/un-nuclear-watchdog-report-iran-probe-next-week-145921569.html


Processing URLs:  38%|███▊      | 383/1000 [16:35<13:39,  1.33s/it]

Error extracting text from https://docs.google.com/spreadsheets/d/1AS9OrUDS7-8Wl76E96XKTxX3470zyrtIEJZr1bLpeks/edit?usp=sharing: 410 Client Error: Gone for url: https://docs.google.com/spreadsheets/d/1AS9OrUDS7-8Wl76E96XKTxX3470zyrtIEJZr1bLpeks/edit?usp=sharing


Processing URLs:  38%|███▊      | 384/1000 [16:55<1:11:24,  6.95s/it]

URL filtered: https://www.youtube.com/watch?v=M0ELAa02TUY


Processing URLs:  39%|███▊      | 387/1000 [16:59<37:51,  3.71s/it]  

Error extracting text from http://www.isn.ethz.ch/Digital-Library/Articles/Detail/?lng=en&amp;id=195974: 404 Client Error: Not found UA for url: https://css.ethz.ch/en/services.html


Processing URLs:  39%|███▉      | 389/1000 [17:00<21:50,  2.15s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/09/13/business/bus-subsidies-fraud-shakes-chinas-electric-vehicle-industry/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/09/13/business/bus-subsidies-fraud-shakes-chinas-electric-vehicle-industry/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-talafar-idUSKCN12S0PM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-talafar-idUSKCN12S0PM


Processing URLs:  39%|███▉      | 393/1000 [17:01<07:42,  1.31it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/germanys-conservatives-spd-start-talks-jan-7-on-another-grand-coalition-idUSKBN1EE27K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/germanys-conservatives-spd-start-talks-jan-7-on-another-grand-coalition-idUSKBN1EE27K


Processing URLs:  40%|███▉      | 395/1000 [17:02<05:02,  2.00it/s]

Error extracting text from http://www.wsj.com/articles/house-passes-highway-bill-with-export-import-bank-renewal-1446740235?alg=y&amp;mg=id-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-passes-highway-bill-with-export-import-bank-renewal-1446740235?alg=y&amp;mg=id-wsj
Error extracting text from http://www.reuters.com/article/us-iran-europe-rouhani-renzi-idUSKCN0V415T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-europe-rouhani-renzi-idUSKCN0V415T


Processing URLs:  40%|███▉      | 396/1000 [17:02<03:59,  2.52it/s]

Error extracting text from http://www.reuters.com/article/us-usa-russia-missiles-idUSKBN16F23V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-missiles-idUSKBN16F23V


Processing URLs:  40%|███▉      | 398/1000 [17:05<08:01,  1.25it/s]

Error extracting text from https://www.naij.com/983666-fulani-herdsmen-go-rampage-burn-kill-locals-kaduna.html: 410 Client Error: Gone for url: https://www.legit.ng/983666-fulani-herdsmen-go-rampage-burn-kill-locals-kaduna.html
Error extracting text from http://news.usni.org/2016/02/26/document-2016-u-s-european-command-posture-statement: 403 Client Error: Forbidden for url: http://news.usni.org/2016/02/26/document-2016-u-s-european-command-posture-statement


Processing URLs:  40%|████      | 401/1000 [17:06<05:08,  1.94it/s]

Error extracting text from http://internal.beta.mobile.reutersmedia.net/article/idUSKCN0WW1AS: HTTPConnectionPool(host='internal.beta.mobile.reutersmedia.net', port=80): Max retries exceeded with url: /article/idUSKCN0WW1AS (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307572570>: Failed to resolve 'internal.beta.mobile.reutersmedia.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-germany-politics/few-cheers-at-home-for-germanys-loveless-coalition-idUSKBN1FS103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/few-cheers-at-home-for-germanys-loveless-coalition-idUSKBN1FS103


Processing URLs:  40%|████      | 402/1000 [17:09<11:27,  1.15s/it]

Error extracting text from https://www.rand.org/pubs/research_reports/RR1140.html?adbsc=social_20160730_939261&amp;adbid=UPDATE-c165654-6165234354294386688&amp;adbpl=li&amp;adbpr=165654: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR1140.html?adbsc=social_20160730_939261&amp;adbid=UPDATE-c165654-6165234354294386688&amp;adbpl=li&amp;adbpr=165654


Processing URLs:  41%|████      | 407/1000 [17:23<28:01,  2.84s/it]

Error extracting text from http://blogs.wsj.com/economics/2016/09/29/soybeans-are-fueling-u-s-economic-growth-but-not-for-long/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/economics/2016/09/29/soybeans-are-fueling-u-s-economic-growth-but-not-for-long/


Processing URLs:  41%|████      | 408/1000 [17:25<24:31,  2.49s/it]

Error extracting text from https://tass.com/search?searchStr=Navalny&amp;sort=date: 502 Server Error: Bad Gateway for url: https://tass.com/search?searchStr=Navalny&amp;sort=date


Processing URLs:  41%|████      | 411/1000 [17:27<12:55,  1.32s/it]

Error extracting text from https://www.cfr.org/blog/top-conflictswatch-2021-economic-political-and-humanitarian-catastrophe-venezuela: 404 Client Error: Not Found for url: https://www.cfr.org/blog/top-conflicts%02watch-2021-economic-political-and-humanitarian-catastrophe-venezuela
Error extracting text from http://www.thegazzete.com: HTTPConnectionPool(host='www.thegazzete.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe0b6150>: Failed to resolve 'www.thegazzete.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://diariodonordeste.verdesmares.com.br/cadernos/policia/online/policia-federal-cumpre-mandados-no-ceara-em-investigacao-de-desvio-de-recursos-publicos-1.1511167&amp;usg=ALkJrhiF328oc_Q-Azgyay8ks7tQvUX8mA: 404 Client Er

Processing URLs:  41%|████▏     | 413/1000 [17:27<07:44,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-security-usa-saudiarabia-idUSKCN10U1TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-usa-saudiarabia-idUSKCN10U1TL
Error extracting text from https://www.reuters.com/article/uk-column-russell-lng-asia-idUSKBN2FP0JE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-column-russell-lng-asia-idUSKBN2FP0JE


Processing URLs:  41%|████▏     | 414/1000 [17:27<05:59,  1.63it/s]

Error extracting text from http://www.reuters.com/article/2015/11/24/us-iran-nuclear-idUSKBN0TD23R20151124#wb2pPjcMTjmDUtm5.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/24/us-iran-nuclear-idUSKBN0TD23R20151124#wb2pPjcMTjmDUtm5.97


Processing URLs:  42%|████▏     | 417/1000 [17:33<14:18,  1.47s/it]

Error extracting text from http://www.dailystar.com.lb/Opinion/Columnist/2015/Jul-09/305789-when-israel-gave-bashar-assad-a-lifeline.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/Opinion/Columnist/2015/Jul-09/305789-when-israel-gave-bashar-assad-a-lifeline.ashx


Processing URLs:  42%|████▏     | 419/1000 [18:33<2:07:15, 13.14s/it]

Error extracting text from https://money.usnews.com/investing/news/articles/2017-11-08/us-warns-bondholders-that-negotiating-with-venezuela-may-be-illegal: HTTPSConnectionPool(host='money.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from https://www.reuters.com/article/us-usa-election-poll/half-of-republicans-say-biden-won-because-of-a-rigged-election-reuters-ipsos-poll-idUSKBN27Y1AJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-poll/half-of-republicans-say-biden-won-because-of-a-rigged-election-reuters-ipsos-poll-idUSKBN27Y1AJ


Processing URLs:  42%|████▏     | 420/1000 [18:34<1:31:22,  9.45s/it]

Error extracting text from http://www.amazon.com/House-Bush-Saud-Relationship-Dynasties/dp/0743253396: 500 Server Error: Internal Server Error for url: https://www.amazon.com/House-Bush-Saud-Relationship-Dynasties/dp/0743253396


Processing URLs:  43%|████▎     | 426/1000 [19:03<58:38,  6.13s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2017-06-30/controversial-pdvsa-bonds-said-to-make-their-way-onto-the-market


Processing URLs:  43%|████▎     | 430/1000 [19:08<25:03,  2.64s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-are-eight-stages-genocide-applicable-nations-spiralling-violence-1552071: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-are-eight-stages-genocide-applicable-nations-spiralling-violence-1552071


Processing URLs:  43%|████▎     | 434/1000 [19:11<11:50,  1.26s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_49736.htm#: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49736.htm


Processing URLs:  44%|████▎     | 436/1000 [19:11<07:20,  1.28it/s]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html),: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html),
Error extracting text from https://news.usni.org/2020/12/22/report-on-u-s-china-competition-in-east-south-china-sea-4: 403 Client Error: Forbidden for url: https://news.usni.org/2020/12/22/report-on-u-s-china-competition-in-east-south-china-sea-4


Processing URLs:  44%|████▎     | 437/1000 [19:11<05:26,  1.72it/s]

Error extracting text from http://www.reuters.com/article/2015/11/24/us-gdp-usa-idUSKBN0TD1M520151124: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/24/us-gdp-usa-idUSKBN0TD1M520151124


Processing URLs:  44%|████▍     | 438/1000 [19:13<07:35,  1.23it/s]

Error extracting text from http://www.theblaze.com/stories/2015/10/05/for-the-record-chinas-cyber-warfare-unit-targeting-american-power-grid/: 404 Client Error: Not Found for url: https://www.theblaze.com/stories/2015/10/05/for-the-record-chinas-cyber-warfare-unit-targeting-american-power-grid/


Processing URLs:  44%|████▍     | 441/1000 [19:16<07:55,  1.18it/s]

URL filtered: https://www.cnbc.com/2021/05/12/facebook-backed-diem-is-moving-from-switzerland-to-the-us.html


Processing URLs:  45%|████▍     | 446/1000 [19:20<07:27,  1.24it/s]

Error extracting text from http://www.realclearpolitics.com/video/2017/02/08/gabbard_any_hope_for_peace_in_syria_means_engaging_with_assad.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/video/2017/02/08/gabbard_any_hope_for_peace_in_syria_means_engaging_with_assad.html
Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0UV12B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0UV12B


Processing URLs:  45%|████▍     | 448/1000 [20:21<2:06:25, 13.74s/it]

Error extracting text from https://www.powernext.com/spot-market-data: HTTPSConnectionPool(host='www.powernext.com', port=443): Max retries exceeded with url: /spot-market-data (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x307571b80>, 'Connection to www.powernext.com timed out. (connect timeout=60)'))


Processing URLs:  45%|████▌     | 454/1000 [20:27<26:03,  2.86s/it]  

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-idUSKBN2BB0K0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKBN2BB0K0


Processing URLs:  46%|████▌     | 456/1000 [20:30<18:46,  2.07s/it]

Error extracting text from http://goodjudgment.com/superforecasting/index.php/2017/08/02/its-official-superforecasters-have-superior-judgment/: 403 Client Error: Forbidden for url: http://goodjudgment.com/superforecasting/index.php/2017/08/02/its-official-superforecasters-have-superior-judgment/


Processing URLs:  46%|████▌     | 457/1000 [20:30<13:47,  1.52s/it]

Error extracting text from https://www.wsj.com/articles/for-aramco-insiders-princes-2-trillion-ipo-valuation-doesnt-add-up-1493064170: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-aramco-insiders-princes-2-trillion-ipo-valuation-doesnt-add-up-1493064170


Processing URLs:  46%|████▌     | 459/1000 [20:34<16:18,  1.81s/it]

Error extracting text from http://www.boxofficemojo.com/weekly/chart/?yr=2015&amp;wk=46&amp;p=.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/weekly/chart/?yr=2015&amp;wk=46&amp;p=.htm


Processing URLs:  46%|████▌     | 460/1000 [20:36<16:27,  1.83s/it]

Error extracting text from http://www.ibtimes.com/what-natos-article-4-why-turkey-called-consultations-under-rarely-used-provision-2026060: 403 Client Error: Forbidden for url: https://www.ibtimes.com/what-natos-article-4-why-turkey-called-consultations-under-rarely-used-provision-2026060


Processing URLs:  46%|████▋     | 464/1000 [20:43<16:11,  1.81s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&amp;s=MGFUPUS1&amp;f=M: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  46%|████▋     | 465/1000 [20:44<12:17,  1.38s/it]

Error extracting text from http://www.rand.org/pubs/research_reports/RR423.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR423.html


Processing URLs:  47%|████▋     | 469/1000 [20:50<11:48,  1.33s/it]

Error extracting text from https://www.nytimes.com/2017/06/04/business/vnesheconombank-veb-bank-russia-trump-kushner.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/04/business/vnesheconombank-veb-bank-russia-trump-kushner.html?_r=0


Processing URLs:  48%|████▊     | 475/1000 [21:00<13:13,  1.51s/it]

Error extracting text from http://afghanistantimes.af/govt-blamed-for-electoral-reforms-delay/: 403 Client Error: Forbidden for url: https://afghanistantimes.af/govt-blamed-for-electoral-reforms-delay/


Processing URLs:  48%|████▊     | 477/1000 [21:03<12:55,  1.48s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57051#.WVFX7FJjL-Y: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57051#.WVFX7FJjL-Y
Error extracting text from http://support.Shell: HTTPConnectionPool(host='support.shell', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307572f60>: Failed to resolve 'support.shell' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 479/1000 [21:04<07:54,  1.10it/s]

Error extracting text from http://www.wsj.com/articles/fbi-won-t-recommend-clinton-be-indicted-over-private-email-use-1467731774: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fbi-won-t-recommend-clinton-be-indicted-over-private-email-use-1467731774


Processing URLs:  48%|████▊     | 480/1000 [21:04<06:26,  1.35it/s]

Error extracting text from https://www.nytimes.com/2021/08/12/us/marines-evacuation-afghanistan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/12/us/marines-evacuation-afghanistan.html


Processing URLs:  48%|████▊     | 481/1000 [21:07<11:42,  1.35s/it]

URL filtered: https://www.youtube.com/watch?v=AqZceAQSJvc


Processing URLs:  48%|████▊     | 485/1000 [21:12<11:37,  1.36s/it]

URL filtered: https://www.youtube.com/watch?v=yWHB8PKOzOw


Processing URLs:  49%|████▉     | 490/1000 [21:17<10:09,  1.20s/it]

Error extracting text from http://flurrymobile.tumblr.com/post/136133865650/pha-la-la-a-real-phabulous-holiday: 404 Client Error: Not Found for url: https://www.flurry.com/blog/post/136133865650/pha-la-la-a-real-phabulous-holiday/


Processing URLs:  50%|████▉     | 496/1000 [21:30<14:12,  1.69s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/07/14/France-to-redeploy-aircraft-carrier-in-Mosul-fight.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/07/14/France-to-redeploy-aircraft-carrier-in-Mosul-fight.html


Processing URLs:  50%|████▉     | 497/1000 [21:31<12:52,  1.53s/it]

Error extracting text from http://www.crisis.acleddata.com/dissent-and-protest-in-zimbabwe-mass-mobilisation-in-the-face-of-economic-and-political-crisis/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /dissent-and-protest-in-zimbabwe-mass-mobilisation-in-the-face-of-economic-and-political-crisis/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307571430>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|█████     | 503/1000 [21:35<06:36,  1.25it/s]

Error extracting text from https://www.wsj.com/articles/fbi-probe-could-hurt-donald-trumps-clout-in-congress-1490053226: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fbi-probe-could-hurt-donald-trumps-clout-in-congress-1490053226


Processing URLs:  51%|█████     | 507/1000 [21:43<13:51,  1.69s/it]

Error extracting text from http://www.icc-cricket.com/team-rankings/test: 404 Client Error: Not Found for url: https://www.icc-cricket.com/team-rankings/test


Processing URLs:  51%|█████     | 509/1000 [22:38<2:20:05, 17.12s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-07/apple-falls-third-day-as-iphone-woe-cuts-40-billion-in-value


Processing URLs:  51%|█████     | 511/1000 [22:39<1:15:52,  9.31s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN19A029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN19A029


Processing URLs:  51%|█████     | 512/1000 [22:41<1:00:43,  7.47s/it]

Error extracting text from https://www.shrm.org/resourcesandtools/legal-and-compliance/employment-law/pages/nlrb-rule-more-time-before-union-elections.aspx: 404 Client Error: Not Found for url: https://www.shrm.org/topics-tools/legal-compliance/nlrb-rule-gives-employers-time-union-elections


Processing URLs:  51%|█████▏    | 513/1000 [22:41<45:25,  5.60s/it]  

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-capitalcontrols-idUSKCN0Z40BR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-capitalcontrols-idUSKCN0Z40BR


Processing URLs:  52%|█████▏    | 517/1000 [22:46<19:03,  2.37s/it]

Error extracting text from https://www.ipsos.com/en-ca/news-polls/one-in-eight-canadians-is-completely-undecided-on-how-to-vote: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-ca/news-polls/one-in-eight-canadians-is-completely-undecided-on-how-to-vote
URL filtered: http://www.bloomberg.com/news/articles/2016-08-09/effort-to-oust-venezuela-s-maduro-derailed-as-recall-hindered


Processing URLs:  52%|█████▏    | 519/1000 [22:49<15:16,  1.91s/it]

Error extracting text from http://www.nwitimes.com/business/ap-explains-how-to-transform-gop-health-care-plan-into/article_4a2404ca-ff5d-5e00-bfa2-7424eb2d5ce9.html: 404 Client Error: Not Found for url: https://www.nwitimes.com/business/ap-explains-how-to-transform-gop-health-care-plan-into/article_4a2404ca-ff5d-5e00-bfa2-7424eb2d5ce9.html


Processing URLs:  52%|█████▏    | 520/1000 [22:51<15:01,  1.88s/it]

Error extracting text from https://uk.reuters.com/article/uk-britain-eu-may-progress/brexit-talks-progressing-but-issues-remain-may-idUKKBN1DO268: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  52%|█████▏    | 521/1000 [22:52<13:02,  1.63s/it]

Error extracting text from https://www.newsweek.com/trump-ready-putins-invasion-belarus-russian-forces-are-gathering-664225: 403 Client Error: Forbidden for url: https://www.newsweek.com/trump-ready-putins-invasion-belarus-russian-forces-are-gathering-664225
URL filtered: https://www.bloomberg.com/news/articles/2019-04-29/investors-focus-on-spanish-government-coalitions-street-wrap


Processing URLs:  52%|█████▏    | 523/1000 [22:52<08:15,  1.04s/it]

Error extracting text from http://www.theridgeschool.org/: 404 Client Error: Not Found for url: http://www.theridgeschool.org/
URL filtered: http://www.bloomberg.com/news/articles/2016-03-01/germany-said-flexible-on-greece-s-pensions-amid-refugee-crisis


Processing URLs:  53%|█████▎    | 526/1000 [22:54<06:38,  1.19it/s]

Error extracting text from http://www.oecd.org/economic-outlook/june-2020/: 403 Client Error: Forbidden for url: https://www.oecd.org/economic-outlook/june-2020/


Processing URLs:  53%|█████▎    | 531/1000 [23:01<08:18,  1.06s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S1877042814066579: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S1877042814066579


Processing URLs:  54%|█████▎    | 537/1000 [23:08<07:20,  1.05it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-peshmerga-idUSKCN0XU0NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-peshmerga-idUSKCN0XU0NE


Processing URLs:  54%|█████▍    | 543/1000 [23:14<05:07,  1.49it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdoantero.com.br/politica/petistas-dizem-que-lula-aceitou-ser-ministro-de-dilma-rousseff/33859&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdoantero.com.br/politica/petistas-dizem-que-lula-aceitou-ser-ministro-de-dilma-rousseff/33859&amp;prev=search
Error extracting text from http://WWW.ASEM11.MN: HTTPConnectionPool(host='www.asem11.mn', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301307680>: Failed to resolve 'www.asem11.mn' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▍    | 547/1000 [23:22<10:59,  1.46s/it]

Error extracting text from http://darksky.org/our-work/public-policy/: 403 Client Error: Forbidden for url: http://darksky.org/our-work/public-policy/


Processing URLs:  55%|█████▍    | 548/1000 [23:22<08:09,  1.08s/it]

Error extracting text from http://licensing.fcc.gov/cgi-bin/ws.exe/prod/ib/forms/reports/swr031b.hts?q_set=V_SITE_ANTENNA_FREQ.file_numberC/File+Number/%3D/SATLOA2016111500118&prepare=&column=V_SITE_ANTENNA_FREQ.file_numberC/File+Number: 403 Client Error: Forbidden for url: http://licensing.fcc.gov/cgi-bin/ws.exe/prod/ib/forms/reports/swr031b.hts?q_set=V_SITE_ANTENNA_FREQ.file_numberC/File+Number/%3D/SATLOA2016111500118&prepare=&column=V_SITE_ANTENNA_FREQ.file_numberC/File+Number
URL filtered: https://twitter.com/NYCMayor/status/1384871684147122179


Processing URLs:  55%|█████▌    | 551/1000 [23:24<05:41,  1.31it/s]

Error extracting text from https://www.macrotrends.net/stocks/charts/AAPL/apple/market-cap: 403 Client Error: Forbidden for url: https://www.macrotrends.net/stocks/charts/AAPL/apple/market-cap


Processing URLs:  56%|█████▌    | 557/1000 [23:31<08:18,  1.13s/it]

Error extracting text from http://services.parliament.uk/bills/2015-16/europeanunionreferendum.html: 403 Client Error: Forbidden for url: https://services.parliament.uk/bills/2015-16/europeanunionreferendum.html


Processing URLs:  56%|█████▌    | 561/1000 [23:34<04:49,  1.52it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-trust-analysis-idUSKBN0TP06Y20151206#VrPE2Li8cu5MCPqX.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-trust-analysis-idUSKBN0TP06Y20151206#VrPE2Li8cu5MCPqX.97
Error extracting text from http://www.reuters.com/article/2015/11/06/us-southchinasea-usa-warship-idUSKCN0SV05420151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-southchinasea-usa-warship-idUSKCN0SV05420151106


Processing URLs:  56%|█████▌    | 562/1000 [23:34<03:37,  2.01it/s]

Error extracting text from http://www.timesofisrael.com/putin-orders-surprise-russian-military-pullout-from-syria/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/putin-orders-surprise-russian-military-pullout-from-syria/


Processing URLs:  56%|█████▋    | 563/1000 [24:34<2:13:25, 18.32s/it]

Error extracting text from https://archive.is/tqonU: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  57%|█████▋    | 570/1000 [24:55<24:12,  3.38s/it]  

URL filtered: https://thehill.com/policy/technology/556884-facebook-suspending-trump-until-at-least-2023


Processing URLs:  57%|█████▋    | 572/1000 [24:55<14:02,  1.97s/it]

Error extracting text from http://www.peruviantimes.com/14/mano-a-mano-between-kuczynski-and-fujimori/26240/: 406 Client Error: Not Acceptable for url: http://www.peruviantimes.com/14/mano-a-mano-between-kuczynski-and-fujimori/26240/


Processing URLs:  57%|█████▋    | 573/1000 [24:56<11:17,  1.59s/it]

Error extracting text from https://www.scotsman.com/news/politics/scottish-parliament-election-2021-when-will-each-holyrood-constituency-declare-a-result-and-when-will-we-know-whether-nicola-sturgeon-has-won-a-majority-3214902: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-parliament-election-2021-when-will-each-holyrood-constituency-declare-a-result-and-when-will-we-know-whether-nicola-sturgeon-has-won-a-majority-3214902


Processing URLs:  58%|█████▊    | 579/1000 [25:13<15:25,  2.20s/it]

Error extracting text from http://www.reuters.com/article/us-libya-security-sirte-idUSKBN13V15R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-libya-security-sirte-idUSKBN13V15R


Processing URLs:  58%|█████▊    | 583/1000 [25:17<10:18,  1.48s/it]

Error extracting text from http://en.trend.az/iran/politics/2496318.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2496318.html


Processing URLs:  58%|█████▊    | 584/1000 [25:18<08:36,  1.24s/it]

Error extracting text from http://aranews.net/2016/03/western-backed-forces-report-new-gains-eastern-syria-approaching-raqqa/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/western-backed-forces-report-new-gains-eastern-syria-approaching-raqqa/


Processing URLs:  59%|█████▊    | 587/1000 [25:22<07:51,  1.14s/it]

Error extracting text from http://mashable.com/2016/02/24/europe-refugee-crisis-growing/: 404 Client Error: Not Found for url: https://mashable.com/2016/02/24/europe-refugee-crisis-growing/


Processing URLs:  59%|█████▉    | 589/1000 [25:24<07:14,  1.06s/it]

Error extracting text from https://www.voanews.com/extremism-watch/afghan-forces-vow-no-break-fighting-during-winter: 404 Client Error: Not Found for url: https://www.voanews.com/extremism-watch/afghan-forces-vow-no-break-fighting-during-winter


Processing URLs:  59%|█████▉    | 593/1000 [25:37<14:25,  2.13s/it]

URL filtered: https://www.reuters.com/article/us-facebook-propaganda/facebook-says-some-russian-ads-during-u-s-election-promoted-live-events-idUSKCN1BN2VG


Processing URLs:  60%|█████▉    | 596/1000 [25:43<13:41,  2.03s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-september-23-2015: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-september-23-2015


Processing URLs:  60%|██████    | 602/1000 [25:50<06:33,  1.01it/s]

Error extracting text from https://www.nytimes.com/2021/01/15/us/politics/inauguration-security.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/15/us/politics/inauguration-security.html


Processing URLs:  60%|██████    | 603/1000 [25:50<05:11,  1.28it/s]

Error extracting text from https://www.nytimes.com/article/wisconsin-parade-attack-what-we-know.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/article/wisconsin-parade-attack-what-we-know.html


Processing URLs:  61%|██████    | 608/1000 [25:58<10:43,  1.64s/it]

Error extracting text from http://tass.com/politics/901459: 502 Server Error: Bad Gateway for url: https://tass.com/politics/901459


Processing URLs:  61%|██████    | 610/1000 [26:58<1:27:26, 13.45s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article167582502.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.nytimes.com/2016/11/02/opinion/can-turkeys-democracy-survive-president-erdogan.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/02/opinion/can-turkeys-democracy-survive-president-erdogan.html


Processing URLs:  61%|██████▏   | 613/1000 [26:59<36:29,  5.66s/it]  

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-idlib/syrian-army-and-iranian-backed-militias-push-towards-idlib-province-idUSKBN1E500R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idlib/syrian-army-and-iranian-backed-militias-push-towards-idlib-province-idUSKBN1E500R


Processing URLs:  62%|██████▏   | 616/1000 [27:04<19:40,  3.07s/it]

Error extracting text from http://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#KmbkJgCRQCuusCFM.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#KmbkJgCRQCuusCFM.97


Processing URLs:  62%|██████▏   | 618/1000 [27:05<11:22,  1.79s/it]

Error extracting text from http://www.nytimes.com/2016/11/04/world/middleeast/iraq-mosul-islamic-state.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/04/world/middleeast/iraq-mosul-islamic-state.html?_r=0


Processing URLs:  62%|██████▏   | 621/1000 [27:11<11:40,  1.85s/it]

Error extracting text from https://reut.rs/2PLamI1.However: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/


Processing URLs:  62%|██████▎   | 625/1000 [27:16<09:04,  1.45s/it]

Error extracting text from https://globalriskinsights.com/2021/01/ethiopia-what-next-for-abiy-ahmed/: 403 Client Error: Forbidden for url: https://globalriskinsights.com/2021/01/ethiopia-what-next-for-abiy-ahmed/
URL filtered: https://www.youtube.com/watch?v=TvVDVFf4okQ


Processing URLs:  63%|██████▎   | 630/1000 [27:21<07:13,  1.17s/it]

Error extracting text from http://www.ibtimes.co.uk/south-korean-presidential-favourite-ban-ki-moon-announce-political-plans-soon-1600840: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/south-korean-presidential-favourite-ban-ki-moon-announce-political-plans-soon-1600840


Processing URLs:  63%|██████▎   | 631/1000 [27:22<06:39,  1.08s/it]

Error extracting text from http://election.princeton.edu/history-of-meta-analysis/: HTTPSConnectionPool(host='election.princeton.eduhistory-of-meta-analysis', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x306bfd2e0>: Failed to resolve 'election.princeton.eduhistory-of-meta-analysis' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 632/1000 [27:23<05:39,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-clinton-idUSKCN0WB2GL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-clinton-idUSKCN0WB2GL


Processing URLs:  63%|██████▎   | 634/1000 [27:38<27:13,  4.46s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20150114/0905144510.html: 504 Server Error: Gateway Time-out for url: http://www.caam.org.cn/AutomotivesStatistics/20150114/0905144510.html


Processing URLs:  64%|██████▎   | 636/1000 [27:40<16:23,  2.70s/it]

Error extracting text from http://www.counterpunch.org/2015/08/14/is-putin-planning-to-sell-out-assad/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2015/08/14/is-putin-planning-to-sell-out-assad/


Processing URLs:  64%|██████▍   | 640/1000 [29:48<3:46:45, 37.79s/it]

Error extracting text from https://protonex.com/blog/what-do-soldiers-carry-and-whats-its-weight/: HTTPSConnectionPool(host='protonex.com', port=443): Max retries exceeded with url: /blog/what-do-soldiers-carry-and-whats-its-weight/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x306e0e3f0>, 'Connection to protonex.com timed out. (connect timeout=60)'))


Processing URLs:  64%|██████▍   | 641/1000 [29:49<2:40:11, 26.77s/it]

Error extracting text from http://www.econtalk.org/archives/2017/02/gary_taubes_on.html: 403 Client Error: Forbidden for url: http://www.econtalk.org/archives/2017/02/gary_taubes_on.html
URL filtered: http://www.youtube.com/watch?v=DJ6Dotht6pM&amp;t=0m31s


Processing URLs:  64%|██████▍   | 644/1000 [29:52<1:08:01, 11.46s/it]

Error extracting text from http://www.caam.org.cn/hangye/20161229/1605203369.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20161229/1605203369.html


Processing URLs:  65%|██████▍   | 648/1000 [29:56<22:45,  3.88s/it]  

URL filtered: http://www.foxbusiness.com/features/2016/12/18/german-courts-could-go-after-fake-news-on-facebook.html
Error extracting text from http://www.reuters.com/article/us-usa-trump-putin-idUSKBN15S0YH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-putin-idUSKBN15S0YH?il=0


Processing URLs:  65%|██████▍   | 649/1000 [29:56<17:34,  3.00s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/12/11/Assad-says-won-t-attend-negotiations-majority-of-Syrians-support-him-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/12/11/Assad-says-won-t-attend-negotiations-majority-of-Syrians-support-him-.html


Processing URLs:  65%|██████▌   | 652/1000 [29:59<09:56,  1.71s/it]

Error extracting text from https://www.nytimes.com/2017/07/18/world/middleeast/trump-iran-sanctions-nuclear.html?action=click&amp;contentCollection=Middle%20East&amp;module=RelatedCoverage&amp;region=Marginalia&amp;pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/18/world/middleeast/trump-iran-sanctions-nuclear.html?action=click&amp;contentCollection=Middle%20East&amp;module=RelatedCoverage&amp;region=Marginalia&amp;pgtype=article


Processing URLs:  65%|██████▌   | 653/1000 [30:00<09:11,  1.59s/it]

Error extracting text from https://thinkprogress.org/republicans-were-fine-with-bush-acting-on-immigration-reform-without-congress-ae2ae00d2130#.4lg27oczz: 403 Client Error: Forbidden for url: https://thinkprogress.org/republicans-were-fine-with-bush-acting-on-immigration-reform-without-congress-ae2ae00d2130#.4lg27oczz


Processing URLs:  66%|██████▌   | 656/1000 [30:03<06:51,  1.20s/it]

Error extracting text from https://www.defenseindustrydaily.com/thaad-reach-out-and-touch-ballistic-missiles-updated-02924/: 403 Client Error: Forbidden for url: https://www.defenseindustrydaily.com/thaad-reach-out-and-touch-ballistic-missiles-updated-02924/


Processing URLs:  66%|██████▌   | 657/1000 [30:05<07:20,  1.29s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-10-19/the-candidates-with-the-most-ads-are-not-doing-well-in-the-polls


Processing URLs:  66%|██████▌   | 661/1000 [30:08<04:49,  1.17it/s]

Error extracting text from https://www.nytimes.com/2021/05/22/world/middleeast/netanyahu-israel-gaza.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/22/world/middleeast/netanyahu-israel-gaza.html


Processing URLs:  66%|██████▌   | 662/1000 [30:10<06:47,  1.21s/it]

Error extracting text from http://www.thenational.ae/business/energy/opec-ministers-may-discuss-output-quota-levels-at-upcoming-meeting-says-uae-minister-of-energy: 404 Client Error: Not Found for url: https://www.thenationalnews.com/business/energy/opec-ministers-may-discuss-output-quota-levels-at-upcoming-meeting-says-uae-minister-of-energy/


Processing URLs:  66%|██████▋   | 665/1000 [30:15<07:08,  1.28s/it]

Error extracting text from http://thehill.com/opinion/mark-mellman/258307-mark-mellman-clintons-primary-advantage: 403 Client Error: Forbidden for url: https://thehill.com/opinion/mark-mellman/258307-mark-mellman-clintons-primary-advantage/


Processing URLs:  67%|██████▋   | 668/1000 [30:17<05:11,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/u-k-vote-on-europe-poses-dilemma-for-scotland-1464300803: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-vote-on-europe-poses-dilemma-for-scotland-1464300803


Processing URLs:  67%|██████▋   | 669/1000 [30:18<04:27,  1.24it/s]

Error extracting text from https://www.reuters.com/world/europe/russia-says-first-phase-ukraine-operation-mostly-complete-focus-now-donbass-2022-03-25/.: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russia-says-first-phase-ukraine-operation-mostly-complete-focus-now-donbass-2022-03-25/


Processing URLs:  67%|██████▋   | 672/1000 [30:20<03:59,  1.37it/s]

Error extracting text from http://www.dailystar.com.lb/News/World/2016/Mar-26/344222-burundi-rebel-group-says-behind-killing-of-senior-army-officer.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/World/2016/Mar-26/344222-burundi-rebel-group-says-behind-killing-of-senior-army-officer.ashx


Processing URLs:  67%|██████▋   | 674/1000 [30:22<04:15,  1.28it/s]

Error extracting text from http://thehill.com/homenews/news/335864-trump-admin-weighing-return-of-seized-russian-compounds-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/335864-trump-admin-weighing-return-of-seized-russian-compounds-report/


Processing URLs:  68%|██████▊   | 676/1000 [30:25<06:42,  1.24s/it]

Error extracting text from https://iemweb.biz.uiowa.edu/graphs/graph_RConv08.cfm: 404 Client Error: Not Found for url: https://iemweb.biz.uiowa.edu/graphs/graph_RConv08.cfm


Processing URLs:  68%|██████▊   | 678/1000 [30:27<06:22,  1.19s/it]

URL filtered: https://www.nytimes.com/2016/11/28/technology/facebook-germany-hate-speech-fake-news.html?pagewanted=all&amp;_r=0


Processing URLs:  68%|██████▊   | 680/1000 [30:27<04:03,  1.32it/s]

Error extracting text from http://www.businessinsider.com.au/media-society-event-polling-eu-referendum-brexit-statistics-2016-5: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/media-society-event-polling-eu-referendum-brexit-statistics-2016-5


Processing URLs:  68%|██████▊   | 682/1000 [30:33<08:40,  1.64s/it]

Error extracting text from https://michiganpeninsulanews.com/featured/1330-michigan-house-committee-passes-autonomous-vehicles-bills-following-new-federal-policy/: 404 Client Error: Not Found for url: https://lienmultimedia.com/bandarqq/featured/1330-michigan-house-committee-passes-autonomous-vehicles-bills-following-new-federal-policy/


Processing URLs:  68%|██████▊   | 683/1000 [30:34<06:43,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-iran-usa-sanctions-idUSKBN1762OR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-sanctions-idUSKBN1762OR


Processing URLs:  68%|██████▊   | 684/1000 [30:34<06:02,  1.15s/it]

Error extracting text from https://www.humboldtforum.org/de/jobs/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/de/jobs/


Processing URLs:  68%|██████▊   | 685/1000 [30:35<04:45,  1.10it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/20/world/iraq-tribesmen-battle-is-inside-fallujah-for-second-day/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/02/20/world/iraq-tribesmen-battle-is-inside-fallujah-for-second-day/


Processing URLs:  69%|██████▉   | 689/1000 [30:46<09:55,  1.92s/it]

Error extracting text from https://www.lawfareblog.com/lawfare-podcast-trouble-ukraine-and-kazakhstan).: 500 Server Error: Internal Server Error for url: https://www.lawfaremedia.org/lawfare-podcast-trouble-ukraine-and-kazakhstan).


Processing URLs:  69%|██████▉   | 691/1000 [30:50<10:21,  2.01s/it]

Error extracting text from http://www.trust.org/item/20151111193923-7gpxk: 404 Client Error:  for url: https://www.trust.org:443/item/20151111193923-7gpxk


Processing URLs:  69%|██████▉   | 693/1000 [30:50<05:51,  1.15s/it]

Error extracting text from https://news.usni.org/2021/05/26/russian-navy-surveillance-ship-quietly-operating-off-hawaii: 403 Client Error: Forbidden for url: https://news.usni.org/2021/05/26/russian-navy-surveillance-ship-quietly-operating-off-hawaii


Processing URLs:  69%|██████▉   | 694/1000 [30:53<07:58,  1.56s/it]

Error extracting text from http://www.bcv.org.ve/excel/2_1_1.xls?id=28: HTTPSConnectionPool(host='www.bcv.org.ve', port=443): Max retries exceeded with url: /excel/2_1_1.xls?id=28 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  70%|██████▉   | 697/1000 [30:55<04:34,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-pipeline-idUSKBN15820N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-pipeline-idUSKBN15820N


Processing URLs:  70%|███████   | 700/1000 [31:02<10:11,  2.04s/it]

Error extracting text from https://so.usmission.gov/security-message-u-s-citizens-specific-threat-information-u-s-citizens-u-s-mission-somalia-november-4-2017/: 404 Client Error: Not Found for url: https://so.usembassy.gov/security-message-u-s-citizens-specific-threat-information-u-s-citizens-u-s-mission-somalia-november-4-2017/


Processing URLs:  70%|███████   | 701/1000 [31:03<08:08,  1.63s/it]

Error extracting text from http://www.huffingtonpost.ca/navid-hassibi/re-engagement-with-iran_b_9169446.html: 502 Server Error: Bad Gateway for url: https://www.huffingtonpost.ca/navid-hassibi/re-engagement-with-iran_b_9169446.html


Processing URLs:  70%|███████   | 703/1000 [31:08<11:38,  2.35s/it]

Error extracting text from http://www.mid.ru/ru/foreign_policy/news/-/asset_publisher/cKNonkJE02Bw/content/id/2623713: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  71%|███████   | 706/1000 [31:17<13:50,  2.83s/it]

Error extracting text from http://peacekeeper.ru/en/?module=news&amp;action=view&amp;id=29155: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  71%|███████   | 707/1000 [31:18<11:36,  2.38s/it]

Error extracting text from http://www.newsweek.com/will-assad-and-hezbollah-get-pick-lebanons-president-405969: 403 Client Error: Forbidden for url: https://www.newsweek.com/will-assad-and-hezbollah-get-pick-lebanons-president-405969


Processing URLs:  71%|███████   | 710/1000 [31:20<05:19,  1.10s/it]

Error extracting text from http://www.reuters.com/article/2013/04/10/us-syria-crisis-iraq-idUSBRE9390OB20130410: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2013/04/10/us-syria-crisis-iraq-idUSBRE9390OB20130410
Error extracting text from http://www.oantagonista.com/posts/pedala-renan: 403 Client Error: Forbidden for url: http://www.oantagonista.com/posts/pedala-renan


Processing URLs:  71%|███████   | 712/1000 [31:22<05:29,  1.15s/it]

Error extracting text from https://www.transparency.org/country/#MNE: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#MNE


Processing URLs:  72%|███████▏  | 717/1000 [31:26<02:54,  1.62it/s]

Error extracting text from http://gbtimes.com/business/asian-nations-call-swift-completion-rcep-trade-talks: 403 Client Error: Forbidden for url: http://gbtimes.com/business/asian-nations-call-swift-completion-rcep-trade-talks
Error extracting text from http://www.nytimes.com/2016/02/12/world/middleeast/us-and-russia-announce-plan-for-humanitarian-aid-and-a-cease-fire-in-syria.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/12/world/middleeast/us-and-russia-announce-plan-for-humanitarian-aid-and-a-cease-fire-in-syria.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  72%|███████▏  | 719/1000 [31:30<05:28,  1.17s/it]

Error extracting text from https://play.google.com/store/apps/details?id=org.servalproject&amp;hl=en: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.servalproject&amp;hl=en


Processing URLs:  72%|███████▏  | 724/1000 [31:43<12:16,  2.67s/it]

Error extracting text from http://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118


Processing URLs:  72%|███████▎  | 725/1000 [31:43<08:58,  1.96s/it]

Error extracting text from http://www.reuters.com/article/iran-oil-output-idUSL5N1785SA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/iran-oil-output-idUSL5N1785SA


Processing URLs:  73%|███████▎  | 732/1000 [31:57<07:11,  1.61s/it]

Error extracting text from https://reut.rs/3pjBECH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/china-vice-premier-holds-talks-with-us-treasury-secretary-2021-06-02/


Processing URLs:  73%|███████▎  | 733/1000 [31:58<06:30,  1.46s/it]

Error extracting text from https://www.france24.com/en/live-news/20210501-un-fails-to-agree-on-myanmar-statement-diplomats-blame-china-russia: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210501-un-fails-to-agree-on-myanmar-statement-diplomats-blame-china-russia


Processing URLs:  74%|███████▎  | 736/1000 [32:07<11:18,  2.57s/it]

URL filtered: https://www.youtube.com/watch?v=SkyK3TB0ONc
URL filtered: https://www.youtube.com/results?search_query=the+march+bbc


Processing URLs:  74%|███████▍  | 740/1000 [32:10<06:26,  1.49s/it]

Error extracting text from https://reut.rs/33COPFj: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  74%|███████▍  | 744/1000 [32:13<04:18,  1.01s/it]

Error extracting text from http://www.nytimes.com/2016/07/01/business/self-driving-tesla-fatal-crash-investigation.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/01/business/self-driving-tesla-fatal-crash-investigation.html?_r=0


Processing URLs:  74%|███████▍  | 745/1000 [32:14<03:24,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-china-autos-green-idUSKCN11767X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-autos-green-idUSKCN11767X


Processing URLs:  75%|███████▍  | 746/1000 [32:14<03:27,  1.23it/s]

URL filtered: https://www.youtube.com/embed/iAgKHSNqxa8&quot


Processing URLs:  75%|███████▍  | 748/1000 [32:16<03:16,  1.28it/s]

Error extracting text from https://www.stanfordlawreview.org/print/article/three-tests-for-practical-evaluation-of-partisan-gerrymandering/: 403 Client Error: Forbidden for url: https://www.stanfordlawreview.org/print/article/three-tests-for-practical-evaluation-of-partisan-gerrymandering/


Processing URLs:  75%|███████▌  | 754/1000 [32:24<04:03,  1.01it/s]

Error extracting text from http://eng.mod.gov.cn/TopNews/2016-02/02/content_4639111.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/TopNews/2016-02/02/content_4639111.htm


Processing URLs:  76%|███████▌  | 762/1000 [32:34<03:41,  1.08it/s]

Error extracting text from http://allafrica.com/stories/201602280120.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201602280120.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303201f40>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  76%|███████▋  | 763/1000 [32:37<05:37,  1.42s/it]

Error extracting text from https://www.reuters.com/article/us-russian-politics-navalny-sanctions-idUSKBN2AI1WU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russian-politics-navalny-sanctions-idUSKBN2AI1WU


Processing URLs:  77%|███████▋  | 769/1000 [32:44<04:48,  1.25s/it]

Error extracting text from http://www.wsj.com/articles/u-s-intelligence-chief-suggests-russia-was-behind-election-linked-hacks-1474416647: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-intelligence-chief-suggests-russia-was-behind-election-linked-hacks-1474416647


Processing URLs:  77%|███████▋  | 773/1000 [32:51<04:32,  1.20s/it]

Error extracting text from http://www.france24.com/en/20160825-colombia-peace-deal-bitter-referendum-fight-santos-uribe: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160825-colombia-peace-deal-bitter-referendum-fight-santos-uribe


Processing URLs:  78%|███████▊  | 776/1000 [32:52<02:15,  1.65it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhapolitica.org/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.folhapolitica.org/&amp;prev=search


Processing URLs:  78%|███████▊  | 785/1000 [33:05<03:22,  1.06it/s]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130


Processing URLs:  79%|███████▊  | 786/1000 [33:08<04:42,  1.32s/it]

Error extracting text from http://www.news8000.com/iranian-vessels-intercept-us-destroyer/41355684: 404 Client Error: Not Found for url: https://www.news8000.com/iranian-vessels-intercept-us-destroyer/41355684/


Processing URLs:  79%|███████▉  | 788/1000 [33:10<03:56,  1.12s/it]

Error extracting text from http://africanbusinessmagazine.com/sectors/energy/going-nuclear-africas-energy-future/: 403 Client Error: Forbidden for url: https://african.business/sectors/energy/going-nuclear-africas-energy-future/


Processing URLs:  79%|███████▉  | 789/1000 [34:10<1:06:38, 18.95s/it]

Error extracting text from https://www.landlordology.com/order-of-applicants/: HTTPSConnectionPool(host='www.landlordology.com', port=443): Read timed out. (read timeout=60)
Error extracting text from http://dailypoliticsusa.com/breaking-rnc-says-will-deny-trump-even-gets-1237-delegates/: HTTPConnectionPool(host='dailypoliticsusa.com', port=80): Max retries exceeded with url: /breaking-rnc-says-will-deny-trump-even-gets-1237-delegates/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303b86030>: Failed to resolve 'dailypoliticsusa.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▉  | 792/1000 [34:12<27:58,  8.07s/it]  

Error extracting text from http://www.foreign.senate.gov/hearings/nato-expansion-examining-the-accession-of-montenegro_091416p: 403 Client Error: Forbidden for url: http://www.foreign.senate.gov/hearings/nato-expansion-examining-the-accession-of-montenegro_091416p


Processing URLs:  80%|███████▉  | 799/1000 [34:27<07:54,  2.36s/it]

Error extracting text from https://www.wsj.com/articles/u-s-covid-19-case-counts-have-doubled-in-recent-weeks-11626198501: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-covid-19-case-counts-have-doubled-in-recent-weeks-11626198501


Processing URLs:  80%|████████  | 800/1000 [35:27<1:05:09, 19.55s/it]

Error extracting text from https://www.olympic.org/michael-phelps: HTTPSConnectionPool(host='www.olympic.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  80%|████████  | 804/1000 [35:35<19:37,  6.01s/it]  

Error extracting text from http://www.wsj.com/articles/russia-signs-cooperation-agreement-with-anti-immigrant-party-in-austria-1482170810: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-signs-cooperation-agreement-with-anti-immigrant-party-in-austria-1482170810


Processing URLs:  81%|████████  | 812/1000 [35:46<04:21,  1.39s/it]

Error extracting text from https://www.yahoo.com/news/m/fc8f83b4-288a-3e5d-8362-8fd5707079b3/ss_heavy-clashes-in-kirkuk.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/m/fc8f83b4-288a-3e5d-8362-8fd5707079b3/ss_heavy-clashes-in-kirkuk.html


Processing URLs:  81%|████████▏ | 814/1000 [35:48<02:58,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-taiwan-idUSKCN0WP0IH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-taiwan-idUSKCN0WP0IH


Processing URLs:  82%|████████▏ | 815/1000 [35:49<02:56,  1.05it/s]

Error extracting text from https://bit.ly/3vGdYuS: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2021/march-2021


Processing URLs:  82%|████████▏ | 820/1000 [35:55<03:04,  1.02s/it]

Error extracting text from http://thehill.com/policy/energy-environment/315819-trump-to-advance-keystone-dakota-access-pipelines-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/315819-trump-to-advance-keystone-dakota-access-pipelines-report/


Processing URLs:  82%|████████▎ | 825/1000 [36:03<03:36,  1.24s/it]

Error extracting text from http://www.nytimes.com/2016/04/23/world/middleeast/russian-military-buildup-near-aleppo-threatens-truce-kerry-warns.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/23/world/middleeast/russian-military-buildup-near-aleppo-threatens-truce-kerry-warns.html?_r=0
Error extracting text from http://www.nytimes.com/2016/08/04/world/asia/china-communist-youth-league.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/04/world/asia/china-communist-youth-league.html?_r=0


Processing URLs:  83%|████████▎ | 828/1000 [36:06<03:03,  1.07s/it]

Error extracting text from https://bit.ly/2Pjaezm: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-elections-2021-scots-set-to-experience-electoral-groundhog-day-3209594


Processing URLs:  83%|████████▎ | 830/1000 [36:10<03:28,  1.23s/it]

Error extracting text from https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/orielly-statement: 403 Client Error: Forbidden for url: https://www.fcc.gov/document/fcc-proposes-ending-utility-style-regulation-internet/orielly-statement


Processing URLs:  84%|████████▍ | 839/1000 [36:28<03:26,  1.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-05/hedge-funds-wary-of-shorting-euro-yen-as-currencies-defy-easing


Processing URLs:  84%|████████▍ | 841/1000 [36:29<02:20,  1.13it/s]

Error extracting text from https://www.reuters.com/article/britain-eu-deal-parliament/eu-parliament-says-it-will-decide-on-brexit-deal-in-new-year-idUSKBN28Y1JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-eu-deal-parliament/eu-parliament-says-it-will-decide-on-brexit-deal-in-new-year-idUSKBN28Y1JK


Processing URLs:  84%|████████▍ | 844/1000 [36:33<03:14,  1.25s/it]

Error extracting text from https://www.ibtimes.com/rafael-nadal-injury-when-will-21-time-grand-slam-champion-play-next-3472437: 403 Client Error: Forbidden for url: https://www.ibtimes.com/rafael-nadal-injury-when-will-21-time-grand-slam-champion-play-next-3472437


Processing URLs:  84%|████████▍ | 845/1000 [36:34<02:41,  1.04s/it]

Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=8675: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=8675


Processing URLs:  85%|████████▍ | 846/1000 [36:34<02:08,  1.20it/s]

Error extracting text from http://www.businessinsider.com.au/trump-presidency-bad-tesla-solarcity-2016-11?r=US&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/trump-presidency-bad-tesla-solarcity-2016-11?r=US&amp;IR=T
URL filtered: https://twitter.com/CarolineLucas/status/1382006359235952642
URL filtered: https://twitter.com/search?q=%23SecondDoseDelay&src=typeahead_click


Processing URLs:  85%|████████▌ | 850/1000 [36:37<02:00,  1.24it/s]

Error extracting text from https://finance.yahoo.com/quote/FB/chart?p=FB: 404 Client Error: Not Found for url: https://finance.yahoo.com/quote/FB/chart?p=FB


Processing URLs:  85%|████████▌ | 853/1000 [36:40<02:11,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0WK1IF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0WK1IF


Processing URLs:  86%|████████▌ | 856/1000 [36:47<04:10,  1.74s/it]

Error extracting text from https://www.top500.org/statistics/details/country/CN: 404 Client Error: Not Found for url: https://www.top500.org/statistics/details/country/CN/


Processing URLs:  86%|████████▌ | 858/1000 [36:50<03:46,  1.60s/it]

Error extracting text from http://www.france24.com/fr/20151210-le-chef-renseignement-demissionne-desaccords-ghani-ashraf-taliban-rahmatullah-nabil: 403 Client Error: Forbidden for url: http://www.france24.com/fr/20151210-le-chef-renseignement-demissionne-desaccords-ghani-ashraf-taliban-rahmatullah-nabil


Processing URLs:  86%|████████▌ | 860/1000 [36:54<03:56,  1.69s/it]

Error extracting text from http://celebcafe.org/senate-panel-passes-bill-to-lift-united-states-oil-export-ban-future-uncertain-8806/: 500 Server Error: Internal Server Error for url: https://celebcafe.org/senate-panel-passes-bill-to-lift-united-states-oil-export-ban-future-uncertain-8806/


Processing URLs:  86%|████████▌ | 862/1000 [36:57<03:34,  1.55s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/307081: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/307081
URL filtered: https://platform.twitter.com/widgets.js&quot


Processing URLs:  86%|████████▋ | 865/1000 [36:59<02:17,  1.02s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2008/president/nh/new_hampshire_democratic_primary-194.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2008/president/nh/new_hampshire_democratic_primary-194.html


Processing URLs:  87%|████████▋ | 867/1000 [37:01<01:51,  1.20it/s]

Error extracting text from http://www.nytimes.com/2015/12/01/opinion/campaign-stops/bernie-sanders-your-cool-socialist-grandpa.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/01/opinion/campaign-stops/bernie-sanders-your-cool-socialist-grandpa.html?_r=0


Processing URLs:  87%|████████▋ | 868/1000 [37:01<01:29,  1.47it/s]

Error extracting text from http://247wallst.com/commodities-metals/2014/10/31/u-s-meat-prices-rise-sharply-in-october/: 403 Client Error: Forbidden for url: https://247wallst.com/commodities-metals/2014/10/31/u-s-meat-prices-rise-sharply-in-october/


Processing URLs:  87%|████████▋ | 870/1000 [37:03<01:35,  1.36it/s]

Error extracting text from http://www.nytimes.com/2015/09/20/world/warily-eyeing-china-philippines-may-invite-us-back-to-subic-bay.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/20/world/warily-eyeing-china-philippines-may-invite-us-back-to-subic-bay.html?_r=0


Processing URLs:  87%|████████▋ | 872/1000 [37:04<01:22,  1.55it/s]

Error extracting text from http://syriancivilwarmap.com/: 404 Client Error: Not Found for url: http://syriancivilwarmap.com/
URL filtered: https://apnews.com/9ad490e00ff5458daa98edb9745aa27e?utm_source=Twitter&utm_medium=AP&utm_campaign=SocialFlow


Processing URLs:  88%|████████▊ | 875/1000 [37:07<01:53,  1.11it/s]

Error extracting text from https://penziniplus.com/deuda-externa-frena-la-capacidad-de-incrementar-importaciones/490526: HTTPSConnectionPool(host='penziniplus.com', port=443): Max retries exceeded with url: /deuda-externa-frena-la-capacidad-de-incrementar-importaciones/490526 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x306959fd0>: Failed to resolve 'penziniplus.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  88%|████████▊ | 878/1000 [37:10<01:51,  1.09it/s]

Error extracting text from https://www.ccjdigital.com/economic-trends/indicators/article/15066174/truck-tonnage-dipped-for-second-consecutive-month-while-still-strong-yearoveryear: 403 Client Error: Forbidden for url: https://www.ccjdigital.com/economic-trends/indicators/article/15066174/truck-tonnage-dipped-for-second-consecutive-month-while-still-strong-yearoveryear


Processing URLs:  89%|████████▊ | 887/1000 [37:31<03:48,  2.03s/it]

Error extracting text from https://information.tv5monde.com/info/australie-un-groupe-de-television-suspend-la-diffusion-de-programmes-de-chaines-chinoises: 404 Client Error: Not Found for url: https://information.tv5monde.com/info/australie-un-groupe-de-television-suspend-la-diffusion-de-programmes-de-chaines-chinoises


Processing URLs:  89%|████████▉ | 889/1000 [37:38<04:41,  2.54s/it]

Error extracting text from http://en.delfi.lt/lithuania/defence/lithuanian-president-on-nato-much-done-still-more-to-do.d?id=69474042: 403 Client Error: Forbidden for url: https://www.delfi.lt/en/lithuania/defence/lithuanian-president-on-nato-much-done-still-more-to-do.d?id=69474042


Processing URLs:  89%|████████▉ | 894/1000 [37:45<02:24,  1.36s/it]

URL filtered: https://twitter.com/edconwaysky/status/1357299310258491394?s=21
URL filtered: https://www.bloomberg.com/news/articles/2017-04-20/canada-s-plan-to-be-the-world-leader-in-legal-weed
Error extracting text from http://www.washingtontimes.com/news/2017/apr/21/iran-cheat-deal-nuclear-research-opposition-group/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/apr/21/iran-cheat-deal-nuclear-research-opposition-group/


Processing URLs:  90%|█████████ | 901/1000 [38:03<02:18,  1.40s/it]

Error extracting text from http://www.reuters.com/article/us-oil-meeting-idUSKCN0VO2FJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-meeting-idUSKCN0VO2FJ


Processing URLs:  90%|█████████ | 902/1000 [40:03<58:31, 35.83s/it]

Error extracting text from https://zimnews.net/kasukuwere-moyo-g40-zimbabwe-corruption/: HTTPSConnectionPool(host='zimnews.net', port=443): Max retries exceeded with url: /kasukuwere-moyo-g40-zimbabwe-corruption/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fae61400>, 'Connection to zimnews.net timed out. (connect timeout=60)'))


Processing URLs:  90%|█████████ | 903/1000 [40:04<41:34, 25.72s/it]

Error extracting text from http://af.reuters.com/article/investingNews/idAFKCN0WB0U5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  90%|█████████ | 904/1000 [41:04<57:24, 35.88s/it]

Error extracting text from http://www.intelljobs.com: HTTPConnectionPool(host='www.intelljobs.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fae635f0>, 'Connection to www.intelljobs.com timed out. (connect timeout=60)'))


Processing URLs:  91%|█████████ | 906/1000 [41:08<29:05, 18.57s/it]

URL filtered: https://www.engadget.com/2017/11/05/russia-funded-facebook-and-twitter-investments-through-tech-magn/


Processing URLs:  91%|█████████ | 911/1000 [41:21<08:31,  5.75s/it]

Error extracting text from http://www.wsj.com/article_email/obamas-middle-east-escapism-1446769562-lMyQjAxMTI1MzA4NjgwNjY5Wj: 403 Client Error: Forbidden for url: https://www.wsj.com/article_email/obamas-middle-east-escapism-1446769562-lMyQjAxMTI1MzA4NjgwNjY5Wj


Processing URLs:  91%|█████████▏| 914/1000 [41:21<03:29,  2.44s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-04/ivory-coast-said-to-sell-1-million-tons-of-next-main-cocoa-crop
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-iran-idUSKBN0U12OM20151218: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-iran-idUSKBN0U12OM20151218


Processing URLs:  92%|█████████▏| 915/1000 [41:31<05:50,  4.12s/it]

Error extracting text from http://news.usni.org/2015/04/07/norad-chief-north-korea-has-ability-to-reach-u-s-with-nuclear-warhead-on-mobile-icbm: 403 Client Error: Forbidden for url: http://news.usni.org/2015/04/07/norad-chief-north-korea-has-ability-to-reach-u-s-with-nuclear-warhead-on-mobile-icbm


Processing URLs:  92%|█████████▏| 922/1000 [41:40<02:13,  1.71s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-18/wef-boss-schwab-warns-commodities-rout-could-spur-more-migration


Processing URLs:  92%|█████████▏| 924/1000 [42:40<18:03, 14.26s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-08-25/us-state-election-officials-still-in-the-dark-on-russian-hacking: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 929/1000 [42:46<04:30,  3.81s/it]

Error extracting text from https://www.nytimes.com/2020/08/29/health/coronavirus-testing.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/08/29/health/coronavirus-testing.html


Processing URLs:  93%|█████████▎| 932/1000 [42:51<02:44,  2.42s/it]

Error extracting text from http://ca.reuters.com/article/technologyNews/idCAKBN18I1EV-OCATC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  94%|█████████▎| 936/1000 [42:59<02:14,  2.10s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57509#.WcIl34xSyUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57509#.WcIl34xSyUk


Processing URLs:  94%|█████████▎| 937/1000 [43:01<02:03,  1.96s/it]

Error extracting text from http://ec.europa.eu/trade/policy/in-focus/ttip/about-ttip/process/#_state-of-play: 404 Client Error: (Not Found) for url: https://ec.europa.eu/policy/in-focus/ttip/about-ttip/process/#_state-of-play
Error extracting text from http://freenews.xyz/2015/11/17/the-japanese-foreign-ministry-2016-is-the-best-time-for-putins-visit/: HTTPConnectionPool(host='freenews.xyz', port=80): Max retries exceeded with url: /2015/11/17/the-japanese-foreign-ministry-2016-is-the-best-time-for-putins-visit/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304dd56a0>: Failed to resolve 'freenews.xyz' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/news/articles/2017-02-21/the-next-financial-crisis-might-be-in-your-driveway


Processing URLs:  94%|█████████▍| 940/1000 [43:02<01:06,  1.12s/it]

Error extracting text from http://jerseyeveningpost.com/news/uk-news/2017/02/27/may-fears-fresh-scottish-independence-referendum-after-brexit-triggered/: 404 Client Error: Not Found for url: https://jerseyeveningpost.com/news/uk-news/2017/02/27/may-fears-fresh-scottish-independence-referendum-after-brexit-triggered/


Processing URLs:  94%|█████████▍| 943/1000 [43:21<03:26,  3.63s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKCN12J1U5?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-turkey-idUSKCN12J1U5?il=0


Processing URLs:  94%|█████████▍| 945/1000 [43:23<02:10,  2.37s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-iran-houthis-idUSKBN16S22R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-iran-houthis-idUSKBN16S22R


Processing URLs:  95%|█████████▍| 946/1000 [44:27<17:31, 19.48s/it]

Error extracting text from http://government.ru/media/files/Csm0ZoYPiDqAzLQLLeAOGFYbJud9Vagc.pdf: HTTPConnectionPool(host='government.ru', port=80): Read timed out. (read timeout=60)
URL filtered: http://www.bloomberg.com/news/articles/2016-11-13/iran-pumps-more-oil-as-saudi-minister-calls-for-opec-output-cuts
URL filtered: https://www.bloomberg.com/news/articles/2017-10-17/how-north-korea-built-an-army-of-cyber-warriors-quicktake-q-a


Processing URLs:  95%|█████████▌| 950/1000 [44:29<06:08,  7.38s/it]

Error extracting text from http://abcnews.go.com/Business/wireStory/data-years-biggest-ipo-prices-estimates-34495304: 404 Client Error: Not Found for url: https://abcnews.go.com/Business/wireStory/data-years-biggest-ipo-prices-estimates-34495304


Processing URLs:  95%|█████████▌| 953/1000 [44:34<03:13,  4.11s/it]

Error extracting text from https://maps.everytownresearch.org/gunfire-in-school/: 403 Client Error: Forbidden for url: https://everytownresearch.org/maps/gunfire-in-school/


Processing URLs:  95%|█████████▌| 954/1000 [44:35<02:30,  3.28s/it]

Error extracting text from https://www.projekt-u5.de/en/construction-procedure/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  96%|█████████▌| 956/1000 [44:38<01:39,  2.26s/it]

Error extracting text from http://www.greaterkashmir.com/news/front-page/kashmir-shuts-against-pm-s-visit/245398.html: 403 Client Error: Forbidden for url: https://www.greaterkashmir.com/news/front-page/kashmir-shuts-against-pm-s-visit/245398.html
Error extracting text from http://www.reuters.com/article/us-tesla-offering-idUSKBN1AN13I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-offering-idUSKBN1AN13I


Processing URLs:  96%|█████████▋| 963/1000 [44:49<01:14,  2.01s/it]

URL filtered: http://www.chicagotribune.com/news/nationworld/politics/ct-russian-facebook-ads-20171101-story.html
URL filtered: http://www.bloomberg.com/news/articles/2016-04-22/canada-inflation-slows-for-2nd-month-on-cheaper-cars-and-gas


Processing URLs:  97%|█████████▋| 967/1000 [44:52<00:38,  1.15s/it]

Error extracting text from http://tass.com/world/934839: 502 Server Error: Bad Gateway for url: https://tass.com/world/934839


Processing URLs:  97%|█████████▋| 969/1000 [44:56<00:45,  1.48s/it]

URL filtered: https://podcasts.apple.com/cr/podcast/bonus-understanding-facebook-oversight-board-decision/id1460055316?i=1000520470444&amp;l=en


Processing URLs:  97%|█████████▋| 971/1000 [44:57<00:35,  1.23s/it]

Error extracting text from http://news.az/articles/iran/107031: 404 Client Error: Not Found for url: https://news.az/articles/iran/107031


Processing URLs:  98%|█████████▊| 975/1000 [45:07<01:05,  2.63s/it]

Error extracting text from http://www.ansamed.info/ansamed/en/news/sections/politics/2016/02/17/montenegro-to-join-nato-in-2017_eee4d6f9-19c8-46a9-8c22-1d9a6d155ef2.html: 404 Client Error: Not Found for url: https://www.ansa.it/ansamed/en/news/sections/politics/2016/02/17/montenegro-to-join-nato-in-2017_eee4d6f9-19c8-46a9-8c22-1d9a6d155ef2.html
URL filtered: https://twitter.com/338Canada/status/1430334963514216448/photo/1


Processing URLs:  98%|█████████▊| 978/1000 [45:14<00:52,  2.37s/it]

Error extracting text from http://thehill.com/homenews/campaign/364132-booker-patrick-stump-for-jones-ahead-of-alabama-election: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/364132-booker-patrick-stump-for-jones-ahead-of-alabama-election/


Processing URLs:  98%|█████████▊| 979/1000 [45:15<00:45,  2.17s/it]

Error extracting text from http://www.newtimes.co.rw/section/article/2016-03-31/198548/: 404 Client Error: Not Found for url: https://www.newtimes.co.rw/article/128580/Latest%20News/government-demands-explanation-over-bihozagaras-death


Processing URLs:  98%|█████████▊| 982/1000 [45:23<00:43,  2.42s/it]

Error extracting text from http://ncr-iran.org/en/news/human-rights/20717-iran-regime-hangs-18-people-over-the-weekend: 403 Client Error: Forbidden for url: https://ncr-iran.org/en/news/human-rights/20717-iran-regime-hangs-18-people-over-the-weekend


Processing URLs:  98%|█████████▊| 983/1000 [45:25<00:35,  2.09s/it]

Error extracting text from http://www.naval-technology.com/news/newsrussia-to-conduct-joint-exercises-with-china-india-and-egypt-in-2016-4791153: 404 Client Error: Not Found for url: https://www.naval-technology.com/news/newsrussia-to-conduct-joint-exercises-with-china-india-and-egypt-in-2016-4791153


Processing URLs:  98%|█████████▊| 985/1000 [45:28<00:27,  1.80s/it]

Error extracting text from http://syriadirect.org/news/east-ghouta-loses-its-breadbasket-%E2%80%98the-regime-exploited-the-ongoing-infighting%E2%80%99/: 404 Client Error: Not Found for url: http://syriadirect.org/news/east-ghouta-loses-its-breadbasket-%E2%80%98the-regime-exploited-the-ongoing-infighting%E2%80%99/


Processing URLs:  99%|█████████▉| 992/1000 [45:38<00:12,  1.61s/it]

Error extracting text from http://www.oddschecker.com/politics/british-politics: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/british-politics


Processing URLs:  99%|█████████▉| 993/1000 [45:39<00:09,  1.35s/it]

URL filtered: https://twitter.com/jbloom_lab/status/1407445638636208130


Processing URLs: 100%|█████████▉| 999/1000 [45:43<00:00,  1.39it/s]

Error extracting text from http://www.reuters.com/subjects/volkswagen-scandal: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/subjects/volkswagen-scandal
Error extracting text from http://www.reuters.com/article/us-usa-burundi-sanctions-idUSKCN0YO2M5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-burundi-sanctions-idUSKCN0YO2M5


Processing URLs: 100%|██████████| 1000/1000 [45:44<00:00,  2.74s/it]
Processing URLs:   0%|          | 2/1000 [00:03<32:32,  1.96s/it]

Error extracting text from http://tpu.ru/en/structure/labs/ipe/: 404 Client Error: Not Found for url: https://tpu.ru:443/en/structure/labs/ipe/


Processing URLs:   1%|          | 7/1000 [00:09<15:23,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-greens-idUSKBN15L13E?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-greens-idUSKBN15L13E?il=0
Error extracting text from http://www.nytimes.com/2016/08/26/technology/apple-software-vulnerability-ios-patch.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/26/technology/apple-software-vulnerability-ios-patch.html


Processing URLs:   1%|          | 8/1000 [00:09<11:37,  1.42it/s]

Error extracting text from http://www.vanguardngr.com/2016/05/639043/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/05/639043/


Processing URLs:   1%|          | 9/1000 [00:11<16:55,  1.02s/it]

Error extracting text from http://en.abna24.com/news/middle-east/un-special-envoy-for-yemen-arrives-in-tehran-to-discuss-yemeni-conflict_847709.html: 404 Client Error: Not Found for url: https://en.abna24.com/news/middle-east/un-special-envoy-for-yemen-arrives-in-tehran-to-discuss-yemeni-conflict_847709.html


Processing URLs:   1%|          | 11/1000 [00:13<16:43,  1.01s/it]

Error extracting text from https://www.dogpile.com/serp?q=spannband: 403 Client Error: Forbidden for url: https://www.dogpile.com/captcha?url=https%3A%2F%2Fwww.dogpile.com%2Fserp%3Fq%3Dspannband


Processing URLs:   1%|          | 12/1000 [00:15<23:15,  1.41s/it]

Error extracting text from http://huff.to/1N6twhj: HTTPConnectionPool(host='huff.to', port=80): Max retries exceeded with url: /1N6twhj (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fb774950>: Failed to resolve 'huff.to' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|▏         | 13/1000 [00:16<21:04,  1.28s/it]

Error extracting text from http://www.thewhatandthewhy.com/catalonia-spains-internal-affair/: 404 Client Error: Not Found for url: https://www.thewhatandthewhy.com/catalonia-spains-internal-affair/
URL filtered: http://www.bloomberg.com/politics/articles/2015-12-14/house-democrats-said-to-be-open-to-lifting-u-s-oil-export-ban-ii6dytba


Processing URLs:   2%|▏         | 15/1000 [00:16<12:24,  1.32it/s]

Error extracting text from http://www.wsj.com/articles/european-parliament-votes-to-suspend-talks-on-turkey-joining-the-eu-1479989641: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/european-parliament-votes-to-suspend-talks-on-turkey-joining-the-eu-1479989641


Processing URLs:   2%|▏         | 19/1000 [00:21<17:03,  1.04s/it]

Error extracting text from http://www.sunstar.com.ph/manila/local-news/2016/10/12/duterte-no-more-us-ph-balikatan-exercises-2017-503155: 404 Client Error: Not Found for url: https://www.sunstar.com.ph/manila/local-news/2016/10/12/duterte-no-more-us-ph-balikatan-exercises-2017-503155


Processing URLs:   2%|▏         | 23/1000 [00:35<34:20,  2.11s/it]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/cia-and-the-vietnam-policymakers-three-epis: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/cia-and-the-vietnam-policymakers-three-epis


Processing URLs:   2%|▎         | 25/1000 [00:36<22:32,  1.39s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/290626-research-critical-infrastructure-easy-to-hack-a-little-slow-to-patch: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/290626-research-critical-infrastructure-easy-to-hack-a-little-slow-to-patch/


Processing URLs:   3%|▎         | 26/1000 [01:36<5:06:34, 18.89s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2022-04-01/yemen-govt-to-help-with-release-of-prisoners-open-sanaa-airport-in-truce-moves: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   3%|▎         | 29/1000 [01:46<2:22:38,  8.81s/it]

Error extracting text from http://www.acleddata.com/asia-data/: 404 Client Error: Not Found for url: https://acleddata.com/asia-data/


Processing URLs:   3%|▎         | 30/1000 [01:47<1:42:36,  6.35s/it]

URL filtered: https://www.bloomberg.com/?sref=i2Bc5OtW


Processing URLs:   4%|▎         | 35/1000 [02:10<1:17:35,  4.82s/it]

Error extracting text from https://www.porttechnology.org/news/panama_canal_expansion_april_update: 403 Client Error: Forbidden for url: https://www.porttechnology.org/news/panama_canal_expansion_april_update
URL filtered: http://www.bloomberg.com/news/articles/2016-08-10/iran-expects-25-billion-oil-contracts-signed-within-two-years


Processing URLs:   4%|▍         | 39/1000 [02:18<46:56,  2.93s/it]  

Error extracting text from http://www.newsweek.com/kurds-aim-take-mosul-isis-490245: 403 Client Error: Forbidden for url: https://www.newsweek.com/kurds-aim-take-mosul-isis-490245


Processing URLs:   4%|▍         | 42/1000 [02:24<37:29,  2.35s/it]

URL filtered: https://twitter.com/elonmusk/status/1083575233423003648
Error extracting text from http://www.nasdaq.com/article/equinix-newmont-mining-halcn-resources-seadrill-partners-and-cobalt-international-energy-highlighted-as-zacks-bull-and-bear-of-the-day-cm519657#ixzz3lpaFq4Uw: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/equinix-newmont-mining-halcn-resources-seadrill-partners-and-cobalt-international-energy-highlighted-as-zacks-bull-and-bear-of-the-day-cm519657#ixzz3lpaFq4Uw


Processing URLs:   5%|▍         | 47/1000 [02:25<13:07,  1.21it/s]

Error extracting text from https://www.senate.gov/legislative/nominations/SupremeCourtNominations1789present.htm: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/nominations/SupremeCourtNominations1789present.htm
URL filtered: https://twitter.com/ABC?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor
Error extracting text from https://www.reuters.com/article/us-myanmar-politics-new/eleven-killed-as-myanmar-protesters-fight-troops-with-handmade-guns-firebombs-media-idUSKBN2BV1RO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-new/eleven-killed-as-myanmar-protesters-fight-troops-with-handmade-guns-firebombs-media-idUSKBN2BV1RO


Processing URLs:   5%|▍         | 48/1000 [02:26<14:03,  1.13it/s]

Error extracting text from http://www.crisisgroup.org/en/publication-type/crisiswatch/crisiswatch-database.aspx?CountryIDs=%7bC076CDFE-2B2D-4642-8895-5EF27AE4E416%7d#results: 404 Client Error: Not Found for url: https://www.crisisgroup.org/en/publication-type/crisiswatch/crisiswatch-database.aspx?CountryIDs=%7BC076CDFE-2B2D-4642-8895-5EF27AE4E416%7D#results


Processing URLs:   5%|▌         | 50/1000 [02:28<13:11,  1.20it/s]

Error extracting text from http://www.asean.org/news/item/asean-framework-for-regional-comprehensive-economic-partnership: 403 Client Error: Forbidden for url: http://www.asean.org/news/item/asean-framework-for-regional-comprehensive-economic-partnership


Processing URLs:   6%|▌         | 55/1000 [02:37<21:14,  1.35s/it]

URL filtered: https://twitter.com/StephenClark1/status/704690601074229249


Processing URLs:   6%|▌         | 57/1000 [02:39<19:22,  1.23s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_120534.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_120534.htm


Processing URLs:   6%|▋         | 63/1000 [02:47<20:16,  1.30s/it]

Error extracting text from https://www.nytimes.com/2016/09/29/technology/yahoo-data-breach-hacking.html?action=click&amp;contentCollection=Technology&amp;module=RelatedCoverage&amp;region=Marginalia&amp;pgtype: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/09/29/technology/yahoo-data-breach-hacking.html?action=click&amp;contentCollection=Technology&amp;module=RelatedCoverage&amp;region=Marginalia&amp;pgtype


Processing URLs:   6%|▋         | 64/1000 [02:48<15:56,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/05/04/us/politics/house-russia-investigation-comey-rogers.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/04/us/politics/house-russia-investigation-comey-rogers.html?_r=1


Processing URLs:   7%|▋         | 71/1000 [03:00<22:03,  1.42s/it]

Error extracting text from http://www.13wmaz.com/news/nation/pentagon-ramps-up-cyberwar-against-islamic-state/65012288: 404 Client Error: Not Found for url: https://www.13wmaz.com/news/nation/pentagon-ramps-up-cyberwar-against-islamic-state/65012288


Processing URLs:   7%|▋         | 73/1000 [03:03<22:23,  1.45s/it]

Error extracting text from http://www.latimes.com/business/la-fi-export-import-bank-20150929-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-export-import-bank-20150929-story.html
URL filtered: https://twitter.com/LeaveEUOfficial/status/720602933117763584


Processing URLs:   8%|▊         | 75/1000 [03:04<14:20,  1.08it/s]

Error extracting text from http://www.wsj.com/articles/eu-leaders-try-to-reach-deal-on-u-k-relationship-with-bloc-at-key-summit-1455798088: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-leaders-try-to-reach-deal-on-u-k-relationship-with-bloc-at-key-summit-1455798088


Processing URLs:   8%|▊         | 81/1000 [03:11<19:28,  1.27s/it]

Error extracting text from https://www.eisa.org/pubChad.php: 404 Client Error: Not Found for url: https://www.eisa.org/pubChad.php
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1351JS?feedType=RSS&amp;feedName=worldNews&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Reuters%2FworldNews+(Reuters+World+News): 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1351JS?feedType=RSS&amp;feedName=worldNews&amp;utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Reuters%2FworldNews+(Reuters+World+News)


Processing URLs:   9%|▉         | 90/1000 [03:28<24:49,  1.64s/it]

Error extracting text from https://www.justsecurity.org/43270/trump-campaign-collude-russia-defeat-republican-opponents-gop-primary: 403 Client Error: Forbidden for url: https://www.justsecurity.org/43270/trump-campaign-collude-russia-defeat-republican-opponents-gop-primary


Processing URLs:   9%|▉         | 93/1000 [03:35<27:15,  1.80s/it]

Error extracting text from https://www.nytimes.com/2017/08/19/technology/farhads-and-mikes-week-in-tech-apples-big-media-bet-and-more-uber-drama.html?emc=edit_th_20170820&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/19/technology/farhads-and-mikes-week-in-tech-apples-big-media-bet-and-more-uber-drama.html?emc=edit_th_20170820&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:   9%|▉         | 94/1000 [03:35<20:45,  1.37s/it]

Error extracting text from http://en.vietnamplus.vn/vietnam-condemns-inhuman-treatment-of-fishermen-spokesman/85562.vnp: HTTPSConnectionPool(host='en.vietnamplus.vn', port=443): Max retries exceeded with url: /vietnam-condemns-inhuman-treatment-of-fishermen-spokesman/85562.vnp (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  10%|▉         | 97/1000 [03:36<09:46,  1.54it/s]

URL filtered: https://www.youtube.com/watch?v=jz9_cCWPcls
Error extracting text from https://www.piie.com/blogs/trade-and-investment-policy-watch/anatomy-flop-why-trumps-us-china-phase-one-trade-deal-fell: 403 Client Error: Forbidden for url: https://www.piie.com/blogs/trade-and-investment-policy-watch/anatomy-flop-why-trumps-us-china-phase-one-trade-deal-fell
Error extracting text from https://www.reuters.com/business/chinas-economic-growth-more-than-halve-q2-more-policy-support-seen-2021-07-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/chinas-economic-growth-more-than-halve-q2-more-policy-support-seen-2021-07-13/


Processing URLs:  10%|█         | 101/1000 [04:43<3:54:34, 15.66s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2016-01-13/pentagon-lays-out-plan-to-take-back-mosul-raqqa-from-is: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  10%|█         | 103/1000 [04:44<2:11:39,  8.81s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2018/02/10/0401000000AEN20180210005100315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  11%|█         | 112/1000 [05:04<42:50,  2.89s/it]  

Error extracting text from http://bze.org.au/spain-now-producing-24-hour-solar-power-110708/: 404 Client Error: Not Found for url: https://www.bze.org.au/spain-now-producing-24-hour-solar-power-110708


Processing URLs:  11%|█▏        | 114/1000 [05:05<24:35,  1.67s/it]

Error extracting text from http://www.wsj.com/articles/u-s-agreed-to-north-korea-peace-talks-1456076019: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-agreed-to-north-korea-peace-talks-1456076019


Processing URLs:  12%|█▏        | 115/1000 [05:05<18:08,  1.23s/it]

Error extracting text from https://www.yahoo.com/news/us-threatens-end-negotiations-russia-syria-151650853.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/us-threatens-end-negotiations-russia-syria-151650853.html


Processing URLs:  12%|█▏        | 120/1000 [05:09<11:31,  1.27it/s]

Error extracting text from https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/YsdCAQaHGcjmeBGyy: 403 Client Error: Forbidden for url: https://www.lesswrong.com/s/wKPWFvdMyvgDWfusX/p/YsdCAQaHGcjmeBGyy
Error extracting text from https://www.reuters.com/article/us-eu-google-antitrust-idUSKBN18I1EV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-google-antitrust-idUSKBN18I1EV
Error extracting text from https://www.reuters.com/article/us-northkorea-missiles/north-korea-unveils-monster-new-intercontinental-ballistic-missile-at-parade-idUSKBN26V01K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles/north-korea-unveils-monster-new-intercontinental-ballistic-missile-at-parade-idUSKBN26V01K


Processing URLs:  13%|█▎        | 126/1000 [05:27<34:55,  2.40s/it]

Error extracting text from http://allafrica.com/stories/201605200928.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201605200928.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30787fc80>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  13%|█▎        | 128/1000 [06:28<4:32:22, 18.74s/it]

Error extracting text from https://www.usnews.com/news/articles/2017-02-24/trump-signals-support-for-border-adjustment-tax-plan-pushed-by-house-republicans: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  13%|█▎        | 131/1000 [06:34<1:54:24,  7.90s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VW0TY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VW0TY


Processing URLs:  13%|█▎        | 134/1000 [06:36<46:30,  3.22s/it]  

Error extracting text from http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html?ref=dealbook&amp;_r=2: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html?ref=dealbook&amp;_r=2
URL filtered: https://phys.org/news/2017-01-facebook-fake-news-offensive-germany.html
Error extracting text from http://www.reuters.com/article/us-safrica-politics-idUSKBN19D0LN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics-idUSKBN19D0LN


Processing URLs:  14%|█▍        | 143/1000 [06:51<16:22,  1.15s/it]

Error extracting text from http://www.nytimes.com/2016/04/17/world/middleeast/us-plans-to-step-upmilitary-campaign-against-isis.html?google_editors_picks=true: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/17/world/middleeast/us-plans-to-step-upmilitary-campaign-against-isis.html?google_editors_picks=true


Processing URLs:  14%|█▍        | 145/1000 [07:23<2:22:41, 10.01s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_turkey-slowly-accepts-syrian-migrants-amid-un-calls_411878.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_turkey-slowly-accepts-syrian-migrants-amid-un-calls_411878.html


Processing URLs:  15%|█▌        | 150/1000 [08:44<5:18:37, 22.49s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/congress/article163232073.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▌        | 160/1000 [09:07<34:03,  2.43s/it]  

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20170222/0905205351.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20170222/0905205351.html


Processing URLs:  16%|█▌        | 162/1000 [09:09<23:02,  1.65s/it]

Error extracting text from http://www.nytimes.com/2016/03/30/world/americas/dilma-rousseff-brazil-governing-coalition.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/30/world/americas/dilma-rousseff-brazil-governing-coalition.html?_r=0


Processing URLs:  17%|█▋        | 166/1000 [09:13<15:36,  1.12s/it]

Error extracting text from http://www.reuters.com/article/us-japan-china-islands-idUSKCN0YV01U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-china-islands-idUSKCN0YV01U


Processing URLs:  17%|█▋        | 170/1000 [09:18<15:56,  1.15s/it]

URL filtered: https://www.youtube.com/watch?v=PfiuIMwOq3s


Processing URLs:  17%|█▋        | 172/1000 [09:18<10:13,  1.35it/s]

Error extracting text from http://www.cdm.me/english/dps-first-nato-invitation-then-we-will-accept-any-date-for-elections: 403 Client Error: Forbidden for url: https://www.cdm.me/english/dps-first-nato-invitation-then-we-will-accept-any-date-for-elections


Processing URLs:  18%|█▊        | 180/1000 [09:27<19:49,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-bnp-paribas-settlement-sentencing-idUSKBN0NM41K20150501: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-bnp-paribas-settlement-sentencing-idUSKBN0NM41K20150501


Processing URLs:  18%|█▊        | 184/1000 [09:35<25:24,  1.87s/it]

Error extracting text from http://www.iranpolitik.com/2015/12/04/analysis/election-watch-2016-trends-watch-upcoming-iranian-parliamentary-elections/: 404 Client Error: Not Found for url: http://www.iranpolitik.com/2015/12/04/analysis/election-watch-2016-trends-watch-upcoming-iranian-parliamentary-elections/


Processing URLs:  18%|█▊        | 185/1000 [09:37<26:04,  1.92s/it]

Error extracting text from https://www.fcc.gov/news-events/events/2017/11/november-2017-open-commission-meeting: 403 Client Error: Forbidden for url: https://www.fcc.gov/news-events/events/2017/11/november-2017-open-commission-meeting


Processing URLs:  19%|█▉        | 188/1000 [09:40<20:43,  1.53s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trade-nafta-idUSKBN16F2JI?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-nafta-idUSKBN16F2JI?il=0


Processing URLs:  20%|██        | 200/1000 [10:02<18:10,  1.36s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/putin-s-oil-pact-with-saudis-masks-threat-of-inflaming-syria-war


Processing URLs:  20%|██        | 204/1000 [10:08<20:04,  1.51s/it]

Error extracting text from http://www.nuclearenergyinsider.com/cyber-security/conference-speakers.php: 503 Server Error: Service Temporarily Unavailable for url: https://www.nuclearenergyinsider.com:443/cyber-security/conference-speakers.php


Processing URLs:  21%|██        | 207/1000 [10:13<20:47,  1.57s/it]

Error extracting text from http://www.navytimes.com/story/military/2016/07/11/carter-iraq-announces-preliminary-plan-retake-mosul-isis/86936478/: 404 Client Error: Not Found for url: https://www.navytimes.com/story/military/2016/07/11/carter-iraq-announces-preliminary-plan-retake-mosul-isis/86936478/


Processing URLs:  21%|██        | 210/1000 [10:18<19:41,  1.50s/it]

Error extracting text from http://www.debka.com/article/25634/Syrian-Kurds-clash-with-Turkish-forces-: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25634/Syrian-Kurds-clash-with-Turkish-forces- (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  21%|██▏       | 213/1000 [10:24<23:53,  1.82s/it]

Error extracting text from http://izvestia.ru/news/653183: 403 Client Error: Forbidden for url: https://iz.ru/news/653183


Processing URLs:  22%|██▏       | 221/1000 [10:29<06:09,  2.11it/s]

Error extracting text from http://www.reuters.com/article/us-iran-election-runoff-idUSKCN0W00E8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-runoff-idUSKCN0W00E8
URL filtered: http://www.wired.com/2015/08/how-facebook-m-works/
Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-idUSKCN11D1K0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-idUSKCN11D1K0


Processing URLs:  22%|██▏       | 223/1000 [10:34<15:24,  1.19s/it]

Error extracting text from http://arxiv.org:443/find/all/1/AND+abs:+AND+Allen+Telescope+abs:+AND+OR+red+M+OR+dwarf+dwarves/0/1/0/all/0/1: ('Connection aborted.', BadStatusLine('\x15\x03\x03\x00\x02\x022\x15\x03\x03\x00\x02\x01\x00'))


Processing URLs:  23%|██▎       | 229/1000 [11:47<4:07:18, 19.25s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-02-04/will-peace-with-the-farc-destabilize-colombia: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  23%|██▎       | 234/1000 [11:55<57:36,  4.51s/it]  

URL filtered: https://twitter.com/pnkurunziza


Processing URLs:  24%|██▍       | 239/1000 [12:03<24:42,  1.95s/it]

Error extracting text from http://www.nytimes.com/2015/12/18/world/middleeast/isis-carries-out-first-serious-attack-in-northern-iraq-in-months-us-says.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/18/world/middleeast/isis-carries-out-first-serious-attack-in-northern-iraq-in-months-us-says.html?_r=0


Processing URLs:  24%|██▍       | 242/1000 [12:10<30:01,  2.38s/it]

Error extracting text from http://elections.nytimes.com/2012/results/states: 404 Client Error: Not Found for url: https://www.nytimes.com/elections/2012/results/states.html


Processing URLs:  24%|██▍       | 244/1000 [13:10<2:52:45, 13.71s/it]

Error extracting text from http://5.160.12.80/en/news/2016/07/28/1142663/iraqi-hashd-al-shaabi-to-be-granted-official-status-pm: HTTPConnectionPool(host='5.160.12.80', port=80): Max retries exceeded with url: /en/news/2016/07/28/1142663/iraqi-hashd-al-shaabi-to-be-granted-official-status-pm (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe273620>, 'Connection to 5.160.12.80 timed out. (connect timeout=60)'))
Error extracting text from http://www.nytimes.com/2016/08/25/world/middleeast/turkey-syria-isis.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/25/world/middleeast/turkey-syria-isis.html?_r=0


Processing URLs:  25%|██▍       | 247/1000 [13:14<1:07:12,  5.36s/it]

Error extracting text from https://balkaninsight.com/2021/04/06/north-macedonia-arrests-police-in-crackdown-on-passport-forgers/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/04/06/north-macedonia-arrests-police-in-crackdown-on-passport-forgers/


Processing URLs:  25%|██▍       | 249/1000 [13:15<36:25,  2.91s/it]  

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-makes-nato-operation-mandatory-for-its-troops-02-02-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-makes-nato-operation-mandatory-for-its-troops-02-02-2016


Processing URLs:  25%|██▌       | 252/1000 [13:18<20:26,  1.64s/it]

Error extracting text from http://www.nytimes.com/2016/01/04/world/middleeast/iran-saudi-arabia-execution-sheikh-nimr.html?ref=world&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/04/world/middleeast/iran-saudi-arabia-execution-sheikh-nimr.html?ref=world&amp;_r=0


Processing URLs:  25%|██▌       | 253/1000 [13:19<15:05,  1.21s/it]

Error extracting text from https://www.wsj.com/articles/ukraine-mounts-counteroffensive-to-drive-russians-back-from-kyiv-key-cities-11647428858: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ukraine-mounts-counteroffensive-to-drive-russians-back-from-kyiv-key-cities-11647428858


Processing URLs:  26%|██▌       | 261/1000 [13:29<14:36,  1.19s/it]

Error extracting text from http://www.nytimes.com/2016/03/10/world/asia/google-alphago-lee-se-dol.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/10/world/asia/google-alphago-lee-se-dol.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  27%|██▋       | 266/1000 [13:37<17:38,  1.44s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/nato-russia-diplomats-rare-talks-ease-tensions-44278250: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/nato-russia-diplomats-rare-talks-ease-tensions-44278250


Processing URLs:  27%|██▋       | 273/1000 [13:54<28:36,  2.36s/it]

Error extracting text from https://www.argusmedia.com/News/Article/?id=1178560: 404 Client Error: Not Found for url: https://www.argusmedia.com/not-found


Processing URLs:  28%|██▊       | 277/1000 [13:59<16:16,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-iaea-idUSKCN0UR1NH20160113: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-iaea-idUSKCN0UR1NH20160113


Processing URLs:  28%|██▊       | 278/1000 [14:04<30:07,  2.50s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-election/afghanistan-parliament-elections-likely-delayed-until-october-idUSKBN1FO0BZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-election/afghanistan-parliament-elections-likely-delayed-until-october-idUSKBN1FO0BZ?il=0


Processing URLs:  29%|██▊       | 286/1000 [14:30<56:51,  4.78s/it]

Error extracting text from https://www.duke-energy.com/power-plants/nuclear.asp: HTTPSConnectionPool(host='www.duke-energy.com', port=443): Max retries exceeded with url: /power-plants/nuclear.asp (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300a5ecc0>: Failed to resolve 'www.duke-energy.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  29%|██▊       | 287/1000 [14:30<42:48,  3.60s/it]

URL filtered: https://twitter.com/Visa


Processing URLs:  29%|██▉       | 289/1000 [14:31<25:58,  2.19s/it]

Error extracting text from http://blogs.piie.com/realtime/?p=5373: HTTPConnectionPool(host='blogs.piie.com', port=80): Max retries exceeded with url: /realtime/?p=5373 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5e360>: Failed to resolve 'blogs.piie.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/news/articles/2016-08-31/japan-s-biggest-banks-said-to-sign-agreement-with-saudi-aramco


Processing URLs:  29%|██▉       | 294/1000 [14:35<14:42,  1.25s/it]

Error extracting text from http://thevillagessuntimes.com/2015/12/27/south-china-sea-us-bomber-angers-beijing-with-spratly/: 403 Client Error: Forbidden for url: http://thevillagessuntimes.com/2015/12/27/south-china-sea-us-bomber-angers-beijing-with-spratly/


Processing URLs:  30%|██▉       | 296/1000 [14:38<16:52,  1.44s/it]

Error extracting text from https://radiotamazuj.org/en/article/splm-n-official-reports-polio-cases-sudan%E2%80%99s-blue-nile: 404 Client Error: Not Found for url: https://radiotamazuj.org/en/article/splm-n-official-reports-polio-cases-sudan%E2%80%99s-blue-nile


Processing URLs:  30%|██▉       | 298/1000 [14:39<11:03,  1.06it/s]

Error extracting text from http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2219135: 403 Client Error: Forbidden for url: http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2219135


Processing URLs:  30%|██▉       | 299/1000 [14:40<10:02,  1.16it/s]

Error extracting text from http://www.talk-finance.co.uk/international/eu-and-uk-negotiations-are-suspended/: 520 Server Error:  for url: https://www.talk-finance.co.uk/international/eu-and-uk-negotiations-are-suspended/


Processing URLs:  30%|███       | 300/1000 [14:40<09:17,  1.26it/s]

URL filtered: https://twitter.com/pontemnetwork?lang=en


Processing URLs:  30%|███       | 303/1000 [14:42<07:02,  1.65it/s]

Error extracting text from http://www.wsj.com/articles/european-border-crackdown-kick-starts-migrant-smuggling-business-1459260153: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/european-border-crackdown-kick-starts-migrant-smuggling-business-1459260153


Processing URLs:  30%|███       | 305/1000 [14:44<08:01,  1.44it/s]

Error extracting text from http://www.rfi.fr/ameriques/20160413-haiti-partisans-martelly-phtk-manifestent-demander-elections: 403 Client Error: Forbidden for url: http://www.rfi.fr/ameriques/20160413-haiti-partisans-martelly-phtk-manifestent-demander-elections


Processing URLs:  31%|███       | 306/1000 [14:46<12:24,  1.07s/it]

URL filtered: http://www.reuters.com/article/us-germany-facebook-fake-idUSKBN1470CN


Processing URLs:  32%|███▏      | 315/1000 [15:03<27:49,  2.44s/it]

Error extracting text from http://www.stltoday.com/news/ap-interview-nato-expands-migrant-mission-in-aegean-sea/article_1c56e8a9-b6f3-5aa9-b16d-5720cab23d6e.html: 404 Client Error: Not Found for url: https://www.stltoday.com/news/ap-interview-nato-expands-migrant-mission-in-aegean-sea/article_1c56e8a9-b6f3-5aa9-b16d-5720cab23d6e.html


Processing URLs:  32%|███▏      | 316/1000 [15:04<21:37,  1.90s/it]

Error extracting text from http://freenews.xyz/2016/01/27/the-ministry-of-energy-of-turkey-is-counting-on-the-continuation-of-joint-projects-with-russia/: HTTPConnectionPool(host='freenews.xyz', port=80): Max retries exceeded with url: /2016/01/27/the-ministry-of-energy-of-turkey-is-counting-on-the-continuation-of-joint-projects-with-russia/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3070890a0>: Failed to resolve 'freenews.xyz' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  32%|███▏      | 321/1000 [15:07<09:26,  1.20it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/dec/18/hillary-clinton-cuts-bernie-sanders-lead-nh-poll/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/dec/18/hillary-clinton-cuts-bernie-sanders-lead-nh-poll/
Error extracting text from https://ark-invest.com/research/podcast/elon-musk-podcast: 403 Client Error: Forbidden for url: https://ark-invest.com/research/podcast/elon-musk-podcast


Processing URLs:  33%|███▎      | 329/1000 [15:22<22:58,  2.05s/it]

Error extracting text from http://www.timesofisrael.com/pas-collapse-now-seems-a-matter-of-when-not-if/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/pas-collapse-now-seems-a-matter-of-when-not-if/


Processing URLs:  34%|███▍      | 338/1000 [15:36<17:55,  1.62s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/01/09/0200000000AEN20160109003000315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: https://www.bloomberg.com/news/articles/2019-04-24/ukraine-to-cut-gas-prices-in-move-that-may-strain-imf-relations


Processing URLs:  34%|███▍      | 341/1000 [15:37<08:54,  1.23it/s]

Error extracting text from http://www.reuters.com/article/2015/09/19/us-mideastcrisis-kerrytalks-idUSKCN0RJ0FX20150919: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/19/us-mideastcrisis-kerrytalks-idUSKCN0RJ0FX20150919
Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-20-march-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-20-march-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307b2ffe0>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  35%|███▍      | 346/1000 [15:45<14:51,  1.36s/it]

Error extracting text from http://sites.stat.psu.edu/~mga/401/tables/Chi-square: 503 Server Error: Service Temporarily Unavailable for url: http://sites.stat.psu.edu/~mga/401/tables/Chi-square


Processing URLs:  35%|███▍      | 347/1000 [15:46<14:51,  1.36s/it]

Error extracting text from http://taskandpurpose.com/deploying-apaches-mosul-comes-real-risk-ground-fight/: 404 Client Error: Not Found for url: https://taskandpurpose.com/deploying-apaches-mosul-comes-real-risk-ground-fight/
URL filtered: https://www.bloomberg.com/news/articles/2016-12-19/goldman-warns-china-outflows-rising-in-both-yuan-payments-forex


Processing URLs:  35%|███▍      | 349/1000 [15:48<11:15,  1.04s/it]

Error extracting text from http://bigstory.ap.org/article/671bdbe15b134c73bb0f021218fd3302/ap-exclusive-iraq-oil-fires-could-jeopardize-mosul-mission: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/671bdbe15b134c73bb0f021218fd3302/ap-exclusive-iraq-oil-fires-could-jeopardize-mosul-mission (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307b2d430>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/elonmusk/status/1256239815256797184


Processing URLs:  36%|███▌      | 355/1000 [15:49<04:59,  2.16it/s]

Error extracting text from http://www.reuters.com/article/idUSKCN0VD07T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VD07T
Error extracting text from http://www.nytimes.com/2015/09/07/opinion/you-deserve-a-raise-today-interest-rates-dont.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/07/opinion/you-deserve-a-raise-today-interest-rates-dont.html


Processing URLs:  36%|███▌      | 360/1000 [16:00<16:47,  1.57s/it]

Error extracting text from http://nyti.ms/1L68dYw: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/08/world/middleeast/russia-syria-conflict.html?smid=pl-share


Processing URLs:  36%|███▌      | 361/1000 [16:01<16:07,  1.51s/it]

URL filtered: https://www.bloomberg.com/view/articles/2016-09-14/how-superforecasters-think-about-the-future


Processing URLs:  36%|███▋      | 364/1000 [16:03<11:21,  1.07s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/opinions_123909.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_123909.htm


Processing URLs:  37%|███▋      | 370/1000 [16:14<17:35,  1.67s/it]

Error extracting text from https://www.lesswrong.com/posts/AHTRyQJtiRin22kth/the-darwin-game-1: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/AHTRyQJtiRin22kth/the-darwin-game-1


Processing URLs:  37%|███▋      | 374/1000 [16:35<1:04:51,  6.22s/it]

Error extracting text from http://bigstory.ap.org/article/7e7e7d7bbebf460dafc2ec8e997a2271/apnewsbreak-us-sees-assad-staying-syria-until-march-2017: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/7e7e7d7bbebf460dafc2ec8e997a2271/apnewsbreak-us-sees-assad-staying-syria-until-march-2017 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fc8b7470>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 380/1000 [16:40<16:55,  1.64s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-11-30/central-banks-find-multiple-ways-to-tighten-in-investor-test


Processing URLs:  38%|███▊      | 383/1000 [16:43<12:42,  1.24s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-09/shale-drillers-challenging-opec-with-84-billion-spending-spree


Processing URLs:  39%|███▉      | 388/1000 [16:46<08:12,  1.24it/s]

Error extracting text from http://thehill.com/blogs/pundits-blog/finance/262843-two-cheers-for-the-imfs-christine-lagarde: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/finance/262843-two-cheers-for-the-imfs-christine-lagarde/


Processing URLs:  39%|███▉      | 389/1000 [16:47<07:54,  1.29it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/opinions_125358.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_125358.htm


Processing URLs:  39%|███▉      | 391/1000 [17:48<2:58:40, 17.60s/it]

Error extracting text from https://sports.ladbrokes.com/en-gb/betting/politics/world-politics/world-politics/next-un-secretary-general/221919402/: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /en-gb/betting/politics/world-politics/world-politics/next-un-secretary-general/221919402/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fc8b6ae0>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:  39%|███▉      | 392/1000 [17:48<2:08:49, 12.71s/it]

Error extracting text from http://mobile.nytimes.com/2016/03/27/world/europe/migrants-in-greece-ready-to-go-anywhere-in-europe-scramble-to-enter-eu-relocation-program.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/27/world/europe/migrants-in-greece-ready-to-go-anywhere-in-europe-scramble-to-enter-eu-relocation-program.html?_r=0&amp;referer=https://www.google.com/
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O10CC66KLVR901-4H6R9N958BO7UOKV6IOATI8IGS


Processing URLs:  40%|███▉      | 397/1000 [17:53<36:44,  3.66s/it]  

Error extracting text from https://warisboring.com/shia-militias-keep-undermining-the-campaign-for-mosul-d1aa5f92e0fe?mc_cid=6b032ca10c&amp;mc_eid=0467f21653#.nrmmx0x4e: 403 Client Error: Forbidden for url: https://warisboring.com/shia-militias-keep-undermining-the-campaign-for-mosul-d1aa5f92e0fe?mc_cid=6b032ca10c&amp;mc_eid=0467f21653#.nrmmx0x4e


Processing URLs:  40%|███▉      | 399/1000 [18:00<36:50,  3.68s/it]

URL filtered: https://www.youtube.com/watch?v=wsixsRI-Sz4


Processing URLs:  40%|████      | 401/1000 [18:16<57:13,  5.73s/it]

Error extracting text from https://www.investopedia.com/terms/n/negative-watch.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/n/negative-watch.asp
Error extracting text from https://www.reuters.com/world/uk/scottish-nationalists-unlikely-win-majority-poll-indicates-2021-05-05/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/scottish-nationalists-unlikely-win-majority-poll-indicates-2021-05-05/


Processing URLs:  40%|████      | 404/1000 [18:20<33:17,  3.35s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/suu-kyi-s-surprise/2421824.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/suu-kyi-s-surprise/2421824.html


Processing URLs:  41%|████      | 406/1000 [18:24<26:19,  2.66s/it]

Error extracting text from http://www.reuters.com/article/us-southsudan-security-un-idUSKBN14A1PS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southsudan-security-un-idUSKBN14A1PS


Processing URLs:  41%|████▏     | 413/1000 [18:36<19:27,  1.99s/it]

URL filtered: https://www.siliconrepublic.com/companies/google-crosscheck-facebook-france


Processing URLs:  42%|████▏     | 415/1000 [18:39<17:21,  1.78s/it]

Error extracting text from http://www.reuters.com/article/2015/10/27/us-mideast-crisis-syria-statement-idUSKCN0SL1AZ20151027: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/27/us-mideast-crisis-syria-statement-idUSKCN0SL1AZ20151027


Processing URLs:  42%|████▏     | 417/1000 [18:40<11:42,  1.20s/it]

Error extracting text from https://www.autoweek.com/news/technology/a36354137/beijing-olympics-robotaxi-china/: 403 Client Error: Forbidden for url: https://www.autoweek.com/news/technology/a36354137/beijing-olympics-robotaxi-china/


Processing URLs:  42%|████▏     | 419/1000 [18:43<13:25,  1.39s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-economy-idUSKBN1AK01J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-idUSKBN1AK01J


Processing URLs:  42%|████▏     | 421/1000 [18:44<09:35,  1.01it/s]

Error extracting text from http://www.reuters.com/article/2015/10/30/us-northkorea-nuclear-idUSKCN0SO09N20151030#fkPel7IT1siqWzxe.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/30/us-northkorea-nuclear-idUSKCN0SO09N20151030#fkPel7IT1siqWzxe.97


Processing URLs:  43%|████▎     | 429/1000 [18:53<09:08,  1.04it/s]

Error extracting text from https://www.leafscience.org/a-step-closer-to-regenerating-the-aging-thymus/: HTTPSConnectionPool(host='www.leafscience.org', port=443): Max retries exceeded with url: /a-step-closer-to-regenerating-the-aging-thymus/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.leafscience.org'. (_ssl.c:1000)")))


Processing URLs:  43%|████▎     | 431/1000 [18:55<08:08,  1.16it/s]

Error extracting text from http://voiceobserver.com/2016/07/26/germany-syrian-asylum-seeker-blows-himself-up-wounding-15.html: HTTPConnectionPool(host='voiceobserver.com', port=80): Max retries exceeded with url: /2016/07/26/germany-syrian-asylum-seeker-blows-himself-up-wounding-15.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761e780>: Failed to resolve 'voiceobserver.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  43%|████▎     | 432/1000 [18:56<07:51,  1.21it/s]

Error extracting text from http://www.saarc-sec.org/2017/02/01/news/53rd-Session-of-the-Programming-Committee-1-2-February-2017/161/: 404 Client Error: Not Found for url: https://www.saarc-sec.org/2017/02/01/news/53rd-Session-of-the-Programming-Committee-1-2-February-2017/161/


Processing URLs:  44%|████▎     | 436/1000 [19:00<08:37,  1.09it/s]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2016/july/tradoc_154811.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2016/july/tradoc_154811.pdf


Processing URLs:  44%|████▍     | 438/1000 [19:01<07:29,  1.25it/s]

Error extracting text from https://www.japantimes.co.jp/news/2018/02/17/asia-pacific/politics-diplomacy-asia-pacific/early-inter-korean-summit-moon-says-urging-talks-washington-pyongyang/#.WpGu4ZM-feQ: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2018/02/17/asia-pacific/politics-diplomacy-asia-pacific/early-inter-korean-summit-moon-says-urging-talks-washington-pyongyang/#.WpGu4ZM-feQ


Processing URLs:  44%|████▍     | 440/1000 [19:05<12:20,  1.32s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0T12TG20151112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0T12TG20151112


Processing URLs:  44%|████▍     | 442/1000 [19:11<22:12,  2.39s/it]

Error extracting text from http://www.ifamagazine.com/market-and-economics/bank-of-japan-opts-against-extending-stimulus-323833: 404 Client Error: Not Found for url: https://ifamagazine.com/market-and-economics/bank-of-japan-opts-against-extending-stimulus-323833


Processing URLs:  44%|████▍     | 444/1000 [19:14<16:57,  1.83s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.otempo.com.br/capa/pol%25C3%25ADtica/petistas-dizem-que-lula-aceitou-ser-ministro-de-dilma-1.1257747&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.otempo.com.br/capa/pol%25C3%25ADtica/petistas-dizem-que-lula-aceitou-ser-ministro-de-dilma-1.1257747&amp;prev=search


Processing URLs:  45%|████▍     | 446/1000 [19:16<12:08,  1.32s/it]

Error extracting text from http://www.newsweek.com/us-traffic-deaths-injuries-and-related-costs-2015-363602: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-traffic-deaths-injuries-and-related-costs-2015-363602


Processing URLs:  45%|████▍     | 448/1000 [19:21<15:59,  1.74s/it]

URL filtered: http://www.telegraph.co.uk/technology/social-media/12105631/Twitter-and-YouTube-unblocked-in-Iran-for-some-users-after-sanctions-lifted.html


Processing URLs:  45%|████▌     | 453/1000 [19:24<07:22,  1.24it/s]

Error extracting text from http://www.caam.org.cn/english/newslist/a101-1.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/english/newslist/a101-1.html
Error extracting text from http://www.reuters.com/article/us-yemen-security-un-idUSKBN13O2K1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-un-idUSKBN13O2K1


Processing URLs:  45%|████▌     | 454/1000 [19:24<05:40,  1.60it/s]

Error extracting text from http://www.nytimes.com/2015/09/12/opinion/russias-risky-military-moves-in-syria.html?action=click&amp;pgtype=Homepage&amp;module=opinion-c-col-left-region&amp;region=opinion-c-col-left-region&amp;WT.nav=opinion-c-col-left-region&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/12/opinion/russias-risky-military-moves-in-syria.html?action=click&amp;pgtype=Homepage&amp;module=opinion-c-col-left-region&amp;region=opinion-c-col-left-region&amp;WT.nav=opinion-c-col-left-region&amp;_r=0


Processing URLs:  46%|████▌     | 456/1000 [19:27<09:19,  1.03s/it]

Error extracting text from http://www.mfa.gov.tr/visa-information-for-foreigners.en.mfa: HTTPSConnectionPool(host='www.mfa.gov.tr', port=443): Max retries exceeded with url: /visa-information-for-foreigners.en.mfa (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://www.youtube.com/watch?v=wDldUOubh0w


Processing URLs:  46%|████▋     | 464/1000 [19:45<17:53,  2.00s/it]

Error extracting text from https://www.reuters.com/article/britain-eu/moment-may-come-to-end-post-brexit-eu-trade-talks-says-uk-pm-johnson-idUSKBN28I18J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-eu/moment-may-come-to-end-post-brexit-eu-trade-talks-says-uk-pm-johnson-idUSKBN28I18J


Processing URLs:  47%|████▋     | 468/1000 [19:50<10:26,  1.18s/it]

Error extracting text from http://www.business-standard.com/article/pti-stories/rcep-may-turn-into-a-reality-soon-wadhwa-116021700424_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/pti-stories/rcep-may-turn-into-a-reality-soon-wadhwa-116021700424_1.html
Error extracting text from https://www.reuters.com/article/us-usa-security-kaspersky-russia/kremlin-says-allegations-against-kaspersky-lab-are-absurd-idUSKBN1CH1DX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-security-kaspersky-russia/kremlin-says-allegations-against-kaspersky-lab-are-absurd-idUSKBN1CH1DX


Processing URLs:  47%|████▋     | 472/1000 [19:57<14:24,  1.64s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-vaccines-attitudes/exclusive-international-covid-19-vaccine-poll-shows-higher-mistrust-of-russia-china-shots-idUSKBN29K16T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-vaccines-attitudes/exclusive-international-covid-19-vaccine-poll-shows-higher-mistrust-of-russia-china-shots-idUSKBN29K16T


Processing URLs:  47%|████▋     | 474/1000 [19:58<08:56,  1.02s/it]

Error extracting text from https://www.predictit.org/Market/2177/Who-will-win-the-Peruvian-presidential-runoff-election: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/2177/Who-will-win-the-Peruvian-presidential-runoff-election
URL filtered: https://www.bloomberg.com/features/elon-musk-goals/


Processing URLs:  48%|████▊     | 483/1000 [20:19<33:38,  3.90s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu-barnier/eus-barnier-no-agreement-yet-on-brexit-bill-irish-border-idUSKBN1DT0YL?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-barnier/eus-barnier-no-agreement-yet-on-brexit-bill-irish-border-idUSKBN1DT0YL?il=0


Processing URLs:  49%|████▊     | 486/1000 [20:21<15:55,  1.86s/it]

Error extracting text from http://www.debka.com/article/25262/Fundamentalists-and-Revolutionary-Guards-steal-Iran%E2%80%99s-elections: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25262/Fundamentalists-and-Revolutionary-Guards-steal-Iran%E2%80%99s-elections (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from http://www.reuters.com/article/venezuela-bonds-idUSL2N1540UK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-bonds-idUSL2N1540UK


Processing URLs:  49%|████▉     | 491/1000 [20:32<15:43,  1.85s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN1061T2?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN1061T2?il=0


Processing URLs:  49%|████▉     | 494/1000 [20:37<16:13,  1.92s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-14/syrian-transition-plan-achieved-by-u-s-allies-kerry-says


Processing URLs:  50%|████▉     | 496/1000 [20:39<12:44,  1.52s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-18/fed-inserted-language-to-stress-potential-for-december-liftoff


Processing URLs:  50%|█████     | 502/1000 [20:51<13:37,  1.64s/it]

URL filtered: https://twitter.com/NicolaSturgeon/status/854285958535098369
URL filtered: https://www.facebook.com/MorningJoe/videos/10153833195278762/?fref=nf
URL filtered: https://www.youtube.com/watch?v=HkkIOLV7F54


Processing URLs:  51%|█████     | 507/1000 [21:00<16:58,  2.07s/it]

Error extracting text from http://www.khobreganemellat.net/en/MajlesMemberList.html?CategoryItemID=1012: HTTPConnectionPool(host='www.khobreganemellat.net', port=80): Max retries exceeded with url: /en/MajlesMemberList.html?CategoryItemID=1012 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023dfa40>: Failed to resolve 'www.khobreganemellat.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████     | 508/1000 [21:10<31:17,  3.82s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-30/turkey-summons-russian-envoy-after-another-airspace-violation


Processing URLs:  51%|█████     | 510/1000 [21:13<24:03,  2.95s/it]

Error extracting text from http://www.thesaurus.com/browse/imprisoned?s=t: 404 Client Error: Not Found for url: https://www.thesaurus.com/browse/imprisoned?s=t


Processing URLs:  51%|█████▏    | 514/1000 [21:18<14:09,  1.75s/it]

Error extracting text from http://www.latimes.com/world/la-fg-south-africa-zuma-noconfidence-20170529-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-south-africa-zuma-noconfidence-20170529-htmlstory.html


Processing URLs:  52%|█████▏    | 516/1000 [22:18<1:51:28, 13.82s/it]

Error extracting text from https://www.betfair.com/sport/politics/scottish-parliament-election/12312083/snp-majority-2021-scottish-parliament-election/924.255883282: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /sport/politics/scottish-parliament-election/12312083/snp-majority-2021-scottish-parliament-election/924.255883282 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30787e210>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))
URL filtered: https://www.youtube.com/watch?v=_MXVolDxy-Q


Processing URLs:  52%|█████▏    | 520/1000 [22:21<45:07,  5.64s/it]  

Error extracting text from https://www.afghanistan-analysts.org/working-in-a-grey-zone-icrc-forced-to-scale-back-its-work-in-afghanistan/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/working-in-a-grey-zone-icrc-forced-to-scale-back-its-work-in-afghanistan/
Error extracting text from http://english.alarabiya.net/en/News/middle-east/2017/02/26/Senior-FSA-officer-We-re-ready-for-direct-talks-with-Assad-regime.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2017/02/26/Senior-FSA-officer-We-re-ready-for-direct-talks-with-Assad-regime.html


Processing URLs:  52%|█████▏    | 521/1000 [22:22<34:34,  4.33s/it]

Error extracting text from http://www.newsletter.co.uk/news/platform-my-opposition-bill-could-radically-improve-stormont-1-7004731: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/platform-my-opposition-bill-could-radically-improve-stormont-1-7004731


Processing URLs:  52%|█████▏    | 523/1000 [22:23<19:36,  2.47s/it]

Error extracting text from http://www.visionofhumanity.org: 403 Client Error: Forbidden for url: http://www.visionofhumanity.org/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.terra.com.br/brasil/politica/lava-jato/camara-recebe-dois-novos-pedidos-de-impeachment-de-dilma,33f2b08d75f42b9b4fef6bb97ddb215dvze6mbnx.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://noticias.terra.com.br/brasil/politica/lava-jato/camara-recebe-dois-novos-pedidos-de-impeachment-de-dilma,33f2b08d75f42b9b4fef6bb97ddb215dvze6mbnx.html&amp;prev=search


Processing URLs:  52%|█████▏    | 524/1000 [22:23<15:28,  1.95s/it]

Error extracting text from http://thehill.com/policy/technology/360977-tech-beefs-up-lobbying-amid-russia-scrutiny: 403 Client Error: Forbidden for url: https://thehill.com/policy/technology/360977-tech-beefs-up-lobbying-amid-russia-scrutiny/


Processing URLs:  52%|█████▎    | 525/1000 [22:24<13:25,  1.70s/it]

Error extracting text from https://www.morningstaronline.co.uk/a-3584-Anti-imperialist-bloc-throws-weight-behind-Venezuela#.WYzX_q3Mz6k: 404 Client Error: Not Found for url: https://www.morningstaronline.co.uk/a-3584-Anti-imperialist-bloc-throws-weight-behind-Venezuela#.WYzX_q3Mz6k


Processing URLs:  53%|█████▎    | 530/1000 [22:31<10:00,  1.28s/it]

Error extracting text from http://www.brookings.edu/research/opinions/2016/02/05-russian-military-modernization-us-response-pifer: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/opinions/2016/02/05-russian-military-modernization-us-response-pifer


Processing URLs:  53%|█████▎    | 533/1000 [22:35<10:10,  1.31s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-04/congo-s-central-bank-moves-to-stabilize-biac-as-it-urges-calm


Processing URLs:  54%|█████▎    | 537/1000 [22:39<09:36,  1.25s/it]

Error extracting text from http://www.usnwc.edu/getattachment/7b0d0f70-bb07-48f2-af0a-7474e92d0bb0/San-Remo-ROE-Handbook.aspx: 404 Client Error: Not Found for url: https://www.usnwc.edu:443/getattachment/7b0d0f70-bb07-48f2-af0a-7474e92d0bb0/San-Remo-ROE-Handbook.aspx


Processing URLs:  54%|█████▍    | 539/1000 [22:43<10:10,  1.32s/it]

Error extracting text from https://www.nytimes.com/2017/02/12/business/media/rupert-murdoch-donald-trump-news-corporation.html?emc=edit_ee_20170213&amp;nl=todaysheadlines-europe&amp;nlid=77825025&amp;_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/12/business/media/rupert-murdoch-donald-trump-news-corporation.html?emc=edit_ee_20170213&amp;nl=todaysheadlines-europe&amp;nlid=77825025&amp;_r=1


Processing URLs:  54%|█████▍    | 542/1000 [22:46<09:33,  1.25s/it]

Error extracting text from http://nanonews.org/major-oil-producers-to-talk-output-freeze-in-qatar-in-april/: 500 Server Error: Internal Server Error for url: https://nanonews.org/major-oil-producers-to-talk-output-freeze-in-qatar-in-april/


Processing URLs:  55%|█████▍    | 545/1000 [22:48<05:08,  1.47it/s]

Error extracting text from http://www.reuters.com/article/2015/10/29/us-volkswagen-emissions-idUSKCN0SN1XC20151029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/29/us-volkswagen-emissions-idUSKCN0SN1XC20151029
Error extracting text from http://www.nytimes.com/2016/03/25/world/middleeast/us-indicts-iranians-in-cyberattacks-on-banks-and-a-dam.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/25/world/middleeast/us-indicts-iranians-in-cyberattacks-on-banks-and-a-dam.html?_r=0


Processing URLs:  55%|█████▌    | 554/1000 [22:59<09:42,  1.31s/it]

Error extracting text from https://upside.com/articles/2016/02/25/alphago-machine-learning.aspx: 404 Client Error: Not Found for url: https://www.upside.com/articles/2016/02/25/alphago-machine-learning.aspx


Processing URLs:  56%|█████▌    | 555/1000 [23:00<09:05,  1.23s/it]

Error extracting text from http://www.france24.com/en/20151213-libya-peace-deal-national-unity-plan-government-foreign-ministers-rome: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151213-libya-peace-deal-national-unity-plan-government-foreign-ministers-rome
URL filtered: https://www.youtube.com/watch?v=CuUtY-7j0EU&amp;t=307s


Processing URLs:  56%|█████▌    | 560/1000 [23:04<07:00,  1.05it/s]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKCN0W21BS?pageNumber=1&amp;virtualBrandChannel=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  56%|█████▌    | 562/1000 [23:10<14:03,  1.93s/it]

Error extracting text from http://vestnikkavkaza.net/video/Mikhail-Lysenko-speaks-about-the-fate-of-the-Akkuyu-nuclear-power-plant.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/video/Mikhail-Lysenko-speaks-about-the-fate-of-the-Akkuyu-nuclear-power-plant.html
Error extracting text from https://www.nasdaq.com/market-activity/stocks/fb/earnings: 403 Client Error: Forbidden for url: https://www.nasdaq.com/market-activity/stocks/fb/earnings


Processing URLs:  57%|█████▋    | 566/1000 [23:14<10:11,  1.41s/it]

Error extracting text from http://www.gallup.com/businessjournal/184748/lure-government-jobs-saudis.aspx: 404 Client Error: Not Found for url: https://www.gallup.com/businessjournal/184748/lure-government-jobs-saudis.aspx
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O2MBFZ6JIJUX01-6EVR3H9BAO55U6SK7J52QUGG2P


Processing URLs:  57%|█████▋    | 569/1000 [23:17<08:27,  1.18s/it]



Processing URLs:  57%|█████▋    | 570/1000 [23:18<08:52,  1.24s/it]

Error extracting text from http://en.trend.az/iran/politics/2440879.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2440879.html


Processing URLs:  57%|█████▋    | 572/1000 [23:21<08:46,  1.23s/it]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2016/03/14-burundi-eu-closes-consultations-cotonou-agreement/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/03/14-burundi-eu-closes-consultations-cotonou-agreement/


Processing URLs:  58%|█████▊    | 576/1000 [23:31<13:14,  1.87s/it]

URL filtered: https://www.instagram.com/p/BAPvZ-1GhbP/?taken-by=realdonaldtrump


Processing URLs:  58%|█████▊    | 579/1000 [23:32<07:05,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-iran-turk-idUSKBN1491NG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-iran-turk-idUSKBN1491NG
URL filtered: https://twitter.com/JChengWSJ


Processing URLs:  58%|█████▊    | 582/1000 [23:34<06:14,  1.12it/s]

Error extracting text from https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html?_r=0


Processing URLs:  58%|█████▊    | 583/1000 [23:37<09:36,  1.38s/it]

Error extracting text from https://www.nord-stream2.com/de/media-info/neuigkeiten/antrag-auf-vorsorgliche-zertifizierung-als-unabhangiger-ubertragungsnetzbetreiber-gestellt-150/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /de/media-info/neuigkeiten/antrag-auf-vorsorgliche-zertifizierung-als-unabhangiger-ubertragungsnetzbetreiber-gestellt-150/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301304470>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  59%|█████▊    | 587/1000 [23:41<07:09,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-fcc-idUSKBN19K06K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fcc-idUSKBN19K06K


Processing URLs:  59%|█████▉    | 588/1000 [23:43<08:38,  1.26s/it]

Error extracting text from http://www.cnbc.com/2016/01/18/us-recession-probability-at-highest-levels-since-fall-2011-survey.html: 503 Server Error: Service Unavailable for url: https://www.cnbc.com/2016/01/18/us-recession-probability-at-highest-levels-since-fall-2011-survey.html
URL filtered: https://www.bloomberg.com/view/articles/2016-12-14/put-forecasting-in-its-final-resting-place


Processing URLs:  59%|█████▉    | 591/1000 [23:45<06:27,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/the-oxford-economist-running-the-feds-interest-rate-machine-1448322140: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-oxford-economist-running-the-feds-interest-rate-machine-1448322140
URL filtered: https://twitter.com/IbnSiqilli/status/961795478068301824?ref_src=twcamp%5Ecopy%7Ctwsrc%5Eandroid%7Ctwgr%5Ecopy%7Ctwcon%5E7090%7Ctwterm%5E3


Processing URLs:  60%|█████▉    | 595/1000 [23:47<04:47,  1.41it/s]

Error extracting text from http://www.wsj.com/articles/paramount-to-break-hollywoods-home-video-window-1436377631: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/paramount-to-break-hollywoods-home-video-window-1436377631


Processing URLs:  60%|██████    | 600/1000 [23:54<08:07,  1.22s/it]

Error extracting text from http://the-japan-news.com/news/article/0002558515: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002558515


Processing URLs:  60%|██████    | 601/1000 [23:56<08:26,  1.27s/it]

Error extracting text from https://www.reuters.com/world/putin-biden-may-meet-june-ria-cites-kremlin-aide-2021-04-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/putin-biden-may-meet-june-ria-cites-kremlin-aide-2021-04-25/


Processing URLs:  60%|██████    | 603/1000 [23:56<05:03,  1.31it/s]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/3937459670x0xS1564590-17-3118/1318605/filing.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/3937459670x0xS1564590-17-3118/1318605/filing.pdf
Error extracting text from http://www.motortrend.com/news/toyota-halts-production-due-explosion-suppliers-factory/: 403 Client Error: Forbidden for url: http://www.motortrend.com/news/toyota-halts-production-due-explosion-suppliers-factory/


Processing URLs:  61%|██████    | 606/1000 [24:04<10:57,  1.67s/it]

Error extracting text from http://www.nytimes.com/2016/01/13/world/middleeast/iran-arak-reactor.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/13/world/middleeast/iran-arak-reactor.html?_r=0
URL filtered: http://www.bloomberg.com/news/articles/2016-03-13/brazil-protesters-assemble-as-rousseff-s-future-hangs-in-balance


Processing URLs:  61%|██████    | 609/1000 [24:06<07:05,  1.09s/it]

Error extracting text from https://www.whitehouse.gov/briefing-room/signed-legislation: 404 Client Error: Not Found for url: https://www.whitehouse.gov/briefing-room/signed-legislation


Processing URLs:  61%|██████    | 610/1000 [24:07<07:25,  1.14s/it]

Error extracting text from https://reut.rs/3sUfXe9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us-intelligence-community-still-divided-covid-19s-origin-summary-2021-08-27/


Processing URLs:  61%|██████    | 611/1000 [24:08<05:55,  1.09it/s]

Error extracting text from http://news.yahoo.com/iran-conducts-fresh-ballistic-missile-tests-state-media-085935504.html: 404 Client Error: Not Found for url: http://news.yahoo.com/iran-conducts-fresh-ballistic-missile-tests-state-media-085935504.html


Processing URLs:  61%|██████▏   | 614/1000 [24:13<09:39,  1.50s/it]

Error extracting text from http://blogs.spectator.co.uk/2015/09/when-will-the-eu-referendum-be-held-here-are-three-possible-dates/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2015/09/when-will-the-eu-referendum-be-held-here-are-three-possible-dates/


Processing URLs:  62%|██████▏   | 616/1000 [24:14<06:42,  1.05s/it]

Error extracting text from https://news.google.com/articles/CAIiEN7eh9qWe19oGuMCBI7W_R8qGAgEKg8IACoHCAowjtSUCjC30XQw_qe5AQ?hl=en-US&amp;gl=US&amp;ceid=US%3Aen: 500 Server Error: Internal Server Error for url: https://news.google.com/articles/CAIiEN7eh9qWe19oGuMCBI7W_R8qGAgEKg8IACoHCAowjtSUCjC30XQw_qe5AQ?hl=en-US&amp;gl=US&amp;ceid=US:en&gl=US&ceid=US:en


Processing URLs:  62%|██████▏   | 618/1000 [24:35<30:46,  4.83s/it]

Error extracting text from https://www.nytimes.com/2017/07/26/us/politics/brian-benczkowski-justice-department-nominee-russia-bank.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/26/us/politics/brian-benczkowski-justice-department-nominee-russia-bank.html


Processing URLs:  62%|██████▏   | 619/1000 [25:36<2:15:49, 21.39s/it]

Error extracting text from https://www.bundesverfassungsgericht.de/SharedDocs/Entscheidungen/DE/2021/04/rs20210415_2bvr054721.html: HTTPSConnectionPool(host='www.bundesverfassungsgericht.de', port=443): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▏   | 622/1000 [25:39<49:55,  7.92s/it]  

Error extracting text from https://www.yahoo.com/news/putin-envoy-meets-iran-officials-syria-talks-133245101.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/putin-envoy-meets-iran-officials-syria-talks-133245101.html


Processing URLs:  62%|██████▏   | 623/1000 [25:40<37:10,  5.92s/it]

Error extracting text from https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2022/february-2022: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2022/february-2022


Processing URLs:  62%|██████▏   | 624/1000 [25:41<27:55,  4.46s/it]

URL filtered: https://www.youtube.com/watch?v=u0-oinyjsk0


Processing URLs:  63%|██████▎   | 629/1000 [25:45<08:51,  1.43s/it]

Error extracting text from http://www.reuters.com/article/2015/11/25/us-mideast-crisis-syria-turkey-impact-idUSKBN0TE04M20151125#X7wdVpr4Is1HtrhP.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/25/us-mideast-crisis-syria-turkey-impact-idUSKBN0TE04M20151125#X7wdVpr4Is1HtrhP.97
Error extracting text from http://www.reuters.com/article/us-northkorea-southkorea-cyber-idUSKCN0YZ0BE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-southkorea-cyber-idUSKCN0YZ0BE


Processing URLs:  63%|██████▎   | 630/1000 [25:46<08:25,  1.37s/it]

Error extracting text from https://tradingeconomics.com/commodity/gasoline: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/gasoline
URL filtered: https://www.facebook.com/LastWeekTonight/?fref=nf


Processing URLs:  64%|██████▎   | 637/1000 [25:55<05:58,  1.01it/s]

Error extracting text from https://www.middleeastmonitor.com/20171127-turkey-iran-and-qatar-sign-new-trade-transport-agreement/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20171127-turkey-iran-and-qatar-sign-new-trade-transport-agreement/
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-usa-senate-idUSKBN17Q1LR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-usa-senate-idUSKBN17Q1LR


Processing URLs:  64%|██████▍   | 638/1000 [25:56<05:36,  1.07it/s]

Error extracting text from http://news.sky.com/story/1630531/north-korea-carries-out-cyber-attack-on-south: 404 Client Error: Not Found for url: https://news.sky.com/story/1630531/north-korea-carries-out-cyber-attack-on-south


Processing URLs:  64%|██████▍   | 639/1000 [25:57<06:21,  1.06s/it]

Error extracting text from http://minpromtorg.gov.ru/press-centre/news/: HTTPSConnectionPool(host='minpromtorg.gov.ru', port=443): Max retries exceeded with url: /press-centre/news/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)')))


Processing URLs:  64%|██████▍   | 645/1000 [26:15<09:25,  1.59s/it]

Error extracting text from http://www.state.gov/t/isn/4791: 404 Client Error: Not Found for url: https://www.state.gov/t/isn/4791


Processing URLs:  65%|██████▍   | 649/1000 [26:19<06:49,  1.17s/it]

Error extracting text from http://www.businessinsider.com/iraq-launches-operation-to-retake-fallujah-2016-5: 404 Client Error: Not Found for url: https://www.businessinsider.com/iraq-launches-operation-to-retake-fallujah-2016-5


Processing URLs:  65%|██████▌   | 653/1000 [26:22<04:35,  1.26it/s]

Error extracting text from https://www.wsj.com/articles/trump-signals-change-in-syria-policy-1491412945: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-signals-change-in-syria-policy-1491412945


Processing URLs:  65%|██████▌   | 654/1000 [27:23<1:48:11, 18.76s/it]

Error extracting text from http://aa.com.tr/en/africa/nigeria-blasts-cameroon-forced-refugees-deportations/923214: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  66%|██████▌   | 656/1000 [27:24<54:00,  9.42s/it]  

Error extracting text from http://www.socialeurope.eu/2016/01/expect-uks-eu-referendum-in-june-or-july-this-year-heres-why/: 403 Client Error: Forbidden for url: http://www.socialeurope.eu/2016/01/expect-uks-eu-referendum-in-june-or-july-this-year-heres-why/


Processing URLs:  66%|██████▌   | 657/1000 [27:25<39:50,  6.97s/it]

Error extracting text from http://www.reuters.com/article/us-nireland-politics-idUSKBN1792IA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nireland-politics-idUSKBN1792IA


Processing URLs:  66%|██████▌   | 660/1000 [27:32<21:34,  3.81s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-13/china-s-economy-defies-prophets-of-doom-even-as-2017-risks-loom


Processing URLs:  66%|██████▋   | 663/1000 [27:35<13:10,  2.35s/it]

Error extracting text from http://www.worldoil.com/news/2017/3/16/iran-set-to-out-produce-qatar-at-worlds-biggest-gas-field: 404 Client Error: Not Found for url: https://worldoil.com/news/2017/3/16/iran-set-to-out-produce-qatar-at-worlds-biggest-gas-field


Processing URLs:  66%|██████▋   | 664/1000 [27:41<18:38,  3.33s/it]

Error extracting text from https://www.reuters.com/article/us-usa-stocks/wall-street-falls-on-china-nafta-concerns-idUSKBN1EZ1DU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks/wall-street-falls-on-china-nafta-concerns-idUSKBN1EZ1DU


Processing URLs:  67%|██████▋   | 668/1000 [28:45<1:32:11, 16.66s/it]

Error extracting text from http://yallpolitics.com/index.php/yp/post/43210/: HTTPConnectionPool(host='yallpolitics.com', port=80): Max retries exceeded with url: /index.php/yp/post/43210/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3029d7a40>, 'Connection to yallpolitics.com timed out. (connect timeout=60)'))


Processing URLs:  67%|██████▋   | 669/1000 [28:46<1:08:24, 12.40s/it]

Error extracting text from https://www.reuters.com/article/us-usa-trump-trade/u-s-senators-express-optimism-about-nafta-after-trump-meeting-idUSKBN1FR33S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-trade/u-s-senators-express-optimism-about-nafta-after-trump-meeting-idUSKBN1FR33S


Processing URLs:  67%|██████▋   | 672/1000 [28:53<33:25,  6.11s/it]  

URL filtered: https://twitter.com/ladpolitics


Processing URLs:  67%|██████▋   | 674/1000 [28:54<19:34,  3.60s/it]



Processing URLs:  68%|██████▊   | 677/1000 [28:58<12:01,  2.23s/it]

Error extracting text from https://blog.openai.com/dota-2/: HTTPSConnectionPool(host='blog.openai.com', port=443): Max retries exceeded with url: /dota-2/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3029d66f0>: Failed to resolve 'blog.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/MahirZeynalov/status/758989255242678274


Processing URLs:  68%|██████▊   | 684/1000 [29:25<19:57,  3.79s/it]

Error extracting text from https://reut.rs/396M5mT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from https://www.wsj.com/articles/s-p-rules-venezuela-in-default-on-interest-payment-1510632714?mg=prod/accounts-wsj: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/s-p-rules-venezuela-in-default-on-interest-payment-1510632714?mg=prod/accounts-wsj


Processing URLs:  69%|██████▊   | 686/1000 [29:28<13:59,  2.67s/it]

Error extracting text from https://www.kivitv.com/newsy/waymo-gets-approval-to-launch-a-commercial-ridehailing-service?autoplay=true: 404 Client Error: Not Found for url: https://www.kivitv.com/newsy/waymo-gets-approval-to-launch-a-commercial-ridehailing-service?autoplay=true


Processing URLs:  69%|██████▊   | 687/1000 [29:29<11:44,  2.25s/it]

Error extracting text from http://www.reuters.com/article/us-poland-constitution-eu-idUSKCN0YB16R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-constitution-eu-idUSKCN0YB16R


Processing URLs:  69%|██████▉   | 689/1000 [29:30<07:02,  1.36s/it]

Error extracting text from https://www.nytimes.com/2021/08/22/world/americas/haiti-earthquake-aid.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/22/world/americas/haiti-earthquake-aid.html


Processing URLs:  69%|██████▉   | 690/1000 [29:30<05:55,  1.15s/it]

Error extracting text from http://allafrica.com/stories/201603030218.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201603030218.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3029d78c0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  69%|██████▉   | 692/1000 [30:32<1:28:00, 17.14s/it]

Error extracting text from http://www.charlotteobserver.com/living/religion/article77770417.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  69%|██████▉   | 693/1000 [30:33<1:04:31, 12.61s/it]

Error extracting text from http://www.businessinsider.com/ap-american-detainees-death-in-north-korea-baffles-experts-2017-6: 404 Client Error: Not Found for url: https://www.businessinsider.com/ap-american-detainees-death-in-north-korea-baffles-experts-2017-6


Processing URLs:  70%|██████▉   | 696/1000 [30:34<23:44,  4.69s/it]  

Error extracting text from http://www.wsj.com/articles/a-vast-email-conspiracy-1460069105: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-vast-email-conspiracy-1460069105
Error extracting text from http://www.reuters.com/article/us-usa-tax/trump-urges-adding-anti-obamacare-provision-to-tax-bill-lawmaker-idUSKBN1D32F2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax/trump-urges-adding-anti-obamacare-provision-to-tax-bill-lawmaker-idUSKBN1D32F2


Processing URLs:  70%|███████   | 700/1000 [30:41<11:31,  2.31s/it]

Error extracting text from http://ajw.asahi.com/article/behind_news/politics/AJ201510140049: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/behind_news/politics/AJ201510140049 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3013070e0>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.business-standard.com/article/news-ians/iran-hopes-high-turn-out-in-parliamentary-elections-116021000102_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/iran-hopes-high-turn-out-in-parliamentary-elections-116021000102_1.html


Processing URLs:  71%|███████   | 707/1000 [30:48<07:05,  1.45s/it]

Error extracting text from http://www.freedomnewspaper.com/6651-2/: 404 Client Error: Not Found for url: https://ufa747.live/6651-2/
Error extracting text from http://www.nytimes.com/2016/02/05/world/middleeast/saudis-suggest-a-syria-ground-operation-led-by-us-and-its-allies.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/05/world/middleeast/saudis-suggest-a-syria-ground-operation-led-by-us-and-its-allies.html


Processing URLs:  71%|███████   | 710/1000 [31:56<1:32:28, 19.13s/it]

Error extracting text from http://origin.www.uscc.gov/sites/default/files/Research/ADIZ%20Update_0.pdf: HTTPConnectionPool(host='origin.www.uscc.gov', port=80): Max retries exceeded with url: /sites/default/files/Research/ADIZ%20Update_0.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30341bb90>, 'Connection to origin.www.uscc.gov timed out. (connect timeout=60)'))


Processing URLs:  71%|███████   | 712/1000 [31:59<50:07, 10.44s/it]  

Error extracting text from http://www.tradingeconomics.com/: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/


Processing URLs:  72%|███████▏  | 715/1000 [32:03<21:28,  4.52s/it]

Error extracting text from http://english.ahram.org.eg/NewsContent/1/64/244965/Egypt/Politics-/Russias-Egypt-airport-inspections-constructive,-at.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/1/64/244965/Egypt/Politics-/Russias-Egypt-airport-inspections-constructive,-at.aspx


Processing URLs:  72%|███████▏  | 718/1000 [32:04<09:09,  1.95s/it]

Error extracting text from http://plebiscito.registraduria.gov.co/99PL/DPLZZZZZZZZZZZZZZZZZ_L1.htm: HTTPConnectionPool(host='plebiscito.registraduria.gov.co', port=80): Max retries exceeded with url: /99PL/DPLZZZZZZZZZZZZZZZZZ_L1.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023ddb80>: Failed to resolve 'plebiscito.registraduria.gov.co' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.el-nacional.com/noticias/petroleo/deterioro-crudo-venezolano-impacta-clientes-refinacion_208692: 403 Client Error: Forbidden for url: https://www.elnacional.com/noticias/petroleo/deterioro-crudo-venezolano-impacta-clientes-refinacion_208692


Processing URLs:  72%|███████▏  | 719/1000 [33:05<1:21:00, 17.30s/it]

Error extracting text from http://arcticenergysummit.com/story/Finland_to_Host_2017_Arctic_Energy_Summit: HTTPConnectionPool(host='arcticenergysummit.com', port=80): Max retries exceeded with url: /story/Finland_to_Host_2017_Arctic_Energy_Summit (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30341ab70>, 'Connection to arcticenergysummit.com timed out. (connect timeout=60)'))
URL filtered: https://www.axios.com/sean-parker-unloads-on-facebook-2508036343.html


Processing URLs:  73%|███████▎  | 727/1000 [33:20<18:55,  4.16s/it]  

Error extracting text from http://aranews.net/2016/02/over-20-isis-jihadis-killed-in-u-s-strikes-in-iraqs-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/02/over-20-isis-jihadis-killed-in-u-s-strikes-in-iraqs-mosul/


Processing URLs:  73%|███████▎  | 728/1000 [33:21<14:36,  3.22s/it]

Error extracting text from https://apple.news/ApAfa-aA1OCO44O7z5T7GmQ: 404 Client Error: Not Found for url: https://apple.news/ApAfa-aA1OCO44O7z5T7GmQ


Processing URLs:  73%|███████▎  | 730/1000 [33:25<11:13,  2.49s/it]

Error extracting text from http://www.cfr.org/china/shanghai-cooperation-organization/p10883: 404 Client Error: Not Found for url: https://www.cfr.org/china/shanghai-cooperation-organization/p10883


Processing URLs:  73%|███████▎  | 731/1000 [33:25<08:19,  1.86s/it]

Error extracting text from https://www.reuters.com/article/us-usa-election-alabama/alabama-democrat-turns-up-attacks-on-roy-moore-in-senate-races-final-stretch-idUSKBN1E01BD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-election-alabama/alabama-democrat-turns-up-attacks-on-roy-moore-in-senate-races-final-stretch-idUSKBN1E01BD


Processing URLs:  73%|███████▎  | 734/1000 [33:40<13:21,  3.01s/it]

Error extracting text from http://www.thesun.co.uk/sol/homepage/news/politics/7095695/UK-and-America-can-better-friends-than-ever-Mr-Obama-if-we-LEAVE-the-EU-says-Boris-Johnson.html: 404 Client Error: Not Found for url: https://www.thesun.co.uk/sol/homepage/news/politics/7095695/UK-and-America-can-better-friends-than-ever-Mr-Obama-if-we-LEAVE-the-EU-says-Boris-Johnson.html


Processing URLs:  74%|███████▎  | 736/1000 [34:01<33:07,  7.53s/it]

Error extracting text from http://www.haaretz.com/israel-news/1.805112: 503 Server Error: Service Unavailable for url: https://www.haaretz.com/israel-news/2017-08-03/ty-article/netanyahu-suspected-of-bribery-and-fraud-police-tells-court/0000017f-e765-da9b-a1ff-ef6f6aec0000


Processing URLs:  74%|███████▍  | 740/1000 [34:09<14:33,  3.36s/it]

Error extracting text from http://carnegie-mec.org/2016/01/13/unhappy-marriage-civil-military-relations-in-post-saddam-iraq/isfj: 403 Client Error: Forbidden for url: http://carnegie-mec.org/2016/01/13/unhappy-marriage-civil-military-relations-in-post-saddam-iraq/isfj


Processing URLs:  74%|███████▍  | 743/1000 [34:24<18:23,  4.29s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html


Processing URLs:  74%|███████▍  | 745/1000 [34:26<11:30,  2.71s/it]

Error extracting text from https://theconversation.com/survey-shows-1-in-4-new-zealanders-remain-hesitant-about-a-coronavirus-vaccine-145304: 403 Client Error: Forbidden for url: https://theconversation.com/survey-shows-1-in-4-new-zealanders-remain-hesitant-about-a-coronavirus-vaccine-145304


Processing URLs:  75%|███████▍  | 746/1000 [35:26<1:24:18, 19.91s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-08-25/iran-intercepts-us-warship-in-strait-of-hormuz: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  75%|███████▌  | 753/1000 [35:37<11:19,  2.75s/it]  

Error extracting text from http://www.reuters.com/article/us-russia-china-xi-northkorea-idUSKBN19P1Q7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-china-xi-northkorea-idUSKBN19P1Q7


Processing URLs:  76%|███████▌  | 756/1000 [35:48<12:46,  3.14s/it]

Error extracting text from https://www.wsj.com/articles/venezuelas-opposition-sets-general-strike-1500323383: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-opposition-sets-general-strike-1500323383


Processing URLs:  76%|███████▌  | 760/1000 [36:17<21:03,  5.27s/it]

Error extracting text from https://tradingeconomics.com/united-kingdom/employment-rate: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/united-kingdom/employment-rate


Processing URLs:  76%|███████▋  | 763/1000 [36:22<11:06,  2.81s/it]

Error extracting text from http://www.rottentomatoes.com/m/captain_america_civil_war/: 403 Client Error: Forbidden for url: http://www.rottentomatoes.com/m/captain_america_civil_war/


Processing URLs:  77%|███████▋  | 766/1000 [36:24<06:32,  1.68s/it]

Error extracting text from http://www.newsweek.com/assad-gave-russian-air-force-free-indefinite-stay-syria-416132: 403 Client Error: Forbidden for url: https://www.newsweek.com/assad-gave-russian-air-force-free-indefinite-stay-syria-416132


Processing URLs:  77%|███████▋  | 768/1000 [36:27<05:20,  1.38s/it]

Error extracting text from http://www.kashmirreader.com/2017/03/26/india-seal-border-pakistan-2018-rajnath-singh/: 404 Client Error: Not Found for url: https://kashmirreader.com/2017/03/26/india-seal-border-pakistan-2018-rajnath-singh/
Error extracting text from https://blog.openai.com/openai-five-benchmark/: HTTPSConnectionPool(host='blog.openai.com', port=443): Max retries exceeded with url: /openai-five-benchmark/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ffd5d010>: Failed to resolve 'blog.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  77%|███████▋  | 774/1000 [36:39<05:55,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-usa-oil-exports-idUSKBN1AC0ER: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-oil-exports-idUSKBN1AC0ER


Processing URLs:  78%|███████▊  | 777/1000 [36:47<07:40,  2.07s/it]

URL filtered: https://www.youtube.com/watch?v=V-Bot5eoPXQ


Processing URLs:  78%|███████▊  | 780/1000 [36:50<05:40,  1.55s/it]

Error extracting text from http://www.newsweek.com/iran-most-famous-general-more-popular-president-new-threats-home-abroad-813803: 403 Client Error: Forbidden for url: https://www.newsweek.com/iran-most-famous-general-more-popular-president-new-threats-home-abroad-813803


Processing URLs:  78%|███████▊  | 781/1000 [36:51<04:43,  1.29s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/266637-trump-grabs-double-digit-lead-in-iowa-poll: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/266637-trump-grabs-double-digit-lead-in-iowa-poll/


Processing URLs:  78%|███████▊  | 783/1000 [36:55<05:44,  1.59s/it]

Error extracting text from http://tass.ru/en/economy/840692: 404 Client Error: Not Found for url: https://tass.ru/en/economy/840692
URL filtered: http://www.straitstimes.com/asia/se-asia/china-likely-to-build-more-islands-in-south-china-sea-philippines?utm_content=bufferf39d9&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  79%|███████▊  | 786/1000 [36:58<04:34,  1.28s/it]

Error extracting text from http://english.ahram.org.eg/News/244680.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/News/244680.aspx


Processing URLs:  79%|███████▉  | 789/1000 [37:01<04:25,  1.26s/it]

Error extracting text from https://www.stripes.com/news/mattis-signs-orders-to-deploy-more-forces-to-afghanistan-1.485419#.WahiJTMfm8U: 404 Client Error: Not Found for url: https://www.stripes.com/news/mattis-signs-orders-to-deploy-more-forces-to-afghanistan-1.485419#.WahiJTMfm8U


Processing URLs:  79%|███████▉  | 791/1000 [37:06<06:02,  1.73s/it]

Error extracting text from https://www.reuters.com/article/g20-summit-trump-bilaterals/at-g20-trump-to-hold-at-least-eight-side-bilateral-meetings-idUSW1N22K02E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/g20-summit-trump-bilaterals/at-g20-trump-to-hold-at-least-eight-side-bilateral-meetings-idUSW1N22K02E


Processing URLs:  79%|███████▉  | 793/1000 [37:07<04:06,  1.19s/it]

Error extracting text from https://www.google.com/selfdrivingcar/faq/#q17: 404 Client Error: Not Found for url: https://www.google.com/selfdrivingcar/faq/#q17


Processing URLs:  80%|███████▉  | 799/1000 [37:16<05:00,  1.49s/it]

Error extracting text from http://www.peruthisweek.com/news-8-million-peruvians-candidates-109318: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-8-million-peruvians-candidates-109318 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301ab49b0>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  80%|████████  | 800/1000 [37:19<06:28,  1.94s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/2016-02/02/content_6885721.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/2016-02/02/content_6885721.htm


Processing URLs:  80%|████████  | 801/1000 [37:20<06:07,  1.85s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/02/744568/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/02/744568/story.html


Processing URLs:  80%|████████  | 802/1000 [37:21<04:36,  1.40s/it]

Error extracting text from http://sunnewsonline.com/buhari-doesnt-understand-nigeria-akintola/: 403 Client Error: Forbidden for url: https://sunnewsonline.com/buhari-doesnt-understand-nigeria-akintola/


Processing URLs:  80%|████████  | 805/1000 [37:28<06:58,  2.15s/it]

URL filtered: https://www.youtube.com/watch?v=sm23q4mR7r4


Processing URLs:  81%|████████  | 809/1000 [37:34<05:14,  1.65s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-trump-roy-moore-florida-rally-20171208-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-trump-roy-moore-florida-rally-20171208-story.html


Processing URLs:  81%|████████  | 811/1000 [37:36<04:22,  1.39s/it]

Error extracting text from http://thehill.com/policy/finance/256504-ex-im-backers-move-to-force-vote: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/256504-ex-im-backers-move-to-force-vote/


Processing URLs:  81%|████████▏ | 813/1000 [37:37<02:50,  1.10it/s]

Error extracting text from http://warisboring.com/articles/u-s-air-force-planes-jam-signals-back-up-iraqi-offensives/?mc_cid=11574eeab0&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/u-s-air-force-planes-jam-signals-back-up-iraqi-offensives/?mc_cid=11574eeab0&amp;mc_eid=0467f21653


Processing URLs:  81%|████████▏ | 814/1000 [37:38<02:57,  1.05it/s]

Error extracting text from https://biatimes.com/2016/04/28/nigerias-second-genocide-on-biafra-authorized-by-buhari-ipob/: HTTPSConnectionPool(host='biatimes.com', port=443): Max retries exceeded with url: /2016/04/28/nigerias-second-genocide-on-biafra-authorized-by-buhari-ipob/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x307f0ca70>: Failed to resolve 'biatimes.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 816/1000 [37:39<02:37,  1.17it/s]

Error extracting text from https://www.iaea.org/newscenter/statements/iaea-director-generals-statement-and-road-map-clarification-past-present-outstanding-issues-regarding-irans-nuclear-program: 404 Client Error: Not Found for url: https://www.iaea.org/newscenter/statements/iaea-director-generals-statement-and-road-map-clarification-past-present-outstanding-issues-regarding-irans-nuclear-program


Processing URLs:  82%|████████▏ | 817/1000 [37:40<02:27,  1.24it/s]

Error extracting text from http://www.cdc.gov/biosafety/publications/bmbl5/BMBL5_sect_IV.pdf: 404 Client Error: Not Found for url: https://www.cdc.gov/biosafety/publications/bmbl5/BMBL5_sect_IV.pdf


Processing URLs:  82%|████████▏ | 819/1000 [38:01<17:07,  5.67s/it]

Error extracting text from http://www.ew.com/article/2015/10/29/game-thrones-season-6-premiere-date: 406 Client Error: Not Acceptable for url: https://www.ew.com/article/2015/10/29/game-thrones-season-6-premiere-date


Processing URLs:  82%|████████▏ | 820/1000 [38:01<12:38,  4.22s/it]

Error extracting text from https://www.nytimes.com/live/2021/11/04/world/covid-delta-variant-vaccine#covid-surge-europe-who: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/11/04/world/covid-delta-variant-vaccine#covid-surge-europe-who


Processing URLs:  82%|████████▏ | 822/1000 [38:03<07:13,  2.44s/it]

Error extracting text from https://reut.rs/3wdUJsb: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/finnish-parliament-backs-eu-recovery-plan-2021-05-18/
Error extracting text from http://www.realclearpolitics.com/epolls/latest_polls/governor/: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/latest_polls/governor/


Processing URLs:  83%|████████▎ | 826/1000 [38:19<09:01,  3.11s/it]

Error extracting text from https://www.reuters.com/world/kremlin-says-russia-us-talks-positive-signal-putin-biden-summit-2021-05-20/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/kremlin-says-russia-us-talks-positive-signal-putin-biden-summit-2021-05-20/


Processing URLs:  83%|████████▎ | 830/1000 [38:26<06:44,  2.38s/it]

URL filtered: https://www.youtube.com/results?search_query=bitter+lake+adam+curtis


Processing URLs:  83%|████████▎ | 832/1000 [38:27<04:26,  1.59s/it]

Error extracting text from https://weimar.bundesarchiv.de/WEIMAR/DE/Content/Artikel/Entdecken/scheidemann-rede-film-audio.html: HTTPSConnectionPool(host='weimar.bundesarchiv.de', port=443): Max retries exceeded with url: /WEIMAR/DE/Content/Artikel/Entdecken/scheidemann-rede-film-audio.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  84%|████████▍ | 845/1000 [38:45<03:01,  1.17s/it]

Error extracting text from https://www.reuters.com/article/us-usa-politics-snowden/trump-says-hes-considering-pardon-for-leaker-edward-snowden-idUSKCN25B10Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-politics-snowden/trump-says-hes-considering-pardon-for-leaker-edward-snowden-idUSKCN25B10Z


Processing URLs:  85%|████████▍ | 848/1000 [38:48<02:31,  1.00it/s]

Error extracting text from https://www.newsweek.com/volodymyr-zelensky-assassination-ukraine-russia-invasion-survive-war-1684801: 403 Client Error: Forbidden for url: https://www.newsweek.com/volodymyr-zelensky-assassination-ukraine-russia-invasion-survive-war-1684801


Processing URLs:  85%|████████▌ | 853/1000 [38:58<05:06,  2.09s/it]

Error extracting text from https://www.bna.com/dakota-access-pipeline-n57982087544/: 403 Client Error: Forbidden for url: https://www.bloombergindustry.com/


Processing URLs:  85%|████████▌ | 854/1000 [39:26<23:26,  9.64s/it]

Error extracting text from https://www.recode.net/2017/4/27/15413870/comcast-broadband-internet-pay-tv-subscribers-q1-2017: Exceeded 30 redirects.


Processing URLs:  86%|████████▌ | 857/1000 [39:29<09:26,  3.96s/it]

Error extracting text from https://inews.co.uk/news/schools-when-break-up-christmas-2021-uk-holiday-dates-go-back-january-1356122): 404 Client Error: Not Found for url: https://inews.co.uk/news/schools-when-break-up-christmas-2021-uk-holiday-dates-go-back-january-1356122)


Processing URLs:  86%|████████▌ | 860/1000 [39:38<08:36,  3.69s/it]

URL filtered: https://www.youtube.com/watch?v=fS1Q9tIfm-U


Processing URLs:  86%|████████▌ | 862/1000 [39:39<05:21,  2.33s/it]

Error extracting text from http://www.thedickinsonpress.com/energy/oil/3881458-hoeven-plans-add-lifting-oil-export-ban-highway-bill: 404 Client Error: Not Found for url: https://www.thedickinsonpress.com/energy/oil/3881458-hoeven-plans-add-lifting-oil-export-ban-highway-bill


Processing URLs:  86%|████████▋ | 864/1000 [39:42<04:23,  1.93s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/08/04/770022/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/08/04/770022/story.html


Processing URLs:  87%|████████▋ | 870/1000 [40:51<40:54, 18.88s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/venezuela/article44569893.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  87%|████████▋ | 872/1000 [40:53<20:59,  9.84s/it]

Error extracting text from http://aranews.net/2016/04/us-415-million-aid-kurdish-peshmerga-will-go-baghdad/?utm_source=: 404 Client Error: Not Found for url: http://aranews.net/2016/04/us-415-million-aid-kurdish-peshmerga-will-go-baghdad/?utm_source=


Processing URLs:  88%|████████▊ | 875/1000 [40:57<08:24,  4.03s/it]

Error extracting text from https://www.nytimes.com/2017/09/05/us/politics/trump-iran-deal-nuclear.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/05/us/politics/trump-iran-deal-nuclear.html


Processing URLs:  88%|████████▊ | 880/1000 [41:22<10:55,  5.47s/it]

Error extracting text from http://www.wsj.com/articles/u-s-second-quarter-gdp-revised-up-to-1-4-gain-1475152328: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-second-quarter-gdp-revised-up-to-1-4-gain-1475152328


Processing URLs:  89%|████████▉ | 888/1000 [41:48<04:10,  2.24s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-iraq-analysis-3b71cc14-37bc-11e6-a254-2b336e293a3c-20160621-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-iraq-analysis-3b71cc14-37bc-11e6-a254-2b336e293a3c-20160621-story.html


Processing URLs:  89%|████████▉ | 889/1000 [42:51<37:44, 20.40s/it]

Error extracting text from http://new.tse.ir/en/indices.html: HTTPConnectionPool(host='new.tse.ir', port=80): Max retries exceeded with url: /en/indices.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x304bf04a0>, 'Connection to new.tse.ir timed out. (connect timeout=60)'))


Processing URLs:  89%|████████▉ | 893/1000 [42:55<10:41,  6.00s/it]

URL filtered: https://techcrunch.com/2017/01/16/facebook-takes-its-fake-news-fight-to-germany/
Error extracting text from http://www.vanguardngr.com/2016/07/attack-christians-islamic-leaders-dismantle-doctrine-hatred-christian-elders-forum/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/07/attack-christians-islamic-leaders-dismantle-doctrine-hatred-christian-elders-forum/


Processing URLs:  90%|████████▉ | 898/1000 [43:03<03:55,  2.31s/it]

Error extracting text from http://www.nytimes.com/2016/01/19/world/middleeast/in-libya-us-courts-unreliable-allies-to-counter-isis.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/world/middleeast/in-libya-us-courts-unreliable-allies-to-counter-isis.html


Processing URLs:  90%|█████████ | 902/1000 [43:11<03:03,  1.87s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scenarios-idUSKBN19E1QI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scenarios-idUSKBN19E1QI


Processing URLs:  90%|█████████ | 903/1000 [43:11<02:16,  1.41s/it]

Error extracting text from https://unherd.com/2021/10/how-we-can-escape-a-lockdown-christmas/: 403 Client Error: Forbidden for url: https://unherd.com/2021/10/how-we-can-escape-a-lockdown-christmas/


Processing URLs:  90%|█████████ | 904/1000 [43:15<03:24,  2.13s/it]

Error extracting text from http://rbth.com/news/2016/10/09/rosatom-ceo-and-turkish-energy-minister-discuss-akkuyu-npp-project_637175: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/10/09/rosatom-ceo-and-turkish-energy-minister-discuss-akkuyu-npp-project_637175


Processing URLs:  91%|█████████ | 907/1000 [43:20<02:39,  1.72s/it]

Error extracting text from https://thehill.com/opinion/white-house/524615-biden-win-would-leave-gop-poised-for-2024-comeback: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/524615-biden-win-would-leave-gop-poised-for-2024-comeback/


Processing URLs:  91%|█████████ | 909/1000 [43:22<01:56,  1.28s/it]

Error extracting text from https://www.dhs.gov/news/2016/06/23/statement-secretary-johnson-todays-supreme-court-decision: 403 Client Error: Forbidden for url: https://www.dhs.gov/news/2016/06/23/statement-secretary-johnson-todays-supreme-court-decision


Processing URLs:  91%|█████████ | 911/1000 [43:38<06:04,  4.09s/it]

Error extracting text from http://mobile.reuters.com/article/newsOne/idUSKCN1020HK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/newsOne/idUSKCN1020HK


Processing URLs:  91%|█████████▏| 913/1000 [43:41<04:12,  2.90s/it]

Error extracting text from https://www.dailystar.com.lb/News/World/2016/May-18/352743-nato-montenegro-to-sign-accession-accord-stoltenberg.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/World/2016/May-18/352743-nato-montenegro-to-sign-accession-accord-stoltenberg.ashx


Processing URLs:  91%|█████████▏| 914/1000 [43:42<03:27,  2.41s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/rousseff-turns-to-brazil-supreme-court-to-snuff-out-impeachment


Processing URLs:  92%|█████████▏| 918/1000 [43:46<01:52,  1.37s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/thailand-to-hold-general-election-in-mid-oct-2018-at-the-earliest-dec-2018-at-the: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/se-asia/thailand-to-hold-general-election-in-mid-oct-2018-at-the-earliest-dec-2018-at-the


Processing URLs:  92%|█████████▏| 919/1000 [43:50<02:39,  1.96s/it]

URL filtered: http://www.nationalreview.com/article/442291/buzzfeed-facebook-fake-news-study-methodology-questioned


Processing URLs:  92%|█████████▏| 921/1000 [43:51<01:50,  1.40s/it]

Error extracting text from http://science.sciencemag.org/content/337/6096/816: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/science.1225829


Processing URLs:  92%|█████████▎| 925/1000 [43:53<00:56,  1.32it/s]

Error extracting text from http://life-span.healthgrove.com/l/63/62: HTTPConnectionPool(host='life-span.healthgrove.com', port=80): Max retries exceeded with url: /l/63/62 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304bf2f00>: Failed to resolve 'life-span.healthgrove.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.washingtontimes.com/news/2015/nov/17/donald-trump-seen-unlikely-to-win-in-iowa-despite-/?page=all: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/nov/17/donald-trump-seen-unlikely-to-win-in-iowa-despite-/?page=all


Processing URLs:  93%|█████████▎| 928/1000 [44:00<02:07,  1.77s/it]

Error extracting text from http://csis.org/files/publication/1503qnk_sk.pdf: 404 Client Error: Not Found for url: https://www.csis.org/files/publication/1503qnk_sk.pdf


Processing URLs:  93%|█████████▎| 933/1000 [44:07<01:33,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VG00H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VG00H


Processing URLs:  94%|█████████▍| 938/1000 [44:17<02:13,  2.16s/it]

URL filtered: http://www.businessinsider.com/mueller-obtains-warrant-for-russia-linked-facebook-ads-and-accounts-2017-9


Processing URLs:  94%|█████████▍| 940/1000 [44:17<01:14,  1.24s/it]

Error extracting text from https://www.nytimes.com/2021/08/02/nyregion/covid-mask-mandate-ny.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/02/nyregion/covid-mask-mandate-ny.html


Processing URLs:  94%|█████████▍| 944/1000 [44:44<05:32,  5.94s/it]

Error extracting text from https://www.recode.net/2017/5/19/15663436/us-drone-registration-rules-faa: Exceeded 30 redirects.


Processing URLs:  94%|█████████▍| 945/1000 [44:44<04:13,  4.61s/it]

Error extracting text from https://www.nytimes.com/2018/02/13/world/asia/pakistan-taliban-commander.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/13/world/asia/pakistan-taliban-commander.html


Processing URLs:  96%|█████████▌| 958/1000 [45:06<00:58,  1.38s/it]

Error extracting text from http://www.businessinsider.my/john-mccain-slams-rand-paul-for-blocking-montenegro-from-joining-nato-2017-3/: HTTPConnectionPool(host='www.businessinsider.my', port=80): Max retries exceeded with url: /john-mccain-slams-rand-paul-for-blocking-montenegro-from-joining-nato-2017-3/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304920ec0>: Failed to resolve 'www.businessinsider.my' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  96%|█████████▌| 960/1000 [45:10<01:15,  1.88s/it]

URL filtered: https://inews.co.uk/opinion/dominic-cummings-covid-select-committee-evidence-liars-story-told-liar-1020749?ito=twitter_share_article-top


Processing URLs:  96%|█████████▋| 964/1000 [45:15<00:59,  1.64s/it]

Error extracting text from https://www.reuters.com/article/us-germany-cyber/germany-may-need-constitutional-change-to-allow-it-to-strike-back-at-hackers-idUSKBN1DR1J7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-cyber/germany-may-need-constitutional-change-to-allow-it-to-strike-back-at-hackers-idUSKBN1DR1J7


Processing URLs:  97%|█████████▋| 968/1000 [45:34<01:55,  3.60s/it]

Error extracting text from http://www.buenosairesherald.com/article/216345/boko-haram-kills-18-women-at-nigeria-funeral-: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/216345/boko-haram-kills-18-women-at-nigeria-funeral-


Processing URLs:  97%|█████████▋| 970/1000 [45:37<01:20,  2.68s/it]

Error extracting text from http://www.usdebtclock.org/: 403 Client Error: Forbidden for url: https://www.usdebtclock.org/403.shtml
URL filtered: http://www.bloomberg.com/news/articles/2015-10-17/japan-s-finance-minister-says-boj-unlikely-to-expand-easing-now


Processing URLs:  97%|█████████▋| 973/1000 [45:40<00:46,  1.72s/it]

Error extracting text from https://www.us-cert.gov/ncas/alerts/TA17-164A: 403 Client Error: Forbidden for url: https://www.us-cert.gov/ncas/alerts/TA17-164A


Processing URLs:  98%|█████████▊| 978/1000 [45:46<00:22,  1.04s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/266111-cruz-failed-to-disclose-second-loan-report: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/266111-cruz-failed-to-disclose-second-loan-report/


Processing URLs:  98%|█████████▊| 982/1000 [45:53<00:31,  1.77s/it]

Error extracting text from http://www.eurasianet.org/node/73586: 403 Client Error: Forbidden for url: http://www.eurasianet.org/node/73586


Processing URLs:  98%|█████████▊| 985/1000 [45:57<00:22,  1.49s/it]

Error extracting text from https://tass.com/science/1146889: 502 Server Error: Bad Gateway for url: https://tass.com/science/1146889


Processing URLs:  99%|█████████▊| 986/1000 [46:00<00:23,  1.71s/it]

Error extracting text from https://goo.gl/DzyT4f: 403 Client Error: Forbidden for url: https://www.un.org/Depts/Cartographic/english/htmain.htm
URL filtered: https://twitter.com/elonmusk/status/1076595190658265088


Processing URLs:  99%|█████████▉| 989/1000 [46:09<00:32,  2.97s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/russian-opposition-leader-arrested-amid-election-protests/2018/01/28/481754a0-049a-11e8-aa61-f3391373867e_story.html?utm_term=.3e2e3a9e04d6: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/russian-opposition-leader-arrested-amid-election-protests/2018/01/28/481754a0-049a-11e8-aa61-f3391373867e_story.html?utm_term=.3e2e3a9e04d6


Processing URLs:  99%|█████████▉| 991/1000 [46:13<00:24,  2.69s/it]

Error extracting text from http://www.ip-watch.org/2016/02/26/ttip-negotiations-hurrying-between-official-rounds/: 500 Server Error: Internal Server Error for url: https://www.ip-watch.org/2016/02/26/ttip-negotiations-hurrying-between-official-rounds/


Processing URLs: 100%|█████████▉| 998/1000 [46:32<00:05,  2.85s/it]

Error extracting text from https://panampost.com/adam-dubove/2016/10/21/marijuana-legalization-in-uruguay-progress-and-challenges-three-years-later/: 403 Client Error: Forbidden for url: https://panampost.com/adam-dubove/2016/10/21/marijuana-legalization-in-uruguay-progress-and-challenges-three-years-later/


Processing URLs: 100%|██████████| 1000/1000 [46:34<00:00,  2.79s/it]
Processing URLs:   0%|          | 1/1000 [00:02<43:18,  2.60s/it]

Error extracting text from https://www.reuters.com/world/americas/brazil-bolsonaro-says-sept-7-marches-will-not-be-violent-2021-08-26/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazil-bolsonaro-says-sept-7-marches-will-not-be-violent-2021-08-26/


Processing URLs:   1%|          | 11/1000 [00:12<16:03,  1.03it/s]

Error extracting text from http://www.reuters.com/article/2015/09/24/us-boeing-exim-idUSKCN0RO25H20150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/us-boeing-exim-idUSKCN0RO25H20150924


Processing URLs:   1%|▏         | 14/1000 [00:18<27:02,  1.65s/it]

Error extracting text from http://www.mediapost.com/publications/article/280946/time-inc-shakeup-continues-magazines-grouped-int.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:   2%|▏         | 16/1000 [00:19<17:06,  1.04s/it]

Error extracting text from http://www.reuters.com/article/2015/10/14/us-brazil-rousseff-idUSKCN0S71V520151014: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/14/us-brazil-rousseff-idUSKCN0S71V520151014
Error extracting text from http://www.reuters.com/article/2015/10/27/us-usa-fiscal-congress-idUSKCN0SK28G20151027: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/27/us-usa-fiscal-congress-idUSKCN0SK28G20151027


Processing URLs:   2%|▏         | 19/1000 [00:20<10:58,  1.49it/s]

Error extracting text from http://media.ofcom.org.uk/facts/: 530 Server Error:  for url: http://media.ofcom.org.uk/facts/


Processing URLs:   2%|▏         | 21/1000 [00:21<07:12,  2.26it/s]

Error extracting text from http://www.reuters.com/article/us-china-asean-idUSKCN0SA05420151016: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-asean-idUSKCN0SA05420151016
Error extracting text from https://www.fiercebiotech.com/biotech/merck-cans-one-covid-19-drug-scraps-a-clinical-trial-another: 403 Client Error: Forbidden for url: https://www.fiercebiotech.com/biotech/merck-cans-one-covid-19-drug-scraps-a-clinical-trial-another


Processing URLs:   3%|▎         | 29/1000 [00:32<16:23,  1.01s/it]

Error extracting text from https://www.reuters.com/world/us/treasurys-yellen-interest-rates-may-need-rise-modestly-2021-05-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/treasurys-yellen-interest-rates-may-need-rise-modestly-2021-05-04/
Error extracting text from http://www.reuters.com/article/2015/12/02/us-usa-fed-yellen-idUSKBN0TL21R20151202: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/02/us-usa-fed-yellen-idUSKBN0TL21R20151202


Processing URLs:   3%|▎         | 32/1000 [00:33<08:19,  1.94it/s]

Error extracting text from http://www.nytimes.com/aponline/2015/10/13/world/middleeast/ap-ml-iran-nuclear.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/10/13/world/middleeast/ap-ml-iran-nuclear.html
Error extracting text from http://www.reuters.com/article/us-un-risks-idUSKCN0Z31W4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-risks-idUSKCN0Z31W4


Processing URLs:   4%|▎         | 35/1000 [00:38<21:11,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-hongkong-election-economy-idUSKBN16V0CT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-election-economy-idUSKBN16V0CT?il=0


Processing URLs:   4%|▍         | 41/1000 [00:55<28:14,  1.77s/it]

Error extracting text from http://goodjudgment.com/gjp/: 403 Client Error: Forbidden for url: http://goodjudgment.com/gjp/


Processing URLs:   4%|▍         | 44/1000 [00:57<19:10,  1.20s/it]

Error extracting text from https://www.reuters.com/article/us-hongkong-protests/china-says-troops-will-defend-hong-kongs-prosperity-ahead-of-planned-pro-democracy-march-idUSKCN1VJ06B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-protests/china-says-troops-will-defend-hong-kongs-prosperity-ahead-of-planned-pro-democracy-march-idUSKCN1VJ06B


Processing URLs:   5%|▌         | 50/1000 [01:04<18:33,  1.17s/it]

Error extracting text from https://www.infratest-dimap.de/en/analyses-results/nationwide/vote-intention/: 404 Client Error: Not Found for url: https://www.infratest-dimap.de/en/analyses-results/nationwide/vote-intention/


Processing URLs:   5%|▌         | 51/1000 [01:06<18:10,  1.15s/it]

Error extracting text from http://www.reuters.com/article/uk-usa-election-republicans-idUSKCN0XM076: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-usa-election-republicans-idUSKCN0XM076


Processing URLs:   5%|▌         | 54/1000 [01:06<10:00,  1.57it/s]

Error extracting text from http://www.reuters.com/article/us-iran-oil-idUSKBN1380FJ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-idUSKBN1380FJ?il=0


Processing URLs:   6%|▌         | 55/1000 [01:09<16:18,  1.04s/it]

Error extracting text from http://www.mofa.go.jp/press/kaiken/kaiken4e_000223.html: 403 Client Error: Forbidden for url: http://www.mofa.go.jp/press/kaiken/kaiken4e_000223.html


Processing URLs:   6%|▌         | 58/1000 [01:10<10:52,  1.44it/s]

Error extracting text from https://www.nytimes.com/2017/11/13/us/politics/roy-moore-alabama-senate.html?&amp;moduleDetail=section-news-0&amp;action=click&amp;contentCollection=Politics&amp;region=Footer&amp;module=MoreInSection&amp;version=WhatsNext&amp;contentID=WhatsNext&amp;pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/13/us/politics/roy-moore-alabama-senate.html?&amp;moduleDetail=section-news-0&amp;action=click&amp;contentCollection=Politics&amp;region=Footer&amp;module=MoreInSection&amp;version=WhatsNext&amp;contentID=WhatsNext&amp;pgtype=article


Processing URLs:   6%|▌         | 60/1000 [01:18<28:27,  1.82s/it]

Error extracting text from http://www.allenbwest.com/michellejesse/obama-dept-of-justice-files-emergency-motion-the-reason-is-astounding: 403 Client Error: Forbidden for url: http://www.allenbwest.com/michellejesse/obama-dept-of-justice-files-emergency-motion-the-reason-is-astounding


Processing URLs:   6%|▋         | 63/1000 [01:25<36:32,  2.34s/it]

Error extracting text from https://reut.rs/3mfw159: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   6%|▋         | 65/1000 [01:26<21:09,  1.36s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-usa-idUSKCN0VC0MU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-usa-idUSKCN0VC0MU


Processing URLs:   7%|▋         | 67/1000 [01:30<23:48,  1.53s/it]

Error extracting text from http://www.reuters.com/article/2015/05/21/us-autos-takata-probe-idUSKBN0O61T620150521: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/05/21/us-autos-takata-probe-idUSKBN0O61T620150521


Processing URLs:   7%|▋         | 68/1000 [01:33<29:08,  1.88s/it]

Error extracting text from http://ec.europa.eu/trade/policy/countries-and-regions/countries/iran/: 404 Client Error: (Not Found) for url: https://ec.europa.eu/policy/countries-and-regions/countries/iran/


Processing URLs:   7%|▋         | 69/1000 [01:34<27:04,  1.74s/it]

Error extracting text from https://www.reuters.com/article/us-usa-security-kaspersky-interpol/kasperskys-u-s-spat-a-sign-of-balkanisation-in-cyber-world-interpol-idUSKBN1CF26O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-security-kaspersky-interpol/kasperskys-u-s-spat-a-sign-of-balkanisation-in-cyber-world-interpol-idUSKBN1CF26O


Processing URLs:   8%|▊         | 75/1000 [01:47<30:35,  1.98s/it]

Error extracting text from http://reut.rs/1GYvQ3A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/us-japan-economy-idUSKCN0SH0Q020151023


Processing URLs:   8%|▊         | 80/1000 [01:55<22:29,  1.47s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-13/malaysia-reacts-coolly-to-prospect-of-tpp-trade-pact-minus-u-s


Processing URLs:   8%|▊         | 82/1000 [01:55<13:11,  1.16it/s]

Error extracting text from https://www.fireeye.com/cyber-map/threat-map.html: 530 Server Error:  for url: https://www.fireeye.com/cyber-map/threat-map.html


Processing URLs:   8%|▊         | 83/1000 [01:56<14:36,  1.05it/s]

Error extracting text from https://www.tuko.co.ke/239621-the-full-list-2017-presidential-candidates-uhuru-raila.html: 410 Client Error: Gone for url: https://www.tuko.co.ke/239621-the-full-list-2017-presidential-candidates-uhuru-raila.html


Processing URLs:   9%|▊         | 87/1000 [02:03<20:48,  1.37s/it]

Error extracting text from http://www.latimes.com/business/la-fi-hy-dmv-driverless-rules-20160920-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-hy-dmv-driverless-rules-20160920-snap-story.html


Processing URLs:   9%|▉         | 93/1000 [02:13<24:34,  1.63s/it]

Error extracting text from http://nmtransload.com/amazons-first-us-drone-delivery/: 404 Client Error: Not Found for url: https://www.nmtransload.com/amazons-first-us-drone-delivery/
Error extracting text from http://www.nytimes.com/2016/11/18/world/africa/us-south-sudan-arms-embargo-genocide.html?emc=edit_ee_20161118&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/18/world/africa/us-south-sudan-arms-embargo-genocide.html?emc=edit_ee_20161118&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  10%|▉         | 96/1000 [02:16<14:59,  1.00it/s]

Error extracting text from http://www.nytimes.com/2015/12/14/business/the-engineering-of-volkswagens-aggressive-ambition.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/14/business/the-engineering-of-volkswagens-aggressive-ambition.html
Error extracting text from http://www.reuters.com/article/us-usa-iran-nuclear-deals-idUSKBN14212X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-nuclear-deals-idUSKBN14212X


Processing URLs:  10%|▉         | 99/1000 [02:17<06:39,  2.26it/s]

URL filtered: http://www.reuters.com/article/us-usa-trump-executiveorders-idUSKBN15B1SS?feedType=RSS&amp;feedName=topNews&amp;utm_source=twitter&amp;utm_medium=Social
Error extracting text from http://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=0


Processing URLs:  10%|█         | 100/1000 [02:18<08:59,  1.67it/s]

Error extracting text from http://www.dailysabah.com/nation/2016/03/21/boat-problems-not-turkey-eu-deal-deter-migrants-heading-to-europe: 404 Client Error: Not Found for url: https://www.dailysabah.com/nation/2016/03/21/boat-problems-not-turkey-eu-deal-deter-migrants-heading-to-europe


Processing URLs:  10%|█         | 101/1000 [02:20<16:56,  1.13s/it]

Error extracting text from http://www.el-balad.com/2269502: 404 Client Error: Not Found for url: https://el-balad.com/2269502
URL filtered: https://www.bloomberg.com/news/articles/2016-10-31/october-smashes-merger-records-as-companies-turn-to-megadeals


Processing URLs:  10%|█         | 104/1000 [02:23<16:24,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-turkey-idUSKCN0W70PC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-turkey-idUSKCN0W70PC


Processing URLs:  11%|█         | 106/1000 [02:25<15:07,  1.01s/it]

URL filtered: https://www.youtube.com/watch?v=9xZx1lf2tvs


Processing URLs:  11%|█         | 111/1000 [02:33<19:22,  1.31s/it]

Error extracting text from http://news.sky.com/story/1694123/nigerian-president-yes-my-country-is-corrupt: 404 Client Error: Not Found for url: https://news.sky.com/story/1694123/nigerian-president-yes-my-country-is-corrupt


Processing URLs:  12%|█▏        | 117/1000 [02:41<16:25,  1.12s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0UW1LE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0UW1LE
Error extracting text from http://www.reuters.com/article/us-nigeria-security-idUSKBN1802K1?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security-idUSKBN1802K1?il=0


Processing URLs:  13%|█▎        | 127/1000 [02:56<18:42,  1.29s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/cambodia-and-china-to-hold-naval-drills: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  13%|█▎        | 130/1000 [02:57<09:35,  1.51it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-18/there-s-a-new-world-order-to-talk-about-at-the-davos-of-energy
Error extracting text from https://www.reuters.com/article/us-venezuela-politics-eu/eu-states-no-longer-recognise-guaido-as-venezuelas-interim-president-idUSKBN29U1A3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-eu/eu-states-no-longer-recognise-guaido-as-venezuelas-interim-president-idUSKBN29U1A3


Processing URLs:  13%|█▎        | 131/1000 [02:57<07:40,  1.89it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/06/11/world/europe/ap-eu-greece-syrians-return.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/06/11/world/europe/ap-eu-greece-syrians-return.html
Error extracting text from https://www.financialexpress.com/lifestyle/science/commercial-space-programme-nasa-boeing-targeting-april-for-cst-100-starliners-test-flight/2197059/: 403 Client Error: Forbidden for url: https://www.financialexpress.com/lifestyle/science/commercial-space-programme-nasa-boeing-targeting-april-for-cst-100-starliners-test-flight/2197059/


Processing URLs:  14%|█▎        | 135/1000 [03:01<13:29,  1.07it/s]

Error extracting text from http://www.thedickinsonpress.com/energy/oil/3868860-green-incentives-seen-key-lifting-us-oil-export-ban: 404 Client Error: Not Found for url: https://www.thedickinsonpress.com/energy/oil/3868860-green-incentives-seen-key-lifting-us-oil-export-ban


Processing URLs:  14%|█▎        | 137/1000 [03:05<21:39,  1.51s/it]

Error extracting text from https://www.whistleblower.org/blog/031620-insider-threat-program-determined-avoid-accountability: 404 Client Error: Not Found for url: https://whistleblower.org/blog/031620-insider-threat-program-determined-avoid-accountability


Processing URLs:  14%|█▍        | 140/1000 [03:09<20:44,  1.45s/it]

URL filtered: https://www.youtube.com/watch?v=-ccNkksrfls


Processing URLs:  14%|█▍        | 143/1000 [03:13<23:10,  1.62s/it]

URL filtered: https://techmonitor.ai/policy/facebook-novi-digital-wallet-libra-diem
URL filtered: https://www.youtube.com/watch?v=jlPEBROvR9w
URL filtered: https://twitter.com/hendopolis/status/1386433042621685765
Error extracting text from https://ixquick-proxy.com/do/spg/show_picture.pl?l=english&amp;rais=1&amp;oiu=http%3A%2F%2Fcnbcpakistan.com%2FNewsPictures%2F201391612525.jpg&amp;sp=ce01c5d3de34ebb168f6af592178378c: HTTPSConnectionPool(host='ixquick-proxy.com', port=443): Max retries exceeded with url: /do/spg/show_picture.pl?l=english&amp;rais=1&amp;oiu=http%3A%2F%2Fcnbcpakistan.com%2FNewsPictures%2F201391612525.jpg&amp;sp=ce01c5d3de34ebb168f6af592178378c (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303419310>: Failed to resolve 'ixquick-proxy.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 152/1000 [03:20<18:20,  1.30s/it]

URL filtered: https://www.youtube.com/watch?v=t1tGM23qnOw


Processing URLs:  15%|█▌        | 154/1000 [03:22<15:15,  1.08s/it]

Error extracting text from https://t.co/2mN8ZDaObV: 406 Client Error: Not Acceptable for url: http://pzfeed.com/?p=20337


Processing URLs:  16%|█▌        | 155/1000 [03:27<27:22,  1.94s/it]

Error extracting text from http://www.arunachaltimes.in/chabahar-gateway-to-north-south-corridor/: 404 Client Error: Not Found for url: http://arunachaltimes.in/chabahar-gateway-to-north-south-corridor/


Processing URLs:  16%|█▌        | 156/1000 [03:28<24:56,  1.77s/it]



Processing URLs:  16%|█▌        | 157/1000 [03:30<25:08,  1.79s/it]

URL filtered: https://twitter.com/GlobalGuessing/


Processing URLs:  16%|█▋        | 164/1000 [03:41<23:06,  1.66s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-islamicstate-files-idUSKCN0WC0UN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-islamicstate-files-idUSKCN0WC0UN


Processing URLs:  17%|█▋        | 168/1000 [03:56<53:40,  3.87s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/venezuela-national-guard-official-charged-for-congress-raid/2017/07/10/99a6eb2c-65a4-11e7-94ab-5b1f0ff459df_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/venezuela-national-guard-official-charged-for-congress-raid/2017/07/10/99a6eb2c-65a4-11e7-94ab-5b1f0ff459df_story.html


Processing URLs:  17%|█▋        | 170/1000 [03:56<30:45,  2.22s/it]

Error extracting text from http://www.vanguardngr.com/2017/08/yemen-somalia-south-sudan-nigeria-may-risk-famine-declares-un/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2017/08/yemen-somalia-south-sudan-nigeria-may-risk-famine-declares-un/


Processing URLs:  17%|█▋        | 173/1000 [03:58<16:05,  1.17s/it]

URL filtered: https://www.youtube.com/watch?v=w5eJXXhwG5I
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://fernandorodrigues.blogosfera.uol.com.br/2016/02/05/campanha-do-impeachment-usa-marca-das-diretas-ja-para-tentar-derrubar-dilma/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://fernandorodrigues.blogosfera.uol.com.br/2016/02/05/campanha-do-impeachment-usa-marca-das-diretas-ja-para-tentar-derrubar-dilma/&amp;prev=search


Processing URLs:  17%|█▋        | 174/1000 [04:00<17:51,  1.30s/it]

Error extracting text from http://www.ibtimes.co.uk/dyre-malware-disrupted-after-russian-authorities-raid-moscow-film-company-office-1542416: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/dyre-malware-disrupted-after-russian-authorities-raid-moscow-film-company-office-1542416


Processing URLs:  18%|█▊        | 176/1000 [04:11<40:05,  2.92s/it]

URL filtered: http://www.bloomberg.com


Processing URLs:  18%|█▊        | 184/1000 [04:25<24:57,  1.83s/it]

Error extracting text from https://www.predictit.org/markets/detail/5049/Will-the-UK-announce-another-Brexit-referendum-by-Mar-31,-2019: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/5049/Will-the-UK-announce-another-Brexit-referendum-by-Mar-31,-2019


Processing URLs:  19%|█▉        | 192/1000 [04:43<21:29,  1.60s/it]

Error extracting text from http://www.wsj.com/articles/opecs-ability-to-influence-oil-supply-is-slipping-1464089971: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opecs-ability-to-influence-oil-supply-is-slipping-1464089971
Error extracting text from http://www.reuters.com/article/us-burundi-rwanda-un-idUSKCN0VD04K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-rwanda-un-idUSKCN0VD04K


Processing URLs:  19%|█▉        | 193/1000 [04:44<17:05,  1.27s/it]

Error extracting text from http://www.peacefare.net/2016/01/03/a-bad-way-to-start-the-new-year/: 406 Client Error: Not Acceptable for url: http://www.peacefare.net/2016/01/03/a-bad-way-to-start-the-new-year/


Processing URLs:  20%|█▉        | 199/1000 [05:02<46:39,  3.49s/it]

URL filtered: https://www.politico.com/amp/news/2019/10/29/kyrsten-sinema-arizona-democrats-060187?__twitter_impression=true


Processing URLs:  20%|██        | 202/1000 [05:05<27:03,  2.03s/it]

Error extracting text from http://www.mintpressnews.com/latest-leak-confirms-ttip-serious-threat-democracy-know/214883/: 403 Client Error: Forbidden for url: http://www.mintpressnews.com/latest-leak-confirms-ttip-serious-threat-democracy-know/214883/


Processing URLs:  20%|██        | 203/1000 [05:09<31:18,  2.36s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-24/saudi-arabia-planning-to-sell-49-of-aramco-eqtisadiah-reports


Processing URLs:  20%|██        | 205/1000 [05:09<19:32,  1.48s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=320381: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=320381


Processing URLs:  21%|██        | 206/1000 [05:11<19:37,  1.48s/it]

Error extracting text from https://countermeasuresgroup.wordpress.com/: 410 Client Error: Gone for url: https://countermeasuresgroup.wordpress.com/


Processing URLs:  21%|██        | 207/1000 [05:11<16:52,  1.28s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2016/01/28/0401000000AEN20160128000352315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  21%|██        | 208/1000 [05:13<18:28,  1.40s/it]

Error extracting text from http://fuelfix.com/blog/2015/11/04/end-of-the-road-for-oil-exports-on-highway-bill/: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2015/11/04/end-of-the-road-for-oil-exports-on-highway-bill/


Processing URLs:  21%|██        | 211/1000 [05:18<21:16,  1.62s/it]

Error extracting text from http://www.novelrank.com/asin/1451606303: HTTPSConnectionPool(host='www.novelrank.com', port=443): Max retries exceeded with url: /asin/1451606303 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  21%|██        | 212/1000 [05:20<20:39,  1.57s/it]

Error extracting text from http://wtop.com/government/2016/06/south-china-sea-china-willing-to-pay-the-price-of-defiance/: 404 Client Error: Not Found for url: https://wtop.com/government/2016/06/south-china-sea-china-willing-to-pay-the-price-of-defiance/


Processing URLs:  22%|██▏       | 217/1000 [05:27<17:54,  1.37s/it]

URL filtered: https://www.youtube.com/watch?v=VlAZV_EsSSE


Processing URLs:  22%|██▏       | 224/1000 [05:35<15:44,  1.22s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/will-rcep-be-a-reality-by-the-end-of-2017: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  22%|██▎       | 225/1000 [05:37<19:08,  1.48s/it]

Error extracting text from http://reut.rs/1R6Rj2M: HTTPConnectionPool(host='feeds.reuters.com', port=80): Max retries exceeded with url: /~r/Reuters/worldNews/~3/PVKZFTjxwJ8/story01.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3009352e0>: Failed to resolve 'feeds.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  23%|██▎       | 227/1000 [05:38<12:41,  1.02it/s]

Error extracting text from https://assets.donaldjtrump.com/_landings/contract/O-TRU-102316-Contractv02.pdf: 403 Client Error: Forbidden for url: https://assets.donaldjtrump.com/_landings/contract/O-TRU-102316-Contractv02.pdf


Processing URLs:  23%|██▎       | 228/1000 [05:39<13:14,  1.03s/it]

Error extracting text from https://www.atptour.com/en/scores/current/roland-garros/520/draws: 403 Client Error: Forbidden for url: https://www.atptour.com/en/scores/current/roland-garros/520/draws


Processing URLs:  23%|██▎       | 231/1000 [05:43<13:23,  1.04s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/euro-zone-bailout-fund-approves-1-billion-euro-payout-for-greece/articleshow/50287957.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/euro-zone-bailout-fund-approves-1-billion-euro-payout-for-greece/articleshow/50287957.cms
Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-back-in-march-felix-sater-claimed-to-1503953853-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-back-in-march-felix-sater-claimed-to-1503953853-htmlstory.html


Processing URLs:  24%|██▎       | 236/1000 [05:52<18:17,  1.44s/it]

Error extracting text from http://www.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?_r=0&amp;mtrref=undefined&amp;gwh=A789D77DF4044D6D85CAC9F2365D529D&amp;gwt=pay&amp;assetType=nyt_now: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?_r=0&amp;mtrref=undefined&amp;gwh=A789D77DF4044D6D85CAC9F2365D529D&amp;gwt=pay&amp;assetType=nyt_now


Processing URLs:  24%|██▍       | 239/1000 [05:57<20:08,  1.59s/it]

URL filtered: http://www.bloombergview.com/articles/2015-11-19/keeping-venezuela-s-elections-democratic


Processing URLs:  25%|██▍       | 246/1000 [06:03<08:48,  1.43it/s]

Error extracting text from https://www.theafricareport.com/85905/eritreas-letter-to-un-is-open-admission-of-aggression-in-ethiopias-tigray-war/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/85905/eritreas-letter-to-un-is-open-admission-of-aggression-in-ethiopias-tigray-war/
Error extracting text from http://www.reuters.com/article/us-usa-trump-taiwan-idUSKBN13R2NT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-taiwan-idUSKBN13R2NT


Processing URLs:  25%|██▍       | 248/1000 [06:05<09:51,  1.27it/s]

Error extracting text from https://www.blackrock.com/investing/literature/whitepaper/bii-2017-uk-election-bulletin.pdf: 404 Client Error: Not Found for url: https://www.blackrock.com/us/individual/literature/whitepaper/bii-2017-uk-election-bulletin.pdf


Processing URLs:  25%|██▍       | 249/1000 [06:05<09:27,  1.32it/s]

Error extracting text from http://aranews.net/2016/03/syrian-army-retakes-strategic-town-al-qaeda-south-aleppo/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/syrian-army-retakes-strategic-town-al-qaeda-south-aleppo/


Processing URLs:  25%|██▌       | 250/1000 [06:05<07:29,  1.67it/s]

Error extracting text from https://www.nytimes.com/2017/01/11/us/rex-tillerson-confirmation-hearings.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/11/us/rex-tillerson-confirmation-hearings.html


Processing URLs:  25%|██▌       | 253/1000 [06:10<12:54,  1.04s/it]

Error extracting text from https://thehill.com/homenews/senate/545175-meet-the-senators-at-the-center-of-the-filibuster-fight?rnd=1616797846: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/545175-meet-the-senators-at-the-center-of-the-filibuster-fight/?rnd=1616797846


Processing URLs:  26%|██▌       | 256/1000 [06:11<07:29,  1.65it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/feb/26/vincent-stewart-intelligence-chief-doubts-readines/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/feb/26/vincent-stewart-intelligence-chief-doubts-readines/


Processing URLs:  26%|██▌       | 258/1000 [06:15<15:43,  1.27s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13940909001166: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940909001166 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3027e85c0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  26%|██▌       | 261/1000 [06:18<11:48,  1.04it/s]

Error extracting text from https://mccoyote.wordpress.com/mkultra/: 410 Client Error: Gone for url: https://mccoyote.wordpress.com/mkultra/


Processing URLs:  26%|██▋       | 263/1000 [06:21<15:55,  1.30s/it]

Error extracting text from http://www.elmundo.com.ve/noticias/economia/mercados/venezuela-cancela-intereses-de-bonos-2019-y-2024-p.aspx#ixzz45nQ6rNkw: HTTPConnectionPool(host='www.elmundo.com.ve', port=80): Max retries exceeded with url: /noticias/economia/mercados/venezuela-cancela-intereses-de-bonos-2019-y-2024-p.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe35db50>: Failed to resolve 'www.elmundo.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  27%|██▋       | 271/1000 [06:42<35:39,  2.94s/it]

URL filtered: https://www.youtube.com/watch?v=Pk7yqlTMvp8


Processing URLs:  28%|██▊       | 279/1000 [06:51<17:26,  1.45s/it]

URL filtered: https://twitter.com/brianstelter/status/752955749953921026


Processing URLs:  28%|██▊       | 283/1000 [06:54<10:55,  1.09it/s]

Error extracting text from https://www.nytimes.com/2017/04/26/us/politics/nafta-executive-order-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/26/us/politics/nafta-executive-order-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  28%|██▊       | 284/1000 [06:55<12:15,  1.03s/it]

Error extracting text from https://reut.rs/3pnqoUT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-nato-drills/russia-says-it-will-join-drills-with-nato-member-ships-off-pakistan-idUSKBN28K1K7


Processing URLs:  29%|██▊       | 287/1000 [06:58<13:03,  1.10s/it]

Error extracting text from https://www.congress.gov/: 403 Client Error: Forbidden for url: https://www.congress.gov/
URL filtered: https://m.youtube.com/watch?v=R0xaA4dYsbQ


Processing URLs:  29%|██▉       | 289/1000 [07:00<12:15,  1.03s/it]

Error extracting text from http://newsdaily.com/2016/04/china-must-see-missile-defense-is-live-or-die-for-south-korea-seoul-official/: 403 Client Error: Forbidden for url: https://www.sciencedaily.com/2016/04/china-must-see-missile-defense-is-live-or-die-for-south-korea-seoul-official/


Processing URLs:  29%|██▉       | 290/1000 [07:01<10:17,  1.15it/s]

Error extracting text from http://www.wsj.com/articles/south-korea-and-u-s-begin-formal-talks-on-missile-shield-1454831176IC: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korea-and-u-s-begin-formal-talks-on-missile-shield-1454831176IC


Processing URLs:  29%|██▉       | 293/1000 [07:08<18:00,  1.53s/it]

Error extracting text from http://www.krdo.com/news/sources-clinton-debate-not-swaying-biden/35835650: 404 Client Error: Not Found for url: https://krdo.com/news/sources-clinton-debate-not-swaying-biden/35835650


Processing URLs:  30%|██▉       | 295/1000 [07:09<13:58,  1.19s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-03-26/banks-rip-up-turkish-lira-forecasts-and-turn-to-guesswork


Processing URLs:  30%|███       | 302/1000 [07:14<07:15,  1.60it/s]

Error extracting text from http://www.nti.org/country-profiles/north-korea/: 403 Client Error: Forbidden for url: https://www.nti.org/country-profiles/north-korea/
Error extracting text from http://www.start.umd./nuclear-facilities-attacks: HTTPConnectionPool(host='www.start.umd.', port=80): Max retries exceeded with url: /nuclear-facilities-attacks (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe843170>: Failed to resolve 'www.start.umd' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-health-zika-nerves-insight-idUSKCN0X22TP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-zika-nerves-insight-idUSKCN0X22TP


Processing URLs:  30%|███       | 303/1000 [07:15<06:11,  1.88it/s]

Error extracting text from http://www.reuters.com/article/us-china-congress-wang/chinas-xi-looks-set-to-keep-right-hand-man-on-despite-age-idUSKBN1CG0JI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-congress-wang/chinas-xi-looks-set-to-keep-right-hand-man-on-despite-age-idUSKBN1CG0JI


Processing URLs:  30%|███       | 304/1000 [07:15<06:16,  1.85it/s]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2397283&amp;CategoryId=10717&gt: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2397283&amp;CategoryId=10717&gt


Processing URLs:  31%|███       | 306/1000 [07:19<12:32,  1.08s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/04/744989/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/04/744989/story.html


Processing URLs:  31%|███       | 308/1000 [07:22<16:53,  1.46s/it]

Error extracting text from http://www.ibtimes.com/will-clinton-win-presidency-saudi-arabia-arab-countries-prefer-hillary-over-donald-2443641: 403 Client Error: Forbidden for url: https://www.ibtimes.com/will-clinton-win-presidency-saudi-arabia-arab-countries-prefer-hillary-over-donald-2443641


Processing URLs:  31%|███       | 309/1000 [07:32<43:55,  3.81s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/iran-still-pondering-what-to-do-with-excess-uranium/2015/09/10/ba761582-57c1-11e5-9f54-1ea23f6e02f3_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/iran-still-pondering-what-to-do-with-excess-uranium/2015/09/10/ba761582-57c1-11e5-9f54-1ea23f6e02f3_story.html
URL filtered: https://www.npr.org/2021/01/31/962104747/unwelcome-on-facebook-twitter-qanon-followers-flock-to-fringe-sites


Processing URLs:  32%|███▏      | 321/1000 [07:51<20:48,  1.84s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/tms-september-2016-sales-chart.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/tms-september-2016-sales-chart/


Processing URLs:  32%|███▏      | 324/1000 [07:57<22:05,  1.96s/it]

Error extracting text from http://www.startribune.com/feingold-campaign-says-ad-blitz-not-helping-johnson/383643621/: 404 Client Error: Not Found for url: https://www.startribune.com/feingold-campaign-says-ad-blitz-not-helping-johnson/383643621/


Processing URLs:  33%|███▎      | 327/1000 [09:00<3:35:51, 19.24s/it]

Error extracting text from https://www.fidh.org/en/region/Africa/burundi/burundi-repression-of-a-genocidal-character-the-un-s-response-must-be: HTTPSConnectionPool(host='www.fidh.org', port=443): Max retries exceeded with url: /en/region/Africa/burundi/burundi-repression-of-a-genocidal-character-the-un-s-response-must-be (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3042e5e80>, 'Connection to www.fidh.org timed out. (connect timeout=60)'))


Processing URLs:  33%|███▎      | 328/1000 [09:02<2:37:37, 14.07s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/heres-why-us-special-forces-want-russian-machine-guns-20642: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/heres-why-us-special-forces-want-russian-machine-guns-20642


Processing URLs:  33%|███▎      | 330/1000 [09:04<1:22:26,  7.38s/it]

Error extracting text from http://www.nytimes.com/interactive/2012/11/23/us/state-government: 404 Client Error: Not Found for url: https://www.nytimes.com/interactive/2012/11/23/us/state-government
URL filtered: https://www.bloomberg.com/news/articles/2017-06-28/china-is-about-to-bury-elon-musk-in-batteries


Processing URLs:  33%|███▎      | 334/1000 [09:08<30:08,  2.72s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-01-27/vw-bears-blame-for-diesel-cheating-scandal-supplier-bosch-says


Processing URLs:  34%|███▎      | 337/1000 [09:09<15:41,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YT0OI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YT0OI


Processing URLs:  34%|███▍      | 342/1000 [09:15<10:00,  1.10it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/news_130581.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_130581.htm
Error extracting text from https://www.france24.com/en/asia-pacific/20210704-security-forces-flee-as-as-districts-in-northern-afghanistan-fall-to-taliban: 403 Client Error: Forbidden for url: https://www.france24.com/en/asia-pacific/20210704-security-forces-flee-as-as-districts-in-northern-afghanistan-fall-to-taliban
Error extracting text from http://www.reuters.com/article/us-russia-putin-nato-idUSKCN0Z80VS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-putin-nato-idUSKCN0Z80VS


Processing URLs:  35%|███▍      | 349/1000 [09:32<16:07,  1.49s/it]

Error extracting text from http://agilisanalysis.com/home/?p=293: HTTPConnectionPool(host='agilisanalysis.com', port=80): Max retries exceeded with url: /home/?p=293 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe0b6a20>: Failed to resolve 'agilisanalysis.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-britain-eu-poll-idUSKCN0XO1GH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-poll-idUSKCN0XO1GH


Processing URLs:  35%|███▌      | 350/1000 [09:34<17:15,  1.59s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0X00WV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0X00WV


Processing URLs:  35%|███▌      | 352/1000 [09:44<31:54,  2.95s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/iranian-reformists-set-to-win-all-tehran-parliamentary-seats/2016/02/28/7fbf0d9a-ddef-11e5-8210-f0bd8de915f6_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/iranian-reformists-set-to-win-all-tehran-parliamentary-seats/2016/02/28/7fbf0d9a-ddef-11e5-8210-f0bd8de915f6_story.html


Processing URLs:  35%|███▌      | 354/1000 [09:54<39:24,  3.66s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/11/gitrep-10mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/11/gitrep-10mar16pm/


Processing URLs:  36%|███▌      | 355/1000 [09:55<32:54,  3.06s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/iran-body-reverses-ban-1500-election-candidates-36755019: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iran-body-reverses-ban-1500-election-candidates-36755019


Processing URLs:  36%|███▌      | 356/1000 [09:58<30:53,  2.88s/it]

URL filtered: https://www.bloomberg.com/quicktake/saudi-aramco


Processing URLs:  36%|███▌      | 359/1000 [10:01<18:07,  1.70s/it]

Error extracting text from http://tass.com/world/953096: 502 Server Error: Bad Gateway for url: https://tass.com/world/953096
Error extracting text from http://www.nytimes.com/2015/11/19/world/africa/boko-haram-ranked-ahead-of-isis-for-deadliest-terror-group.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/19/world/africa/boko-haram-ranked-ahead-of-isis-for-deadliest-terror-group.html


Processing URLs:  36%|███▌      | 361/1000 [10:02<13:18,  1.25s/it]

Error extracting text from http://www.news.com.au/national/breaking-news/iraqis-troops-prepare-for-mosul-battle/news-story/bcc43c1c10afc353e5fe816185a7161e: 404 Client Error: Not Found for url: https://www.news.com.au/404.php
Error extracting text from http://www.reuters.com/article/2015/10/26/iran-banks-swift-idUSL8N12Q2BN20151026: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/26/iran-banks-swift-idUSL8N12Q2BN20151026


Processing URLs:  36%|███▌      | 362/1000 [10:02<10:16,  1.03it/s]

Error extracting text from https://www.fbi.gov/file-repository/active-shooter-incidents-us-2016-2017.pdf/view: 403 Client Error: Forbidden for url: https://www.fbi.gov/file-repository/active-shooter-incidents-us-2016-2017.pdf/view


Processing URLs:  36%|███▋      | 363/1000 [10:03<08:03,  1.32it/s]

Error extracting text from https://www.nytimes.com/2017/06/20/us/politics/mike-pompeo-cia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/20/us/politics/mike-pompeo-cia.html
URL filtered: http://ajw.asahi.com/article/asia/china/AJ201603310054?utm_content=buffer523b1&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  37%|███▋      | 367/1000 [10:05<05:52,  1.80it/s]

Error extracting text from http://www.wsj.com/articles/bayer-investor-floats-idea-of-hostile-bid-for-monsanto-1469017672: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bayer-investor-floats-idea-of-hostile-bid-for-monsanto-1469017672


Processing URLs:  37%|███▋      | 371/1000 [10:16<18:53,  1.80s/it]

Error extracting text from http://www.cdm.me/english/paris-friday-the-13th-will-not-affect-decision-on-nato-invitation-to-montenegro: 403 Client Error: Forbidden for url: https://www.cdm.me/english/paris-friday-the-13th-will-not-affect-decision-on-nato-invitation-to-montenegro
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-usa-war-analysis-idUSKBN1AQ2LM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-usa-war-analysis-idUSKBN1AQ2LM


Processing URLs:  37%|███▋      | 373/1000 [10:19<15:51,  1.52s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/why-donald-trump-was-right-the-f-35s-costs-are-out-control-18826: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/why-donald-trump-was-right-the-f-35s-costs-are-out-control-18826


Processing URLs:  38%|███▊      | 375/1000 [10:24<21:51,  2.10s/it]

Error extracting text from https://www.defense.gov/Portals/1/Documents/pubs/Tab_A_Arctic_Report_Public.pdf: 404 Client Error: Not Found for url: https://www.defense.gov/Portals/1/Documents/pubs/Tab_A_Arctic_Report_Public.pdf


Processing URLs:  38%|███▊      | 376/1000 [10:25<20:14,  1.95s/it]

URL filtered: https://www.bloomberg.com/news/features/2021-03-25/merck-mrk-molnupiravir-pill-could-change-the-fight-against-covid


Processing URLs:  38%|███▊      | 379/1000 [10:27<11:08,  1.08s/it]

Error extracting text from http://edgar.sec.gov/: HTTPConnectionPool(host='edgar.sec.gov', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fae63440>: Failed to resolve 'edgar.sec.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 380/1000 [10:28<10:55,  1.06s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/turkeys-justice-minister-contesting-referendum-moot-46952730: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/turkeys-justice-minister-contesting-referendum-moot-46952730


Processing URLs:  38%|███▊      | 381/1000 [10:30<12:38,  1.23s/it]

Error extracting text from http://gis.harvard.edu/services/products/new-hampshire-voter-turnout-map: 404 Client Error: Not Found for url: https://gis.harvard.edu/services/products/new-hampshire-voter-turnout-map


Processing URLs:  38%|███▊      | 383/1000 [10:33<15:23,  1.50s/it]

Error extracting text from http://the-numbers.com/movie/records/All-Time-Worldwide-Box-Office: 403 Client Error: Forbidden for url: https://the-numbers.com/movie/records/All-Time-Worldwide-Box-Office


Processing URLs:  39%|███▊      | 386/1000 [10:38<14:55,  1.46s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/report-russias-dangerous-iskander-m-ballistic-missiles-are-18991: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/report-russias-dangerous-iskander-m-ballistic-missiles-are-18991


Processing URLs:  39%|███▉      | 393/1000 [10:52<13:12,  1.31s/it]

Error extracting text from http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkuilCvYjsn: 403 Client Error: Forbidden for url: http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkuilCvYjsn
Error extracting text from http://www.washingtontimes.com/news/2016/apr/13/ted-cruz-eyes-double-agent-delegates-in-bid-to-sna/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/apr/13/ted-cruz-eyes-double-agent-delegates-in-bid-to-sna/
URL filtered: http://www.bloomberg.com/news/articles/2015-12-04/rousseff-says-brazil-s-impeachment-proceedings-are-coup-attempt


Processing URLs:  40%|███▉      | 396/1000 [10:54<09:29,  1.06it/s]

Error extracting text from https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/04/opinion/gerrymandering-supreme-court.html


Processing URLs:  40%|███▉      | 399/1000 [10:57<09:34,  1.05it/s]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-protests/thousands-march-in-tel-aviv-to-protest-against-netanyahu-corruption-idUSKBN1E30RW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-protests/thousands-march-in-tel-aviv-to-protest-against-netanyahu-corruption-idUSKBN1E30RW


Processing URLs:  40%|████      | 400/1000 [10:59<11:53,  1.19s/it]

Error extracting text from http://www.fda.gov/AboutFDA/CentersOffices/OfficeofMedicalProductsandTobacco/CBER/ucm133463.htm: 404 Client Error: Not Found for url: https://www.fda.gov/about-fda/about-center-biologics-evaluation-and-research/transfer-therapeutic-products-center-drug-evaluation-and-research-cder


Processing URLs:  40%|████      | 403/1000 [11:03<11:42,  1.18s/it]

Error extracting text from http://cepa.org/The-Russian-Cyber-Threat-Views-from-Estonia: 404 Client Error: Not Found for url: https://cepa.org/The-Russian-Cyber-Threat-Views-from-Estonia
Error extracting text from https://www.wsj.com/livecoverage/tax-bill-2017: 403 Client Error: Forbidden for url: https://www.wsj.com/livecoverage/tax-bill-2017


Processing URLs:  40%|████      | 404/1000 [11:06<15:22,  1.55s/it]

URL filtered: https://twitter.com/USNavy/status/916797092739284993
URL filtered: https://m.facebook.com/yudkowsky/posts/10153914357214228?pnref=story


Processing URLs:  41%|████      | 408/1000 [11:10<12:18,  1.25s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-09/tensions-high-on-korean-border-as-north-raises-threat-of-war


Processing URLs:  41%|████      | 411/1000 [11:10<07:41,  1.28it/s]

Error extracting text from https://www.amazon.com/Amazon-Prime-Air/b?ie=UTF8&amp;node=8037720011: 503 Server Error: Service Unavailable for url: https://www.amazon.com/Amazon-Prime-Air/b?ie=UTF8&amp;node=8037720011


Processing URLs:  41%|████      | 412/1000 [11:12<10:26,  1.07s/it]

Error extracting text from https://www.theicct.org/publications/prevalence-heavy-fuel-oil-and-black-carbon-arctic-shipping-2015-2025: 403 Client Error: Forbidden for url: http://theicct.org/publications/prevalence-heavy-fuel-oil-and-black-carbon-arctic-shipping-2015-2025


Processing URLs:  41%|████▏     | 413/1000 [11:15<12:44,  1.30s/it]

Error extracting text from https://www.rocketlaunchschedule.com/arianespace-ariane-5-eca-ses-17-syracuse-4a/: 404 Client Error: Not Found for url: https://www.rocketlaunchschedule.com/arianespace-ariane-5-eca-ses-17-syracuse-4a/


Processing URLs:  41%|████▏     | 414/1000 [11:17<14:29,  1.48s/it]

Error extracting text from http://www.boston.com/news/politics/2015/09/13/new-poll-shows-bernie-sanders-leading-hillary-clinton-record-margins-new-hampshire-and-iowa/rYcXqdkN0Uq4NDQo64pIvI/story.html: 404 Client Error: Not Found for url: https://www.boston.com/news/politics/2015/09/13/new-poll-shows-bernie-sanders-leading-hillary-clinton-record-margins-new-hampshire-and-iowa/rYcXqdkN0Uq4NDQo64pIvI/story.html


Processing URLs:  42%|████▏     | 416/1000 [11:22<21:27,  2.20s/it]

Error extracting text from https://horizon-magazine.eu/article/back-boom-supersonic-planes-get-ready-quieter-greener-comeback.html: 404 Client Error: Not Found for url: https://projects.research-and-innovation.ec.europa.eu/en/horizon-magazinearticle/back-boom-supersonic-planes-get-ready-quieter-greener-comeback.html
URL filtered: http://www.bloomberg.com/news/articles/2015-12-18/boj-unveils-2-5-billion-etf-boost-to-offset-planned-stock-sales


Processing URLs:  42%|████▏     | 420/1000 [11:29<21:11,  2.19s/it]

Error extracting text from http://www.choosingtolead.net/john-hay-blog/2016/2/1/santoss-gamble-for-peace: 404 Client Error: Not Found for url: https://choosingtolead.net/john-hay-blog/2016/2/1/santoss-gamble-for-peace


Processing URLs:  42%|████▎     | 425/1000 [11:37<15:15,  1.59s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/today-headlines/2016-08/09/content_7197122.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/today-headlines/2016-08/09/content_7197122.htm


Processing URLs:  43%|████▎     | 429/1000 [11:42<09:11,  1.04it/s]

Error extracting text from http://www.nytimes.com/2016/12/09/world/asia/hong-kong-cy-leung-reelection.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/09/world/asia/hong-kong-cy-leung-reelection.html
Error extracting text from http://www.reuters.com/article/us-northkorea-rockets-idUSKBN0TS2QF20151209#35VMKYit4y8FKAQ0.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-rockets-idUSKBN0TS2QF20151209#35VMKYit4y8FKAQ0.99


Processing URLs:  43%|████▎     | 431/1000 [11:43<06:43,  1.41it/s]

Error extracting text from https://www.timesofisrael.com/iran-announces-joint-naval-exercise-with-china-russia/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/iran-announces-joint-naval-exercise-with-china-russia/


Processing URLs:  43%|████▎     | 433/1000 [11:45<09:07,  1.04it/s]

Error extracting text from https://www.fcc.gov/news-events/blog/2020/10/21/fccs-authority-interpret-section-230-communications-act: 403 Client Error: Forbidden for url: https://www.fcc.gov/news-events/blog/2020/10/21/fccs-authority-interpret-section-230-communications-act


Processing URLs:  44%|████▎     | 436/1000 [11:47<06:19,  1.49it/s]

Error extracting text from https://www.nytimes.com/2017/02/06/us/politics/obamacare-tom-price-trump.html?emc=edit_th_20170207&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/06/us/politics/obamacare-tom-price-trump.html?emc=edit_th_20170207&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0
URL filtered: https://www.youtube.com/watch?v=PuNymhcTtSQ


Processing URLs:  44%|████▍     | 443/1000 [11:53<08:27,  1.10it/s]

Error extracting text from http://blogs.barrons.com/techtraderdaily/2015/10/07/apple-iphone-6s-sales-in-china-below-expectations-says-boutique-researcher-j-l-warren/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/techtraderdaily/2015/10/07/apple-iphone-6s-sales-in-china-below-expectations-says-boutique-researcher-j-l-warren/


Processing URLs:  44%|████▍     | 444/1000 [11:55<10:43,  1.16s/it]

Error extracting text from http://www.siliconbeat.com/2016/02/05/googles-alphago-to-take-on-world-champ/: 404 Client Error: Not Found for url: https://www.mercurynews.com/tag/siliconbeat/2016/02/05/googles-alphago-to-take-on-world-champ/


Processing URLs:  44%|████▍     | 445/1000 [11:56<11:43,  1.27s/it]

Error extracting text from http://www.newsweek.com/isis-crushes-mosul-rebellion-aimed-give-city-back-baghdad-509885: 403 Client Error: Forbidden for url: https://www.newsweek.com/isis-crushes-mosul-rebellion-aimed-give-city-back-baghdad-509885


Processing URLs:  45%|████▍     | 446/1000 [11:57<08:53,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-germany-idUSKCN0W80LK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-germany-idUSKCN0W80LK


Processing URLs:  45%|████▍     | 447/1000 [12:01<19:09,  2.08s/it]

Error extracting text from https://www.destatis.de/EN/Publications/Specialized/InternationalData/G7InFigures0000155159004.pdf?__blob=publicationFile: 404 Client Error: Not Found for url: https://www.destatis.de/EN/Publications/Specialized/InternationalData/G7InFigures0000155159004.pdf?__blob=publicationFile


Processing URLs:  45%|████▌     | 451/1000 [12:05<12:37,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=1B6oiLNyKKI
URL filtered: https://www.youtube.com/watch?v=uki4lrLzRaU


Processing URLs:  46%|████▌     | 457/1000 [13:08<2:21:15, 15.61s/it]

Error extracting text from https://www.usnews.com/news/best-states/alabama/articles/2017-11-28/alabama-senate-race-gives-gop-voters-an-uncomfortable-choice: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  46%|████▌     | 458/1000 [13:10<1:47:58, 11.95s/it]

URL filtered: https://www.youtube.com/watch?v=RFUdFKgxkFk


Processing URLs:  46%|████▌     | 461/1000 [13:12<50:13,  5.59s/it]  

Error extracting text from http://www.wsj.com/articles/germany-says-bank-risks-must-be-cut-before-saver-deposit-scheme-agreed-1466157655: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germany-says-bank-risks-must-be-cut-before-saver-deposit-scheme-agreed-1466157655


Processing URLs:  46%|████▋     | 465/1000 [13:20<26:34,  2.98s/it]

Error extracting text from http://www.newsfultoncounty.com/world/news/1015356-jailed-opposition-leader-urges-venezuelans-to-vote: 403 Client Error: Forbidden for url: https://www.newsfultoncounty.com/world/news/1015356-jailed-opposition-leader-urges-venezuelans-to-vote


Processing URLs:  47%|████▋     | 467/1000 [13:21<15:45,  1.77s/it]

Error extracting text from http://afghanistantimes.af/religious-scholars-demand-loya-jirga-call-to-solve-crisis/: 403 Client Error: Forbidden for url: https://afghanistantimes.af/religious-scholars-demand-loya-jirga-call-to-solve-crisis/


Processing URLs:  47%|████▋     | 468/1000 [13:22<14:18,  1.61s/it]

Error extracting text from https://www.sunrisemovement.org/green-new-deal/?ms=WhatistheGreenNewDeal%3F: 403 Client Error: Forbidden for url: https://www.sunrisemovement.org/green-new-deal/?ms=WhatistheGreenNewDeal%3F
Error extracting text from http://www.timesofisrael.com/pm-gifts-probe-could-drag-on-for-many-months-report/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/pm-gifts-probe-could-drag-on-for-many-months-report/


Processing URLs:  47%|████▋     | 470/1000 [13:23<09:41,  1.10s/it]

Error extracting text from https://abcnews.go.com/Technology/wireStory/us-west-prepares-1st-water-shortage-declaration-77139468: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/us-west-prepares-1st-water-shortage-declaration-77139468


Processing URLs:  47%|████▋     | 471/1000 [13:23<07:53,  1.12it/s]

Error extracting text from https://www.nytimes.com/2017/06/13/world/asia/trump-russia-sanctions-veto.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/13/world/asia/trump-russia-sanctions-veto.html?_r=0


Processing URLs:  48%|████▊     | 479/1000 [13:39<18:15,  2.10s/it]

Error extracting text from http://www.reuters.com/article/us-usa-stocks-weekahead-idUSKBN1632A1?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-weekahead-idUSKBN1632A1?il=0


Processing URLs:  48%|████▊     | 481/1000 [13:42<14:47,  1.71s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-15/n-korea-s-no-sail-zone-signals-possible-missile-test-yonhap


Processing URLs:  48%|████▊     | 484/1000 [13:46<13:21,  1.55s/it]

Error extracting text from http://gfs.eiu.com/Article.aspx?articleType=rf&amp;articleid=564149040&amp;secId=1: 403 Client Error: Forbidden for url: https://www.eiu.com/n/global-themes/global-forecasting-hub
Error extracting text from http://www.business-standard.com/article/pti-stories/us-intel-chief-russia-curtailed-hacking-of-us-targets-116111800001_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/pti-stories/us-intel-chief-russia-curtailed-hacking-of-us-targets-116111800001_1.html


Processing URLs:  49%|████▊     | 486/1000 [13:46<08:11,  1.05it/s]

Error extracting text from https://www.uefa.com/uefachampionsleague/draws/: 403 Client Error: Forbidden for url: https://www.uefa.com/uefachampionsleague/draws/


Processing URLs:  49%|████▉     | 489/1000 [13:49<08:19,  1.02it/s]

Error extracting text from https://southfront.org/farc-ep-chief-peace-negotiations-havent-stopped/: HTTPSConnectionPool(host='southfront.org', port=443): Max retries exceeded with url: /farc-ep-chief-peace-negotiations-havent-stopped/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x304dd77a0>: Failed to resolve 'southfront.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  49%|████▉     | 492/1000 [13:58<14:17,  1.69s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/03/asia-pacific/pyongyang-launch-plan-no-secret-specifics-prep-like-reading-tea-leaves/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/02/03/asia-pacific/pyongyang-launch-plan-no-secret-specifics-prep-like-reading-tea-leaves/


Processing URLs:  49%|████▉     | 493/1000 [13:59<13:28,  1.60s/it]

Error extracting text from http://www.iran-daily.com/News/135994.html: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  50%|████▉     | 495/1000 [14:01<10:18,  1.22s/it]

Error extracting text from http://www.tradingeconomics.com/iran/crude-oil-production: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/iran/crude-oil-production


Processing URLs:  50%|█████     | 502/1000 [14:09<08:40,  1.04s/it]

Error extracting text from https://transition.fcc.gov/pshs/911/Apps%20Wrkshp%202015/911_Help_SMS_WhitePaper0515.pdf: 403 Client Error: Forbidden for url: https://transition.fcc.gov/pshs/911/Apps%20Wrkshp%202015/911_Help_SMS_WhitePaper0515.pdf


Processing URLs:  50%|█████     | 504/1000 [14:14<12:42,  1.54s/it]

URL filtered: https://mobile.twitter.com/SpaceX/status/725351354537906176


Processing URLs:  51%|█████     | 506/1000 [14:16<11:13,  1.36s/it]

Error extracting text from http://www.criticalthreats.org/iran-news-round-september-17-2015: 404 Client Error: Not Found for url: https://www.criticalthreats.org/iran-news-round-september-17-2015


Processing URLs:  51%|█████     | 508/1000 [14:18<09:30,  1.16s/it]

Error extracting text from http://thehill.com/homenews/campaign/361451-gop-operative-im-donating-to-a-dem-for-the-first-time-in-alabama-senate: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361451-gop-operative-im-donating-to-a-dem-for-the-first-time-in-alabama-senate/


Processing URLs:  51%|█████     | 511/1000 [15:22<2:30:06, 18.42s/it]

Error extracting text from http://ewp.dali.dartmouth.edu/questions/24: HTTPConnectionPool(host='ewp.dali.dartmouth.edu', port=80): Max retries exceeded with url: /questions/24 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3041e1f70>, 'Connection to ewp.dali.dartmouth.edu timed out. (connect timeout=60)'))


Processing URLs:  52%|█████▏    | 515/1000 [15:27<44:27,  5.50s/it]  

Error extracting text from https://www.timesofisrael.com/a-smiling-rivlin-hints-he-believes-netanyahu-should-resign-if-indicted/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/a-smiling-rivlin-hints-he-believes-netanyahu-should-resign-if-indicted/


Processing URLs:  52%|█████▏    | 520/1000 [15:33<14:28,  1.81s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-syria-safezones-idUSKBN1592O8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-syria-safezones-idUSKBN1592O8


Processing URLs:  52%|█████▏    | 521/1000 [15:34<13:11,  1.65s/it]

Error extracting text from https://www.nytimes.com/2017/08/30/us/politics/trump-north-korea-extortion-money.html?emc=edit_mbae_20170831&amp;nl=&amp;nlid=77825025&amp;te=1&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/30/us/politics/trump-north-korea-extortion-money.html?emc=edit_mbae_20170831&amp;nl=&amp;nlid=77825025&amp;te=1&amp;_r=0


Processing URLs:  53%|█████▎    | 526/1000 [15:40<09:01,  1.14s/it]

Error extracting text from http://warontherocks.com/2016/03/the-three-faces-of-russian-spetsnaz-in-syria/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/03/the-three-faces-of-russian-spetsnaz-in-syria/


Processing URLs:  53%|█████▎    | 528/1000 [15:41<06:06,  1.29it/s]

Error extracting text from http://www.nasdaq.com/article/german-prosecutors-california-regulator-open-fresh-vw-probes-20151125-00643#ixzz3uT74n4kk: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/german-prosecutors-california-regulator-open-fresh-vw-probes-20151125-00643#ixzz3uT74n4kk


Processing URLs:  53%|█████▎    | 534/1000 [15:50<12:29,  1.61s/it]

Error extracting text from http://www.cfr.org/sanctions/economic-sanctions/p36259: 404 Client Error: Not Found for url: https://www.cfr.org/sanctions/economic-sanctions/p36259


Processing URLs:  54%|█████▍    | 538/1000 [15:53<06:43,  1.15it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XR04L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XR04L
Error extracting text from https://www.axios.com/biden-executive-order-mandating-vaccines-4ba564a0-9498-43ef-bef8-f59f36e61418.html: 403 Client Error: Forbidden for url: https://www.axios.com/biden-executive-order-mandating-vaccines-4ba564a0-9498-43ef-bef8-f59f36e61418.html


Processing URLs:  55%|█████▍    | 546/1000 [16:13<15:41,  2.07s/it]

Error extracting text from http://www.france24.com/en/20161022-turkey-hits-syrian-kurd-group-again: 403 Client Error: Forbidden for url: http://www.france24.com/en/20161022-turkey-hits-syrian-kurd-group-again


Processing URLs:  55%|█████▌    | 552/1000 [16:27<12:19,  1.65s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-cris-iraq-mosul-idUSKCN12N0CA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-cris-iraq-mosul-idUSKCN12N0CA


Processing URLs:  55%|█████▌    | 553/1000 [16:28<11:35,  1.55s/it]

Error extracting text from http://www.caam.org.cn/hangye/20170206/1005204431.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20170206/1005204431.html


Processing URLs:  56%|█████▌    | 555/1000 [16:30<09:43,  1.31s/it]

Error extracting text from http://nyti.ms/WMCC7Q: 403 Client Error: Forbidden for url: http://www.nytimes.com/2013/02/03/opinion/sunday/the-great-gerrymander-of-2012.html?smid=tw-share


Processing URLs:  56%|█████▌    | 561/1000 [16:34<04:19,  1.69it/s]

Error extracting text from http://www.wsj.com/articles/democratic-presidential-candidates-scramble-to-lure-joe-bidens-supporters-1446384989: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/democratic-presidential-candidates-scramble-to-lure-joe-bidens-supporters-1446384989


Processing URLs:  56%|█████▋    | 563/1000 [16:40<13:24,  1.84s/it]

Error extracting text from http://www.water.ca.gov/pubs/climate/using_future_climate_projections_to_support_water_resources_decision_making_in_california/usingfutureclimateprojtosuppwater_jun09_web.pdf: 404 Client Error: Not Found for url: https://water.ca.gov/pubs/climate/using_future_climate_projections_to_support_water_resources_decision_making_in_california/usingfutureclimateprojtosuppwater_jun09_web.pdf


Processing URLs:  58%|█████▊    | 576/1000 [17:03<06:05,  1.16it/s]

URL filtered: https://twitter.com/POLITICOEurope?ref_src=twsrc^google|twcamp^serp|twgr^author
Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0XF045: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0XF045
Error extracting text from http://www.business-standard.com/article/markets/saudi-aramco-ipo-unlikely-to-require-rule-changes-regulator-says-116120601000_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/markets/saudi-aramco-ipo-unlikely-to-require-rule-changes-regulator-says-116120601000_1.html


Processing URLs:  58%|█████▊    | 579/1000 [17:06<05:33,  1.26it/s]

Error extracting text from http://www.tradingeconomics.com/malaysia/foreign-exchange-reserves: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/malaysia/foreign-exchange-reserves


Processing URLs:  58%|█████▊    | 580/1000 [17:13<16:15,  2.32s/it]

Error extracting text from http://gulftimes.ae/why-the-ttip-is-more-important-than-tpp/: 404 Client Error:  for url: https://gulftimes.ae/why-the-ttip-is-more-important-than-tpp/


Processing URLs:  58%|█████▊    | 582/1000 [17:17<14:02,  2.01s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13940617001232: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940617001232 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3063216d0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 584/1000 [17:17<08:53,  1.28s/it]

Error extracting text from https://www.espn.com/radio/play/_/id/31519855: 403 Client Error: Forbidden for url: https://www.espn.com/radio/play/_/id/31519855


Processing URLs:  59%|█████▊    | 587/1000 [17:28<13:29,  1.96s/it]

Error extracting text from http://english.aawsat.com/2016/08/article55357175/falih-aramco-issue-underwriting-2018: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/08/article55357175/falih-aramco-issue-underwriting-2018


Processing URLs:  59%|█████▉    | 588/1000 [25:28<15:15:16, 133.29s/it]

Error extracting text from https://www.thespainreport.com/newsitems/623-160709122950-update-pablo-iglesias-says-podemos-here-to-stay-despite-slow-unsexy-parliamentary-future: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /newsitems/623-160709122950-update-pablo-iglesias-says-podemos-here-to-stay-despite-slow-unsexy-parliamentary-future (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x300db02c0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  59%|█████▉    | 589/1000 [25:28<10:56:22, 95.82s/it] 

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN12P1KT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN12P1KT


Processing URLs:  60%|█████▉    | 595/1000 [25:39<1:28:58, 13.18s/it] 

Error extracting text from https://wikileaks.org/podesta-emails/emailid/6008: 403 Client Error: Forbidden for url: https://wikileaks.org/podesta-emails/emailid/6008


Processing URLs:  60%|█████▉    | 598/1000 [26:09<1:12:15, 10.79s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/plan-colombia-how-washington-learned-to-love-latin-american-intervention-again/2016/09/18/ddaeae1c-3199-4ea3-8d0f-69ee1c: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/plan-colombia-how-washington-learned-to-love-latin-american-intervention-again/2016/09/18/ddaeae1c-3199-4ea3-8d0f-69ee1c/


Processing URLs:  60%|█████▉    | 599/1000 [26:10<51:54,  7.77s/it]  

Error extracting text from http://www.nasdaq.com/article/markets-way-ahead-of-fed-when-it-comes-to-higher-borrowing-costs-20150911-00692#ixzz3lTUT5l7S: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/markets-way-ahead-of-fed-when-it-comes-to-higher-borrowing-costs-20150911-00692#ixzz3lTUT5l7S


Processing URLs:  61%|██████    | 606/1000 [26:24<18:20,  2.79s/it]

Error extracting text from http://www.unhcr.org/news/briefing/2016/5/574d564c4/unhcr-concern-high-death-toll-204000-cross-mediterranean-first-5-months.html: 403 Client Error: Forbidden for url: http://www.unhcr.org/news/briefing/2016/5/574d564c4/unhcr-concern-high-death-toll-204000-cross-mediterranean-first-5-months.html


Processing URLs:  61%|██████    | 609/1000 [26:35<16:34,  2.54s/it]

Error extracting text from http://www.nytimes.com/2015/11/25/business/economy/us-economy-q3-growth-gdp-revision.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/25/business/economy/us-economy-q3-growth-gdp-revision.html?_r=0
Error extracting text from http://www.reuters.com/article/2015/11/05/us-opec-delegate-idUSKCN0SU2YR20151105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/05/us-opec-delegate-idUSKCN0SU2YR20151105


Processing URLs:  61%|██████    | 611/1000 [26:39<13:52,  2.14s/it]

Error extracting text from http://www.reuters.com/article/2015/11/29/us-britain-eu-talks-idUSKBN0TI0XA20151129#Uu8awcRlDApvMfqS.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/29/us-britain-eu-talks-idUSKBN0TI0XA20151129#Uu8awcRlDApvMfqS.97


Processing URLs:  61%|██████▏   | 613/1000 [26:41<10:10,  1.58s/it]

Error extracting text from https://pollytix.de/wahltrend/: 403 Client Error: Forbidden for url: https://pollytix.de/wahltrend/


Processing URLs:  62%|██████▏   | 616/1000 [26:45<08:05,  1.26s/it]

Error extracting text from https://www.wsj.com/articles/schumer-calls-on-democrats-to-block-gorsuch-confirmation-1490284872: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/schumer-calls-on-democrats-to-block-gorsuch-confirmation-1490284872


Processing URLs:  62%|██████▏   | 619/1000 [26:47<05:55,  1.07it/s]

Error extracting text from http://in.reuters.com/article/northkorea-satellite-idINKCN0VE0OC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from http://blogs.barrons.com/techtraderdaily/2015/10/26/apple-drops-3-2-ahead-of-3q-earnings-on-iphone-fears/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/techtraderdaily/2015/10/26/apple-drops-3-2-ahead-of-3q-earnings-on-iphone-fears/


Processing URLs:  62%|██████▏   | 620/1000 [26:49<07:08,  1.13s/it]

URL filtered: https://twitter.com/lrozen


Processing URLs:  62%|██████▏   | 623/1000 [26:52<05:51,  1.07it/s]

Error extracting text from https://worldcrunch.com/opinion-analysis/china-and-russia-or-the-west-latin-america-must-choose-a-side-1: 403 Client Error: Forbidden for url: https://worldcrunch.com/opinion-analysis/china-and-russia-or-the-west-latin-america-must-choose-a-side-1


Processing URLs:  63%|██████▎   | 628/1000 [26:57<05:56,  1.04it/s]

Error extracting text from http://www.hi5.com: HTTPSConnectionPool(host='secure.hi5.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  63%|██████▎   | 629/1000 [26:58<06:06,  1.01it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/donald-trump-to-senate-scrap-rules-if-needed-to-confirm-gorsuch/articleshow/56926468.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/donald-trump-to-senate-scrap-rules-if-needed-to-confirm-gorsuch/articleshow/56926468.cms


Processing URLs:  63%|██████▎   | 631/1000 [27:00<05:17,  1.16it/s]

Error extracting text from http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html?_r=0


Processing URLs:  63%|██████▎   | 634/1000 [27:01<02:39,  2.29it/s]

Error extracting text from http://www.reuters.com/article/2015/08/21/eu-trade-usa-idUSL5N10W2ER20150821: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/08/21/eu-trade-usa-idUSL5N10W2ER20150821
Error extracting text from https://www.hindustantimes.com/lifestyle/travel/dubai-airports-see-visitors-spike-in-2022-easing-of-covid-19-curbs-boost-travel-101632209962640.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/lifestyle/travel/dubai-airports-see-visitors-spike-in-2022-easing-of-covid-19-curbs-boost-travel-101632209962640.html


Processing URLs:  64%|██████▍   | 640/1000 [27:15<10:32,  1.76s/it]

Error extracting text from https://www.angrybirds.com/games/: 403 Client Error: Forbidden for url: https://www.angrybirds.com/games/


Processing URLs:  64%|██████▍   | 641/1000 [27:17<10:12,  1.71s/it]

Error extracting text from http://www.fuerzasarmadas.mil.do/detail.aspx?id=842: HTTPConnectionPool(host='www.fuerzasarmadas.mil.do', port=80): Max retries exceeded with url: /detail.aspx?id=842 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303cadca0>: Failed to resolve 'www.fuerzasarmadas.mil.do' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▍   | 647/1000 [27:22<07:16,  1.24s/it]

Error extracting text from http://www.ibtimes.com/iraqis-gather-baghdad-massive-anti-corruption-rally-led-shiite-cleric-muqtada-al-sadr-2324652?rel=rel1: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iraqis-gather-baghdad-massive-anti-corruption-rally-led-shiite-cleric-muqtada-al-sadr-2324652


Processing URLs:  65%|██████▌   | 650/1000 [27:25<06:03,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/nigerian-communities-can-sue-royal-dutch-shell-over-oil-spills-u-k-court-says-1456950713: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nigerian-communities-can-sue-royal-dutch-shell-over-oil-spills-u-k-court-says-1456950713
Error extracting text from http://blogs.wsj.com/bankruptcy/2016/02/05/corporate-bankruptcy-filings-rise-12-in-january/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/bankruptcy/2016/02/05/corporate-bankruptcy-filings-rise-12-in-january/


Processing URLs:  65%|██████▌   | 653/1000 [27:29<06:02,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/sec-investigating-exxon-on-valuing-of-assets-accounting-practices-1474393593: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sec-investigating-exxon-on-valuing-of-assets-accounting-practices-1474393593


Processing URLs:  66%|██████▌   | 659/1000 [27:41<09:16,  1.63s/it]

Error extracting text from http://www.newsweek.com/no-fly-zone-syria-no-no-394046: 403 Client Error: Forbidden for url: https://www.newsweek.com/no-fly-zone-syria-no-no-394046


Processing URLs:  66%|██████▌   | 662/1000 [27:45<08:15,  1.46s/it]

Error extracting text from http://www.dailystarjournal.com/opinion/article_2e6022c7-18e4-5f6f-b1e0-bee555b51719.html: 404 Client Error: Not Found for url: http://www.dailystarjournal.com/opinion/article_2e6022c7-18e4-5f6f-b1e0-bee555b51719.html


Processing URLs:  66%|██████▋   | 665/1000 [27:52<10:38,  1.91s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/25/749777/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/25/749777/story.html


Processing URLs:  67%|██████▋   | 669/1000 [27:56<06:31,  1.18s/it]

Error extracting text from https://www.theafricareport.com/79552/chad-will-the-11-april-presidential-election-be-idriss-deby-itnos-final-one/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/79552/chad-will-the-11-april-presidential-election-be-idriss-deby-itnos-final-one/


Processing URLs:  67%|██████▋   | 672/1000 [27:58<03:55,  1.39it/s]

Error extracting text from http://www.latimes.com: 403 Client Error: Forbidden for url: https://www.latimes.com/
Error extracting text from http://www.vanguardngr.com/2017/03/nigeria-not-threatened-famine-minister/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2017/03/nigeria-not-threatened-famine-minister/


Processing URLs:  67%|██████▋   | 673/1000 [27:59<04:06,  1.33it/s]

Error extracting text from http://allafrica.com/stories/201607070699.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607070699.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x301ab7fe0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  68%|██████▊   | 675/1000 [29:02<1:43:19, 19.07s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2020-12-08/texas-asks-us-supreme-court-to-help-trump-upend-election: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  68%|██████▊   | 678/1000 [29:05<37:51,  7.06s/it]  

Error extracting text from http://english.yonhapnews.co.kr/national/2017/06/09/0301000000AEN20170609008151315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/news/videos/b/98e0d4e6-c3fa-442f-8736-0a02cefeae63


Processing URLs:  68%|██████▊   | 685/1000 [29:15<11:00,  2.10s/it]

Error extracting text from http://heraldvoice.com/2015/11/25/iran-confirms-sentencing-of-washington-post-reporter-jason/: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=heraldvoice.com
URL filtered: http://www.bloomberg.com/news/articles/2015-12-26/japan-coast-guard-says-three-chinese-ships-near-senkaku-islands


Processing URLs:  70%|██████▉   | 695/1000 [29:40<10:12,  2.01s/it]

Error extracting text from http://www.cctv-america.com/2016/09/19/farc-and-colombian-government-close-to-signing-peace-deal: 403 Client Error: Forbidden for url: http://america.cgtn.com/2016/09/19/farc-and-colombian-government-close-to-signing-peace-deal


Processing URLs:  70%|██████▉   | 696/1000 [29:41<09:03,  1.79s/it]

Error extracting text from http://www.payvand.com/news/15/dec/1134.html&quot: 404 Client Error: Not Found for url: http://www.payvand.com/news/15/dec/1134.html&quot
URL filtered: https://www.voanews.com/a/twitter-tells-congress-shut-down-two-hundred-accounts-linked-to-russia/4048833.html


Processing URLs:  70%|██████▉   | 698/1000 [29:43<07:23,  1.47s/it]

Error extracting text from http://tass.ru/en/defense/847901: 404 Client Error: Not Found for url: https://tass.ru/en/defense/847901


Processing URLs:  70%|███████   | 705/1000 [29:53<05:34,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-mali-attacks-idUSKCN0W42G2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mali-attacks-idUSKCN0W42G2


Processing URLs:  71%|███████   | 709/1000 [29:58<06:44,  1.39s/it]

Error extracting text from https://www.wpr.org/russian-hackers-may-disrupt-civil-society-groups-next: 404 Client Error: Not Found for url: https://www.wpr.org/russian-hackers-may-disrupt-civil-society-groups-next


Processing URLs:  71%|███████   | 712/1000 [30:01<05:06,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-jaafari-idUSKBN16B0B2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-jaafari-idUSKBN16B0B2


Processing URLs:  71%|███████▏  | 714/1000 [30:02<03:33,  1.34it/s]

Error extracting text from https://www.oddschecker.com/politics/european-politics/german-politics/next-chancellor: 403 Client Error: Forbidden for url: https://www.oddschecker.com/politics/european-politics/german-politics/next-chancellor


Processing URLs:  72%|███████▏  | 715/1000 [30:03<02:46,  1.71it/s]

Error extracting text from https://www.tribuneindia.com/news/world/china-sharpens-language-warns-taiwan-that-independence-means-war-204579: 403 Client Error: Forbidden for url: https://www.tribuneindia.com/news/world/china-sharpens-language-warns-taiwan-that-independence-means-war-204579


Processing URLs:  72%|███████▏  | 717/1000 [30:05<04:36,  1.02it/s]

Error extracting text from http://www.ibtimes.com/chinese-consumers-bought-nearly-300-more-electric-cars-year-compared-2014-2222591: 403 Client Error: Forbidden for url: https://www.ibtimes.com/chinese-consumers-bought-nearly-300-more-electric-cars-year-compared-2014-2222591


Processing URLs:  72%|███████▏  | 719/1000 [30:13<10:53,  2.32s/it]

Error extracting text from https://www.state.gov/documents/organization/202926.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:  72%|███████▏  | 720/1000 [30:15<09:40,  2.07s/it]

URL filtered: https://news.sky.com/story/ebola-third-case-of-virus-in-a-week-discovered-in-democratic-republic-of-congo-12216373?dcmp=snt-sf-twitter


Processing URLs:  72%|███████▎  | 725/1000 [30:34<13:16,  2.90s/it]

Error extracting text from https://www.nytimes.com/2017/09/04/world/asia/north-korea-nuclear-south-us-alliance.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/04/world/asia/north-korea-nuclear-south-us-alliance.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region


Processing URLs:  73%|███████▎  | 727/1000 [30:38<11:10,  2.45s/it]

Error extracting text from http://www.channelnewsasia.com/news/business/singapore/china-based-bhg-retail/2315844.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/singapore/china-based-bhg-retail/2315844.html


Processing URLs:  73%|███████▎  | 731/1000 [30:45<07:11,  1.60s/it]

Error extracting text from https://www.un.org/press/en/2021/ga12339.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/ga12339.doc.htm


Processing URLs:  73%|███████▎  | 734/1000 [30:48<04:46,  1.08s/it]

Error extracting text from https://news.mongabay.com/2021/06/the-brazilian-amazon-is-burning-again/: 403 Client Error: Forbidden for url: https://news.mongabay.com/2021/06/the-brazilian-amazon-is-burning-again/


Processing URLs:  74%|███████▎  | 737/1000 [30:55<08:14,  1.88s/it]

Error extracting text from http://africanarguments.org/2016/02/03/let-us-be-heard-burundis-refugees-tell-stories-of-ethnic-targeting/: 403 Client Error: Forbidden for url: http://africanarguments.org/2016/02/03/let-us-be-heard-burundis-refugees-tell-stories-of-ethnic-targeting/


Processing URLs:  74%|███████▍  | 740/1000 [31:00<06:51,  1.58s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKCN11Q01Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKCN11Q01Q


Processing URLs:  74%|███████▍  | 741/1000 [31:29<42:25,  9.83s/it]

Error extracting text from http://data.cnbc.com/quotes/@LCO.1: 504 Server Error: Gateway Time-out for url: http://data.cnbc.com/quotes/@LCO.1


Processing URLs:  74%|███████▍  | 744/1000 [31:36<20:45,  4.87s/it]

Error extracting text from https://reut.rs/2YI8a52: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-china-military/west-concerned-by-closer-russia-china-ties-top-nato-general-says-idUSKBN2A32KK?il=0


Processing URLs:  75%|███████▍  | 746/1000 [31:38<12:37,  2.98s/it]

Error extracting text from https://www.givedirectly.org/ubi-study/: 403 Client Error: Forbidden for url: https://www.givedirectly.org/ubi-study/


Processing URLs:  75%|███████▍  | 748/1000 [31:41<08:59,  2.14s/it]

Error extracting text from https://www.courtlistener.com/docket/4609586/waymo-llc-v-uber-technologies-inc/?filed_after=&amp;filed_before=&amp;entry_gte=&amp;entry_lte=&amp;order_by=desc: 403 Client Error: Forbidden for url: https://www.courtlistener.com/docket/4609586/waymo-llc-v-uber-technologies-inc/?filed_after=&amp;filed_before=&amp;entry_gte=&amp;entry_lte=&amp;order_by=desc


Processing URLs:  75%|███████▌  | 750/1000 [31:43<05:55,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-govt-idUSKCN0Z8184: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-govt-idUSKCN0Z8184


Processing URLs:  76%|███████▌  | 757/1000 [31:50<03:51,  1.05it/s]

Error extracting text from http://in.reuters.com/article/india-oil-imports-idINKBN0P41GI20150624: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  76%|███████▌  | 758/1000 [31:51<04:08,  1.03s/it]

Error extracting text from http://www.news.com.au/world/breaking-news/265-deaths-in-failed-turkey-coup/news-story/1111b2a30416fb70d83299cf505612f1: 404 Client Error: Not Found for url: https://www.news.com.au/404.php


Processing URLs:  76%|███████▌  | 760/1000 [31:53<04:31,  1.13s/it]

URL filtered: https://www.forbes.com/sites/larrydownes/2017/03/06/youtube-tv-improves-outlook-for-att-time-warner-merger/#5f2c38d3e597


Processing URLs:  77%|███████▋  | 768/1000 [32:07<05:36,  1.45s/it]

Error extracting text from https://seekingalpha.com/article/4347603-american-airlines-possible-path-to-bankruptcy: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4347603-american-airlines-possible-path-to-bankruptcy


Processing URLs:  77%|███████▋  | 773/1000 [32:22<09:45,  2.58s/it]

Error extracting text from http://www.economist.com/blogs/graphicdetail/2016/01/graphics-political-and-economic-guide-venezuela: 404 Client Error: Not Found for url: https://www.economist.com/blogs/graphicdetail/2016/01/graphics-political-and-economic-guide-venezuela
Error extracting text from http://www.opec.org/opec_web/en/press_room/3146.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/3146.htm


Processing URLs:  78%|███████▊  | 775/1000 [32:22<05:52,  1.57s/it]

Error extracting text from https://www.americaeconomia.com/analisis-opinion/brasil-ganara-bolsonaro-las-elecciones-de-2022: 404 Client Error: Not Found for url: https://www.americaeconomia.com/analisis-opinion/brasil-ganara-bolsonaro-las-elecciones-de-2022


Processing URLs:  78%|███████▊  | 777/1000 [33:25<1:03:25, 17.06s/it]

Error extracting text from http://www.usnews.com/opinion/articles/2016-09-15/natalia-gherman-should-be-the-next-un-secretary-general: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  78%|███████▊  | 780/1000 [33:28<25:18,  6.90s/it]  

Error extracting text from http://news.yahoo.com/trump-picked-stock-fraud-felon-senior-adviser-085416435--finance.html: 404 Client Error: Not Found for url: http://news.yahoo.com/trump-picked-stock-fraud-felon-senior-adviser-085416435--finance.html


Processing URLs:  78%|███████▊  | 784/1000 [33:34<10:01,  2.79s/it]

Error extracting text from http://blogs.wsj.com/chinarealtime/2016/04/18/think-chinas-xi-jinping-is-in-trouble-think-again/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/chinarealtime/2016/04/18/think-chinas-xi-jinping-is-in-trouble-think-again/


Processing URLs:  79%|███████▊  | 787/1000 [33:41<08:59,  2.53s/it]

Error extracting text from http://www.presstv.com/Detail/2016/01/07/444830/Venezuela-Maduro-cabinet-reshuffle-Luis-Salas: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/01/07/444830/Venezuela-Maduro-cabinet-reshuffle-Luis-Salas
URL filtered: https://m.youtube.com/watch?v=yPbNlHxkHEs


Processing URLs:  79%|███████▉  | 793/1000 [33:50<06:06,  1.77s/it]

Error extracting text from http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/targeted_attacks_against_the_energy_sector.pdf: 404 Client Error: Not Found for url: https://www.broadcom.com/404-symantec?sourceURL=http://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/targeted_attacks_against_the_energy_sector.pdf?


Processing URLs:  80%|████████  | 800/1000 [34:01<04:39,  1.40s/it]

Error extracting text from http://en.trend.az/iran/nuclearp/2439987.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/nuclearp/2439987.html
Error extracting text from http://www.reuters.com/article/us-emirates-iran-gold-idUSBRE89M0SW20121023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-emirates-iran-gold-idUSBRE89M0SW20121023


Processing URLs:  80%|████████  | 801/1000 [34:01<03:25,  1.03s/it]

Error extracting text from http://www.nytimes.com/2015/12/30/sports/cricket/pakistan-cricket-mired-in-threefold-controversy.html?emc=edit_ee_20151230&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/30/sports/cricket/pakistan-cricket-mired-in-threefold-controversy.html?emc=edit_ee_20151230&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  80%|████████  | 804/1000 [34:04<03:47,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=SqYc8WJzaHw


Processing URLs:  81%|████████  | 810/1000 [34:12<04:13,  1.34s/it]

Error extracting text from http://www.wsj.com/articles/the-u-s-senate-takes-on-iran-1458605304: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-u-s-senate-takes-on-iran-1458605304


Processing URLs:  81%|████████  | 812/1000 [34:14<03:18,  1.06s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-referendum-poll-idUSKBN17F0V3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-poll-idUSKBN17F0V3


Processing URLs:  81%|████████▏ | 814/1000 [34:16<03:28,  1.12s/it]

Error extracting text from http://in.reuters.com/article/2015/11/02/opec-report-idINKCN0SR1VY20151102: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  82%|████████▏ | 815/1000 [34:22<07:23,  2.40s/it]

Error extracting text from http://www.acleddata.com/data/realtime-data-2016: 404 Client Error: Not Found for url: https://acleddata.com/data/realtime-data-2016
Error extracting text from https://www.reuters.com/lifestyle/sports/biden-says-us-considering-diplomatic-boycott-beijing-olympics-2021-11-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/biden-says-us-considering-diplomatic-boycott-beijing-olympics-2021-11-18/


Processing URLs:  82%|████████▏ | 820/1000 [34:27<03:50,  1.28s/it]

Error extracting text from https://legal.un.org/repertory/art27.shtml: HTTPSConnectionPool(host='legal.un.org', port=443): Max retries exceeded with url: /repertory/art27.shtml (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from http://www.nytimes.com/2016/05/27/opinion/a-saudi-ipo-buyer-beware.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/27/opinion/a-saudi-ipo-buyer-beware.html


Processing URLs:  82%|████████▏ | 823/1000 [34:34<04:59,  1.69s/it]

Error extracting text from http://uk.reuters.com/article/uk-germany-election-spd-idUKKBN18Z2SA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-global-markets-idUSKCN0YG01L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-markets-idUSKCN0YG01L


Processing URLs:  83%|████████▎ | 826/1000 [34:35<02:44,  1.06it/s]

Error extracting text from http://www.wsj.com/articles/peruvians-voting-for-a-new-president-and-congress-1460293200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/peruvians-voting-for-a-new-president-and-congress-1460293200


Processing URLs:  83%|████████▎ | 830/1000 [34:46<05:57,  2.10s/it]

Error extracting text from http://evobsession.com/2-things-to-realize-about-2015-tesla-sales-projections/: 403 Client Error: Forbidden for url: http://evobsession.com/2-things-to-realize-about-2015-tesla-sales-projections/


Processing URLs:  83%|████████▎ | 831/1000 [34:47<04:41,  1.66s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/page/ct-perspec-page-trolls-russia-trump-blacktivist-1029-20171027-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/page/ct-perspec-page-trolls-russia-trump-blacktivist-1029-20171027-story.html


Processing URLs:  84%|████████▎ | 835/1000 [34:50<03:06,  1.13s/it]

Error extracting text from http://nyti.ms/1nUeqCo: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/asia/north-korea-moves-up-rocket-launching-plan.html?smid=pl-share


Processing URLs:  84%|████████▎ | 837/1000 [34:54<03:41,  1.36s/it]

Error extracting text from https://www.realclearpolitics.com/epolls/2017/house/ga/georgia_6th_district_runoff_election_handel_vs_ossoff-6202.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2017/house/ga/georgia_6th_district_runoff_election_handel_vs_ossoff-6202.html#polls


Processing URLs:  84%|████████▍ | 838/1000 [34:54<02:44,  1.02s/it]

Error extracting text from https://www.nytimes.com/reuters/2017/07/30/world/africa/30reuters-somalia-attacks-offical.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/07/30/world/africa/30reuters-somalia-attacks-offical.html


Processing URLs:  84%|████████▍ | 840/1000 [34:59<04:37,  1.73s/it]

Error extracting text from http://www.ibtimes.com/myspace-saves-time-inc-kicks-first-revenue-growth-over-year-2364623: 403 Client Error: Forbidden for url: https://www.ibtimes.com/myspace-saves-time-inc-kicks-first-revenue-growth-over-year-2364623


Processing URLs:  84%|████████▍ | 841/1000 [35:02<05:24,  2.04s/it]

Error extracting text from https://jobs.lever.co/faradayfuture: 404 Client Error: Not Found for url: https://jobs.lever.co/faradayfuture
URL filtered: https://www.bloomberg.com/news/articles/2017-07-31/potential-u-s-oil-sanctions-increase-risk-of-venezuelan-default


Processing URLs:  84%|████████▍ | 843/1000 [35:02<03:02,  1.16s/it]

Error extracting text from https://www.nytimes.com/2017/08/02/opinion/pence-trump-russia-europe.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/02/opinion/pence-trump-russia-europe.html


Processing URLs:  85%|████████▍ | 846/1000 [35:06<02:34,  1.00s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=fr&amp;u=http://www.aumilitaire.com/forum/topic/22948-l%25E2%2580%2599efficacit%25C3%25A9-discr%25C3%25A8te-des-instructeurs-militaires-fran%25C3%25A7ais-affect%25C3%25A9s-aupr%25C3%25A8s-des-forces-irakiennes/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=fr&amp;u=http://www.aumilitaire.com/forum/topic/22948-l%25E2%2580%2599efficacit%25C3%25A9-discr%25C3%25A8te-des-instructeurs-militaires-fran%25C3%25A7ais-affect%25C3%25A9s-aupr%25C3%25A8s-des-forces-irakiennes/&amp;prev=search


Processing URLs:  85%|████████▍ | 848/1000 [35:08<02:19,  1.09it/s]

Error extracting text from http://www.arabnews.com/node/1027946/saudi-arabia: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1027946/saudi-arabia


Processing URLs:  85%|████████▌ | 851/1000 [35:11<02:08,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-iexgroup-sec-idUSKBN0U123L20151218: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iexgroup-sec-idUSKBN0U123L20151218


Processing URLs:  85%|████████▌ | 853/1000 [35:14<02:41,  1.10s/it]

Error extracting text from http://www.businessinsider.com/r-philippines-plans-tracking-system-for-civilian-flights-over-disputed-sea-2016-1?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-philippines-plans-tracking-system-for-civilian-flights-over-disputed-sea-2016-1?IR=T


Processing URLs:  86%|████████▌ | 858/1000 [35:21<02:40,  1.13s/it]

Error extracting text from http://www.caracaschronicles.com/2016/09/18/pdvsa-swapeameesta/: 403 Client Error: Forbidden for url: http://www.caracaschronicles.com/2016/09/18/pdvsa-swapeameesta/


Processing URLs:  86%|████████▌ | 859/1000 [35:22<02:11,  1.07it/s]

Error extracting text from http://blogs.reuters.com/breakingviews/2015/12/03/saudi-has-chance-to-teach-hedge-funds-a-lesson/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2015/12/03/saudi-has-chance-to-teach-hedge-funds-a-lesson/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe74dbe0>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▌ | 860/1000 [35:22<01:53,  1.24it/s]

Error extracting text from http://newsinfo.inquirer.net/780653/us-ups-pressure-on-isis-with-first-b-52-bomber-strike: 403 Client Error: Forbidden for url: https://newsinfo.inquirer.net/780653/us-ups-pressure-on-isis-with-first-b-52-bomber-strike


Processing URLs:  86%|████████▌ | 861/1000 [35:23<01:56,  1.20it/s]

Error extracting text from http://www.hydroworld.com/articles/2017/08/hindustan-construction-secures-epc-contract-for-india-s-93-mw-new-ganderbal-hydropower-plant.html: 403 Client Error: Forbidden for url: https://www.hydroreview.com/


Processing URLs:  87%|████████▋ | 871/1000 [35:47<02:57,  1.38s/it]

Error extracting text from https://www.reuters.com/business/energy/german-court-dismisses-challenge-nord-stream-2-pipeline-consortium-2021-08-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/german-court-dismisses-challenge-nord-stream-2-pipeline-consortium-2021-08-25/


Processing URLs:  87%|████████▋ | 874/1000 [35:50<02:03,  1.02it/s]

Error extracting text from http://www.nytimes.com/2016/07/16/nyregion/zika-virus-female-to-male-sexual-transmission.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/16/nyregion/zika-virus-female-to-male-sexual-transmission.html


Processing URLs:  88%|████████▊ | 878/1000 [35:53<01:25,  1.43it/s]

Error extracting text from https://www.sciencemag.org/news/2021/06/claim-chinese-team-hid-early-sars-cov-2-sequences-stymie-origin-hunt-sparks-furor: 403 Client Error: Forbidden for url: https://www.science.org/news/2021/06/claim-chinese-team-hid-early-sars-cov-2-sequences-stymie-origin-hunt-sparks-furor
Error extracting text from http://www.reuters.com/article/us-oil-opec-talks-idUSKCN0WI0Q2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-opec-talks-idUSKCN0WI0Q2
Error extracting text from http://www.reuters.com/article/2015/09/15/us-iran-nuclear-setad-insight-idUSKCN0RF0E920150915: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/15/us-iran-nuclear-setad-insight-idUSKCN0RF0E920150915


Processing URLs:  88%|████████▊ | 880/1000 [35:54<00:50,  2.36it/s]

Error extracting text from http://www.reuters.com/article/us-un-election-idUSKCN10E20G?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-idUSKCN10E20G?il=0


Processing URLs:  88%|████████▊ | 883/1000 [36:18<07:24,  3.80s/it]

Error extracting text from https://www.biblicalarchaeology.org/daily/ancient-cultures/ancient-near-eastern-world/copper-ingots-ancient-shipwreck-turkish-coast/: 403 Client Error: Forbidden for url: https://www.biblicalarchaeology.org/daily/ancient-cultures/ancient-near-eastern-world/copper-ingots-ancient-shipwreck-turkish-coast/


Processing URLs:  88%|████████▊ | 885/1000 [36:19<04:05,  2.14s/it]

Error extracting text from http://www.lockheedmartin.com/gbsd: 404 Client Error: Not Found for url: https://www.lockheedmartin.com/gbsd
Error extracting text from http://www.reuters.com/article/us-hongkong-politics-idUSKBN13Y0NA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-politics-idUSKBN13Y0NA


Processing URLs:  89%|████████▊ | 887/1000 [36:22<03:13,  1.72s/it]

URL filtered: https://www.bloomberglaw.com/product/blaw/document/OVRMTYSYF01S?bc=W1siU2VhcmNoIFJlc3VsdHMiLCIvcHJvZHVjdC9ibGF3L3NlYXJjaC9yZXN1bHRzLzg3YjljYWI3NDU4ZGI4MmQxMmQxNWQ3ZmQ0YzgyZGEyIl1d--0b21224ceb8d77c9a3ed64bed71dabb198b57e85&amp;bestStory=true&amp;headlineOnly=false&amp;highlight=PDVSA
URL filtered: http://www.bloomberg.com/news/articles/2016-05-04/two-tesla-production-chiefs-to-leave-ahead-of-biggest-challenge-yet


Processing URLs:  89%|████████▉ | 890/1000 [36:24<02:00,  1.09s/it]

Error extracting text from http://calhoun.nps.edu/bitstream/handle/10945/30190/soviettheoryofre00chot.pdf?sequence=1: 403 Client Error: Forbidden for url: https://calhoun.nps.edu/bitstream/handle/10945/30190/soviettheoryofre00chot.pdf?sequence=1


Processing URLs:  89%|████████▉ | 894/1000 [36:33<02:57,  1.67s/it]

Error extracting text from https://www.ghi.gov/wherewework/docs/dominicanrepublicstrategy.pdf: HTTPSConnectionPool(host='www.ghi.gov', port=443): Max retries exceeded with url: /wherewework/docs/dominicanrepublicstrategy.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30060e780>: Failed to resolve 'www.ghi.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|████████▉ | 897/1000 [36:35<01:49,  1.07s/it]

Error extracting text from https://www.top1000funds.com/wp-content/uploads/2020/11/Seeking-income-private-debt_MG.pdf: 403 Client Error: Forbidden for url: https://www.top1000funds.com/wp-content/uploads/2020/11/Seeking-income-private-debt_MG.pdf


Processing URLs:  90%|█████████ | 901/1000 [36:40<01:57,  1.19s/it]

URL filtered: https://twitter.com/phildstewart/status/1498422706567585794


Processing URLs:  90%|█████████ | 903/1000 [36:40<01:12,  1.35it/s]

Error extracting text from http://theiowarepublican.com/2015/trumps-impending-demise-has-been-greatly-overstated/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/trumps-impending-demise-has-been-greatly-overstated/


Processing URLs:  90%|█████████ | 905/1000 [36:42<01:19,  1.19it/s]

Error extracting text from http://www.biznews.com/leadership/2016/10/25/south-africans-mobilise-to-save-sa-from-state-capture-can-they-eject-zuma: 404 Client Error: Not Found for url: https://www.biznews.com/leadership/2016/10/25/south-africans-mobilise-to-save-sa-from-state-capture-can-they-eject-zuma


Processing URLs:  91%|█████████ | 907/1000 [36:49<02:50,  1.84s/it]

Error extracting text from https://lfpress.com/news/local-news/omicron-moving-quicker-than-we-can-identify-cases-londons-top-doctor: 403 Client Error: Forbidden for url: https://lfpress.com/news/local-news/omicron-moving-quicker-than-we-can-identify-cases-londons-top-doctor


Processing URLs:  91%|█████████ | 908/1000 [36:51<02:45,  1.79s/it]

Error extracting text from http://tass.ru/en/defense/840427: 404 Client Error: Not Found for url: https://tass.ru/en/defense/840427


Processing URLs:  91%|█████████ | 910/1000 [37:14<08:50,  5.90s/it]

Error extracting text from http://www.aikenstandard.com/20161003/161009890/russia-cancels-plutonium-agreement-leaving-mox-future-uncertain: 404 Client Error: Not Found for url: https://www.postandcourier.com/aikenstandard/20161003/161009890/russia-cancels-plutonium-agreement-leaving-mox-future-uncertain/


Processing URLs:  91%|█████████▏| 914/1000 [37:36<05:44,  4.01s/it]

Error extracting text from http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html?ref=world&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html?ref=world&amp;_r=0
URL filtered: http://www.bloomberg.com/news/articles/2015-08-10/ravaged-by-oil-s-collapse-venezuela-now-has-a-big-gold-problem


Processing URLs:  92%|█████████▏| 921/1000 [37:42<01:20,  1.02s/it]

Error extracting text from http://www.nytimes.com/2015/11/27/world/europe/paris-attacks-have-many-in-france-eager-to-join-the-fight.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/27/world/europe/paris-attacks-have-many-in-france-eager-to-join-the-fight.html
Error extracting text from http://www.reuters.com/article/us-finland-russia-idUSKCN1270ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-finland-russia-idUSKCN1270ID


Processing URLs:  92%|█████████▎| 925/1000 [37:49<01:32,  1.24s/it]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-swap-idUSL1N1CJ026: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-swap-idUSL1N1CJ026


Processing URLs:  93%|█████████▎| 927/1000 [37:53<01:59,  1.64s/it]

Error extracting text from http://www.transparency.org/cpi2015#results-table: 404 Client Error: Not Found for url: https://www.transparency.org/en/cpi2015#results-table


Processing URLs:  93%|█████████▎| 928/1000 [38:53<22:55, 19.10s/it]

Error extracting text from http://www.sacbee.com/news/state/california/water-and-drought/article54957530.html: HTTPConnectionPool(host='www.sacbee.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 931/1000 [39:00<08:59,  7.82s/it]

Error extracting text from http://www.pwccn.com/home/eng/pr_020715.html: 404 Client Error: Not Found for url: https://www.pwccn.com/home/eng/pr_020715.html


Processing URLs:  93%|█████████▎| 934/1000 [39:05<04:11,  3.82s/it]

Error extracting text from http://time.com/4714169/canada-marijuana-pot-weed-legalization-2018-justin-trudeau/: 404 Client Error: Not Found for url: https://time.com/4714169/canada-marijuana-pot-weed-legalization-2018-justin-trudeau/


Processing URLs:  94%|█████████▎| 935/1000 [39:07<03:21,  3.09s/it]

Error extracting text from http://www.newsweek.com/trump-putin-russia-interfered-presidential-election-541302: 403 Client Error: Forbidden for url: https://www.newsweek.com/trump-putin-russia-interfered-presidential-election-541302


Processing URLs:  94%|█████████▎| 937/1000 [39:11<02:39,  2.53s/it]

Error extracting text from http://debka.com/article/25333/Russian-nuclear-capable-Iskander-missiles-deployed-in-Syria: HTTPSConnectionPool(host='debka.com', port=443): Max retries exceeded with url: /article/25333/Russian-nuclear-capable-Iskander-missiles-deployed-in-Syria (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  94%|█████████▍| 938/1000 [39:12<02:08,  2.07s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-18/billionaire-ally-of-putin-socialized-with-kushner-ivanka-trump


Processing URLs:  95%|█████████▍| 946/1000 [39:26<01:53,  2.11s/it]

Error extracting text from http://www.hackmageddon.com/2014/07/07/june-2014-cyber-attacks-statistics/: 403 Client Error: Forbidden for url: http://www.hackmageddon.com/2014/07/07/june-2014-cyber-attacks-statistics/


Processing URLs:  95%|█████████▍| 947/1000 [39:28<01:41,  1.91s/it]

Error extracting text from https://tass.com/society/834027: 502 Server Error: Bad Gateway for url: https://tass.com/society/834027


Processing URLs:  96%|█████████▌| 955/1000 [39:39<01:03,  1.40s/it]

Error extracting text from http://www.ibtimes.com/ethiopia-crackdown-dissent-least-140-people-killed-during-protests-ethnic-oromia-2259225: 403 Client Error: Forbidden for url: https://www.ibtimes.com/ethiopia-crackdown-dissent-least-140-people-killed-during-protests-ethnic-oromia-2259225


Processing URLs:  96%|█████████▌| 959/1000 [40:17<04:11,  6.14s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-germany-exclusive-idUSKBN17T2IY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-germany-exclusive-idUSKBN17T2IY
URL filtered: https://twitter.com/Kriseman/status/1358108483116298240


Processing URLs:  96%|█████████▋| 963/1000 [40:20<01:24,  2.29s/it]

Error extracting text from https://www.nytimes.com/2022/01/29/world/asia/north-korea-missile-test.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/29/world/asia/north-korea-missile-test.html
Error extracting text from https://www.reuters.com/article/us-pdvsa-nustar-ener-storage-exclusive/exclusive-pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills-idUSKBN1CP23L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-pdvsa-nustar-ener-storage-exclusive/exclusive-pdvsa-blocked-from-using-nustar-terminal-over-unpaid-bills-idUSKBN1CP23L


Processing URLs:  96%|█████████▋| 965/1000 [40:24<01:20,  2.30s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-offensive-idUSKCN0YY07K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-offensive-idUSKCN0YY07K


Processing URLs:  97%|█████████▋| 968/1000 [40:25<00:38,  1.20s/it]

Error extracting text from https://www.reuters.com/article/us-palestinians-reconciliation-rafah/egypt-gaza-border-opens-under-pa-control-for-first-time-in-a-decade-idUSKBN1DI0C9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-palestinians-reconciliation-rafah/egypt-gaza-border-opens-under-pa-control-for-first-time-in-a-decade-idUSKBN1DI0C9
URL filtered: http://www.bloomberg.com/news/articles/2015-09-07/in-hungry-venezuela-a-brazil-beef-giant-has-extraordinary-power


Processing URLs:  97%|█████████▋| 974/1000 [40:29<00:22,  1.17it/s]

Error extracting text from https://www.prosieben.de/amp/tv/galileo/videos/2017258-donnerstag-die-macht-der-lobbies-ganze-folge: 404 Client Error: Not Found for url: https://www.prosieben.de/tv/galileo/videos/2017258-donnerstag-die-macht-der-lobbies-ganze-folge


Processing URLs:  98%|█████████▊| 977/1000 [40:30<00:11,  2.00it/s]

Error extracting text from http://thehill.com/homenews/administration/359894-trump-slams-former-us-intel-leaders-as-political-hacks: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/359894-trump-slams-former-us-intel-leaders-as-political-hacks/
Error extracting text from https://www.reuters.com/lifestyle/sports/beijing-2022-winter-games-will-need-spectators-ioc-2021-07-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/beijing-2022-winter-games-will-need-spectators-ioc-2021-07-21/
URL filtered: http://www.bloomberg.com/news/articles/2015-09-21/volkswagen-said-to-be-target-of-u-s-criminal-probe-on-emissions


Processing URLs:  98%|█████████▊| 980/1000 [40:34<00:16,  1.18it/s]

Error extracting text from https://www.nytimes.com/2022/03/25/us/politics/manchin-jackson-supreme-court.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/03/25/us/politics/manchin-jackson-supreme-court.html


Processing URLs:  98%|█████████▊| 981/1000 [40:35<00:17,  1.06it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0TW01D20151213: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0TW01D20151213


Processing URLs:  98%|█████████▊| 982/1000 [40:36<00:13,  1.31it/s]

Error extracting text from https://www.nytimes.com/2017/03/23/us/politics/neil-gorsuch-supreme-court-hearing.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/23/us/politics/neil-gorsuch-supreme-court-hearing.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  98%|█████████▊| 984/1000 [40:39<00:17,  1.12s/it]

Error extracting text from https://reut.rs/3c7MsyE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-china-southchinasea/philippines-says-220-chinese-boats-have-encroached-in-south-china-sea-idUSKBN2BD0DE?il=0


Processing URLs:  99%|█████████▊| 986/1000 [40:40<00:12,  1.09it/s]

Error extracting text from http://www.straitstimes.com/world/middle-east/putin-says-russia-has-no-right-to-ask-syrias-assad-to-leave-power-will-visit-iran: 403 Client Error: Forbidden for url: https://www.straitstimes.com/world/middle-east/putin-says-russia-has-no-right-to-ask-syrias-assad-to-leave-power-will-visit-iran


Processing URLs:  99%|█████████▊| 987/1000 [40:41<00:12,  1.00it/s]

Error extracting text from http://www.wsj.com/articles/eu-extends-sanctions-on-russian-officials-over-ukraine-crisis-1473937849: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-extends-sanctions-on-russian-officials-over-ukraine-crisis-1473937849
Error extracting text from https://www.reuters.com/article/us-aid-innovation-drones/its-a-bird-its-a-plane-its-an-edible-aid-drone-idUSKBN15Z1TG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-aid-innovation-drones/its-a-bird-its-a-plane-its-an-edible-aid-drone-idUSKBN15Z1TG


Processing URLs:  99%|█████████▉| 989/1000 [41:04<01:01,  5.59s/it]

Error extracting text from http://www.asianews.network/content/opinion-will-aung-san-suu-kyi-be-myanmars-next-president-9254: 404 Client Error: Not Found for url: http://asianews.network/content/opinion-will-aung-san-suu-kyi-be-myanmars-next-president-9254


Processing URLs: 100%|█████████▉| 995/1000 [41:21<00:22,  4.43s/it]

Error extracting text from https://www.business-standard.com/article/international/netanyahu-toughens-stand-as-tension-rises-between-israel-and-hamas-121052000176_1.html: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/international/netanyahu-toughens-stand-as-tension-rises-between-israel-and-hamas-121052000176_1.html
URL filtered: http://www.bloomberg.com/news/articles/2016-02-17/iran-supports-oil-producer-pact-without-pledging-supply-curbs?cmpid=wsdemand


Processing URLs: 100%|█████████▉| 998/1000 [41:22<00:04,  2.09s/it]

Error extracting text from http://www.jamestown.org/programs/chinabrief/single/?tx_ttnews[tt_news]=45181&amp;tx_ttnews[backPid]=25&amp;cHash=08d97185db7e22db1968e8920fd767bd#.VuM4EvkrLIU: HTTPConnectionPool(host='www.jamestown.org', port=80): Max retries exceeded with url: /programs/chinabrief/single/?tx_ttnews%5Btt_news%5D=45181&amp;tx_ttnews%5BbackPid%5D=25&amp;cHash=08d97185db7e22db1968e8920fd767bd (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303cafdd0>: Failed to resolve 'www.jamestown.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs: 100%|██████████| 1000/1000 [41:23<00:00,  2.48s/it]


URL filtered: http://www.bloomberg.com/news/videos/2016-08-23/quicktake-corporate-china-s-global-shopping-spree


Processing URLs:   0%|          | 2/1000 [00:04<33:42,  2.03s/it]

Error extracting text from http://www.newsweek.com/uganda-elections-museveni-besigye-and-mbabazi-prepare-face-polls-414541?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/uganda-elections-museveni-besigye-and-mbabazi-prepare-face-polls-414541?rx=us


Processing URLs:   0%|          | 5/1000 [00:07<20:43,  1.25s/it]

Error extracting text from https://www.armscontrolwonk.com/archive/1213011/wasted-opportunities-with-the-jcpoa/: HTTPSConnectionPool(host='www.armscontrolwonk.com', port=443): Max retries exceeded with url: /archive/1213011/wasted-opportunities-with-the-jcpoa/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:   1%|          | 10/1000 [00:16<27:52,  1.69s/it]

Error extracting text from https://www.senate.gov/legislative/LIS/roll_call_lists/roll_call_vote_cfm.cfm?congress=115&amp;session=1&amp;vote=00168: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/LIS/roll_call_lists/roll_call_vote_cfm.cfm?congress=115&amp;session=1&amp;vote=00168


Processing URLs:   1%|▏         | 14/1000 [00:21<21:34,  1.31s/it]

Error extracting text from http://www.reuters.com/video/2017/08/11/saudis-favor-new-york-for-aramco-ipo?videoId=372302106&amp;videoChannel=-13668: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2017/08/11/saudis-favor-new-york-for-aramco-ipo?videoId=372302106&amp;videoChannel=-13668


Processing URLs:   2%|▏         | 16/1000 [00:24<22:56,  1.40s/it]

URL filtered: https://twitter.com/planet4589/status/1389245047829417991


Processing URLs:   2%|▏         | 18/1000 [00:25<16:44,  1.02s/it]

Error extracting text from https://townhall.com/tipsheet/spencerbrown/2021/07/21/house-democrats-block-covid-origins-bill-n2592839: 403 Client Error: Forbidden for url: https://townhall.com/tipsheet/spencerbrown/2021/07/21/house-democrats-block-covid-origins-bill-n2592839


Processing URLs:   2%|▏         | 19/1000 [00:26<13:34,  1.20it/s]

Error extracting text from http://www.wsj.com/articles/putin-criticizes-u-s-policy-in-middle-east-1445533121: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/putin-criticizes-u-s-policy-in-middle-east-1445533121


Processing URLs:   2%|▏         | 23/1000 [00:29<11:27,  1.42it/s]

Error extracting text from http://www.barrons.com/articles/buy-venezuela-bonds-as-regime-change-sanctions-risk-rises-1500939175: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/buy-venezuela-bonds-as-regime-change-sanctions-risk-rises-1500939175


Processing URLs:   3%|▎         | 26/1000 [00:33<18:05,  1.11s/it]

Error extracting text from http://ca.reuters.com/article/topNews/idCAKCN1061T2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:   3%|▎         | 31/1000 [00:47<37:52,  2.34s/it]

Error extracting text from http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?emc=edit_th_20160602&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/02/world/asia/treasury-imposes-sanctions-on-north-korea.html?emc=edit_th_20160602&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:   4%|▎         | 37/1000 [00:56<23:29,  1.46s/it]

Error extracting text from http://thehill.com/homenews/administration/347732-russia-probe-unearths-email-about-attempted-meeting-between-trump: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/347732-russia-probe-unearths-email-about-attempted-meeting-between-trump/


Processing URLs:   4%|▍         | 38/1000 [00:58<23:34,  1.47s/it]

URL filtered: https://www.youtube.com/watch?v=WOvxmuJJYhI


Processing URLs:   4%|▍         | 40/1000 [01:00<22:04,  1.38s/it]

Error extracting text from http://sdc-usa.org/the-us-nuclear-triad/: 404 Client Error: Not Found for url: http://www.sdc-usa.org/the-us-nuclear-triad/


Processing URLs:   4%|▍         | 41/1000 [01:04<30:47,  1.93s/it]

Error extracting text from https://reut.rs/3aSuD6M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   4%|▍         | 44/1000 [01:19<1:09:02,  4.33s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-03/russia-pushing-ahead-with-turkey-gas-link-as-ties-improve


Processing URLs:   5%|▌         | 54/1000 [01:44<41:17,  2.62s/it]  

Error extracting text from https://www.reuters.com/article/us-safrica-politics/zuma-quits-ending-scandal-plagued-term-as-south-african-president-idUSKCN1FY0HM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/zuma-quits-ending-scandal-plagued-term-as-south-african-president-idUSKCN1FY0HM
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-army-idUSKCN0YP0U2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-army-idUSKCN0YP0U2


Processing URLs:   6%|▌         | 57/1000 [01:45<18:56,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-turkey-nato-exclusive-idUSKBN0U123520151218: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-turkey-nato-exclusive-idUSKBN0U123520151218


Processing URLs:   6%|▌         | 60/1000 [01:49<18:56,  1.21s/it]

Error extracting text from http://www.topsecretwriters.com/2016/06/was-nyt-reporter-researching-psychotronic-weapons-murdered/: 503 Server Error: Service Temporarily Unavailable for url: http://www.topsecretwriters.com/2016/06/was-nyt-reporter-researching-psychotronic-weapons-murdered/
URL filtered: https://www.youtube.com/watch?v=brdMWZA8ajs


Processing URLs:   6%|▌         | 62/1000 [01:51<15:03,  1.04it/s]

Error extracting text from https://ec.europa.eu/commission/sites/beta-political/files/draft_withdrawal_agreement_0.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/commission/sites/beta-political/files/draft_withdrawal_agreement_0.pdf


Processing URLs:   6%|▋         | 63/1000 [01:53<20:09,  1.29s/it]

Error extracting text from http://data.unhcr.org/mediterranean/regional.php: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/regional.php


Processing URLs:   6%|▋         | 65/1000 [01:57<24:38,  1.58s/it]

Error extracting text from http://mobile.nytimes.com/2016/04/17/world/middleeast/us-plans-to-step-upmilitary-campaign-against-isis.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/04/17/world/middleeast/us-plans-to-step-upmilitary-campaign-against-isis.html?_r=0&amp;referer=https://www.google.com/


Processing URLs:   7%|▋         | 66/1000 [01:57<21:28,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=NKb9GVU8bHE


Processing URLs:   7%|▋         | 68/1000 [01:58<14:43,  1.05it/s]

Error extracting text from http://uk.reuters.com/article/2015/10/20/uk-venezuela-election-brazil-idUKKCN0SE2R220151020: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   7%|▋         | 69/1000 [02:00<16:33,  1.07s/it]

Error extracting text from https://taarifa.rw/rwanda-soldiers-kill-30-insurgents-in-cabo-delgabo/: 404 Client Error: Not Found for url: https://taarifa.rw/rwanda-soldiers-kill-30-insurgents-in-cabo-delgabo/


Processing URLs:   7%|▋         | 72/1000 [02:02<12:37,  1.22it/s]

Error extracting text from https://www.nytimes.com/2017/06/26/business/google-eu-fine-antitrust.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/26/business/google-eu-fine-antitrust.html


Processing URLs:   8%|▊         | 77/1000 [02:12<26:44,  1.74s/it]

Error extracting text from https://reut.rs/3mr4CgN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   8%|▊         | 79/1000 [02:16<29:23,  1.91s/it]

Error extracting text from http://www.enca.com/south-africa/polls-south-africans-disappointed-by-political-leadership: 404 Client Error: Not Found for url: https://www.enca.com/south-africa/polls-south-africans-disappointed-by-political-leadership


Processing URLs:   8%|▊         | 84/1000 [02:28<32:10,  2.11s/it]

Error extracting text from http://www.reuters.com/article/eu-usa-trade-merkel-idUSB4N10001W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/eu-usa-trade-merkel-idUSB4N10001W


Processing URLs:   9%|▉         | 91/1000 [02:41<31:48,  2.10s/it]

Error extracting text from http://www.stateofthemedia.org/2013/news-magazines-embracing-their-digital-future/news-magazines-by-the-numbers/: 404 Client Error: Not Found for url: https://www.pewresearch.org/journalism/2016/06/15/state-of-the-news-media-2016/2013/news-magazines-embracing-their-digital-future/news-magazines-by-the-numbers/
Error extracting text from http://www.bigstory.ap.org/article/c8d531ec05e0403a90e9d3ec0b8f83c2/ap-investigation-us-power-grid-vulnerable-foreign-hacks: HTTPConnectionPool(host='www.bigstory.ap.org', port=80): Max retries exceeded with url: /article/c8d531ec05e0403a90e9d3ec0b8f83c2/ap-investigation-us-power-grid-vulnerable-foreign-hacks (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30748c9e0>: Failed to resolve 'www.bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 94/1000 [02:43<18:14,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-report-idUSKCN1AU2GH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-report-idUSKCN1AU2GH


Processing URLs:  10%|█         | 100/1000 [02:57<34:02,  2.27s/it]

Error extracting text from https://tass.com/defense/1241159: 502 Server Error: Bad Gateway for url: https://tass.com/defense/1241159


Processing URLs:  10%|█         | 101/1000 [02:59<32:21,  2.16s/it]

URL filtered: http://www.usatoday.com/story/tech/news/2017/03/06/facebook-begins-flagging-disputed-fake-news/98804948/
URL filtered: https://www.bloomberg.com/news/articles/2017-03-15/dollar-treads-water-as-traders-look-for-four-dots-from-fed-plot


Processing URLs:  11%|█         | 106/1000 [03:00<11:16,  1.32it/s]

Error extracting text from https://www.reuters.com/article/us-brazil-politics/bolsonaro-allies-win-control-of-brazilian-congress-idUSKBN2A12WY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics/bolsonaro-allies-win-control-of-brazilian-congress-idUSKBN2A12WY?il=0
Error extracting text from http://www.nytimes.com/2016/02/02/world/europe/us-fortifying-europes-east-to-deter-putin.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/02/world/europe/us-fortifying-europes-east-to-deter-putin.html


Processing URLs:  11%|█         | 111/1000 [03:06<18:45,  1.27s/it]

Error extracting text from http://www.ibtimes.com/us-drops-leaflets-mosul-promising-liberation-2335325: 403 Client Error: Forbidden for url: https://www.ibtimes.com/us-drops-leaflets-mosul-promising-liberation-2335325


Processing URLs:  11%|█▏        | 113/1000 [03:07<12:01,  1.23it/s]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=121121: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=121121


Processing URLs:  12%|█▏        | 119/1000 [03:20<20:26,  1.39s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/pakistan-urges-global-community-look-into-meltdown-afghan-forces-2021-08-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/pakistan-urges-global-community-look-into-meltdown-afghan-forces-2021-08-09/
Error extracting text from http://news.yahoo.com/farc-rebels-shy-colombia-peace-deadline-174934824.html: 404 Client Error: Not Found for url: http://news.yahoo.com/farc-rebels-shy-colombia-peace-deadline-174934824.html


Processing URLs:  12%|█▏        | 121/1000 [03:24<26:11,  1.79s/it]

Error extracting text from http://www.css.ethz.ch/en/services/digital-library/articles/article.html/08f19ddd-d74c-4077-8872-f6c763ec5836: 404 Client Error: Not found UA for url: https://css.ethz.ch/en/services/digital-library/articles/article.html/08f19ddd-d74c-4077-8872-f6c763ec5836


Processing URLs:  13%|█▎        | 129/1000 [03:35<16:34,  1.14s/it]

Error extracting text from http://www.nytimes.com/2015/10/20/us/politics/gop-candidates-leading-charge-in-call-for-syrian-no-fly-zone.html?emc=edit_th_20151020&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/20/us/politics/gop-candidates-leading-charge-in-call-for-syrian-no-fly-zone.html?emc=edit_th_20151020&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  13%|█▎        | 132/1000 [03:41<23:04,  1.60s/it]

Error extracting text from https://cleantechnica.com/2016/10/27/tesla-record-vehicle-production-delivery-revenue-numbers-q3-2016/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/10/27/tesla-record-vehicle-production-delivery-revenue-numbers-q3-2016/
Error extracting text from https://www.reuters.com/world/us-russia-hold-arctic-talks-push-summit-2021-05-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us-russia-hold-arctic-talks-push-summit-2021-05-19/


Processing URLs:  13%|█▎        | 134/1000 [03:42<14:53,  1.03s/it]

URL filtered: https://www.youtube.com/watch?v=4lQ_MjU4QHw


Processing URLs:  14%|█▎        | 136/1000 [03:42<10:32,  1.36it/s]

Error extracting text from https://www.kogonuso.com/jacinda-ardern-prime-minister-of-new-zealand-has-revealed-the-covid-19-vaccination-strategy-for-group-4/: 404 Client Error: Not Found for url: https://www.kogonuso.com/jacinda-ardern-prime-minister-of-new-zealand-has-revealed-the-covid-19-vaccination-strategy-for-group-4/


Processing URLs:  15%|█▍        | 146/1000 [04:03<21:58,  1.54s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1318605/000119312511054847/dex211.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1318605/000119312511054847/dex211.htm


Processing URLs:  15%|█▍        | 148/1000 [04:06<20:34,  1.45s/it]

Error extracting text from http://www.reuters.com/article/us-maryland-serial-case-idUSKCN0VH1FE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-maryland-serial-case-idUSKCN0VH1FE


Processing URLs:  16%|█▌        | 155/1000 [04:18<22:53,  1.62s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0WC2RT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0WC2RT


Processing URLs:  16%|█▌        | 158/1000 [04:22<19:53,  1.42s/it]

Error extracting text from https://ics-cert.us-cert.gov/sites/default/files/Monitors/ICS-CERT%20Monitor_Nov-Dec2015_S508C.pdf: 403 Client Error: Forbidden for url: https://ics-cert.us-cert.gov/sites/default/files/Monitors/ICS-CERT%20Monitor_Nov-Dec2015_S508C.pdf


Processing URLs:  16%|█▌        | 159/1000 [04:22<16:48,  1.20s/it]

Error extracting text from http://www.laht.com/article.asp?CategoryId=10717&amp;ArticleId=450404: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?CategoryId=10717&amp;ArticleId=450404


Processing URLs:  16%|█▌        | 161/1000 [04:26<21:30,  1.54s/it]

Error extracting text from https://sscaitournament.com/index.php?action=scoresCompetitive: 523 Server Error:  for url: https://sscaitournament.com/index.php?action=scoresCompetitive
URL filtered: http://www.bloomberg.com/news/articles/2015-11-27/china-explosives-maker-prices-ipo-as-share-sales-resume


Processing URLs:  16%|█▋        | 165/1000 [04:29<11:45,  1.18it/s]

Error extracting text from http://mobile.nytimes.com/2015/10/16/world/ukraine-and-japan-among-latest-members-of-un-security-council.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/10/16/world/ukraine-and-japan-among-latest-members-of-un-security-council.html?_r=0&amp;referer=https://www.google.com/
Error extracting text from https://www.cpomagazine.com/2017/11/15/hackers-help-make-voting-machines-safe-again/: 403 Client Error: Forbidden for url: https://www.cpomagazine.com/2017/11/15/hackers-help-make-voting-machines-safe-again/


Processing URLs:  17%|█▋        | 166/1000 [04:30<12:38,  1.10it/s]

Error extracting text from https://www.hkupop.hku.hk/english/report/pai2016_hk01/content/freq.html: HTTPSConnectionPool(host='www.hkupop.hku.hk', port=443): Max retries exceeded with url: /english/report/pai2016_hk01/content/freq.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2ff932b70>: Failed to resolve 'www.hkupop.hku.hk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  17%|█▋        | 169/1000 [04:33<16:27,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-un-election-bulgaria-idUSKCN11Y0OO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-bulgaria-idUSKCN11Y0OO


Processing URLs:  17%|█▋        | 172/1000 [04:35<11:17,  1.22it/s]

URL filtered: https://www.youtube.com/c/USGOWeb


Processing URLs:  18%|█▊        | 175/1000 [04:38<12:34,  1.09it/s]

URL filtered: https://twitter.com/HasanSari7/status/826408963730243585


Processing URLs:  18%|█▊        | 177/1000 [04:38<08:35,  1.60it/s]

Error extracting text from https://www.nytimes.com/2018/01/10/us/politics/canada-us-tariffs-wto.html?rref=collection%2Fnewseventcollection%2FThe%20Trump%20White%20House&amp;action=click&amp;contentCollection=Politics&amp;module=Collection&amp;region=Marginalia&amp;src=me&amp;version=newsevent&amp;pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/10/us/politics/canada-us-tariffs-wto.html?rref=collection%2Fnewseventcollection%2FThe%20Trump%20White%20House&amp;action=click&amp;contentCollection=Politics&amp;module=Collection&amp;region=Marginalia&amp;src=me&amp;version=newsevent&amp;pgtype=article


Processing URLs:  18%|█▊        | 178/1000 [04:40<11:31,  1.19it/s]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKBN13A2I9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  18%|█▊        | 182/1000 [04:46<16:59,  1.25s/it]

Error extracting text from http://saarc-sec.org/calendar: 404 Client Error: Not Found for url: https://saarc-sec.org/calendar


Processing URLs:  18%|█▊        | 184/1000 [04:55<36:06,  2.65s/it]

Error extracting text from http://www.leftlanenews.com/report-vw-top-brass-aware-of-emissions-investigation-in-2014-91001.html#ixzz40HYrwD5R: HTTPSConnectionPool(host='www.leftlanenews.com', port=443): Max retries exceeded with url: /report-vw-top-brass-aware-of-emissions-investigation-in-2014-91001.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  19%|█▉        | 188/1000 [05:02<30:07,  2.23s/it]

Error extracting text from https://cns-snc.ca/media/uploads/GF_CNS__Presentation_for_web_2013_06_06.pdf: 404 Client Error: Not Found for url: https://cns-snc.ca/media/uploads/GF_CNS__Presentation_for_web_2013_06_06.pdf


Processing URLs:  19%|█▉        | 193/1000 [05:09<15:59,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/brazils-speaker-eduardo-cunha-fights-for-his-political-life-1444728611: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-speaker-eduardo-cunha-fights-for-his-political-life-1444728611
URL filtered: https://www.bloomberg.com/news/articles/2021-07-07/covid-origins-mirrors-sars-introduction-from-animals-study-says
Error extracting text from http://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YO0AD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-usa-idUSKCN0YO0AD


Processing URLs:  20%|█▉        | 197/1000 [05:49<2:18:04, 10.32s/it]

Error extracting text from http://m.weeklystandard.com/articles/handicapping-iowa_1059432.html: 522 Server Error:  for url: http://m.weeklystandard.com/articles/handicapping-iowa_1059432.html


Processing URLs:  20%|██        | 201/1000 [05:53<41:33,  3.12s/it]  

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdomarioflavio.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdomarioflavio.com.br/&amp;prev=search


Processing URLs:  20%|██        | 204/1000 [05:57<24:37,  1.86s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/01/14/New-aid-convoy-enters-hunger-stricken-Madaya-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/01/14/New-aid-convoy-enters-hunger-stricken-Madaya-.html


Processing URLs:  21%|██        | 206/1000 [05:58<16:41,  1.26s/it]

Error extracting text from http://www.reuters.com/article/us-apec-vietnam-idUSKCN18H02B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apec-vietnam-idUSKCN18H02B
URL filtered: https://www.politico.com/news/2021/04/06/state-department-boycott-2https://www.linkedin.com/in/cherifgamra/022-beijing-olympics-479337


Processing URLs:  21%|██        | 211/1000 [06:03<11:26,  1.15it/s]

Error extracting text from http://www.nytimes.com/2010/09/26/realestate/26scapes.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2010/09/26/realestate/26scapes.html?_r=0


Processing URLs:  21%|██▏       | 214/1000 [06:07<13:26,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-oil-insight-idUSKBN15O2BC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-insight-idUSKBN15O2BC


Processing URLs:  22%|██▏       | 216/1000 [06:09<11:26,  1.14it/s]

Error extracting text from http://www.nytimes.com/2016/01/10/opinion/sunday/how-crazy-are-the-north-koreans.html?ref=topics&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/10/opinion/sunday/how-crazy-are-the-north-koreans.html?ref=topics&amp;_r=0


Processing URLs:  22%|██▏       | 218/1000 [06:11<12:59,  1.00it/s]

Error extracting text from http://www.reuters.com/article/us-usa-congress-budget-idUSKBN19E263: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-budget-idUSKBN19E263


Processing URLs:  22%|██▏       | 224/1000 [06:16<08:26,  1.53it/s]

Error extracting text from http://www.autonews.com/article/20160218/COPY01/302189978/volvo-sees-record-sales-higher-earnings-in-2016-after-strong-2015: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20160218/COPY01/302189978/volvo-sees-record-sales-higher-earnings-in-2016-after-strong-2015
Error extracting text from http://www.vanguardngr.com/2016/06/blame-buhari-for-resurgence-of-militia-groups-in-n-delta-s-east-pdp-youths/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/06/blame-buhari-for-resurgence-of-militia-groups-in-n-delta-s-east-pdp-youths/


Processing URLs:  23%|██▎       | 226/1000 [06:18<11:05,  1.16it/s]

Error extracting text from http://www.nytimes.com/2016/01/17/world/middleeast/iran-releases-washington-post-reporter-jason-rezaian.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/17/world/middleeast/iran-releases-washington-post-reporter-jason-rezaian.html


Processing URLs:  23%|██▎       | 227/1000 [06:19<09:05,  1.42it/s]

Error extracting text from http://www.scotsman.com/news/politics/indyref2-poll-finds-support-for-independence-at-high-of-46-1-4392203: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/indyref2-poll-finds-support-for-independence-at-high-of-46-1-4392203


Processing URLs:  24%|██▎       | 235/1000 [06:30<17:51,  1.40s/it]

Error extracting text from http://www.ibtimes.co.uk/eu-referendum-read-david-camerons-letter-european-council-president-donald-tusk-full-1528057: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/eu-referendum-read-david-camerons-letter-european-council-president-donald-tusk-full-1528057


Processing URLs:  24%|██▍       | 245/1000 [06:56<23:59,  1.91s/it]

Error extracting text from http://www.newsweek.com/dr-congo-third-term-fears-mount-court-clears-kabila-remain-power-if-elections-459107: 403 Client Error: Forbidden for url: https://www.newsweek.com/dr-congo-third-term-fears-mount-court-clears-kabila-remain-power-if-elections-459107


Processing URLs:  25%|██▍       | 247/1000 [06:57<16:03,  1.28s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/myanmar-risks-coming-standstill-violence-worsens-un-envoy-2021-04-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/myanmar-risks-coming-standstill-violence-worsens-un-envoy-2021-04-30/


Processing URLs:  25%|██▌       | 252/1000 [07:04<16:43,  1.34s/it]

Error extracting text from http://www.imdb.com/title/tt4359416/?ref_=nv_sr_4: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt4359416/?ref_=nv_sr_4
URL filtered: http://www.bloomberg.com/news/articles/2015-10-07/yen-advances-as-bank-of-japan-refrains-from-adding-to-stimulus


Processing URLs:  26%|██▌       | 257/1000 [07:10<17:53,  1.45s/it]

URL filtered: http://www.bloombergview.com/articles/2016-01-28/shipping-news-says-world-economy-is-toast


Processing URLs:  26%|██▌       | 260/1000 [07:18<29:14,  2.37s/it]

Error extracting text from http://english.farsnews.com/newstext.aspx?nn=13940803001143: HTTPConnectionPool(host='english.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940803001143 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051da0c0>: Failed to resolve 'english.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.wsj.com/articles/theranos-has-struggled-with-blood-tests-1444881901: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/theranos-has-struggled-with-blood-tests-1444881901


Processing URLs:  26%|██▋       | 264/1000 [07:21<15:31,  1.27s/it]

Error extracting text from http://thehill.com/blogs/defcon-hill/policy-strategy/188911-russia-continues-weapons-shipments-to-syria: 403 Client Error: Forbidden for url: https://thehill.com/blogs/defcon-hill/policy-strategy/188911-russia-continues-weapons-shipments-to-syria/
Error extracting text from http://www.financialexpress.com/article/economy/rcep-talks-to-stretch-longer-on-differences-over-services/219805/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/economy/rcep-talks-to-stretch-longer-on-differences-over-services/219805/


Processing URLs:  27%|██▋       | 266/1000 [07:23<15:51,  1.30s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-02/rousseff-hands-allies-more-power-to-strengthen-brazil-presidency


Processing URLs:  27%|██▋       | 268/1000 [07:24<10:34,  1.15it/s]

Error extracting text from https://www.nytimes.com/2021/04/09/business/economy/amazon-labor-unions.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/09/business/economy/amazon-labor-unions.html


Processing URLs:  27%|██▋       | 271/1000 [07:28<14:21,  1.18s/it]

Error extracting text from http://nationalinterest.org/feature/facing-the-middle-easts-hardliner-problem-14899: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/facing-the-middle-easts-hardliner-problem-14899


Processing URLs:  27%|██▋       | 274/1000 [07:32<15:12,  1.26s/it]

Error extracting text from https://www.nytimes.com/2017/04/06/us/politics/jared-kushner-russians-security-clearance.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/06/us/politics/jared-kushner-russians-security-clearance.html


Processing URLs:  28%|██▊       | 275/1000 [07:34<17:00,  1.41s/it]

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-congress/u-s-house-to-vote-on-non-nuclear-iran-sanctions-next-week-idUSKBN1CP2A7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-congress/u-s-house-to-vote-on-non-nuclear-iran-sanctions-next-week-idUSKBN1CP2A7


Processing URLs:  28%|██▊       | 278/1000 [07:37<13:41,  1.14s/it]

Error extracting text from http://env-health.org/news/latest-news/article/eu-takes-poland-to-court-over-air: 403 Client Error: Forbidden for url: http://env-health.org/news/latest-news/article/eu-takes-poland-to-court-over-air


Processing URLs:  28%|██▊       | 280/1000 [07:40<16:00,  1.33s/it]

Error extracting text from https://www.confidencial.com.ni/english/political-division-gives-ortega-an-additional-advantage-are-there-any-other-options/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/english/political-division-gives-ortega-an-additional-advantage-are-there-any-other-options/


Processing URLs:  28%|██▊       | 283/1000 [07:46<19:12,  1.61s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-27/diabetes-breakthrough-nears-with-medtronic-s-artificial-pancreas


Processing URLs:  28%|██▊       | 285/1000 [07:47<12:24,  1.04s/it]

Error extracting text from https://wikileaks.org/podesta-emails/emailid/36703: 403 Client Error: Forbidden for url: https://wikileaks.org/podesta-emails/emailid/36703


Processing URLs:  29%|██▊       | 287/1000 [07:48<10:30,  1.13it/s]

Error extracting text from http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html?emc=edit_th_20151220&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html?emc=edit_th_20151220&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  29%|██▉       | 288/1000 [07:51<15:31,  1.31s/it]

Error extracting text from http://hir.harvard.edu/brexit-will-good-european-integration/: 404 Client Error: Not Found for url: https://hir.harvard.edu/brexit-will-good-european-integration/


Processing URLs:  29%|██▉       | 289/1000 [07:51<11:57,  1.01s/it]

Error extracting text from https://www.nytimes.com/2021/10/05/us/politics/debt-ceiling-filibuster-biden.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/10/05/us/politics/debt-ceiling-filibuster-biden.html


Processing URLs:  29%|██▉       | 293/1000 [07:56<18:06,  1.54s/it]

URL filtered: https://www.nytimes.com/2018/11/14/technology/facebook-data-russia-election-racism.html?module=inline


Processing URLs:  30%|██▉       | 295/1000 [07:57<12:34,  1.07s/it]

Error extracting text from https://reut.rs/3jpnvB2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-boe/bank-of-england-says-banks-need-6-months-for-any-sub-zero-rates-idUSKBN2A41P8


Processing URLs:  30%|██▉       | 297/1000 [08:00<14:18,  1.22s/it]

Error extracting text from http://abcnews.go.com/Politics/clinton-vaults-double-digit-lead-boosted-broad-disapproval/story: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/clinton-vaults-double-digit-lead-boosted-broad-disapproval/story


Processing URLs:  30%|███       | 302/1000 [08:16<23:09,  1.99s/it]

URL filtered: https://www.youtube.com/watch?v=QffT-_i5erk


Processing URLs:  31%|███       | 306/1000 [08:19<11:40,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-california-shooting-poll-idUSKBN0TU2AW20151211: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-california-shooting-poll-idUSKBN0TU2AW20151211


Processing URLs:  31%|███       | 308/1000 [08:20<10:47,  1.07it/s]

Error extracting text from https://www.reuters.com/article/us-gilead-sciences-fda/u-s-fda-approves-gilead-cancer-gene-therapy-price-set-at-373000-idUSKBN1CN35H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-gilead-sciences-fda/u-s-fda-approves-gilead-cancer-gene-therapy-price-set-at-373000-idUSKBN1CN35H


Processing URLs:  31%|███       | 310/1000 [08:22<09:19,  1.23it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN10210Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN10210Z


Processing URLs:  31%|███▏      | 314/1000 [08:24<06:54,  1.66it/s]

Error extracting text from http://www.nytimes.com/reuters/2016/12/13/world/asia/13reuters-southchinasea-usa.html?ref=world: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/12/13/world/asia/13reuters-southchinasea-usa.html?ref=world


Processing URLs:  32%|███▏      | 320/1000 [08:29<07:04,  1.60it/s]

Error extracting text from http://www.reuters.com/article/us-burundi-rwanda-congodemocratic-idUSKCN0VD2H2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-rwanda-congodemocratic-idUSKCN0VD2H2
Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15M2DU?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-idUSKBN15M2DU?il=0


Processing URLs:  32%|███▏      | 322/1000 [08:31<09:21,  1.21it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-19/fischer-says-fed-has-tried-not-to-surprise-ems-with-rate-hike


Processing URLs:  32%|███▎      | 325/1000 [08:33<09:15,  1.22it/s]

Error extracting text from http://www.theworldbeast.com/dutch-jets-to-bomb-is-in-syria.html: 406 Client Error: Not Acceptable for url: http://www.theworldbeast.com/dutch-jets-to-bomb-is-in-syria.html


Processing URLs:  33%|███▎      | 326/1000 [08:43<33:46,  3.01s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-04-18/oil-steady-as-industry-data-shows-pace-of-u-s-supply-drop-slows


Processing URLs:  33%|███▎      | 331/1000 [08:50<19:45,  1.77s/it]

Error extracting text from https://www.macrotrends.net/2016/10-year-treasury-bond-rate-yield-chart: 403 Client Error: Forbidden for url: https://www.macrotrends.net/2016/10-year-treasury-bond-rate-yield-chart
Error extracting text from https://www.reuters.com/article/us-italy-fivestar-euro/italys-5-star-says-euro-referendum-is-last-resort-idUSKCN1BE0HY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-fivestar-euro/italys-5-star-says-euro-referendum-is-last-resort-idUSKCN1BE0HY


Processing URLs:  33%|███▎      | 334/1000 [08:53<15:45,  1.42s/it]

Error extracting text from http://www.sfgate.com/business/article/Waymo-self-driving-minivan-has-different-type-of-12188714.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/business/article/Waymo-self-driving-minivan-has-different-type-of-12188714.php
URL filtered: https://www.linkedin.com/in/kizzy-bailey-01b60b4


Processing URLs:  35%|███▍      | 348/1000 [09:14<11:43,  1.08s/it]

Error extracting text from https://www.wsj.com/articles/trump-threatened-to-kill-the-at-t-time-warner-deal-but-its-very-much-alive-1502998077: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-threatened-to-kill-the-at-t-time-warner-deal-but-its-very-much-alive-1502998077


Processing URLs:  35%|███▌      | 354/1000 [09:21<09:29,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN17D1XL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN17D1XL


Processing URLs:  36%|███▌      | 355/1000 [09:21<08:29,  1.27it/s]

Error extracting text from http://thehill.com/policy/transportation/259246-house-approves-325b-highway-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/transportation/259246-house-approves-325b-highway-bill/


Processing URLs:  36%|███▌      | 356/1000 [09:22<07:07,  1.51it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-idUSKCN12J22H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-idUSKCN12J22H


Processing URLs:  36%|███▌      | 357/1000 [09:23<09:07,  1.18it/s]

Error extracting text from http://www.theinsider.ug/nkurunziza-men-shot-dead-as-un-weighs-3000-police/: 406 Client Error: Not Acceptable for url: http://www.theinsider.ug/nkurunziza-men-shot-dead-as-un-weighs-3000-police/


Processing URLs:  36%|███▌      | 359/1000 [09:24<06:02,  1.77it/s]

Error extracting text from https://www.opendemocracy.net/valerie-hopkins/montenegro-mafia-state-in-eu-neighbourhood: 403 Client Error: Forbidden for url: https://www.opendemocracy.net/valerie-hopkins/montenegro-mafia-state-in-eu-neighbourhood


Processing URLs:  36%|███▌      | 362/1000 [09:32<26:30,  2.49s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-04-21/china-s-electric-car-subsidy-fraud-casts-doubt-on-surging-demand


Processing URLs:  37%|███▋      | 366/1000 [09:43<29:12,  2.76s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-18/maduro-fails-to-reassure-investors-venezuela-can-avoid-default


Processing URLs:  37%|███▋      | 371/1000 [09:48<14:21,  1.37s/it]

Error extracting text from http://news.yahoo.com/iran-ready-prisoner-swap-us-rouhani-195606437.html: 404 Client Error: Not Found for url: http://news.yahoo.com/iran-ready-prisoner-swap-us-rouhani-195606437.html


Processing URLs:  37%|███▋      | 372/1000 [09:49<13:52,  1.33s/it]

Error extracting text from http://www.bradford.ac.uk/research-old/ijas/ijasno2/Georgis.html: 404 Client Error: Not Found for url: http://www.bradford.ac.uk/research-old/ijas/ijasno2/Georgis.html


Processing URLs:  37%|███▋      | 374/1000 [09:52<13:26,  1.29s/it]

Error extracting text from http://www.nytimes.com/2016/04/25/world/middleeast/yemeni-troops-backed-by-united-arab-emirates-take-city-from-al-qaeda.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/25/world/middleeast/yemeni-troops-backed-by-united-arab-emirates-take-city-from-al-qaeda.html


Processing URLs:  38%|███▊      | 380/1000 [09:56<04:55,  2.10it/s]

Error extracting text from https://www.nytimes.com/live/2021/01/29/us/biden-trump-impeachment/biden-wants-republicans-to-back-a-1-9-trillion-relief-package-but-some-democrats-are-happy-to-go-it-alone: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/01/29/us/biden-trump-impeachment/biden-wants-republicans-to-back-a-1-9-trillion-relief-package-but-some-democrats-are-happy-to-go-it-alone
URL filtered: https://twitter.com/adamparsons/status/1342223035558985731?s=21
Error extracting text from https://www.reuters.com/article/uk-britain-scotland-poll/scottish-nationalists-set-for-record-majority-boosting-independence-push-idUSKBN29J13Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-scotland-poll/scottish-nationalists-set-for-record-majority-boosting-independence-push-idUSKBN29J13Y


Processing URLs:  38%|███▊      | 381/1000 [09:56<04:48,  2.14it/s]

Error extracting text from https://seekingalpha.com/article/4463214-amazon-time-to-spin-off-aws: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4463214-amazon-time-to-spin-off-aws
Error extracting text from http://www.visualcapitalist.com/the-lithium-ion-megafactories-are-coming-chart/: 403 Client Error: Forbidden for url: http://www.visualcapitalist.com/the-lithium-ion-megafactories-are-coming-chart/


Processing URLs:  39%|███▊      | 386/1000 [10:01<08:19,  1.23it/s]

Error extracting text from http://www.wsj.com/articles/wsj-survey-economists-see-2015-gdp-growth-at-3-1421345963: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/wsj-survey-economists-see-2015-gdp-growth-at-3-1421345963


Processing URLs:  39%|███▉      | 388/1000 [10:06<17:40,  1.73s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/641968/military-strikes-target-isil-terrorists-in-syria-iraq: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/641968/military-strikes-target-isil-terrorists-in-syria-iraq


Processing URLs:  39%|███▉      | 392/1000 [10:10<11:41,  1.15s/it]

Error extracting text from https://www.middleeastmonitor.com/articles/middle-east/20697-sisis-attempts-to-rehabilitate-assad-a-different-tune-to-the-gulf: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/articles/middle-east/20697-sisis-attempts-to-rehabilitate-assad-a-different-tune-to-the-gulf


Processing URLs:  39%|███▉      | 393/1000 [10:11<11:09,  1.10s/it]

Error extracting text from http://www.foxnews.com/world/2015/10/11/iran-says-verdict-reached-in-case-washington-post-reporter-jason-rezaian/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/10/11/iran-says-verdict-reached-in-case-washington-post-reporter-jason-rezaian/


Processing URLs:  40%|███▉      | 398/1000 [10:21<13:24,  1.34s/it]

Error extracting text from http://leadercall.com/2016/02/counter-islamic-state-nations-approve-new-military-plan-and/: 404 Client Error: Not Found for url: http://leadercall.com/2016/02/counter-islamic-state-nations-approve-new-military-plan-and/


Processing URLs:  40%|████      | 402/1000 [10:32<19:23,  1.95s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/iraq-heat-wave-sends-temperatures-53-degrees-celsius-40729019: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iraq-heat-wave-sends-temperatures-53-degrees-celsius-40729019


Processing URLs:  40%|████      | 404/1000 [10:34<14:39,  1.48s/it]

Error extracting text from https://www.congress.gov/bill/117th-congress/house-bill/1319: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/117th-congress/house-bill/1319


Processing URLs:  40%|████      | 405/1000 [10:35<12:03,  1.22s/it]

Error extracting text from http://thehill.com/blogs/floor-action/senate/263709-senate-sends-18-trillion-deal-to-obama: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/263709-senate-sends-18-trillion-deal-to-obama/
URL filtered: https://www.youtube.com/watch?v=1PGm8LslEb4


Processing URLs:  41%|████      | 407/1000 [10:36<09:10,  1.08it/s]

Error extracting text from http://legalinsurrection.com/2016/03/us-eases-and-imposes-iran-sanctions-on-same-day/: 403 Client Error: Forbidden for url: http://legalinsurrection.com/2016/03/us-eases-and-imposes-iran-sanctions-on-same-day/


Processing URLs:  41%|████      | 408/1000 [10:36<07:26,  1.33it/s]

Error extracting text from http://www.wsj.com/articles/venezuelas-maduro-wont-give-up-power-1447626102: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-maduro-wont-give-up-power-1447626102


Processing URLs:  41%|████      | 409/1000 [10:38<10:28,  1.06s/it]

Error extracting text from http://www.ibtimes.com/bank-japan-introduces-negative-interest-rates-boost-growth-2285400: 403 Client Error: Forbidden for url: https://www.ibtimes.com/bank-japan-introduces-negative-interest-rates-boost-growth-2285400


Processing URLs:  41%|████▏     | 413/1000 [10:46<20:28,  2.09s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/fed-pivots-toward-december-rate-rise-amid-moderate-u-s-growth


Processing URLs:  42%|████▏     | 415/1000 [10:48<17:08,  1.76s/it]

Error extracting text from http://www.9news.com/story/news/local/politics/2015/10/13/democratic-presidential-debate-poll-bernie-sanders-hillary-clinton/73893908/: 503 Server Error: Service Unavailable for url: https://www.9news.com/story/news/local/politics/2015/10/13/democratic-presidential-debate-poll-bernie-sanders-hillary-clinton/73893908/


Processing URLs:  42%|████▏     | 417/1000 [10:49<11:26,  1.18s/it]

Error extracting text from http://www.nytimes.com/2016/04/01/world/africa/retired-rwanda-politician-dies-while-jailed-in-burundi-as-a-spy.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/01/world/africa/retired-rwanda-politician-dies-while-jailed-in-burundi-as-a-spy.html?_r=0


Processing URLs:  42%|████▏     | 421/1000 [10:54<12:20,  1.28s/it]

Error extracting text from https://www.eumonitor.eu/9353000/1/j9vvik7m1c3gyxp/viy85f9hbsyp).: 404 Client Error: File Not Found for url: https://www.eumonitor.eu/9353000/1/j9vvik7m1c3gyxp/viy85f9hbsyp).


Processing URLs:  42%|████▏     | 422/1000 [10:57<15:07,  1.57s/it]

Error extracting text from http://www.thejc.com/news/uk-news/145823/george-osborne-under-fire-over-plan-lead-trade-delegation-iran: 404 Client Error: Not Found for url: https://www.thejc.com/404


Processing URLs:  42%|████▎     | 425/1000 [11:02<17:46,  1.85s/it]

Error extracting text from https://www.multistate.com/state-resources/legislative-session-deadlines: 404 Client Error: Not Found for url: https://www.multistate.us/state-resources/legislative-session-deadlines


Processing URLs:  43%|████▎     | 430/1000 [12:08<3:01:55, 19.15s/it]

Error extracting text from http://www.miamiherald.com/opinion/op-ed/article58250848.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  43%|████▎     | 432/1000 [12:10<1:33:16,  9.85s/it]

Error extracting text from http://www.reuters.com/article/2015/09/22/us-mideast-crisis-gulf-russia-idUSKCN0RM1JX20150922: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/22/us-mideast-crisis-gulf-russia-idUSKCN0RM1JX20150922


Processing URLs:  43%|████▎     | 434/1000 [12:12<51:34,  5.47s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-12-06/venezuelans-to-vote-in-polls-seen-handing-congress-to-opposition


Processing URLs:  44%|████▎     | 437/1000 [12:15<26:08,  2.79s/it]

Error extracting text from https://www.dhs.gov/ntas/advisory/national-terrorism-advisory-system-bulletin-august-13-2021: 403 Client Error: Forbidden for url: https://www.dhs.gov/ntas/advisory/national-terrorism-advisory-system-bulletin-august-13-2021


Processing URLs:  44%|████▍     | 438/1000 [12:15<19:51,  2.12s/it]

Error extracting text from https://www.nytimes.com/2017/02/10/opinion/a-game-plan-for-senate-democrats.html?emc=edit_th_20170210&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/10/opinion/a-game-plan-for-senate-democrats.html?emc=edit_th_20170210&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  44%|████▍     | 442/1000 [12:23<17:35,  1.89s/it]

Error extracting text from https://www.onpe.gob.pe/: 403 Client Error: Forbidden for url: https://www.onpe.gob.pe/


Processing URLs:  44%|████▍     | 444/1000 [12:26<12:42,  1.37s/it]

Error extracting text from http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/08/us/politics/us-formally-accuses-russia-of-stealing-dnc-emails.html?_r=0


Processing URLs:  45%|████▍     | 446/1000 [12:26<07:50,  1.18it/s]

Error extracting text from https://www.timesofisrael.com/liveblog-february-7-2018/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/liveblog-february-7-2018/


Processing URLs:  45%|████▍     | 447/1000 [12:28<10:42,  1.16s/it]

Error extracting text from http://allthingsnuclear.org/lgrego/update-on-possible-iranian-simorgh-rocket-launch: 403 Client Error: Forbidden for url: https://blog.ucsusa.org/lgrego/update-on-possible-iranian-simorgh-rocket-launch


Processing URLs:  45%|████▍     | 449/1000 [12:30<10:10,  1.11s/it]

Error extracting text from http://thehill.com/homenews/senate/346312-senate-chairman-hopes-to-wrap-up-russia-investigation-this-year: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/346312-senate-chairman-hopes-to-wrap-up-russia-investigation-this-year/


Processing URLs:  45%|████▌     | 450/1000 [12:31<07:41,  1.19it/s]

Error extracting text from http://www.wsj.com/articles/venezuelans-mobilize-for-vote-to-recall-president-nicolas-maduro-1472504412: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelans-mobilize-for-vote-to-recall-president-nicolas-maduro-1472504412


Processing URLs:  45%|████▌     | 454/1000 [12:38<13:40,  1.50s/it]

Error extracting text from https://bit.ly/3b4f9vP: 403 Client Error: Forbidden for url: https://capx.co/why-the-salmond-saga-threatens-sturgeon-and-the-separatist-cause/?omhide=true&utm_source=newsletter&utm_medium=email&utm_campaign=01/03/21


Processing URLs:  46%|████▌     | 456/1000 [12:39<09:00,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16N1IK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16N1IK
URL filtered: https://twitter.com/PARLYapp/status/1343623320357498888?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1343892291279314949%7Ctwgr%5E%7Ctwcon%5Es3_&amp;ref_url=https%3A%2F%2Fwww.bbc.co.uk%2Fnews%2Fuk-politics-55474148


Processing URLs:  46%|████▌     | 459/1000 [12:41<07:27,  1.21it/s]

Error extracting text from https://www.carbonbrief.org/cop25-key-outcomes-agreed-at-the-un-climate-talks-in-madrid): 403 Client Error: Forbidden for url: https://www.carbonbrief.org/cop25-key-outcomes-agreed-at-the-un-climate-talks-in-madrid)


Processing URLs:  46%|████▌     | 462/1000 [12:43<06:37,  1.35it/s]

Error extracting text from https://www.wsj.com/articles/canada-names-proud-boys-a-terrorist-group-11612376710: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/canada-names-proud-boys-a-terrorist-group-11612376710


Processing URLs:  47%|████▋     | 473/1000 [12:59<10:35,  1.21s/it]

Error extracting text from http://www.reuters.com/article/2015/09/28/us-usa-fed-dudley-idUSKCN0RS1PA20150928: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/28/us-usa-fed-dudley-idUSKCN0RS1PA20150928
URL filtered: http://www.bloomberg.com/news/articles/2016-03-24/u-s-charges-iranian-hackers-in-wall-street-cyberattacks-im6b43tt


Processing URLs:  48%|████▊     | 478/1000 [13:04<10:58,  1.26s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/epa-chief-carbon-dioxide-not-primary-cause-of-warming/articleshow/57561378.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/epa-chief-carbon-dioxide-not-primary-cause-of-warming/articleshow/57561378.cms


Processing URLs:  48%|████▊     | 480/1000 [13:06<09:25,  1.09s/it]

Error extracting text from http://www.businessinsider.com/r-vw-investigation-shows-many-bosses-knew-of-manipulation-newspaper-2016-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-vw-investigation-shows-many-bosses-knew-of-manipulation-newspaper-2016-1


Processing URLs:  48%|████▊     | 481/1000 [13:06<07:14,  1.19it/s]

Error extracting text from https://go.bsimm.com/hubfs/BSIMM/BSIMM6.pdf: HTTPSConnectionPool(host='go.bsimm.com', port=443): Max retries exceeded with url: /hubfs/BSIMM/BSIMM6.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x301ab5820>: Failed to resolve 'go.bsimm.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 483/1000 [13:08<08:02,  1.07it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/01/03/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/03/national/politics-diplomacy/abe-may-visit-u-s-canada-europe-ahead-mays-g-7-summit-mie-prefecture/
URL filtered: http://www.bloomberg.com/news/articles/2016-03-19/assad-foes-seek-purge-of-security-officials-in-proposal-to-un


Processing URLs:  49%|████▊     | 487/1000 [13:31<37:35,  4.40s/it]

Error extracting text from http://www.nytimes.com/2016/05/13/world/americas/michel-temer-brazils-interim-president-may-herald-shift-to-the-right.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/13/world/americas/michel-temer-brazils-interim-president-may-herald-shift-to-the-right.html
URL filtered: https://www.youtube.com/watch?v=ynOXkKKI6jM


Processing URLs:  50%|████▉     | 497/1000 [13:50<08:45,  1.04s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-22/china-slams-western-democracy-as-flawed-as-trump-takes-office
Error extracting text from http://www.reuters.com/article/us-asean-summit-idUSKBN17S0D3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-summit-idUSKBN17S0D3


Processing URLs:  50%|█████     | 501/1000 [13:54<07:26,  1.12it/s]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0003491615003243: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0003491615003243


Processing URLs:  50%|█████     | 502/1000 [13:56<09:03,  1.09s/it]

Error extracting text from http://yangon.coconuts.co/2016/02/18/if-suu-kyis-sons-take-myanmar-citizenship-their-mother-could-be-president-outgoing: 403 Client Error: Forbidden for url: https://coconuts.co/yangon/2016/02/18/if-suu-kyis-sons-take-myanmar-citizenship-their-mother-could-be-president-outgoing


Processing URLs:  50%|█████     | 503/1000 [13:57<08:12,  1.01it/s]

Error extracting text from https://coronavirus.upenn.edu/content/dashboard: 404 Client Error: Not Found for url: https://wellness.upenn.edu/content/dashboard
URL filtered: http://www.bloomberg.com/news/articles/2015-12-10/1mdb-fix-leaves-oil-as-malaysia-debt-risk-with-najib-digging-in


Processing URLs:  51%|█████     | 508/1000 [14:03<08:49,  1.08s/it]

Error extracting text from http://zeenews.india.com/india/indus-water-treaty-next-round-of-indo-pak-talks-at-world-bank-to-begin-today-2042175.html: 403 Client Error: Forbidden for url: https://zeenews.india.com/india/indus-water-treaty-next-round-of-indo-pak-talks-at-world-bank-to-begin-today-2042175.html


Processing URLs:  51%|█████     | 510/1000 [14:04<05:22,  1.52it/s]

Error extracting text from https://www.cnbc.com/2019/01/18/white-house-next-trump-summit-with-kim-jong-un-will-take-place-near-the-end-of-february.html: 403 Client Error: Forbidden for url: https://www.cnbc.com/2019/01/18/white-house-next-trump-summit-with-kim-jong-un-will-take-place-near-the-end-of-february.html
Error extracting text from http://www.reuters.com/article/us-eu-usa-trade-idUSKCN0VB2AK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-trade-idUSKCN0VB2AK


Processing URLs:  51%|█████     | 511/1000 [14:05<07:27,  1.09it/s]

Error extracting text from http://cleantechnica.com/2016/03/19/chinas-electric-cars-sales-boom-to-continue-sales-to-double-in-2016-china-minister-states/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/03/19/chinas-electric-cars-sales-boom-to-continue-sales-to-double-in-2016-china-minister-states/


Processing URLs:  52%|█████▏    | 516/1000 [14:15<15:58,  1.98s/it]

URL filtered: https://twitter.com/Gab_H_R/status/1472620240232226823


Processing URLs:  52%|█████▏    | 519/1000 [14:18<12:32,  1.56s/it]

URL filtered: http://www.businessinsider.com/twitter-is-banning-all-ads-from-russian-news-agencies-rt-and-sputnik-2017-10


Processing URLs:  52%|█████▏    | 523/1000 [14:21<08:21,  1.05s/it]

Error extracting text from http://voiceherald.com/2015/11/29/kosovo-opposition-holds-rally-to-protest-against.html: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=voiceherald.com


Processing URLs:  52%|█████▎    | 525/1000 [14:23<07:00,  1.13it/s]

Error extracting text from https://www.nytimes.com/2021/03/25/world/europe/boris-johnson-vaccine-brexit.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/25/world/europe/boris-johnson-vaccine-brexit.html


Processing URLs:  53%|█████▎    | 530/1000 [14:37<16:30,  2.11s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0WM0XK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0WM0XK
URL filtered: http://www.bloomberg.com/news/articles/2015-09-21/putin-and-netanyahu-divided-over-threat-from-crisis-in-syria


Processing URLs:  53%|█████▎    | 533/1000 [14:40<11:52,  1.53s/it]

Error extracting text from https://www.uscis.gov/ilink/docView/SLB/HTML/SLB/0-0-0-1/0-0-0-29/0-0-0-2006.html: 403 Client Error: Forbidden for url: https://www.uscis.gov/ilink/docView/SLB/HTML/SLB/0-0-0-1/0-0-0-29/0-0-0-2006.html
URL filtered: https://www.bloomberg.com/news/articles/2017-08-22/zuma-tightens-grip-as-south-africa-ruling-party-censures-rebels
Error extracting text from http://blog.dilbert.com/post/138125409321/trump-fox-news-and-megyn-kelly-explained-master: HTTPConnectionPool(host='blog.dilbert.com', port=80): Max retries exceeded with url: /post/138125409321/trump-fox-news-and-megyn-kelly-explained-master (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fce1e0>: Failed to resolve 'blog.dilbert.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: http://www.bloomberg.com/news/articles/2016-01-28/malaysia-trims-growth-expectations-as-oil-drop-weighs-on-economy


Processing URLs:  54%|█████▍    | 540/1000 [14:45<07:10,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/fed-minutes-december-could-well-be-time-for-rate-hike-1447873643: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-minutes-december-could-well-be-time-for-rate-hike-1447873643


Processing URLs:  54%|█████▍    | 542/1000 [14:48<09:30,  1.25s/it]

Error extracting text from https://www.reuters.com/article/us-alphabet-russia/russia-to-act-against-google-if-sputnik-rt-get-lower-search-rankings-official-idUSKBN1DL2MA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-alphabet-russia/russia-to-act-against-google-if-sputnik-rt-get-lower-search-rankings-official-idUSKBN1DL2MA


Processing URLs:  55%|█████▍    | 546/1000 [14:52<07:03,  1.07it/s]

Error extracting text from http://www.straitstimes.com/asia/east-asia/us-provocations-inflame-south-china-sea-tensions-china-daily: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  55%|█████▍    | 547/1000 [15:01<23:08,  3.07s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-27/pdvsa-s-sweetened-swap-bolsters-bonds-as-default-seen-deferred


Processing URLs:  55%|█████▌    | 550/1000 [15:03<12:47,  1.70s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Brazil-s-Rousseff-cancels-Japan-trip-as-budget-crisis-worsens: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Brazil-s-Rousseff-cancels-Japan-trip-as-budget-crisis-worsens


Processing URLs:  56%|█████▌    | 555/1000 [15:34<48:55,  6.60s/it]

Error extracting text from https://www.almasdarnews.com/article/pictures-syrias-assad-makes-daring-visit-ghouta-frontline/: 522 Server Error:  for url: https://www.almasdarnews.com/article/pictures-syrias-assad-makes-daring-visit-ghouta-frontline/


Processing URLs:  56%|█████▌    | 558/1000 [15:53<43:23,  5.89s/it]

Error extracting text from https://www.latimes.com/world-nation/story/2020-11-15/biden-administration-latin-america-foreign-policy: 403 Client Error: Forbidden for url: https://www.latimes.com/world-nation/story/2020-11-15/biden-administration-latin-america-foreign-policy


Processing URLs:  56%|█████▋    | 564/1000 [15:59<12:20,  1.70s/it]

Error extracting text from http://www.nytimes.com/2015/11/15/world/europe/attacks-in-paris-add-urgency-to-talks-on-ending-syria-war.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/15/world/europe/attacks-in-paris-add-urgency-to-talks-on-ending-syria-war.html


Processing URLs:  57%|█████▋    | 567/1000 [16:04<12:31,  1.74s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-01-10/why-china-is-falling-out-with-australia-and-allies-quicktake


Processing URLs:  57%|█████▋    | 570/1000 [16:05<07:08,  1.00it/s]

Error extracting text from https://aibirds.org/: HTTPSConnectionPool(host='aibirds.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  57%|█████▋    | 571/1000 [16:15<22:43,  3.18s/it]

Error extracting text from https://www.washingtonpost.com/politics/courts_law/release-set-for-next-week-for-convicted-spy-jonathan-pollard/2015/11/14/1a43b194-8aa9-11e5-bd91-d385b244482f_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/courts_law/release-set-for-next-week-for-convicted-spy-jonathan-pollard/2015/11/14/1a43b194-8aa9-11e5-bd91-d385b244482f_story.html


Processing URLs:  57%|█████▋    | 574/1000 [16:18<12:24,  1.75s/it]

Error extracting text from https://www.reuters.tv/v/$ta/2017/08/06/venezuela-crushes-military-rebellion: HTTPSConnectionPool(host='www.reuters.tv', port=443): Max retries exceeded with url: /v/$ta/2017/08/06/venezuela-crushes-military-rebellion (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x306323830>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 576/1000 [16:21<11:23,  1.61s/it]

Error extracting text from http://uk.reuters.com/article/iran-oil-production-idUKL8N1IO20M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  58%|█████▊    | 579/1000 [16:37<35:54,  5.12s/it]

Error extracting text from http://www.trackpersia.com/unprecedented-death-toll-iranians-syria/: HTTPConnectionPool(host='www.trackpersia.com', port=80): Max retries exceeded with url: /unprecedented-death-toll-iranians-syria/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306188980>: Failed to resolve 'www.trackpersia.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  58%|█████▊    | 580/1000 [16:40<31:03,  4.44s/it]

URL filtered: https://www.bloomberg.com/news/articles/2022-03-10/goldman-sees-euro-area-economy-shrinking-inflation-close-to-8#:~:text=Goldman%20Sachs%20predicts%20euro%2Darea,to%20increase%202.2%25%20in%202023.)


Processing URLs:  58%|█████▊    | 582/1000 [16:43<21:14,  3.05s/it]

URL filtered: https://twitter.com/SamirBennis/status/921477268534329345


Processing URLs:  59%|█████▊    | 587/1000 [16:49<11:50,  1.72s/it]

Error extracting text from http://www.peruviantimes.com/28/vargas-llosa-calls-keiko-fujimoris-lead-for-president-troubling/26280/: 406 Client Error: Not Acceptable for url: http://www.peruviantimes.com/28/vargas-llosa-calls-keiko-fujimoris-lead-for-president-troubling/26280/


Processing URLs:  59%|█████▉    | 588/1000 [16:49<09:09,  1.33s/it]

Error extracting text from http://www.wsj.com/articles/u-k-lawmaker-injured-after-incident-in-parliamentary-constituency-1466085309: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-lawmaker-injured-after-incident-in-parliamentary-constituency-1466085309


Processing URLs:  59%|█████▉    | 589/1000 [16:51<09:21,  1.37s/it]

Error extracting text from http://www.msf.org/en/article/20160720-nigeria-health-disaster-borno-state: 404 Client Error: Not Found for url: https://www.msf.org/en/article/20160720-nigeria-health-disaster-borno-state


Processing URLs:  59%|█████▉    | 593/1000 [16:57<12:03,  1.78s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/froman-says-eu-nations-need-to-persuade-public-of-trade-benefits


Processing URLs:  60%|█████▉    | 599/1000 [17:22<18:14,  2.73s/it]

Error extracting text from http://www.reuters.com/article/us-nigeria-famine-aid-idUSKBN18J2HN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-famine-aid-idUSKBN18J2HN


Processing URLs:  60%|██████    | 603/1000 [17:37<23:25,  3.54s/it]

Error extracting text from https://blogs.intralinks.com/2017/03/ma-volatility-ahead-north-america/#: HTTPSConnectionPool(host='blogs.intralinks.com', port=443): Max retries exceeded with url: /2017/03/ma-volatility-ahead-north-america/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blogs.intralinks.com'. (_ssl.c:1000)")))


Processing URLs:  60%|██████    | 605/1000 [17:42<18:57,  2.88s/it]

Error extracting text from http://cleantechnica.com/2015/10/31/samsung-opens-lithium-ion-battery-factory-in-china/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2015/10/31/samsung-opens-lithium-ion-battery-factory-in-china/


Processing URLs:  61%|██████    | 609/1000 [17:47<08:43,  1.34s/it]

Error extracting text from https://www.yahoo.com/news/m/ce114731-9654-3554-82e1-7c98a98c7ff6/ss_analyst%3A-preparations-needed.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/m/ce114731-9654-3554-82e1-7c98a98c7ff6/ss_analyst%3A-preparations-needed.html


Processing URLs:  61%|██████▏   | 613/1000 [17:55<10:10,  1.58s/it]

Error extracting text from http://tass.ru/en/world/876182: 404 Client Error: Not Found for url: https://tass.ru/en/world/876182
Error extracting text from https://www.reuters.com/business/us-housing-starts-drop-sharply-april-2021-05-18/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/us-housing-starts-drop-sharply-april-2021-05-18/


Processing URLs:  62%|██████▏   | 616/1000 [17:57<05:45,  1.11it/s]

Error extracting text from http://thehill.com/homenews/campaign/361555-rnc-nrsc-have-no-plans-to-back-moore-after-trump-threw-support-behind-him: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361555-rnc-nrsc-have-no-plans-to-back-moore-after-trump-threw-support-behind-him/
Error extracting text from https://www.unicef.org/emergencies/index_95476.html: 403 Client Error: Forbidden for url: https://www.unicef.org/emergencies/index_95476.html


Processing URLs:  62%|██████▏   | 619/1000 [18:11<21:27,  3.38s/it]

URL filtered: http://www.reuters.com/article/2015/09/21/us-iran-nuclear-iaea-idUSKCN0RL0Z020150921?feedType=RSS&amp;feedName=topNews&amp;utm_source=twitter


Processing URLs:  62%|██████▏   | 623/1000 [18:14<09:53,  1.58s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55285: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55285


Processing URLs:  62%|██████▎   | 625/1000 [18:16<07:57,  1.27s/it]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/1675108879x0x874449/945B9CF5-86DA-4C35-B03C-4892824F058D/Q4_15_Tesla_Update_Letter.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/1675108879x0x874449/945B9CF5-86DA-4C35-B03C-4892824F058D/Q4_15_Tesla_Update_Letter.pdf


Processing URLs:  63%|██████▎   | 627/1000 [18:19<08:36,  1.38s/it]

Error extracting text from http://www.yardeni.com/pub/sp500corrbear.pdf: 403 Client Error: Forbidden for url: https://yardeni.com/our-charts/


Processing URLs:  63%|██████▎   | 630/1000 [18:22<05:58,  1.03it/s]

Error extracting text from http://www.japantimes.co.jp/news/2017/02/01/asia-pacific/politics-diplomacy-asia-pacific/ex-u-n-chief-ban-ki-moon-abandons-bid-south-korean-presidency/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2017/02/01/asia-pacific/politics-diplomacy-asia-pacific/ex-u-n-chief-ban-ki-moon-abandons-bid-south-korean-presidency/


Processing URLs:  63%|██████▎   | 632/1000 [18:28<10:47,  1.76s/it]

Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN0ZP0ID: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN0ZP0ID


Processing URLs:  63%|██████▎   | 634/1000 [18:29<06:21,  1.04s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/opinions_123947.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_123947.htm?selectedLocale=en
Error extracting text from http://www.reuters.com/article/us-iran-usa-rouhani-idUSKCN0T118K20151112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-rouhani-idUSKCN0T118K20151112


Processing URLs:  64%|██████▎   | 636/1000 [18:31<06:18,  1.04s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/08/killer-facts-the-scale-of-the-global-arms-trade/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/08/killer-facts-the-scale-of-the-global-arms-trade/


Processing URLs:  64%|██████▎   | 637/1000 [18:31<04:53,  1.24it/s]

Error extracting text from https://www.nytimes.com/2017/05/13/world/asia/north-korea-missile-test-kim-jong-un-moon-jae-in.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/13/world/asia/north-korea-missile-test-kim-jong-un-moon-jae-in.html


Processing URLs:  64%|██████▍   | 640/1000 [18:34<05:58,  1.00it/s]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://mapa.vemprarua.net/&amp;usg=ALkJrhhipXjZ-_c6CPCt0dmLb_Nmgc3bAQ: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://mapa.vemprarua.net/&amp;usg=ALkJrhhipXjZ-_c6CPCt0dmLb_Nmgc3bAQ


Processing URLs:  64%|██████▍   | 645/1000 [18:46<10:00,  1.69s/it]

Error extracting text from https://www.thenation.com/article/trump-is-playing-a-fascinating-game-with-nafta-negotiations/: 404 Client Error: Not Found for url: https://www.thenation.com/article/trump-is-playing-a-fascinating-game-with-nafta-negotiations/


Processing URLs:  65%|██████▍   | 646/1000 [18:52<17:07,  2.90s/it]

Error extracting text from http://www.frankolsonproject.org/Articles/Mulholland.html: 404 Client Error: Not Found for url: https://frankolsonproject.com/Articles/Mulholland.html


Processing URLs:  65%|██████▍   | 648/1000 [18:54<11:19,  1.93s/it]

URL filtered: https://www.youtube.com/watch?v=IJRaUY3XU88


Processing URLs:  65%|██████▌   | 651/1000 [18:56<07:05,  1.22s/it]

Error extracting text from https://bit.ly/3sE6Ah0: 403 Client Error: Forbidden for url: https://www.edinburghnews.scotsman.com/news/opinion/columnists/scottish-elections-2021-could-snp-and-greens-form-a-coalition-at-holyrood-after-may-6-ian-swanson-3206157


Processing URLs:  65%|██████▌   | 652/1000 [18:56<05:36,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-security-eu-idUSKCN0ZV2NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-eu-idUSKCN0ZV2NE


Processing URLs:  66%|██████▌   | 656/1000 [19:01<07:05,  1.24s/it]

Error extracting text from http://www.reuters.com/article/us-iran-election-idUSKCN0VR0QA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-idUSKCN0VR0QA


Processing URLs:  66%|██████▌   | 662/1000 [19:06<05:23,  1.04it/s]

Error extracting text from https://finance.yahoo.com/news/weekly-jobless-claims-week-ended-october-16-2021-193313259.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/weekly-jobless-claims-week-ended-october-16-2021-193313259.html


Processing URLs:  66%|██████▋   | 663/1000 [19:07<05:29,  1.02it/s]

Error extracting text from http://de.news-front.info/2016/01/11/putin-wird-an-munchner-sicherheitskonferenz-nicht-teilnehmen/: HTTPConnectionPool(host='de.news-front.info', port=80): Max retries exceeded with url: /2016/01/11/putin-wird-an-munchner-sicherheitskonferenz-nicht-teilnehmen/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fece07d0>: Failed to resolve 'de.news-front.info' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  66%|██████▋   | 665/1000 [19:08<04:09,  1.34it/s]

Error extracting text from http://www.reuters.com/article/us-illinois-budget-idUSKBN19Q2P1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-illinois-budget-idUSKBN19Q2P1


Processing URLs:  67%|██████▋   | 672/1000 [19:17<06:50,  1.25s/it]

Error extracting text from http://www.veteranstoday.com/2014/11/13/aegis-fail-in-black-sea-ruskies-burn-down-uss-donald-duck/: 404 Client Error: Not Found for url: https://veteranstoday.com/2014/11/13/aegis-fail-in-black-sea-ruskies-burn-down-uss-donald-duck/
Error extracting text from http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html?ref=dealbook&amp;_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/25/business/dealbook/thepotential-criminal-consequences-for-volkswagen.html?ref=dealbook&amp;_r=1


Processing URLs:  67%|██████▋   | 673/1000 [19:17<05:19,  1.02it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/oct/21/surge-illegal-children-families-accelerates/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/oct/21/surge-illegal-children-families-accelerates/


Processing URLs:  68%|██████▊   | 676/1000 [19:21<05:54,  1.09s/it]

Error extracting text from http://carnegieeurope.eu/strategiceurope/?fa=63458: 403 Client Error: Forbidden for url: http://carnegieeurope.eu/strategiceurope/?fa=63458


Processing URLs:  68%|██████▊   | 677/1000 [19:21<04:30,  1.19it/s]

Error extracting text from http://adage.com/article/media/time-ceo-mergers-year/297116/: 403 Client Error: Forbidden for url: https://adage.com/article/media/time-ceo-mergers-year/297116/


Processing URLs:  68%|██████▊   | 678/1000 [19:23<06:04,  1.13s/it]

Error extracting text from https://www.justsecurity.org/39409/money-russia-cyprus-trump-teams-odd-business-dealings/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/39409/money-russia-cyprus-trump-teams-odd-business-dealings/


Processing URLs:  68%|██████▊   | 681/1000 [19:26<05:18,  1.00it/s]

Error extracting text from http://www.get-top-news.com/news-11747412.html: 404 Client Error: Not Found for url: https://www.get-top-news.com/news-11747412.html
URL filtered: https://www.bloomberg.com/news/articles/2021-04-22/first-mideast-bitcoin-etf-aims-to-raise-more-than-200-million


Processing URLs:  68%|██████▊   | 683/1000 [19:27<04:55,  1.07it/s]

Error extracting text from http://en.trend.az/iran/politics/2431225.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2431225.html
URL filtered: http://www.bloomberg.com/news/articles/2016-05-24/poland-says-it-s-near-deal-with-eu-to-solve-rule-of-law-dispute


Processing URLs:  68%|██████▊   | 685/1000 [19:30<05:11,  1.01it/s]

Error extracting text from http://media.spglobal.com/documents/SPGlobal_Ratings_Article_13+April+2017_Annual+Corporate+Default+Study+and+Rating+Transitions.pdf: 404 Client Error: Not Found for url: https://www.spglobal.com/_redirects/media.spglobal.com/documents/SPGlobal_Ratings_Article_13+April+2017_Annual+Corporate+Default+Study+and+Rating+Transitions.pdf


Processing URLs:  69%|██████▉   | 689/1000 [19:37<06:26,  1.24s/it]

Error extracting text from https://www.wsj.com/articles/u-k-inflation-exceeds-central-bank-target-1490091370: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-inflation-exceeds-central-bank-target-1490091370
Error extracting text from http://www.panarmenian.net/rus/news/229952/: 403 Client Error: Forbidden for url: http://www.panarmenian.net/rus/news/229952/


Processing URLs:  69%|██████▉   | 690/1000 [19:37<04:58,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-g20-turkey-russia-japan-idUSKCN0T518F20151116: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-g20-turkey-russia-japan-idUSKCN0T518F20151116


Processing URLs:  69%|██████▉   | 691/1000 [19:40<07:23,  1.43s/it]

Error extracting text from http://cookpolitical.com/senate/charts/race-ratings: 404 Client Error: Not Found for url: https://www.cookpolitical.com/senate/charts/race-ratings


Processing URLs:  69%|██████▉   | 693/1000 [19:42<05:35,  1.09s/it]

Error extracting text from http://observers.france24.com/en/20160429-burundi-crisis-victims-deaths-protests-enfants: 403 Client Error: Forbidden for url: http://observers.france24.com/en/20160429-burundi-crisis-victims-deaths-protests-enfants


Processing URLs:  69%|██████▉   | 694/1000 [19:43<05:34,  1.09s/it]

Error extracting text from http://www.businessinsider.com/r-britains-out-campaign-leads-by-2-percent-points-ahead-of-eu-referendum-icm-poll-2016-3?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-britains-out-campaign-leads-by-2-percent-points-ahead-of-eu-referendum-icm-poll-2016-3?IR=T


Processing URLs:  70%|██████▉   | 695/1000 [19:46<08:05,  1.59s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Jul-19/362767-iran-hard-liners-gain-power-in-backlash-against-rouhani.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Jul-19/362767-iran-hard-liners-gain-power-in-backlash-against-rouhani.ashx


Processing URLs:  70%|███████   | 703/1000 [19:53<04:35,  1.08it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-03/tesla-misses-deliveries-forecast-due-to-steep-production-ramp


Processing URLs:  70%|███████   | 705/1000 [19:53<02:47,  1.76it/s]

Error extracting text from https://www.wsj.com/articles/tillerson-calls-for-tighter-sanctions-against-north-korea-1489741089: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/tillerson-calls-for-tighter-sanctions-against-north-korea-1489741089


Processing URLs:  71%|███████   | 709/1000 [20:00<05:01,  1.04s/it]

Error extracting text from https://www.theguardian.com/world/2021/mar/12/israel-bombed-a-dozen-ships-carrying-iranian-oil-or-weapons-report).: 404 Client Error: Not Found for url: https://www.theguardian.com/world/2021/mar/12/israel-bombed-a-dozen-ships-carrying-iranian-oil-or-weapons-report).


Processing URLs:  71%|███████   | 711/1000 [20:02<03:56,  1.22it/s]

Error extracting text from https://medium.com/@evopsychgoogle/a-critique-of-rushton-and-templers-2012-paper-b334ed8db5ae: 403 Client Error: Forbidden for url: https://medium.com/@evopsychgoogle/a-critique-of-rushton-and-templers-2012-paper-b334ed8db5ae


Processing URLs:  72%|███████▏  | 716/1000 [20:17<08:42,  1.84s/it]

Error extracting text from https://www.newsweek.com/biden-wont-hold-back-jen-psaki-shuts-down-putin-spox-who-says-navalny-wont-come-summit-1600105: 403 Client Error: Forbidden for url: https://www.newsweek.com/biden-wont-hold-back-jen-psaki-shuts-down-putin-spox-who-says-navalny-wont-come-summit-1600105


Processing URLs:  72%|███████▏  | 721/1000 [20:24<05:57,  1.28s/it]

Error extracting text from http://www.ibtimes.com/apple-inc-aapl-q1-2016-earnings-preview-iphone-sales-may-post-first-year-over-year-2278629: 403 Client Error: Forbidden for url: https://www.ibtimes.com/apple-inc-aapl-q1-2016-earnings-preview-iphone-sales-may-post-first-year-over-year-2278629
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKBN16V1QN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKBN16V1QN


Processing URLs:  72%|███████▏  | 722/1000 [20:25<05:47,  1.25s/it]

Error extracting text from http://www.latimes.com/nation/politics/trailguide/la-na-trailguide-10212015-htmlstory.html?update=84773963: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/politics/trailguide/la-na-trailguide-10212015-htmlstory.html?update=84773963


Processing URLs:  72%|███████▏  | 724/1000 [20:30<08:15,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-usa-court-garland-idUSKCN0WM0LV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-garland-idUSKCN0WM0LV
URL filtered: https://www.youtube.com/watch?v=lmSC52Npuq0


Processing URLs:  73%|███████▎  | 727/1000 [20:31<04:12,  1.08it/s]

URL filtered: https://www.facebook.com/GeezMedia/posts/834083590524583


Processing URLs:  73%|███████▎  | 730/1000 [20:34<04:48,  1.07s/it]

Error extracting text from http://www.economist.com/blogs/graphicdetail/2016/05/britain-s-eu-referendum: 404 Client Error: Not Found for url: https://www.economist.com/blogs/graphicdetail/2016/05/britain-s-eu-referendum


Processing URLs:  73%|███████▎  | 732/1000 [20:36<04:20,  1.03it/s]

Error extracting text from http://www.berkeley.edu/: 403 Client Error: Forbidden for url: http://www.berkeley.edu/


Processing URLs:  73%|███████▎  | 733/1000 [20:39<06:10,  1.39s/it]

Error extracting text from http://www.stripes.com/news/us/under-trump-gop-to-give-space-weapons-close-look-1.440657: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/us/under-trump-gop-to-give-space-weapons-close-look-1.440657


Processing URLs:  74%|███████▎  | 737/1000 [20:47<07:32,  1.72s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-20/france-says-we-want-our-money-back-as-brexit-talks-crawl-on


Processing URLs:  74%|███████▍  | 739/1000 [20:48<04:40,  1.07s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/lawmaker-news/341826-free-kevin-hassett-trumps-pick-a-victim-of-democratic: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/lawmaker-news/341826-free-kevin-hassett-trumps-pick-a-victim-of-democratic/


Processing URLs:  74%|███████▍  | 743/1000 [20:54<04:34,  1.07s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=sr&amp;u=http://www.b92.net/mobilni/info/1168037&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=sr&amp;u=http://www.b92.net/mobilni/info/1168037&amp;prev=search


Processing URLs:  74%|███████▍  | 744/1000 [20:54<03:31,  1.21it/s]

Error extracting text from http://www.wsj.com/articles/saudi-arabia-iran-talks-needed-for-syria-breakthrough-says-united-nations-envoy-1441900196: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabia-iran-talks-needed-for-syria-breakthrough-says-united-nations-envoy-1441900196


Processing URLs:  74%|███████▍  | 745/1000 [20:55<03:32,  1.20it/s]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Abe-Park-briefly-discuss-planned-3-way-summit-with-Chinese-premier: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Abe-Park-briefly-discuss-planned-3-way-summit-with-Chinese-premier


Processing URLs:  75%|███████▍  | 746/1000 [21:12<23:15,  5.49s/it]

Error extracting text from https://www.investopedia.com/news/mcafee-tracker-predicts-1-bitcoin1m-2020/: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/news/mcafee-tracker-predicts-1-bitcoin1m-2020/


Processing URLs:  75%|███████▌  | 751/1000 [21:21<09:54,  2.39s/it]

Error extracting text from https://www.stripes.com/news/middle-east/marines-rekindling-old-afghan-relations-in-helmand-province-1.475412#.WV-2uDMfnq1: 404 Client Error: Not Found for url: https://www.stripes.com/theaters/middle_east/marines-rekindling-old-afghan-relations-in-helmand-province-1.475412#.WV-2uDMfnq1


Processing URLs:  76%|███████▌  | 756/1000 [21:28<05:24,  1.33s/it]

Error extracting text from http://www.chinapost.com.tw/china/china-business/2016/05/24/467070/Incentives-main.htm: HTTPConnectionPool(host='www.chinapost.com.tw', port=80): Max retries exceeded with url: /china/china-business/2016/05/24/467070/Incentives-main.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fdf5d670>: Failed to resolve 'www.chinapost.com.tw' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▋  | 764/1000 [21:36<03:37,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-france-submarines-india-australia-idUSKCN10Z04G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-submarines-india-australia-idUSKCN10Z04G


Processing URLs:  77%|███████▋  | 772/1000 [21:48<05:59,  1.57s/it]

Error extracting text from http://www.reuters.com/article/2015/10/31/us-mideast-crisis-syria-idUSKCN0SP0EA20151031: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/31/us-mideast-crisis-syria-idUSKCN0SP0EA20151031


Processing URLs:  78%|███████▊  | 776/1000 [21:53<04:31,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-iran-usa-military-idUSKBN15J0BM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-military-idUSKBN15J0BM


Processing URLs:  78%|███████▊  | 783/1000 [22:03<05:03,  1.40s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-16/brazil-police-tapped-rousseff-discussing-lula-nomination-globo


Processing URLs:  79%|███████▊  | 787/1000 [22:04<02:23,  1.49it/s]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2019/04/15/eu-set-to-enhance-cross-border-access-to-online-content/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2019/04/15/eu-set-to-enhance-cross-border-access-to-online-content/


Processing URLs:  79%|███████▉  | 788/1000 [22:07<04:25,  1.25s/it]

Error extracting text from http://www.kdmid.ru/docs.aspx: 404 Client Error: Not Found for url: https://www.kdmid.ru/docs.aspx


Processing URLs:  79%|███████▉  | 791/1000 [22:17<08:27,  2.43s/it]

Error extracting text from http://aan.af/2hKNy62: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/jihadi-commuters-how-the-taleban-cross-the-durand-line/


Processing URLs:  80%|███████▉  | 796/1000 [22:40<08:42,  2.56s/it]

Error extracting text from https://www.nytimes.com/2021/01/07/world/americas/what-is-a-coup-attempt.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/07/world/americas/what-is-a-coup-attempt.html


Processing URLs:  80%|████████  | 800/1000 [22:49<07:13,  2.17s/it]

Error extracting text from http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html?_r=0


Processing URLs:  80%|████████  | 801/1000 [22:50<06:14,  1.88s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1018724/000101872421000004/amzn-20201231.htm#i75de98b9097f40f3b5884e541f532421_163: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1018724/000101872421000004/amzn-20201231.htm#i75de98b9097f40f3b5884e541f532421_163


Processing URLs:  80%|████████  | 802/1000 [22:50<04:34,  1.38s/it]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2021/06/21/myanmar-burma-third-round-of-eu-sanctions-over-the-military-coup-and-subsequent-repression/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2021/06/21/myanmar-burma-third-round-of-eu-sanctions-over-the-military-coup-and-subsequent-repression/


Processing URLs:  81%|████████  | 810/1000 [23:57<10:05,  3.19s/it]

Error extracting text from https://www.reuters.com/world/us/us-senate-democrats-plan-debt-limit-vote-after-biden-hints-filibuster-could-go-2021-10-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-senate-democrats-plan-debt-limit-vote-after-biden-hints-filibuster-could-go-2021-10-06/


Processing URLs:  81%|████████▏ | 813/1000 [23:59<05:09,  1.66s/it]

Error extracting text from http://asia.nikkei.com/Features/China-up-close/Xi-warns-Obama-over-THAAD-missile-system?page=2: 404 Client Error: Not Found for url: https://asia.nikkei.com/Features/China-up-close/Xi-warns-Obama-over-THAAD-missile-system?page=2


Processing URLs:  82%|████████▏ | 822/1000 [24:11<02:40,  1.11it/s]

Error extracting text from http://www.wsj.com/articles/imf-cuts-2016-u-s-economic-growth-forecast-to-2-2-1466602332: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/imf-cuts-2016-u-s-economic-growth-forecast-to-2-2-1466602332
Error extracting text from https://www.washingtontimes.com/news/2022/feb/26/world-leaders-ban-russia-swift-cutting-country-fin/: 403 Client Error: Forbidden for url: https://www.washingtontimes.com/news/2022/feb/26/world-leaders-ban-russia-swift-cutting-country-fin/


Processing URLs:  82%|████████▎ | 825/1000 [24:18<05:54,  2.03s/it]

URL filtered: https://www.buzzfeed.com/kevincollier/twitter-was-warned-repeatedly-about-this-fake-account-run?utm_term=.tj956gXoZ#.qakKpYPbg


Processing URLs:  83%|████████▎ | 830/1000 [24:23<03:38,  1.28s/it]

Error extracting text from https://www.niaid.nih.gov/research/antivirals: 403 Client Error: Forbidden for url: https://www.niaid.nih.gov/research/antivirals
URL filtered: https://twitter.com/AP/status/846508300023857153


Processing URLs:  83%|████████▎ | 833/1000 [25:29<45:06, 16.20s/it]

Error extracting text from http://renegociacionsoberana.mppef.gob.ve/assets/files/ingles.pdf: HTTPConnectionPool(host='renegociacionsoberana.mppef.gob.ve', port=80): Max retries exceeded with url: /assets/files/ingles.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fbded1f0>, 'Connection to renegociacionsoberana.mppef.gob.ve timed out. (connect timeout=60)'))


Processing URLs:  84%|████████▎ | 835/1000 [25:30<25:13,  9.18s/it]

Error extracting text from http://www.reuters.com/article/pdvsa-debt-idUSL1N1DM1OK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/pdvsa-debt-idUSL1N1DM1OK


Processing URLs:  84%|████████▍ | 839/1000 [25:37<10:00,  3.73s/it]

Error extracting text from http://data.ers.usda.gov/reports.aspx?ID=10633#P17eff3964e9c48fd80e15cec752eb385_10_586iT21R0x0: 500 Server Error: Internal Server Error for url: https://data.ers.usda.gov/reports.aspx?ID=10633#P17eff3964e9c48fd80e15cec752eb385_10_586iT21R0x0


Processing URLs:  84%|████████▍ | 840/1000 [25:37<07:10,  2.69s/it]

Error extracting text from http://www.nytimes.com/2015/12/05/business/energy-environment/opec-meeting-oil-production-price.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/05/business/energy-environment/opec-meeting-oil-production-price.html?_r=0


Processing URLs:  84%|████████▍ | 844/1000 [25:43<03:42,  1.43s/it]

Error extracting text from https://www.nytimes.com/2017/03/11/us/politics/republican-health-law-repeal-strategy.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/11/us/politics/republican-health-law-repeal-strategy.html?_r=0
URL filtered: https://mobile.twitter.com/realDonaldTrump/status/1343068648118874113


Processing URLs:  85%|████████▍ | 847/1000 [25:45<02:18,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-eu-idUSKCN0W61V1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-eu-idUSKCN0W61V1


Processing URLs:  85%|████████▍ | 848/1000 [25:47<03:10,  1.25s/it]

Error extracting text from http://eurodialogue.eu/Putin-Erdogan-Alliance%20-can-Turn-into-Political-Reality: HTTPConnectionPool(host='eurodialogue.eu', port=80): Max retries exceeded with url: /Putin-Erdogan-Alliance%20-can-Turn-into-Political-Reality (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe657080>: Failed to resolve 'eurodialogue.eu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▌ | 857/1000 [25:56<02:33,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-russia-diplomat-gulen-idUSKBN1482A0?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-russia-diplomat-gulen-idUSKBN1482A0?il=0
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O6K2696JTSEO01-4GPS1JQ37FMNJ44C4464JI6OA0


Processing URLs:  86%|████████▌ | 861/1000 [26:02<03:18,  1.43s/it]

Error extracting text from http://news.xinhuanet.com/english/china/2013-11/23/c_132912145.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/china/2013-11/23/c_132912145.htm


Processing URLs:  86%|████████▋ | 864/1000 [26:07<03:55,  1.73s/it]

Error extracting text from http://indy100.independent.co.uk/article/how-many-afghan-civilians-have-died-in-13-years-of-war--lkcwu0y6Le: 503 Server Error: Service Unavailable for url: http://indy100.independent.co.uk/article/how-many-afghan-civilians-have-died-in-13-years-of-war--lkcwu0y6Le


Processing URLs:  86%|████████▋ | 865/1000 [26:09<04:04,  1.81s/it]

Error extracting text from http://www.ibtimes.com/volkswagen-diesel-emissions-scandal-update-2015-spain-opens-criminal-probe-against-2160156: 403 Client Error: Forbidden for url: https://www.ibtimes.com/volkswagen-diesel-emissions-scandal-update-2015-spain-opens-criminal-probe-against-2160156


Processing URLs:  87%|████████▋ | 866/1000 [26:10<03:41,  1.65s/it]

Error extracting text from http://www.wsj.com/articles/clinton-foundation-provides-details-on-canadian-donation-1430092540: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/clinton-foundation-provides-details-on-canadian-donation-1430092540


Processing URLs:  87%|████████▋ | 868/1000 [26:11<02:01,  1.09it/s]

Error extracting text from http://www.nrttv.com/EN/birura-details.aspx?Jimare=3598: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/birura-details.aspx?Jimare=3598


Processing URLs:  87%|████████▋ | 871/1000 [26:14<02:08,  1.00it/s]

URL filtered: http://www.bloomberg.com/politics/videos/2016-04-27/president-barack-obama-charlie-rose
URL filtered: https://twitter.com/michaelroston/status/1473415065978257410?t=1UmH46zC5sfuPONwr_HBdg&amp;s=19


Processing URLs:  87%|████████▋ | 874/1000 [26:15<01:30,  1.39it/s]

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-2021-hopes-for-snp-majority-continue-to-fade-as-more-support-slips-away-shows-poll-3209589: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-hopes-for-snp-majority-continue-to-fade-as-more-support-slips-away-shows-poll-3209589


Processing URLs:  88%|████████▊ | 879/1000 [26:20<01:46,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-iran-arms-russia-idUSKCN0VJ0MN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-arms-russia-idUSKCN0VJ0MN


Processing URLs:  88%|████████▊ | 880/1000 [27:21<34:32, 17.27s/it]

Error extracting text from http://www.charlotteobserver.com/news/politics-government/election/article111163942.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  88%|████████▊ | 881/1000 [27:21<24:42, 12.46s/it]

Error extracting text from http://www.reuters.com/article/2015/09/14/us-usa-montenegro-idUSKCN0RE23O20150914: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/14/us-usa-montenegro-idUSKCN0RE23O20150914


Processing URLs:  89%|████████▊ | 886/1000 [27:29<06:09,  3.24s/it]

Error extracting text from http://www.reuters.com/article/us-china-vietnam-idUSKCN18B0HK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-vietnam-idUSKCN18B0HK


Processing URLs:  89%|████████▊ | 887/1000 [27:32<05:35,  2.97s/it]

Error extracting text from http://www.rusemb.org.uk/fnapr/5795: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
URL filtered: https://www.youtube.com/watch?v=tkplPbd2f60


Processing URLs:  89%|████████▉ | 893/1000 [27:47<04:18,  2.42s/it]

Error extracting text from http://www.brown.senate.gov/newsroom/press/release/brown-urges-treasury-department-to-protect-the-earned-benefits-of-thousands-of-ohioans-with-central-states-pension-plans: 403 Client Error: Forbidden for url: http://www.brown.senate.gov/newsroom/press/release/brown-urges-treasury-department-to-protect-the-earned-benefits-of-thousands-of-ohioans-with-central-states-pension-plans


Processing URLs:  90%|████████▉ | 898/1000 [27:56<03:40,  2.16s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0SE2SC20151020: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0SE2SC20151020


Processing URLs:  90%|█████████ | 903/1000 [28:01<01:52,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-navy-idUSKCN0ZP036: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-navy-idUSKCN0ZP036


Processing URLs:  90%|█████████ | 905/1000 [28:05<02:05,  1.32s/it]

Error extracting text from http://www.ethnologue.com/statistics/size: 404 Client Error: Not Found for url: https://www.ethnologue.com/statistics/size
Error extracting text from https://www.reuters.com/article/us-russia-election-navalny/putin-nemesis-navalny-barred-from-election-tries-political-siege-idUSKCN1G50RM?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-election-navalny/putin-nemesis-navalny-barred-from-election-tries-political-siege-idUSKCN1G50RM?il=0


Processing URLs:  91%|█████████ | 906/1000 [28:05<01:51,  1.19s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56356#.WMt3bTvyuUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56356#.WMt3bTvyuUk


Processing URLs:  91%|█████████ | 909/1000 [28:08<01:06,  1.37it/s]

Error extracting text from http://www.reuters.com/article/2015/11/16/us-japan-economy-gdp-idUSKCN0T41CC20151116#JG1LwyiqmjL0DsOI.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/us-japan-economy-gdp-idUSKCN0T41CC20151116#JG1LwyiqmjL0DsOI.97
Error extracting text from http://www.expreso.com.pe/politica/keiko-fujimori-con-un-pie-fuera-de-la-contienda/: 403 Client Error: Forbidden for url: http://www.expreso.com.pe/politica/keiko-fujimori-con-un-pie-fuera-de-la-contienda/


Processing URLs:  92%|█████████▏| 916/1000 [28:18<01:51,  1.33s/it]

Error extracting text from https://www.yahoo.com/news/erdogan-eyes-expanding-turkey-influence-latin-america-tour-110243189.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/erdogan-eyes-expanding-turkey-influence-latin-america-tour-110243189.html


Processing URLs:  92%|█████████▏| 920/1000 [28:20<00:43,  1.85it/s]

URL filtered: https://twitter.com/Raiklin/status/1336319381547089922
Error extracting text from https://www.barrons.com/articles/goldman-sachs-bitcoin-cryptocurrency-51617217920?siteid=yhoof2: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/goldman-sachs-bitcoin-cryptocurrency-51617217920?siteid=yhoof2


Processing URLs:  92%|█████████▏| 921/1000 [28:21<00:49,  1.59it/s]

URL filtered: https://www.mediamatters.org/false-flag-conspiracy-theory/facebook-2018-rep-marjorie-taylor-greene-endorsed-conspiracy-theories
URL filtered: https://twitter.com/markknoller/status/656865284377178112


Processing URLs:  93%|█████████▎| 926/1000 [29:22<16:45, 13.59s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-12-03/embattled-alabama-republican-senate-candidate-ahead-in-cbs-poll: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 929/1000 [29:27<07:40,  6.49s/it]

URL filtered: https://www.youtube.com/watch?v=wsIbfYEizLk
Error extracting text from http://www.nato.int/cps/en/natohq/opinions_125570.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/opinions_125570.htm?selectedLocale=en


Processing URLs:  93%|█████████▎| 933/1000 [29:30<03:18,  2.97s/it]

Error extracting text from https://www.accuweather.com/en/weather-forecasts/accuweathers-2020-2021-south-america-summer-forecast/843182: 403 Client Error: Forbidden for url: https://www.accuweather.com/en/weather-forecasts/accuweathers-2020-2021-south-america-summer-forecast/843182


Processing URLs:  94%|█████████▎| 937/1000 [29:33<01:40,  1.60s/it]

Error extracting text from http://www.reuters.com/article/2015/09/28/us-markets-stocks-idUSKCN0RS19U20150928: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/28/us-markets-stocks-idUSKCN0RS19U20150928


Processing URLs:  94%|█████████▍| 938/1000 [29:34<01:36,  1.56s/it]

Error extracting text from http://www.wsj.com/articles/california-set-to-extend-curbs-on-water-use-1454460453: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/california-set-to-extend-curbs-on-water-use-1454460453


Processing URLs:  94%|█████████▍| 941/1000 [29:39<01:24,  1.43s/it]

Error extracting text from http://www.reuters.com/article/us-usa-mattis-iran-idUSKBN17L230: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-mattis-iran-idUSKBN17L230


Processing URLs:  94%|█████████▍| 942/1000 [29:39<01:06,  1.16s/it]

Error extracting text from https://www.ccjdigital.com/business/article/15114958/ata-truck-tonnage-up-for-first-time-since-march: 403 Client Error: Forbidden for url: https://www.ccjdigital.com/business/article/15114958/ata-truck-tonnage-up-for-first-time-since-march
URL filtered: https://twitter.com/LeaderMcConnell/status/1426302359811072005?s=20


Processing URLs:  95%|█████████▍| 948/1000 [30:01<01:50,  2.13s/it]

Error extracting text from http://www.nytimes.com/2016/02/07/magazine/roger-goodells-unstoppable-football-machine.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/magazine/roger-goodells-unstoppable-football-machine.html?_r=0


Processing URLs:  95%|█████████▌| 953/1000 [30:10<01:57,  2.49s/it]

Error extracting text from http://cajnewsafrica.com/2017/05/16/land-clashes-linger-in-liberated-nigeria-zones/: 404 Client Error: Not Found for url: https://www.cajnewsafrica.com/2017/05/16/land-clashes-linger-in-liberated-nigeria-zones/
URL filtered: https://www.bloomberg.com/news/articles/2017-10-30/pdvsa-says-it-s-all-paid-up-but-bond-market-thinks-otherwise


Processing URLs:  96%|█████████▌| 956/1000 [30:11<00:56,  1.29s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/11/04/Syrian-army-regains-control-of-Aleppo-supply-route-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/11/04/Syrian-army-regains-control-of-Aleppo-supply-route-.html


Processing URLs:  96%|█████████▌| 958/1000 [30:14<00:58,  1.40s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZGG4Y6KLVR501-4G1CN4ATQKE46FNFAHF7O3JDCO


Processing URLs:  97%|█████████▋| 968/1000 [30:38<01:08,  2.13s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-idUSKBN1630LR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-idUSKBN1630LR


Processing URLs:  97%|█████████▋| 969/1000 [30:39<00:51,  1.66s/it]

Error extracting text from https://www.theafricareport.com/65873/ethiopia-will-tigray-reach-a-conclusion-in-2021/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/65873/ethiopia-will-tigray-reach-a-conclusion-in-2021/


Processing URLs:  97%|█████████▋| 973/1000 [30:42<00:24,  1.09it/s]

Error extracting text from http://www.abc.net.au/religion/articles/2016/06/14/4481526.htm: 404 Client Error: Not Found for url: https://www.abc.net.au/religion/articles/2016/06/14/4481526.htm


Processing URLs:  98%|█████████▊| 975/1000 [30:44<00:23,  1.07it/s]

Error extracting text from https://tass.com/world/1279383: 502 Server Error: Bad Gateway for url: https://tass.com/world/1279383
Error extracting text from http://www.nytimes.com/2016/04/26/world/africa/ethiopia-south-sudan-nuer-highlander.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/26/world/africa/ethiopia-south-sudan-nuer-highlander.html


Processing URLs:  98%|█████████▊| 976/1000 [30:46<00:27,  1.15s/it]

Error extracting text from http://dailytimes.com.pk/islamabad/30-Jun-16/maryam-nawaz-busy-in-preparing-historical-welcome-for-pm: 404 Client Error: Not Found for url: https://dailytimes.com.pk/islamabad/30-Jun-16/maryam-nawaz-busy-in-preparing-historical-welcome-for-pm


Processing URLs:  98%|█████████▊| 980/1000 [31:02<01:03,  3.17s/it]

Error extracting text from https://www.yahoo.com/news/official-rights-court-could-start-getting-cases-poland-163408977.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/official-rights-court-could-start-getting-cases-poland-163408977.html


Processing URLs:  98%|█████████▊| 981/1000 [31:03<00:47,  2.48s/it]

Error extracting text from https://global.handelsblatt.com/edition/452/ressort/politics/article/fighting-over-the-last-word-on-ttip: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/edition/452/ressort/politics/article/fighting-over-the-last-word-on-ttip


Processing URLs:  98%|█████████▊| 983/1000 [31:04<00:25,  1.47s/it]

Error extracting text from https://www.yahoo.com/news/latest-schumer-says-budget-talks-must-two-way-121730402--politics.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/latest-schumer-says-budget-talks-must-two-way-121730402--politics.html


Processing URLs:  99%|█████████▊| 986/1000 [31:38<01:18,  5.63s/it]

Error extracting text from http://www.reuters.com/article/2015/10/23/us-mideast-crisis-syria-putin-opposition-idUSKCN0SG2A620151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/us-mideast-crisis-syria-putin-opposition-idUSKCN0SG2A620151023


Processing URLs:  99%|█████████▊| 987/1000 [31:39<00:55,  4.28s/it]

Error extracting text from https://www.rt.com/news/334880-iran-ballistic-missile-test/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  99%|█████████▉| 994/1000 [31:59<00:13,  2.26s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_In_detail.htm?lang=e&amp;id=In&amp;No=119073&amp;current_page=: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_In_detail.htm?lang=e&amp;id=In&amp;No=119073&amp;current_page=


Processing URLs: 100%|█████████▉| 995/1000 [31:59<00:08,  1.75s/it]

Error extracting text from http://cleantechnica.com/tag/us-electricity-capacity-reports/: 403 Client Error: Forbidden for url: http://cleantechnica.com/tag/us-electricity-capacity-reports/


Processing URLs: 100%|█████████▉| 996/1000 [32:01<00:06,  1.64s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-05/tesla-falls-after-lowering-sales-goal-to-as-few-as-50-000-autos


Processing URLs: 100%|█████████▉| 999/1000 [32:04<00:01,  1.24s/it]

Error extracting text from http://www.cnbc.com/2017/09/19/reuters-america-update-1-u-s-senators-seek-review-of-potential-russian-control-of-citgo.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/09/19/reuters-america-update-1-u-s-senators-seek-review-of-potential-russian-control-of-citgo.html


Processing URLs: 100%|██████████| 1000/1000 [32:04<00:00,  1.92s/it]
Processing URLs:   0%|          | 4/1000 [00:02<11:57,  1.39it/s]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-iraq-mosul-20161002-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-iraq-mosul-20161002-snap-story.html


Processing URLs:   1%|          | 6/1000 [00:05<15:51,  1.04it/s]

Error extracting text from http://www.aaai.org/Conferences/AAAI/2015/aaai15speakers.php: 403 Client Error: Forbidden for url: http://aaai.org/Conferences/AAAI/2015/aaai15speakers.php


Processing URLs:   1%|          | 7/1000 [00:07<19:57,  1.21s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&amp;s=wcrfpus2&amp;f=4): 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:   1%|          | 8/1000 [00:08<22:41,  1.37s/it]

URL filtered: http://www.reuters.com/article/us-southchinasea-china-philippines-exclu-idUSKBN17B124?utm_source=Twitter&amp;utm_medium=Social


Processing URLs:   1%|          | 10/1000 [00:10<18:15,  1.11s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-01/merkel-s-challenger-leads-social-democrats-to-german-poll-boost


Processing URLs:   1%|▏         | 14/1000 [00:18<25:51,  1.57s/it]

Error extracting text from http://www.channelnewsasia.com/news/singapore/south-china-sea-tensions/2399082.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/singapore/south-china-sea-tensions/2399082.html


Processing URLs:   2%|▏         | 15/1000 [00:19<24:34,  1.50s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/18/gitrep-17mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/18/gitrep-17mar16pm/


Processing URLs:   2%|▏         | 18/1000 [00:23<21:22,  1.31s/it]

Error extracting text from https://i.guim.co.uk/img/static/sys-images/Guardian/Pix/pictures/2014/3/13/1394740925879/Survivors-of-the-plane-cr-011.jpg?w=620&amp;q=85&amp;auto=format&amp;sharp=10&amp;s=a61f1ef17cf4a8c3c0b36b4a1f9dda0b: 401 Client Error: Unauthorized - missing signature for url: https://i.guim.co.uk/img/static/sys-images/Guardian/Pix/pictures/2014/3/13/1394740925879/Survivors-of-the-plane-cr-011.jpg?w=620&amp;q=85&amp;auto=format&amp;sharp=10&amp;s=a61f1ef17cf4a8c3c0b36b4a1f9dda0b


Processing URLs:   2%|▏         | 22/1000 [00:42<1:15:32,  4.63s/it]

Error extracting text from http://www.oann.com/approval-of-venezuelan-leader-drops-as-crisis-bites-poll/: 404 Client Error: Not Found for url: https://www.oann.com/approval-of-venezuelan-leader-drops-as-crisis-bites-poll/


Processing URLs:   2%|▏         | 24/1000 [00:44<45:47,  2.81s/it]  

Error extracting text from http://elainelchao.com/biography/: 404 Client Error: Not Found for url: http://elainelchao.com/biography/


Processing URLs:   3%|▎         | 26/1000 [00:45<26:15,  1.62s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15N061: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15N061


Processing URLs:   3%|▎         | 34/1000 [01:04<28:04,  1.74s/it]

URL filtered: https://twitter.com/MeGovernment/status/862589342925107200


Processing URLs:   5%|▍         | 49/1000 [01:37<22:59,  1.45s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/unemployment-rate: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/unemployment-rate


Processing URLs:   5%|▌         | 53/1000 [01:41<17:01,  1.08s/it]

Error extracting text from http://www.straitstimes.com/asia/se-asia/rcep-may-be-a-better-option-than-tpp-philippine-daily-inquirer: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:   5%|▌         | 54/1000 [01:44<22:17,  1.41s/it]

Error extracting text from http://www.alternet.org/election-2016/could-trump-save-hillary-new-hampshire-electoral-primary-scenario-could-defy-polls: 404 Client Error: Not Found for url: https://www.alternet.org/election-2016/could-trump-save-hillary-new-hampshire-electoral-primary-scenario-could-defy-polls


Processing URLs:   6%|▌         | 59/1000 [01:49<13:59,  1.12it/s]

URL filtered: https://www.youtube.com/watch?v=1xpKeabZlEs


Processing URLs:   6%|▌         | 61/1000 [01:51<13:44,  1.14it/s]

Error extracting text from https://abcnews.go.com/US/wireStory/china-warns-retaliation-nyses-delisting-companies-75011768: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/china-warns-retaliation-nyses-delisting-companies-75011768


Processing URLs:   6%|▋         | 63/1000 [01:52<10:20,  1.51it/s]

Error extracting text from http://www.nytimes.com/2016/04/02/business/international/tesla-model-3.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/02/business/international/tesla-model-3.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=second-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:   7%|▋         | 66/1000 [01:54<10:09,  1.53it/s]

Error extracting text from http://www.nytimes.com/2006/04/30/nyregion/30hillary.html?pagewanted=print&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2006/04/30/nyregion/30hillary.html?pagewanted=print&amp;_r=0


Processing URLs:   7%|▋         | 67/1000 [02:01<39:08,  2.52s/it]

Error extracting text from https://apollo.auto/minibus/index.html: 404 Client Error: Not Found for url: https://www.apollo.auto/minibus/index.html


Processing URLs:   7%|▋         | 73/1000 [02:10<21:04,  1.36s/it]

Error extracting text from http://www.tradingeconomics.com/united-states/unemployment-rate: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/unemployment-rate


Processing URLs:   8%|▊         | 77/1000 [02:24<38:16,  2.49s/it]  

Error extracting text from http://www.telegraaf.nl/binnenland/27434978/__Rutte___VVD_niet_met_PVV___.html: 403 Client Error: Forbidden for url: https://www.telegraaf.nl/binnenland/27434978/__Rutte___VVD_niet_met_PVV___.html


Processing URLs:   8%|▊         | 79/1000 [02:35<54:16,  3.54s/it]  

Error extracting text from http://www.coinweek.com/press-releases/burundi-issues-new-banknote-series/: 403 Client Error: Forbidden for url: http://coinweek.com/press-releases/burundi-issues-new-banknote-series/


Processing URLs:   8%|▊         | 81/1000 [02:38<39:22,  2.57s/it]

Error extracting text from https://warisboring.com/after-the-syrian-regime-recaptures-a-neighborhood-the-reconciliation-begins-820a7bf9146e?mc_cid=530d9108bd&amp;mc_eid=0467f21653#.wugobz2k7: 403 Client Error: Forbidden for url: https://warisboring.com/after-the-syrian-regime-recaptures-a-neighborhood-the-reconciliation-begins-820a7bf9146e?mc_cid=530d9108bd&amp;mc_eid=0467f21653#.wugobz2k7


Processing URLs:   8%|▊         | 83/1000 [02:40<28:36,  1.87s/it]

Error extracting text from https://www.c-span.org/video/?c4664379/clinton-watts-senate-intelligence-committee-hearing: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?c4664379/clinton-watts-senate-intelligence-committee-hearing


Processing URLs:   8%|▊         | 84/1000 [02:57<1:37:08,  6.36s/it]

Error extracting text from https://www.almasdarnews.com/article/russian-strategic-bombers-propel-syrian-army-western-palmyra/: 522 Server Error:  for url: https://www.almasdarnews.com/article/russian-strategic-bombers-propel-syrian-army-western-palmyra/


Processing URLs:   8%|▊         | 85/1000 [02:59<1:15:56,  4.98s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/no-grand-coalition-opponents-of-merkel-alliance-hit-the-road-idUSKBN1F20HX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/no-grand-coalition-opponents-of-merkel-alliance-hit-the-road-idUSKBN1F20HX


Processing URLs:   9%|▉         | 91/1000 [03:04<23:40,  1.56s/it]  

Error extracting text from http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1122/DOC-347927A1.pdf: 403 Client Error: Forbidden for url: http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1122/DOC-347927A1.pdf
Error extracting text from http://www.nytimes.com/2016/10/09/us/politics/what-options-does-the-us-have-after-accusing-russia-of-hacks.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/09/us/politics/what-options-does-the-us-have-after-accusing-russia-of-hacks.html


Processing URLs:  10%|▉         | 95/1000 [03:10<22:24,  1.49s/it]

Error extracting text from http://digitalarchive.wilsoncenter.org/document/122544: 403 Client Error: Forbidden for url: http://digitalarchive.wilsoncenter.org/document/122544


Processing URLs:  10%|▉         | 99/1000 [03:17<20:40,  1.38s/it]

Error extracting text from https://tradingeconomics.com/euro-area/inflation-rate-mom: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/euro-area/inflation-rate-mom
Error extracting text from http://www.reuters.com/article/us-health-zika-idUSKCN0VQ27A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-zika-idUSKCN0VQ27A


Processing URLs:  10%|█         | 100/1000 [03:18<20:38,  1.38s/it]

Error extracting text from http://mobile.reuters.com/article/idUSB4N19F00J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSB4N19F00J


Processing URLs:  10%|█         | 101/1000 [03:18<15:41,  1.05s/it]

Error extracting text from https://cafepalermo.files.wordpress.com/2012/03/democrats-spot-a-backbone.jpg?w=625: 404 Client Error: Not Found for url: https://cafepalermo.files.wordpress.com/2012/03/democrats-spot-a-backbone.jpg?w=625


Processing URLs:  10%|█         | 102/1000 [03:19<13:21,  1.12it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-23/burundi-to-release-1-200-detainees-following-un-ban-s-visit-ikz7q3ti
URL filtered: https://www.vanityfair.com/news/2017/09/jared-kushner-data-operation-russia-facebook


Processing URLs:  11%|█         | 106/1000 [03:23<15:38,  1.05s/it]

Error extracting text from https://www.e-education.psu.edu/geog885/l2_p4.html: 404 Client Error: Not Found for url: https://www.e-education.psu.edu/geog885/l2_p4.html


Processing URLs:  11%|█         | 108/1000 [03:25<15:08,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/bank-of-japans-kuroda-shrugs-off-gdp-shrinkage-1447921466: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bank-of-japans-kuroda-shrugs-off-gdp-shrinkage-1447921466


Processing URLs:  11%|█         | 109/1000 [03:25<12:32,  1.18it/s]

Error extracting text from http://marginalrevolution.com/marginalrevolution/2016/10/announce-secret-unprecedented-cyberstrike-russia.html: 403 Client Error: Forbidden for url: http://marginalrevolution.com/marginalrevolution/2016/10/announce-secret-unprecedented-cyberstrike-russia.html


Processing URLs:  11%|█         | 110/1000 [03:32<35:28,  2.39s/it]

Error extracting text from http://vestnikkavkaza.net/articles/Why-do-Montenegrins-not-want-to-be-in-NATO.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/articles/Why-do-Montenegrins-not-want-to-be-in-NATO.html


Processing URLs:  11%|█         | 112/1000 [03:37<37:55,  2.56s/it]

Error extracting text from https://fcw.com/articles/2016/04/04/nss-cyber-chowdhry.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2016/04/04/nss-cyber-chowdhry.aspx/
Error extracting text from https://www.reuters.com/article/us-tesla-quality-insight/build-fast-fix-later-speed-hurts-quality-at-tesla-some-workers-say-idUSKBN1DT0N3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-quality-insight/build-fast-fix-later-speed-hurts-quality-at-tesla-some-workers-say-idUSKBN1DT0N3?il=0
URL filtered: http://www.bloomberg.com/gadfly/articles/2016-02-17/opec-has-a-trust-problem


Processing URLs:  12%|█▏        | 122/1000 [03:52<16:46,  1.15s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-chemicalweapons-idUSKCN1152M8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-chemicalweapons-idUSKCN1152M8
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKCN0ZQ0OF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKCN0ZQ0OF


Processing URLs:  12%|█▏        | 123/1000 [03:55<22:59,  1.57s/it]

URL filtered: https://www.usatoday.com/story/tech/2021/01/20/biden-trump-censorship-section-230-google-facebook-scrutiny/4238357001/


Processing URLs:  13%|█▎        | 127/1000 [04:00<22:32,  1.55s/it]

Error extracting text from https://reason.com/archives/2015/10/01/grexit-on-ice&gt: 404 Client Error: Not Found for url: https://reason.com/archives/2015/10/01/grexit-on-ice&gt/


Processing URLs:  13%|█▎        | 130/1000 [04:04<19:28,  1.34s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/22/gitrep-21apr16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/22/gitrep-21apr16pm/


Processing URLs:  13%|█▎        | 132/1000 [04:06<15:28,  1.07s/it]

Error extracting text from http://thehill.com/homenews/senate/361330-alabama-columnist-roy-moore-first-took-interest-in-his-wife-when-she-was: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/361330-alabama-columnist-roy-moore-first-took-interest-in-his-wife-when-she-was/


Processing URLs:  13%|█▎        | 134/1000 [04:12<26:22,  1.83s/it]

Error extracting text from http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/s_res_2254.pdf: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/s_res_2254.pdf


Processing URLs:  14%|█▎        | 135/1000 [04:12<21:20,  1.48s/it]

Error extracting text from http://nationalinterest.org/feature/chinas-sea-phantom-fleet-prowls-the-open-waters-15105: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/chinas-sea-phantom-fleet-prowls-the-open-waters-15105


Processing URLs:  14%|█▎        | 136/1000 [04:14<20:19,  1.41s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/poland-eu-officials-improvement-ties-52932695: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/poland-eu-officials-improvement-ties-52932695


Processing URLs:  14%|█▎        | 137/1000 [04:15<20:58,  1.46s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-04-03/venezuela-credit-dashboard-default-risk-spikes-as-payment-looms


Processing URLs:  14%|█▍        | 141/1000 [04:20<15:42,  1.10s/it]

Error extracting text from https://www.wsj.com/articles/crazy-bets-on-200-oil-invade-the-options-market-11634463002: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/crazy-bets-on-200-oil-invade-the-options-market-11634463002


Processing URLs:  14%|█▍        | 144/1000 [04:22<11:59,  1.19it/s]

Error extracting text from https://www.amazon.com/gp/product/B015AR5O9E/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;psc=1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/gp/product/B015AR5O9E/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;psc=1


Processing URLs:  14%|█▍        | 145/1000 [04:24<17:27,  1.23s/it]

Error extracting text from https://bit.ly/3rDX8dy: 404 Client Error: Not Found for url: https://theweek.com/articles/964336/scottish-fishermen-say-industry-crisis-after-brexit


Processing URLs:  15%|█▍        | 146/1000 [04:25<14:26,  1.01s/it]

Error extracting text from http://www.cdc.gov/media/releases/2014/p0731-ebola.html: 404 Client Error: Not Found for url: https://www.cdc.gov/media/releases/2014/p0731-ebola.html


Processing URLs:  15%|█▌        | 150/1000 [04:33<30:35,  2.16s/it]

Error extracting text from http://www.cnbrlyc.com/2016/08/27/maduro-says-venezuela-iran-hold-talks-on-oil-price-stability.html: HTTPConnectionPool(host='www.cnbrlyc.com', port=80): Max retries exceeded with url: /2016/08/27/maduro-says-venezuela-iran-hold-talks-on-oil-price-stability.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301ab5d00>: Failed to resolve 'www.cnbrlyc.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 152/1000 [04:33<17:17,  1.22s/it]

Error extracting text from http://www.wsj.com/articles/democratic-national-committee-computers-breached-by-hackers-linked-to-russian-government-1465920304: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/democratic-national-committee-computers-breached-by-hackers-linked-to-russian-government-1465920304


Processing URLs:  15%|█▌        | 154/1000 [04:36<18:39,  1.32s/it]

Error extracting text from https://www.pcori.org/evidence-synthesis/horizon-scanning/covid-19-scan-june-10-23-2021: 403 Client Error: Forbidden for url: https://www.pcori.org/evidence-synthesis/horizon-scanning/covid-19-scan-june-10-23-2021


Processing URLs:  16%|█▌        | 157/1000 [04:42<23:22,  1.66s/it]

Error extracting text from http://www.ibtimes.com/iran-test-fires-2-more-ballistic-missiles-despite-threat-new-sanctions-us-2333013: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-test-fires-2-more-ballistic-missiles-despite-threat-new-sanctions-us-2333013


Processing URLs:  16%|█▌        | 159/1000 [04:43<15:48,  1.13s/it]

Error extracting text from https://www.nytimes.com/2017/07/25/world/europe/china-russia-baltic-navy-exercises.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/25/world/europe/china-russia-baltic-navy-exercises.html


Processing URLs:  16%|█▌        | 161/1000 [04:45<16:27,  1.18s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-04/modi-bucks-global-bond-rout-as-cash-recall-cuts-india-inc-costs


Processing URLs:  17%|█▋        | 166/1000 [04:52<19:36,  1.41s/it]

Error extracting text from http://abcas3.auditedmedia.com/ecirc/magtitlesearch.asp: HTTPSConnectionPool(host='abcas3.auditedmedia.com', port=443): Max retries exceeded with url: /ecirc/magtitlesearch.asp (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  17%|█▋        | 168/1000 [04:54<15:21,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-usa-climatechange-paris-idUSKBN1740NP?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-climatechange-paris-idUSKBN1740NP?il=0


Processing URLs:  17%|█▋        | 171/1000 [05:56<3:00:47, 13.09s/it]

Error extracting text from http://www.einnews.com/pr_news/334341255/mediterranean-migrant-arrivals-in-2016-230-885-deaths-2-920: HTTPConnectionPool(host='www.einnews.com', port=80): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-southkorea-china-fishermen-idUSKCN1200DQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-china-fishermen-idUSKCN1200DQ


Processing URLs:  17%|█▋        | 172/1000 [05:56<2:07:51,  9.26s/it]

Error extracting text from http://www.wsj.com/articles/syrias-alawites-the-people-behind-assad-1435166941: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syrias-alawites-the-people-behind-assad-1435166941


Processing URLs:  17%|█▋        | 174/1000 [05:57<1:05:27,  4.75s/it]

Error extracting text from http://www.wsj.com/articles/putins-record-augurs-tough-response-if-terror-scenario-hardens-on-plane-crash-1446751143: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/putins-record-augurs-tough-response-if-terror-scenario-hardens-on-plane-crash-1446751143


Processing URLs:  18%|█▊        | 176/1000 [06:59<4:43:41, 20.66s/it]

Error extracting text from http://www.eetimes.com/author.asp?doc_id=1324428: HTTPConnectionPool(host='www.eetimes.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  18%|█▊        | 179/1000 [07:00<1:42:01,  7.46s/it]

Error extracting text from http://www.reuters.com/article/2015/11/03/us-mideast-crisis-syria-russia-idUSKCN0SS0TY20151103#gkvYbcCijbJtI6aP.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/03/us-mideast-crisis-syria-russia-idUSKCN0SS0TY20151103#gkvYbcCijbJtI6aP.97
URL filtered: https://twitter.com/TennisChannel/status/1524862662718505011


Processing URLs:  18%|█▊        | 183/1000 [07:04<37:11,  2.73s/it]  

Error extracting text from http://www.realcleardefense.com/articles/2016/03/11/iraqs_military_falls_short_on_logistics_in_fight_against_isis_109129.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/03/11/iraqs_military_falls_short_on_logistics_in_fight_against_isis_109129.html


Processing URLs:  19%|█▉        | 191/1000 [07:21<29:30,  2.19s/it]

Error extracting text from http://www.eluniversal.com/noticias/daily-news/pdvsa-negotiates-debt-swap-2017-with-credit-suisse_433037: 404 Client Error: Not Found for url: https://www.eluniversal.com/noticias/daily-news/pdvsa-negotiates-debt-swap-2017-with-credit-suisse_433037


Processing URLs:  19%|█▉        | 193/1000 [07:24<25:03,  1.86s/it]

Error extracting text from http://www.who.int/bulletin/volumes/85/07-100107/en/: 404 Client Error: Not Found for url: https://www.who.int/bulletin/volumes/85/07-100107/en/


Processing URLs:  20%|██        | 201/1000 [07:53<53:44,  4.04s/it]  

URL filtered: http://www.forbes.com/sites/federicoguerrini/2017/01/16/facebook-will-flag-and-filter-fake-news-in-germany/#48b687b060e3


Processing URLs:  20%|██        | 203/1000 [07:54<31:02,  2.34s/it]

URL filtered: https://www.bloomberg.com/news/videos/2017-01-06/at-t-doj-requested-in-depth-time-warner-review


Processing URLs:  21%|██        | 206/1000 [07:56<20:07,  1.52s/it]

Error extracting text from https://www.hindustantimes.com/india-news/natco-pharma-seeks-approval-for-molnupiravir-for-use-in-covid-19-treatment-101619412404574.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/india-news/natco-pharma-seeks-approval-for-molnupiravir-for-use-in-covid-19-treatment-101619412404574.html


Processing URLs:  21%|██        | 209/1000 [08:02<21:46,  1.65s/it]

Error extracting text from http://thehill.com/policy/finance/255815-week-ahead-ex-im-supporters-look-to-force-vote: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/255815-week-ahead-ex-im-supporters-look-to-force-vote/
URL filtered: https://www.youtube.com/watch?v=ULL7apmAJTE


Processing URLs:  21%|██▏       | 214/1000 [08:08<17:41,  1.35s/it]

Error extracting text from https://gcaptain.com/houthis-seize-uae-flagged-cargo-ship-off-yemen/?subscriber=true&amp;goal=0_f50174ef03-2bb5bd07da-170102337&amp;mc_cid=2bb5bd07da&amp;mc_eid=c74873c672: 403 Client Error: Forbidden for url: https://gcaptain.com/houthis-seize-uae-flagged-cargo-ship-off-yemen/?subscriber=true&amp;goal=0_f50174ef03-2bb5bd07da-170102337&amp;mc_cid=2bb5bd07da&amp;mc_eid=c74873c672


Processing URLs:  22%|██▏       | 215/1000 [08:09<17:06,  1.31s/it]

Error extracting text from http://www.nytimes.com/1981/09/16/us/panel-approves-judge-o-connor.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1981/09/16/us/panel-approves-judge-o-connor.html


Processing URLs:  22%|██▏       | 219/1000 [08:14<14:11,  1.09s/it]

Error extracting text from https://www.wsj.com/articles/iran-test-fires-medium-range-ballistic-missile-1489082919?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-test-fires-medium-range-ballistic-missile-1489082919?mod=e2fb


Processing URLs:  22%|██▏       | 221/1000 [08:15<10:15,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-islamicstate-idUSKBN13X28N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-islamicstate-idUSKBN13X28N


Processing URLs:  22%|██▏       | 224/1000 [09:19<4:05:22, 18.97s/it]

Error extracting text from https://www.tradingfloor.com/posts/low-oil-price-no-barrier-to-mega-aramco-ipo-8478188: HTTPSConnectionPool(host='www.tradingfloor.com', port=443): Max retries exceeded with url: /posts/low-oil-price-no-barrier-to-mega-aramco-ipo-8478188 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x306e0df70>, 'Connection to www.tradingfloor.com timed out. (connect timeout=60)'))


Processing URLs:  23%|██▎       | 226/1000 [09:21<2:07:39,  9.90s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Abe-eyes-Russia-visit-in-hopes-of-breakthrough
URL filtered: http://www.bloomberg.com/news/articles/2016-03-14/burundi-direct-financial-support-suspended-by-eu-amid-crisis


Processing URLs:  23%|██▎       | 232/1000 [09:30<32:52,  2.57s/it]  

Error extracting text from https://thenationalpulse.com/news/biden-revokes-trump-energy-eo/: 403 Client Error: Forbidden for url: https://thenationalpulse.com/news/biden-revokes-trump-energy-eo/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-britain-iraq-idUSKBN15Q0RO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-britain-iraq-idUSKBN15Q0RO


Processing URLs:  23%|██▎       | 233/1000 [09:31<26:25,  2.07s/it]

URL filtered: https://twitter.com/search?q=yellow+skies+london&amp;ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Esearch


Processing URLs:  24%|██▎       | 235/1000 [09:31<15:09,  1.19s/it]

Error extracting text from http://www.wsj.com/articles/assads-future-may-be-stumbling-block-in-plan-to-fight-isis-1447791663: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/assads-future-may-be-stumbling-block-in-plan-to-fight-isis-1447791663


Processing URLs:  24%|██▍       | 243/1000 [09:47<20:16,  1.61s/it]

Error extracting text from http://www.thenational.ae/world/central-asia/afghan-parliament-elections-create-new-political-rifts: 404 Client Error: Not Found for url: https://www.thenationalnews.com/world/central-asia/afghan-parliament-elections-create-new-political-rifts/


Processing URLs:  25%|██▌       | 252/1000 [10:06<18:15,  1.46s/it]

Error extracting text from https://www.yahoo.com/news/m/a64db76f-1405-31e8-8e3c-0170e4725f08/ss_scotus-pick%3F-fox-news&#39;-judge.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/m/a64db76f-1405-31e8-8e3c-0170e4725f08/ss_scotus-pick%3F-fox-news&#39;-judge.html
Error extracting text from http://www.wsj.com/articles/obama-administration-preparing-fresh-iran-sanctions-1451507921: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/obama-administration-preparing-fresh-iran-sanctions-1451507921


Processing URLs:  26%|██▌       | 260/1000 [10:15<09:45,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-usa-cybersecurity-infrastructure-idUSKCN0UR2CX20160113: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cybersecurity-infrastructure-idUSKCN0UR2CX20160113


Processing URLs:  26%|██▋       | 263/1000 [10:19<14:15,  1.16s/it]

Error extracting text from https://www.businessinsider.com.au/the-risk-of-a-us-recession-according-to-deutsche-bank-2016-9?r=US&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com.au/the-risk-of-a-us-recession-according-to-deutsche-bank-2016-9?r=US&amp;IR=T


Processing URLs:  27%|██▋       | 266/1000 [10:23<14:57,  1.22s/it]

Error extracting text from http://www.rollingstone.com/politics/news/how-republicans-rig-the-game: 404 Client Error: Not Found for url: https://www.rollingstone.com/politics/news/how-republicans-rig-the-game


Processing URLs:  27%|██▋       | 269/1000 [10:27<14:12,  1.17s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-federal-budget-congress-perspec-1029-20151029-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-federal-budget-congress-perspec-1029-20151029-story.html


Processing URLs:  27%|██▋       | 270/1000 [10:29<15:52,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1AI1WT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1AI1WT


Processing URLs:  28%|██▊       | 280/1000 [10:47<25:37,  2.14s/it]

Error extracting text from https://www.nytimes.com/2017/09/18/world/europe/germany-election-martin-schulz.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/18/world/europe/germany-election-martin-schulz.html


Processing URLs:  28%|██▊       | 282/1000 [10:49<16:23,  1.37s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7995093/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7995093/
Error extracting text from http://www.reuters.com/article/us-usa-sec-conflictminerals-idUSKBN1792WX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-sec-conflictminerals-idUSKBN1792WX


Processing URLs:  28%|██▊       | 283/1000 [10:49<12:08,  1.02s/it]

Error extracting text from https://www.wsj.com/articles/republican-leaders-push-spending-patch-to-avoid-shutdown-1512341550: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/republican-leaders-push-spending-patch-to-avoid-shutdown-1512341550


Processing URLs:  29%|██▉       | 294/1000 [11:04<09:50,  1.19it/s]

Error extracting text from http://www.nasdaq.com/article/after-budget-deal-top-lawmakers-brace-for-spending-showdown-20151103-01702: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/after-budget-deal-top-lawmakers-brace-for-spending-showdown-20151103-01702


Processing URLs:  30%|██▉       | 298/1000 [11:09<09:13,  1.27it/s]

Error extracting text from http://www.reuters.com/article/us-afghanistan-election-idUSKCN0UW0Q6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-election-idUSKCN0UW0Q6


Processing URLs:  30%|███       | 303/1000 [11:17<18:09,  1.56s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0U02VE20151217: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0U02VE20151217


Processing URLs:  31%|███       | 306/1000 [11:23<21:57,  1.90s/it]

URL filtered: https://twitter.com/IntelCrab/status/787827564928602112


Processing URLs:  31%|███       | 311/1000 [11:27<14:18,  1.25s/it]

Error extracting text from http://www.thenational.ae/opinion/comment/would-a-military-committee-ease-syria-transition: 404 Client Error: Not Found for url: https://www.thenationalnews.com/opinion/comment/would-a-military-committee-ease-syria-transition/


Processing URLs:  31%|███       | 312/1000 [11:29<14:41,  1.28s/it]

Error extracting text from https://www.scotsman.com/news/politics/nicola-sturgeon-to-fall-short-of-majority-as-alex-salmonds-alba-party-deprives-snp-of-key-list-seats-poll-shows-3192641: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/nicola-sturgeon-to-fall-short-of-majority-as-alex-salmonds-alba-party-deprives-snp-of-key-list-seats-poll-shows-3192641


Processing URLs:  32%|███▏      | 315/1000 [11:34<15:47,  1.38s/it]

URL filtered: https://www.youtube.com/watch?v=xdtssXITXuE


Processing URLs:  32%|███▏      | 318/1000 [11:39<18:01,  1.59s/it]

Error extracting text from http://www.theepochtimes.com/n3/2007259-communist-party-theoretician-knows-who-his-godfather-is/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2007259-communist-party-theoretician-knows-who-his-godfather-is/


Processing URLs:  32%|███▏      | 319/1000 [11:39<13:56,  1.23s/it]

Error extracting text from http://www.wsj.com/articles/brazil-to-allow-aircraft-to-spray-for-mosquitoes-to-combat-zika-1467382925: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-to-allow-aircraft-to-spray-for-mosquitoes-to-combat-zika-1467382925
Error extracting text from https://www.reuters.com/world/americas/brazil-reports-over-150000-covid-19-cases-one-day-amid-rio-backlog-2021-09-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazil-reports-over-150000-covid-19-cases-one-day-amid-rio-backlog-2021-09-19/


Processing URLs:  32%|███▏      | 321/1000 [11:39<09:03,  1.25it/s]

URL filtered: https://twitter.com/BarzanSadiq/status/811597568652279808


Processing URLs:  32%|███▏      | 324/1000 [11:42<09:09,  1.23it/s]

Error extracting text from http://www2.politicalbetting.com/index.php/archives/2016/03/30/trying-to-work-out-who-will-turn-out-in-the-referendum-of-june-23rd/: 404 Client Error: Not Found for url: http://www2.politicalbetting.com/index.php/archives/2016/03/30/trying-to-work-out-who-will-turn-out-in-the-referendum-of-june-23rd/


Processing URLs:  33%|███▎      | 332/1000 [11:55<19:02,  1.71s/it]

Error extracting text from http://www.al-monitor.com/pulse//sites/almonitor/contents/afp/2016/03/iran-politics-parliament-rouhani.html#ixzz429ZTuLMO: 404 Client Error: Not Found for url: https://www.al-monitor.com/sites/almonitor/contents/afp/2016/03/iran-politics-parliament-rouhani.html#ixzz429ZTuLMO


Processing URLs:  33%|███▎      | 333/1000 [11:56<15:39,  1.41s/it]

URL filtered: https://www.youtube.com/watch?v=oeSeOKFvNgY


Processing URLs:  34%|███▎      | 335/1000 [11:56<09:39,  1.15it/s]

Error extracting text from http://warisboring.com/the-u-s-military-bushwacked-iranian-troops-in-syria/: 403 Client Error: Forbidden for url: http://warisboring.com/the-u-s-military-bushwacked-iranian-troops-in-syria/


Processing URLs:  34%|███▎      | 337/1000 [12:00<12:38,  1.14s/it]

Error extracting text from https://www.reuters.com/article/us-iran-oil-idUSKBN18L2N9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-idUSKBN18L2N9


Processing URLs:  34%|███▍      | 339/1000 [12:04<16:20,  1.48s/it]

Error extracting text from https://www.thelifeyoucansave.org/Take-the-Pledge: 403 Client Error: Forbidden for url: https://www.thelifeyoucansave.org/Take-the-Pledge


Processing URLs:  34%|███▍      | 340/1000 [12:05<16:22,  1.49s/it]

Error extracting text from https://www.mediapost.com/publications/article/307676/fcc-urged-to-put-bakes-on-net-neutrality-repeal.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  34%|███▍      | 341/1000 [12:06<14:18,  1.30s/it]

Error extracting text from http://thehill.com/policy/energy-environment/326561-trump-to-decide-whether-to-stay-paris-climate-pact-by-late-may: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/326561-trump-to-decide-whether-to-stay-paris-climate-pact-by-late-may/


Processing URLs:  34%|███▍      | 344/1000 [12:09<10:23,  1.05it/s]

Error extracting text from http://nyti.ms/1LDbdfk: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/22/world/middleeast/assad-putin-syria-russia.html?smid=pl-share


Processing URLs:  36%|███▌      | 356/1000 [12:43<20:51,  1.94s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN13R1KJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-exclusive-idUSKBN13R1KJ
Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-debt-idUSKCN12D1VK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-debt-idUSKCN12D1VK


Processing URLs:  36%|███▌      | 360/1000 [12:56<24:10,  2.27s/it]

Error extracting text from https://www.wsj.com/articles/donald-trump-warns-china-on-north-korea-we-will-no-longer-allow-this-to-continue-1501374727: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/donald-trump-warns-china-on-north-korea-we-will-no-longer-allow-this-to-continue-1501374727


Processing URLs:  36%|███▌      | 362/1000 [12:57<14:32,  1.37s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/10/06/0200000000AEN20151006000200315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from http://www.reuters.com/article/2015/10/21/us-iran-nuclear-khamenei-idUSKCN0SF18720151021: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/21/us-iran-nuclear-khamenei-idUSKCN0SF18720151021


Processing URLs:  36%|███▋      | 364/1000 [13:00<17:38,  1.66s/it]

Error extracting text from http://www.gupc.com.pa/en/press/press-releases/252-finalizacion-de-obra: 404 Client Error: Not Found for url: https://www.gupc.com.pa/en/press/press-releases/252-finalizacion-de-obra


Processing URLs:  37%|███▋      | 371/1000 [13:10<13:11,  1.26s/it]

Error extracting text from http://www.baltimoresun.com/news/maryland/crime/blog/bs-md-syed-hearing-date-changed-20151228-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/news/maryland/crime/blog/bs-md-syed-hearing-date-changed-20151228-story.html


Processing URLs:  37%|███▋      | 372/1000 [13:11<12:08,  1.16s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN10A0F0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN10A0F0


Processing URLs:  37%|███▋      | 374/1000 [13:13<11:51,  1.14s/it]

Error extracting text from http://www.oddsportal.com/cricket/world/icc-world-twenty20/outrights/: 404 Client Error: No such file for url: https://www.oddsportal.com/cricket/world/icc-world-twenty20/outrights/
Error extracting text from http://blogs.wsj.com/speakeasy/2016/01/02/george-r-r-martin-says-the-winds-of-winter-isnt-finished/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/speakeasy/2016/01/02/george-r-r-martin-says-the-winds-of-winter-isnt-finished/


Processing URLs:  38%|███▊      | 376/1000 [13:15<09:23,  1.11it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/dec/10/bernie-sanders-has-10-point-lead-over-hillary-clin/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/dec/10/bernie-sanders-has-10-point-lead-over-hillary-clin/


Processing URLs:  38%|███▊      | 382/1000 [13:24<12:07,  1.18s/it]

Error extracting text from https://www.jstor.org/stable/758651: 420 Client Error: Enhance Your Calm for url: https://www.jstor.org/stable/758651


Processing URLs:  39%|███▊      | 387/1000 [13:29<10:04,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-thailand-king-idUSKBN12H08D?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-king-idUSKBN12H08D?il=0


Processing URLs:  39%|███▉      | 388/1000 [13:30<10:46,  1.06s/it]

Error extracting text from http://www.globalresearch.ca/interview-with-brazils...dilma-rousseff/5526410: 404 Client Error: Not Found for url: https://www.globalresearch.ca/interview-with-brazils...dilma-rousseff/5526410


Processing URLs:  39%|███▉      | 389/1000 [13:32<12:41,  1.25s/it]

URL filtered: http://www.bloomberg.com/news/articles/2014-08-08/why-chlorine-chicken-from-america-inspires-dread-in-europe


Processing URLs:  39%|███▉      | 391/1000 [13:32<07:54,  1.28it/s]

Error extracting text from https://bit.ly/3ACCsXL: 403 Client Error: Forbidden for url: https://www.france24.com/en/americas/20210812-haiti-postpones-election-date-to-replace-assassinated-president


Processing URLs:  39%|███▉      | 393/1000 [13:35<09:16,  1.09it/s]

Error extracting text from https://nyti.ms/3p8mkXQ: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/09/world/europe/italy-renzi-interview.html


Processing URLs:  39%|███▉      | 394/1000 [13:36<11:42,  1.16s/it]

Error extracting text from http://therealnews.com/t2/index.php?option=com_content&amp;task=view&amp;id=31&amp;Itemid=74&amp;jumival=16730: 404 Client Error: Not Found for url: https://therealnews.com/t2/?option=com_content&amp;task=view&amp;id=31&amp;Itemid=74&amp;jumival=16730


Processing URLs:  40%|███▉      | 396/1000 [13:39<12:17,  1.22s/it]

Error extracting text from https://volafy.net/equity/FB: HTTPSConnectionPool(host='volafy.net', port=443): Max retries exceeded with url: /equity/FB (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fedb57c0>: Failed to resolve 'volafy.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 400/1000 [13:45<12:30,  1.25s/it]

Error extracting text from https://www.nytimes.com/2017/01/28/us/politics/trump-putin-russia-sanctions.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/28/us/politics/trump-putin-russia-sanctions.html?_r=0


Processing URLs:  41%|████      | 407/1000 [14:08<19:59,  2.02s/it]  

Error extracting text from https://www.nytimes.com/2021/03/30/world/europe/ukraine-russia-fighting.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/30/world/europe/ukraine-russia-fighting.html


Processing URLs:  41%|████      | 409/1000 [14:10<15:09,  1.54s/it]

Error extracting text from https://www.nytimes.com/2017/03/20/us/politics/judge-gorsuch-supreme-court-confirmation-hearings.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/20/us/politics/judge-gorsuch-supreme-court-confirmation-hearings.html?_r=0


Processing URLs:  41%|████      | 411/1000 [14:22<41:17,  4.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-14/brazil-government-said-to-unveil-14-6-billion-austerity-plan


Processing URLs:  41%|████▏     | 414/1000 [14:24<20:45,  2.13s/it]

Error extracting text from http://breakingdefense.com/2017/05/air-force-drops-first-gps-bomb-from-reaper-gbu-38-jdam/\: 404 Client Error: Not Found for url: https://breakingdefense.com/2017/05/air-force-drops-first-gps-bomb-from-reaper-gbu-38-jdam/%5C


Processing URLs:  42%|████▏     | 415/1000 [14:25<18:38,  1.91s/it]

Error extracting text from https://www.businessinsider.com/15-reasons-why-ceos-leave-their-office-2012-7?r=US&IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/(null)/15-reasons-why-ceos-leave-their-office-2012-7?IR=T


Processing URLs:  42%|████▏     | 416/1000 [14:28<19:33,  2.01s/it]

Error extracting text from https://www1.nyc.gov/site/doh/covid/covid-19-goals.page.: 404 Client Error: Not Found for url: https://www.nyc.gov:443/site/doh/covid/covid-19-goals.page.


Processing URLs:  42%|████▏     | 419/1000 [14:32<13:57,  1.44s/it]

Error extracting text from https://globalguessing.com/metaculus-mondays-vol12/#will-the-us-rejoin-the-iran-nuclear-deal: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /metaculus-mondays-vol12/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  42%|████▏     | 421/1000 [14:49<47:42,  4.94s/it]

Error extracting text from http://news.trust.org/item/20160608160437-f8nla/: 404 Client Error:  for url: https://news.trust.org:443/item/20160608160437-f8nla/


Processing URLs:  42%|████▏     | 422/1000 [14:50<37:12,  3.86s/it]

Error extracting text from http://uk.reuters.com/article/2015/10/17/uk-montenegro-protests-idUKKCN0SB0XV20151017: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  43%|████▎     | 427/1000 [15:01<24:16,  2.54s/it]

Error extracting text from http://www.fda.gov/AboutFDA/CentersOffices/OfficeofMedicalProductsandTobacco/CDER/: 404 Client Error: Not Found for url: https://www.fda.gov/about-fda/about-center-drug-evaluation-and-research/center-drug-evaluation-and-research


Processing URLs:  43%|████▎     | 433/1000 [15:14<15:37,  1.65s/it]

Error extracting text from http://www.nytimes.com/2016/05/28/world/asia/pakistan-nawaz-sharif-open-heart-surgery.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/28/world/asia/pakistan-nawaz-sharif-open-heart-surgery.html


Processing URLs:  44%|████▎     | 436/1000 [15:18<14:48,  1.57s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-02-05/uber-s-courtroom-detour-into-the-secrets-of-the-mind-quicktake


Processing URLs:  44%|████▍     | 442/1000 [15:20<05:22,  1.73it/s]

Error extracting text from http://www.paddypower.com/bet/politics/other-politics/us-politics?ev_oc_grp_ids=2092272: HTTPConnectionPool(host='www.paddypower.com', port=80): Max retries exceeded with url: /bet/politics/other-politics/us-politics?ev_oc_grp_ids=2092272 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ff1cdf10>: Failed to establish a new connection: [Errno 61] Connection refused'))
URL filtered: http://www.theatlantic.com/technology/archive/2016/06/did-facebook-spike-uk-voter-registration/485843/
Error extracting text from https://www.timesofisrael.com/netanyahu-trial-may-face-further-delay-as-judges-hint-indictment-must-be-revised/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/netanyahu-trial-may-face-further-delay-as-judges-hint-indictment-must-be-revised/


Processing URLs:  44%|████▍     | 445/1000 [15:24<07:42,  1.20it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/330612-carl-bernstein-flynn-is-central-to-what-fbi-believes-is-cover: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/330612-carl-bernstein-flynn-is-central-to-what-fbi-believes-is-cover/


Processing URLs:  45%|████▍     | 449/1000 [15:29<11:03,  1.20s/it]

Error extracting text from https://www.reuters.com/article/us-african-union-summit-southsudan/african-union-joins-growing-chorus-demanding-sanctions-on-south-sudan-war-idUSKBN1FI2IO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-african-union-summit-southsudan/african-union-joins-growing-chorus-demanding-sanctions-on-south-sudan-war-idUSKBN1FI2IO?il=0


Processing URLs:  45%|████▌     | 453/1000 [15:37<18:21,  2.01s/it]

Error extracting text from http://www.rollcall.com/news/politics/could-supreme-court-stance-doom-ron-johnson: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/could-supreme-court-stance-doom-ron-johnson


Processing URLs:  45%|████▌     | 454/1000 [15:39<17:48,  1.96s/it]

Error extracting text from https://www.kdrv.com/content/news/FBI-agent-says-domestic-terrorist-cases-are-rising-in-Oregon--574076881.html: 404 Client Error: Not Found for url: https://www.kdrv.com/content/news/fbi-agent-says-domestic-terrorist-cases-are-rising-in-oregon--574076881.html


Processing URLs:  46%|████▌     | 455/1000 [16:39<2:55:44, 19.35s/it]

Error extracting text from http://kremlin.ru/acts/constitution/item#chapter3: HTTPConnectionPool(host='kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  46%|████▌     | 458/1000 [16:43<1:08:17,  7.56s/it]

Error extracting text from http://concorde.ua/en/research/daily/eu-to-release-ukraine-loan-after-imf-tranche-poroshenko-says-15556/#sthash.Y6yFOBRw.dpuf&amp;quot: 404 Client Error: Not Found for url: https://concorde.ua/en/research/daily/eu-to-release-ukraine-loan-after-imf-tranche-poroshenko-says-15556/#sthash.Y6yFOBRw.dpuf&amp;quot


Processing URLs:  46%|████▌     | 460/1000 [16:44<34:45,  3.86s/it]  

Error extracting text from http://www.usni.org/magazines/proceedings/2016-05/chinese-navy-trains-and-takes-risks: 403 Client Error: Forbidden for url: http://www.usni.org/magazines/proceedings/2016-05/chinese-navy-trains-and-takes-risks


Processing URLs:  46%|████▌     | 461/1000 [16:45<27:01,  3.01s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/latest-nato-leaders-wary-russian-intentions-40455392: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/latest-nato-leaders-wary-russian-intentions-40455392


Processing URLs:  46%|████▌     | 462/1000 [16:48<26:52,  3.00s/it]

URL filtered: https://techcrunch.com/2016/12/01/facebook-fake-news-flags-reliable-news-source/


Processing URLs:  46%|████▋     | 464/1000 [16:50<18:30,  2.07s/it]

Error extracting text from http://www.stripes.com/news/middle-east/in-a-first-leading-democrats-and-republicans-join-forces-on-iran-sanctions-1.398700: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/middle_east/in-a-first-leading-democrats-and-republicans-join-forces-on-iran-sanctions-1.398700


Processing URLs:  47%|████▋     | 470/1000 [16:55<07:04,  1.25it/s]

Error extracting text from https://www.nytimes.com/2017/10/22/world/asia/cia-expanding-taliban-fight-afghanistan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/22/world/asia/cia-expanding-taliban-fight-afghanistan.html


Processing URLs:  47%|████▋     | 474/1000 [17:10<20:49,  2.38s/it]

Error extracting text from https://thehill.com/policy/international/587295-thousands-of-russian-troops-withdrawing-from-ukraine-border-report?amp: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/587295-thousands-of-russian-troops-withdrawing-from-ukraine-border-report/?amp


Processing URLs:  48%|████▊     | 475/1000 [17:12<19:13,  2.20s/it]

Error extracting text from http://www.ibtimes.com/nigeria-replaces-saudi-arabia-top-crude-oil-supplier-india-1983397: 403 Client Error: Forbidden for url: https://www.ibtimes.com/nigeria-replaces-saudi-arabia-top-crude-oil-supplier-india-1983397


Processing URLs:  48%|████▊     | 482/1000 [17:18<08:03,  1.07it/s]

Error extracting text from http://belarusdigest.com/story/belarus-and-russian-food-embargo-success-story-23073: 404 Client Error: Not Found for url: http://belarusdigest.com/story/belarus-and-russian-food-embargo-success-story-23073
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-moscovici-idUSKBN1761IU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-moscovici-idUSKBN1761IU


Processing URLs:  48%|████▊     | 483/1000 [17:22<15:13,  1.77s/it]

Error extracting text from https://bit.ly/3cfDqyx: 403 Client Error: Forbidden for url: https://conservativehome.com/thetorydiary/2021/03/our-post-budget-cabinet-league-table-sunak-is-still-second-though-his-score-is-his-lowest-as-chancellor-since-covid.html


Processing URLs:  49%|████▊     | 486/1000 [17:34<22:55,  2.68s/it]

Error extracting text from https://www.legbranch.org/2015-10-19-when-does-congress-repeal-legislation-a-new-dataset-of-major-repeals-from-1877-2012-provides-answers/: 403 Client Error: Forbidden for url: https://www.legbranch.org/2015-10-19-when-does-congress-repeal-legislation-a-new-dataset-of-major-repeals-from-1877-2012-provides-answers/


Processing URLs:  49%|████▊     | 487/1000 [17:35<19:13,  2.25s/it]

Error extracting text from http://www.reuters.com/article/us-vietnam-china-conflict-insight-idUSKBN0U000320151218: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-vietnam-china-conflict-insight-idUSKBN0U000320151218


Processing URLs:  49%|████▉     | 489/1000 [17:37<12:48,  1.50s/it]

Error extracting text from http://mobil.express.de/news/politik-und-wirtschaft/frontex-chef-warnt-vor-neuen-routen-aegypten-wird-fluechtlings-hotspot-24306196?originalReferrer=https://www.google.ch/: 403 Client Error: Forbidden for url: http://mobil.express.de/news/politik-und-wirtschaft/frontex-chef-warnt-vor-neuen-routen-aegypten-wird-fluechtlings-hotspot-24306196?originalReferrer=https://www.google.ch/


Processing URLs:  49%|████▉     | 491/1000 [17:38<09:42,  1.14s/it]

Error extracting text from http://www.reuters.com/article/2015/11/25/usa-fedfunds-idUSL1N13K0SU20151125: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/25/usa-fedfunds-idUSL1N13K0SU20151125


Processing URLs:  49%|████▉     | 492/1000 [17:39<07:35,  1.12it/s]

Error extracting text from https://www.nytimes.com/reuters/2017/06/30/world/africa/30reuters-safrica-politics-parliament.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/reuters/2017/06/30/world/africa/30reuters-safrica-politics-parliament.html


Processing URLs:  50%|████▉     | 495/1000 [17:43<10:05,  1.20s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Japan-to-help-improve-Myanmar-s-infrastructure-standards: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Japan-to-help-improve-Myanmar-s-infrastructure-standards
URL filtered: https://www.voanews.com/a/us-senator-twitter-should-offer-analysis-of-russian-activity-/4019157.html


Processing URLs:  50%|████▉     | 497/1000 [17:44<08:00,  1.05it/s]

Error extracting text from http://sana.sy/en/?p=76435: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  50%|████▉     | 498/1000 [17:45<08:12,  1.02it/s]

Error extracting text from http://auvac.org/uploads/configuration_spec_sheets/MBARI%20Seafloor%20mapping%20AUV.pdf: 404 Client Error: Not Found for url: https://auvac.org/uploads/configuration_spec_sheets/MBARI%20Seafloor%20mapping%20AUV.pdf


Processing URLs:  50%|█████     | 502/1000 [17:51<09:06,  1.10s/it]

Error extracting text from http://uk.reuters.com/article/uk-brazil-rousseff-cunha-idUKKBN0UC16Q20151229: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  50%|█████     | 504/1000 [17:54<11:07,  1.34s/it]

Error extracting text from http://atimes.com/2016/01/indias-satellite-station-in-vietnam-to-stir-up-trouble-in-south-china-sea-china/: 404 Client Error: Not Found for url: https://atimes.com/2016/01/indias-satellite-station-in-vietnam-to-stir-up-trouble-in-south-china-sea-china/


Processing URLs:  51%|█████     | 506/1000 [17:56<08:12,  1.00it/s]

Error extracting text from http://www.reuters.com/article/2015/11/15/us-doubleline-gundlach-idUSKCN0T417Z20151115#2Md3deyjvjT5HeI3.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/15/us-doubleline-gundlach-idUSKCN0T417Z20151115#2Md3deyjvjT5HeI3.99


Processing URLs:  51%|█████     | 507/1000 [17:57<08:30,  1.04s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XN2OS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XN2OS


Processing URLs:  51%|█████     | 510/1000 [18:00<07:07,  1.15it/s]

Error extracting text from http://www.wsj.com/articles/amid-uproar-over-comey-firing-sean-spicer-fulfills-military-duties-1494540119: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/amid-uproar-over-comey-firing-sean-spicer-fulfills-military-duties-1494540119


Processing URLs:  51%|█████     | 512/1000 [19:00<2:30:59, 18.56s/it]

Error extracting text from https://www.betfair.com/exchange/politics/event?id=27538435: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/politics/event?id=27538435 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3013040b0>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))
URL filtered: https://www.bloomberg.com/news/features/2017-06-14/the-machine-of-tomorrow-today-quantum-computing-on-the-verge
URL filtered: https://twitter.com/SuspendThePres/status/1347702888970084354


Processing URLs:  52%|█████▏    | 516/1000 [19:01<52:56,  6.56s/it]  

Error extracting text from http://postimg.org/image/g9ss13rv3/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/g9ss13rv3/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3013049e0>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  52%|█████▏    | 517/1000 [19:02<41:33,  5.16s/it]

Error extracting text from https://jpt.spe.org/russian-lng-aims-high-leveraging-big-reserves-and-logistical-advantages: 403 Client Error: Forbidden for url: https://jpt.spe.org/russian-lng-aims-high-leveraging-big-reserves-and-logistical-advantages


Processing URLs:  52%|█████▏    | 521/1000 [19:08<19:48,  2.48s/it]

Error extracting text from https://www.reuters.com/article/us-usa-biden-china/biden-says-there-will-be-repercussions-for-china-over-human-rights-idUSKBN2AH0AC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-china/biden-says-there-will-be-repercussions-for-china-over-human-rights-idUSKBN2AH0AC?il=0


Processing URLs:  52%|█████▏    | 523/1000 [19:17<26:34,  3.34s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/european-rights-body-checks-on-polands-rule-of-law-again/2016/09/12/d9d4b456-78db-11e6-8064-c1ddc8a724bb_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/european-rights-body-checks-on-polands-rule-of-law-again/2016/09/12/d9d4b456-78db-11e6-8064-c1ddc8a724bb_story.html


Processing URLs:  53%|█████▎    | 534/1000 [20:15<45:02,  5.80s/it]  

Error extracting text from http://www.cfr.org/china/china-vietnam-military-clash/p37029?cid=nlc-publications-publications_quarterly-fall_2015-link18-20151109&amp;sp_mid=50037808&amp;sp_rid=amVyZW15QGxpY2h0bWFuLmNhS0: 404 Client Error: Not Found for url: https://www.cfr.org/china/china-vietnam-military-clash/p37029?cid=nlc-publications-publications_quarterly-fall_2015-link18-20151109&amp;sp_mid=50037808&amp;sp_rid=amVyZW15QGxpY2h0bWFuLmNhS0
Error extracting text from http://www.nytimes.com/2016/06/30/business/dealbook/a-bayer-deal-for-monsanto-would-get-eu-regulatory-scrutiny.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/30/business/dealbook/a-bayer-deal-for-monsanto-would-get-eu-regulatory-scrutiny.html


Processing URLs:  54%|█████▎    | 536/1000 [20:17<25:17,  3.27s/it]

Error extracting text from https://theconversation.com/the-real-challenge-to-covid-19-vaccination-rates-isnt-hesitancy-its-equal-access-for-maori-and-pacific-people-161676: 403 Client Error: Forbidden for url: https://theconversation.com/the-real-challenge-to-covid-19-vaccination-rates-isnt-hesitancy-its-equal-access-for-maori-and-pacific-people-161676
URL filtered: https://www.youtube.com/watch?v=HlYeq5f9lqM


Processing URLs:  54%|█████▍    | 540/1000 [20:28<18:20,  2.39s/it]

Error extracting text from https://www.wionews.com/world/talks-on-grand-ethiopian-renaissance-dam-fail-again-356124: 403 Client Error: Forbidden for url: https://www.wionews.com/world/talks-on-grand-ethiopian-renaissance-dam-fail-again-356124


Processing URLs:  54%|█████▍    | 542/1000 [20:28<10:35,  1.39s/it]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/2507793330x0xS1193125%2D15%2D364285/1318605/filing.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/2507793330x0xS1193125-15-364285/1318605/filing.pdf
Error extracting text from https://www.wsj.com/articles/china-india-move-tens-of-thousands-of-troops-to-the-border-in-largest-buildup-in-decades-11625218201: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-india-move-tens-of-thousands-of-troops-to-the-border-in-largest-buildup-in-decades-11625218201


Processing URLs:  55%|█████▍    | 545/1000 [20:35<13:52,  1.83s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-22/one-yuan-buys-you-one-hong-kong-dollar-in-7-eleven-at-least


Processing URLs:  55%|█████▍    | 547/1000 [20:36<10:18,  1.37s/it]

URL filtered: https://twitter.com/PressSec/status/1357398773782237186


Processing URLs:  55%|█████▌    | 552/1000 [20:45<14:03,  1.88s/it]

Error extracting text from http://tass.com/world/899797: 502 Server Error: Bad Gateway for url: https://tass.com/world/899797


Processing URLs:  55%|█████▌    | 554/1000 [20:48<12:32,  1.69s/it]

Error extracting text from https://www.cdc.gov/niosh/docs/96-103/pdfs/96-103.pdf: 404 Client Error: Not Found for url: https://www.cdc.gov/niosh/docs/96-103/pdfs/96-103.pdf


Processing URLs:  56%|█████▌    | 556/1000 [20:50<10:33,  1.43s/it]

Error extracting text from http://1tvnews.af/en/news/afghanistan/23582-afghan-government-allots-10-million-to-prepare-for-elections: 406 Client Error: Not Acceptable for url: http://1tvnews.af/en/news/afghanistan/23582-afghan-government-allots-10-million-to-prepare-for-elections


Processing URLs:  56%|█████▌    | 557/1000 [20:51<08:23,  1.14s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.foliomag.com/2015/time-buys-jane-pratts-xojane-xovain/: Document is empty


Processing URLs:  56%|█████▋    | 564/1000 [21:27<48:17,  6.65s/it]  

URL filtered: https://www.reuters.com/article/us-facebook-venezuela-exclusive-idUSKBN2BJ03Z


Processing URLs:  57%|█████▋    | 567/1000 [21:30<24:21,  3.37s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-18/your-guide-to-dutch-elections-a-bellwether-to-european-populism


Processing URLs:  57%|█████▋    | 570/1000 [21:35<17:36,  2.46s/it]

Error extracting text from http://uk.reuters.com/article/uk-trade-europe-usa-idUKKCN0XG2F5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN12Q1FP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN12Q1FP


Processing URLs:  57%|█████▋    | 572/1000 [21:37<12:52,  1.80s/it]

Error extracting text from http://www.channelnewsasia.com/news/business/imf-chief-urges-eurozone-to-back-greece-debt-relief-8841610: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/business/imf-chief-urges-eurozone-to-back-greece-debt-relief-8841610


Processing URLs:  57%|█████▊    | 575/1000 [21:40<09:08,  1.29s/it]

Error extracting text from https://www.ancestry.com/corporate/about-ancestry/company-facts: 403 Client Error: Forbidden for url: https://www.ancestry.com/corporate/about-ancestry/company-facts


Processing URLs:  58%|█████▊    | 579/1000 [21:44<06:57,  1.01it/s]

Error extracting text from http://news.yahoo.com/no-french-ground-troops-syria-hollande-180238523.html: 404 Client Error: Not Found for url: http://news.yahoo.com/no-french-ground-troops-syria-hollande-180238523.html


Processing URLs:  58%|█████▊    | 582/1000 [21:55<18:13,  2.62s/it]

Error extracting text from http://www.12newsnow.com/story/31136513/peace-deal-in-reach-obama-says-us-to-help-colombia-rebuild: 503 Server Error: Service Unavailable for url: https://www.12newsnow.com/story/31136513/peace-deal-in-reach-obama-says-us-to-help-colombia-rebuild


Processing URLs:  58%|█████▊    | 584/1000 [21:59<16:13,  2.34s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-06-09/top-russian-general-says-syria-civil-war-practically-stopped


Processing URLs:  59%|█████▊    | 586/1000 [22:01<11:29,  1.67s/it]

Error extracting text from https://www.usnwc.edu/getattachment/eef71cb7-abe7-4410-adaf-d78d085d933e/Phase-Zero--How-China-Exploits-It,-Why-the-United-: 404 Client Error: Not Found for url: https://www.usnwc.edu/getattachment/eef71cb7-abe7-4410-adaf-d78d085d933e/Phase-Zero--How-China-Exploits-It,-Why-the-United-


Processing URLs:  59%|█████▉    | 588/1000 [22:04<11:11,  1.63s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/dem-primaries/255855-biden-to-announce-2016-decision-within-days-report: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/dem-primaries/255855-biden-to-announce-2016-decision-within-days-report/


Processing URLs:  59%|█████▉    | 592/1000 [22:07<06:30,  1.05it/s]

Error extracting text from http://www.latimes.com/nation/la-na-pol-russia-consulates-20170831-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-pol-russia-consulates-20170831-story.html


Processing URLs:  59%|█████▉    | 593/1000 [22:08<05:00,  1.35it/s]

Error extracting text from https://www.wsj.com/articles/feds-bullard-pencils-in-rate-increase-in-2022-questions-mortgage-bond-buying-11624022857: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-bullard-pencils-in-rate-increase-in-2022-questions-mortgage-bond-buying-11624022857


Processing URLs:  60%|█████▉    | 595/1000 [22:10<07:00,  1.04s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-01-16/portugal-bonds-falter-as-ecb-running-out-of-eligible-debt-to-buy


Processing URLs:  60%|██████    | 605/1000 [22:20<06:43,  1.02s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-budget-ryan-idUSKCN1B424F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-budget-ryan-idUSKCN1B424F


Processing URLs:  61%|██████    | 607/1000 [22:22<04:58,  1.32it/s]

Error extracting text from https://www.nytimes.com/2022/01/14/us/politics/russia-ukraine-biden-military.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/14/us/politics/russia-ukraine-biden-military.html


Processing URLs:  61%|██████    | 611/1000 [22:29<11:10,  1.72s/it]

Error extracting text from https://amti.csis.org/beijing-bolsters-the-role-of-the-china-coast-guard/: 403 Client Error: Forbidden for url: https://amti.csis.org/beijing-bolsters-the-role-of-the-china-coast-guard/


Processing URLs:  61%|██████▏   | 613/1000 [22:33<11:13,  1.74s/it]

Error extracting text from http://www.afro.who.int/en/rdo/articles/4733-dr-matshidiso-moeti-op-ed-africas-great-polio-legacy-.html: 404 Client Error: Not Found for url: https://www.afro.who.int/en/rdo/articles/4733-dr-matshidiso-moeti-op-ed-africas-great-polio-legacy-.html


Processing URLs:  61%|██████▏   | 614/1000 [22:35<12:13,  1.90s/it]

URL filtered: https://twitter.com/danigirl329/with_replies


Processing URLs:  62%|██████▏   | 619/1000 [22:38<05:36,  1.13it/s]

Error extracting text from http://www.crainsnewyork.com/article/20150925/POLITICS/150929898/post-boehner-risk-of-december-government-shutdown-and-export-import: 403 Client Error: Forbidden for url: https://www.crainsnewyork.com/article/20150925/POLITICS/150929898/post-boehner-risk-of-december-government-shutdown-and-export-import
URL filtered: https://twitter.com/NathanpmYoung


Processing URLs:  62%|██████▏   | 621/1000 [22:39<04:28,  1.41it/s]

URL filtered: https://www.youtube.com/watch?v=FMkNsMMvrqk&amp;list=RD8r-e2NDSTuE&amp;index=24#t=174.03375
URL filtered: https://www.youtube.com/embed/imfY2YaE3uE&quot


Processing URLs:  62%|██████▎   | 625/1000 [22:41<03:43,  1.68it/s]

Error extracting text from http://seekingalpha.com/article/3549316-why-i-dont-care-about-no-stinkin-oil-rig-counts: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3549316-why-i-dont-care-about-no-stinkin-oil-rig-counts


Processing URLs:  63%|██████▎   | 628/1000 [22:44<05:11,  1.19it/s]

Error extracting text from https://trade.ec.europa.eu/doclib/docs/2021/march/tradoc_159458.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2021/march/tradoc_159458.pdf


Processing URLs:  63%|██████▎   | 629/1000 [22:45<04:39,  1.33it/s]

Error extracting text from http://thehill.com/homenews/administration/343063-trump-officials-russia-meddled-in-the-election: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/343063-trump-officials-russia-meddled-in-the-election/


Processing URLs:  63%|██████▎   | 631/1000 [22:47<05:16,  1.16it/s]

Error extracting text from http://www.channelnewsasia.com/news/world/burundian-regional-parlia/2954520.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/burundian-regional-parlia/2954520.html


Processing URLs:  64%|██████▎   | 637/1000 [22:58<06:49,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-referendum-europe-idUSKBN16H2C6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-europe-idUSKBN16H2C6


Processing URLs:  64%|██████▍   | 638/1000 [22:59<07:13,  1.20s/it]

Error extracting text from https://www.cbsnews.com/news/gas-prices-high-expensive-come-down-cbs-news-explains/),: 404 Client Error: Not Found for url: https://www.cbsnews.com/news/gas-prices-high-expensive-come-down-cbs-news-explains/),


Processing URLs:  64%|██████▍   | 639/1000 [22:59<05:29,  1.10it/s]

Error extracting text from http://www.wsj.com/articles/agreement-on-september-rate-increase-eludes-fed-1441825296: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/agreement-on-september-rate-increase-eludes-fed-1441825296
URL filtered: https://www.bloomberg.com/news/articles/2017-12-24/south-africa-s-ruling-anc-to-ask-zuma-to-step-down-city-press


Processing URLs:  64%|██████▍   | 641/1000 [23:01<04:50,  1.23it/s]

Error extracting text from https://globalriskinsights.com/2021/02/nicaragua-risks-to-economic-recovery-in-2021/: 403 Client Error: Forbidden for url: https://globalriskinsights.com/2021/02/nicaragua-risks-to-economic-recovery-in-2021/


Processing URLs:  64%|██████▍   | 642/1000 [23:02<05:23,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-britain-politics-idUSKBN19B35O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-politics-idUSKBN19B35O
URL filtered: http://uk.businessinsider.com/facebooks-growth-problems-will-get-worse-2018-7?r=US&IR=T


Processing URLs:  65%|██████▍   | 647/1000 [23:06<05:46,  1.02it/s]

Error extracting text from https://www.reuters.com/world/ukraine-russia-talks-have-restarted-ukrainian-negotiator-says-2022-03-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/ukraine-russia-talks-have-restarted-ukrainian-negotiator-says-2022-03-15/


Processing URLs:  65%|██████▍   | 649/1000 [23:06<03:52,  1.51it/s]

Error extracting text from https://www.nytimes.com/2021/03/15/opinion/politics/beijing-olympics-mitt-romney.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/15/opinion/politics/beijing-olympics-mitt-romney.html


Processing URLs:  65%|██████▌   | 654/1000 [23:17<06:44,  1.17s/it]

Error extracting text from http://thehill.com/regulation/court-battles/288220-doj-asks-supreme-court-to-rehear-immigration-case: 403 Client Error: Forbidden for url: https://thehill.com/regulation/court-battles/288220-doj-asks-supreme-court-to-rehear-immigration-case/


Processing URLs:  66%|██████▌   | 655/1000 [23:21<11:46,  2.05s/it]

URL filtered: https://twitter.com/jennycohn1/status/9404556692237148163


Processing URLs:  66%|██████▌   | 660/1000 [24:27<51:48,  9.14s/it]  

Error extracting text from https://www.consilium.europa.eu/en/policies/enlargement/republic-north-macedonia/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/policies/enlargement/republic-north-macedonia/


Processing URLs:  66%|██████▋   | 664/1000 [24:32<17:13,  3.08s/it]

Error extracting text from http://www.kurdistan24.net/en/news/01be23a1-5e47-481b-a12d-df7265cdfad8/%E2%80%98Peshmerga-have-five-gates-to-free-Mosul--Iraqi-army-one%E2%80%99: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/01be23a1-5e47-481b-a12d-df7265cdfad8/%E2%80%98Peshmerga-have-five-gates-to-free-Mosul--Iraqi-army-one%E2%80%99


Processing URLs:  67%|██████▋   | 667/1000 [24:46<22:26,  4.04s/it]

Error extracting text from http://www.wsj.com/articles/fed-beige-book-reports-modest-growth-1449083069: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-beige-book-reports-modest-growth-1449083069


Processing URLs:  67%|██████▋   | 670/1000 [24:49<10:25,  1.90s/it]

Error extracting text from http://blogs.wsj.com/frontiers/2015/09/02/venezuela-says-china-to-provide-5-billion-oil-loan/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/frontiers/2015/09/02/venezuela-says-china-to-provide-5-billion-oil-loan/
Error extracting text from https://www.reuters.com/world/africa/real-threat-chads-military-rulers-unemployed-youth-2021-04-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/real-threat-chads-military-rulers-unemployed-youth-2021-04-30/


Processing URLs:  67%|██████▋   | 674/1000 [24:52<05:36,  1.03s/it]

Error extracting text from https://www.rs.nato.int/rs-commands/task-force-southwest.aspx: HTTPSConnectionPool(host='www.rs.nato.int', port=443): Max retries exceeded with url: /rs-commands/task-force-southwest.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe0b4170>: Failed to resolve 'www.rs.nato.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 676/1000 [24:57<08:39,  1.60s/it]

Error extracting text from http://www.nationalreview.com/corner/437909/never-trump-dies-whimper: 404 Client Error: Not Found for url: https://www.nationalreview.com/corner/437909/never-trump-dies-whimper/


Processing URLs:  68%|██████▊   | 681/1000 [25:09<12:51,  2.42s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-election/afghanistan-to-hold-elections-in-july-next-year-idUSKBN19D258: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-election/afghanistan-to-hold-elections-in-july-next-year-idUSKBN19D258


Processing URLs:  69%|██████▉   | 693/1000 [25:37<08:31,  1.67s/it]

Error extracting text from http://www.nytimes.com/2015/10/24/world/middleeast/us-and-russia-find-common-goals-on-syria-if-not-on-assad.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/24/world/middleeast/us-and-russia-find-common-goals-on-syria-if-not-on-assad.html


Processing URLs:  69%|██████▉   | 694/1000 [25:38<07:30,  1.47s/it]

Error extracting text from https://www.who.int/csr/don/06-november-2020-mink-associated-sars-cov2-denmark/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/06-november-2020-mink-associated-sars-cov2-denmark/en/


Processing URLs:  70%|██████▉   | 695/1000 [25:40<07:42,  1.52s/it]

Error extracting text from https://in.reuters.com/article/israel-palestinians-security/palestinians-resume-security-ties-with-israel-eye-gaza-enforcement-idINKBN1D81EF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  70%|██████▉   | 696/1000 [25:40<05:59,  1.18s/it]

Error extracting text from http://blogs.reuters.com/breakingviews/2015/12/03/impeaching-rousseff-as-likely-to-falter-as-brazil/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2015/12/03/impeaching-rousseff-as-likely-to-falter-as-brazil/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306321a30>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|██████▉   | 697/1000 [25:42<07:36,  1.51s/it]

URL filtered: https://www.youtube.com/watch?v=XwmMPytjrK4


Processing URLs:  70%|███████   | 702/1000 [25:54<13:45,  2.77s/it]

Error extracting text from https://www.mier.org.my/presentations/archives/pdf-restore/presentations/archives/pdf/DrMenon.pdf: 404 Client Error: Not Found for url: https://mier.org.my/presentations/archives/pdf-restore/presentations/archives/pdf/DrMenon.pdf


Processing URLs:  70%|███████   | 705/1000 [25:58<08:15,  1.68s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN16X00O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN16X00O?il=0
Error extracting text from https://www.reuters.com/article/us-russia-politics-navalny-tenders-idUSKBN2BP0NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-politics-navalny-tenders-idUSKBN2BP0NE


Processing URLs:  71%|███████   | 707/1000 [26:14<21:57,  4.50s/it]

Error extracting text from https://www.almasdarnews.com/article/massive-syrian-army-convoy-arrives-southern-aleppo/: 522 Server Error:  for url: https://www.almasdarnews.com/article/massive-syrian-army-convoy-arrives-southern-aleppo/


Processing URLs:  72%|███████▏  | 724/1000 [26:44<04:31,  1.02it/s]

Error extracting text from http://thehill.com/regulation/court-battles/323417-american-bar-association-gives-gorsuch-best-rating: 403 Client Error: Forbidden for url: https://thehill.com/regulation/court-battles/323417-american-bar-association-gives-gorsuch-best-rating/


Processing URLs:  73%|███████▎  | 726/1000 [26:47<05:10,  1.13s/it]

Error extracting text from http://winteriscoming.net/2015/11/24/a-martinology-update-when-might-the-winds-of-winter-hit-store-shelves/: 410 Client Error: Gone for url: https://winteriscoming.net/2015/11/24/a-martinology-update-when-might-the-winds-of-winter-hit-store-shelves/


Processing URLs:  73%|███████▎  | 732/1000 [26:53<04:12,  1.06it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-03-21/this-is-what-s-going-on-beneath-the-subprime-auto-loan-turmoil
URL filtered: https://www.youtube.com/watch?v=ngm_5PNSP00
URL filtered: https://www.bloomberg.com/news/articles/2017-06-18/oil-trades-below-45-as-u-s-drillers-extend-record-rig-streak


Processing URLs:  74%|███████▍  | 738/1000 [26:55<01:52,  2.33it/s]

Error extracting text from http://www.reuters.com/article/us-southkorea-cybersecurity-nuclear-idUSKBN0K603320141228: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-cybersecurity-nuclear-idUSKBN0K603320141228
Error extracting text from http://www.nytimes.com/2016/04/15/nyregion/ramapo-town-supervisor-arrested-in-federal-fraud-case.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/15/nyregion/ramapo-town-supervisor-arrested-in-federal-fraud-case.html


Processing URLs:  74%|███████▍  | 740/1000 [26:56<01:56,  2.23it/s]

Error extracting text from https://www.arabnews.com/node/1827346/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1827346/middle-east


Processing URLs:  74%|███████▍  | 745/1000 [27:02<04:22,  1.03s/it]

Error extracting text from http://www.trtworld.com/mea/us-anti-daesh-envoy-visits-northern-syria-38778: 404 Client Error: Not Found for url: https://www.trtworld.com:443/mea/us-anti-daesh-envoy-visits-northern-syria-38778
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN16P0UZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN16P0UZ


Processing URLs:  75%|███████▍  | 749/1000 [27:05<02:45,  1.51it/s]

Error extracting text from http://www.worldbulletin.net/haber/169550/australia-taken-in-26-syrians-months-after-promising-12000: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/haber/169550/australia-taken-in-26-syrians-months-after-promising-12000


Processing URLs:  75%|███████▌  | 752/1000 [27:08<03:17,  1.25it/s]

Error extracting text from http://www.france24.com/en/20160225-france-secret-war-libya-islamic-state-group: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160225-france-secret-war-libya-islamic-state-group


Processing URLs:  75%|███████▌  | 754/1000 [27:10<03:43,  1.10it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20161103/0905200308.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20161103/0905200308.html


Processing URLs:  76%|███████▌  | 758/1000 [27:15<04:07,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/opec-meeting-ends-with-no-production-cuts-1449248892: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-meeting-ends-with-no-production-cuts-1449248892


Processing URLs:  76%|███████▌  | 761/1000 [27:21<06:11,  1.55s/it]

Error extracting text from http://www.france24.com/en/20161006-iran-death-penalty-bill-reduce-executions-rouhani: 403 Client Error: Forbidden for url: http://www.france24.com/en/20161006-iran-death-penalty-bill-reduce-executions-rouhani


Processing URLs:  76%|███████▋  | 763/1000 [27:22<04:18,  1.09s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-25/russia-more-prey-than-predator-to-cyber-firm-wary-of-china


Processing URLs:  76%|███████▋  | 765/1000 [27:24<03:45,  1.04it/s]

Error extracting text from http://www.worldlifeexpectancy.com/turkey-life-expectancy: 403 Client Error: Forbidden for url: https://www.worldlifeexpectancy.com/403.shtml


Processing URLs:  77%|███████▋  | 772/1000 [27:32<02:59,  1.27it/s]

URL filtered: https://twitter.com/search?q=jerry+moran+and+mike+lee&amp;ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Esearch
Error extracting text from http://www.nytimes.com/2002/07/22/world/iran-blew-up-jewish-center-in-argentina-defector-says.html?pagewanted=all: 403 Client Error: Forbidden for url: http://www.nytimes.com/2002/07/22/world/iran-blew-up-jewish-center-in-argentina-defector-says.html?pagewanted=all
URL filtered: https://twitter.com/elonmusk/status/1336809767574982658


Processing URLs:  78%|███████▊  | 776/1000 [27:34<02:32,  1.47it/s]

URL filtered: https://www.youtube.com/watch?v=xaSRQHYxfsg


Processing URLs:  78%|███████▊  | 779/1000 [27:35<02:02,  1.81it/s]

Error extracting text from http://www.nytimes.com/2016/04/22/world/asia/china-xi-jinping-military-commander.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/22/world/asia/china-xi-jinping-military-commander.html?_r=0


Processing URLs:  78%|███████▊  | 781/1000 [27:37<02:35,  1.41it/s]

Error extracting text from http://caracaschronicles.com/2015/11/10/app-update-agonizingly-close-to-the-supermajority-we-need/: 403 Client Error: Forbidden for url: http://caracaschronicles.com/2015/11/10/app-update-agonizingly-close-to-the-supermajority-we-need/


Processing URLs:  78%|███████▊  | 782/1000 [27:39<03:34,  1.02it/s]

Error extracting text from https://www.reuters.com/world/asia-pacific/militia-commanders-rush-aid-afghan-forces-against-taliban-2021-07-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/militia-commanders-rush-aid-afghan-forces-against-taliban-2021-07-09/


Processing URLs:  78%|███████▊  | 785/1000 [27:58<16:56,  4.73s/it]

Error extracting text from https://governor.nebraska.gov/press/gov-ricketts-comments-keystone-xl-presidential-permit: HTTPSConnectionPool(host='governor.nebraska.gov', port=443): Max retries exceeded with url: /press/gov-ricketts-comments-keystone-xl-presidential-permit (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x300a5ea50>: Failed to resolve 'governor.nebraska.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▊  | 787/1000 [27:59<10:01,  2.82s/it]

Error extracting text from http://thehill.com/homenews/campaign/369555-poll-biden-holds-double-digit-lead-over-field-of-2020-dem-presidential: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/369555-poll-biden-holds-double-digit-lead-over-field-of-2020-dem-presidential/


Processing URLs:  79%|███████▉  | 788/1000 [28:01<09:50,  2.78s/it]

Error extracting text from http://www.cfr.org/world/armed-clash-south-china-sea/p27883: 404 Client Error: Not Found for url: https://www.cfr.org/world/armed-clash-south-china-sea/p27883


Processing URLs:  79%|███████▉  | 790/1000 [28:03<06:06,  1.75s/it]

Error extracting text from http://www.torontosun.com/2016/01/04/schina-sea-tensions-surge-as-china-lands-plane-on-artificial-island: 403 Client Error: Forbidden for url: https://torontosun.com/2016/01/04/schina-sea-tensions-surge-as-china-lands-plane-on-artificial-island
URL filtered: https://www.youtube.com/watch?v=XWN65nAkk20


Processing URLs:  79%|███████▉  | 794/1000 [28:06<03:35,  1.04s/it]

Error extracting text from http://www.nytimes.com/2015/11/22/us/stingy-water-users-in-fined-in-drought-while-the-rich-soak.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/22/us/stingy-water-users-in-fined-in-drought-while-the-rich-soak.html


Processing URLs:  80%|███████▉  | 795/1000 [28:08<04:12,  1.23s/it]

Error extracting text from http://thephilippinestar.ph/articles/2016-04-12/news/g7-foreign-ministers-oppose-sea-row-provocation/147407: HTTPConnectionPool(host='thephilippinestar.ph', port=80): Max retries exceeded with url: /articles/2016-04-12/news/g7-foreign-ministers-oppose-sea-row-provocation/147407 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5fa40>: Failed to resolve 'thephilippinestar.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  80%|███████▉  | 798/1000 [28:12<04:31,  1.34s/it]

Error extracting text from https://www.nytimes.com/2021/01/15/world/asia/north-korea-missile-biden.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/15/world/asia/north-korea-missile-biden.html


Processing URLs:  80%|████████  | 800/1000 [29:14<1:02:46, 18.83s/it]

Error extracting text from http://www.irantracker.org/iran-news-round-march-04-2016: HTTPConnectionPool(host='www.irantracker.org', port=80): Max retries exceeded with url: /iran-news-round-march-04-2016 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300a5c320>, 'Connection to www.irantracker.org timed out. (connect timeout=60)'))


Processing URLs:  80%|████████  | 801/1000 [29:15<44:51, 13.52s/it]  

Error extracting text from http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA590294: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /cgi-bin/GetTRDoc?AD=ADA590294 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  80%|████████  | 802/1000 [29:17<32:55,  9.98s/it]



Processing URLs:  81%|████████  | 807/1000 [29:24<08:47,  2.73s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/technology/281475-cyber-warfare-more-dire-and-likely-than-nuclear: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/technology/281475-cyber-warfare-more-dire-and-likely-than-nuclear/
URL filtered: https://www.bloomberg.com/news/articles/2017-08-10/it-s-merkel-s-election-to-lose-as-she-returns-to-campaign-trail
Error extracting text from http://www.hindustantimes.com/world-news/north-korea-mocks-un-chief-ban-ki-moon-for-hollow-silly-presidential-dreams/story-BharFMlVCVWXDIXctaxuSL.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/world-news/north-korea-mocks-un-chief-ban-ki-moon-for-hollow-silly-presidential-dreams/story-BharFMlVCVWXDIXctaxuSL.html
URL filtered: https://www.washingtonpost.com/news/the-intersect/wp/2016/12/15/what-facebook-hasnt-said-about-its-plan-to-fight-fake-news/?tid=sm_tw&amp;utm_term=.14f14e02a850


Processing URLs:  81%|████████▏ | 814/1000 [29:30<04:53,  1.58s/it]

URL filtered: https://www.youtube.com/watch?v=3QzevECMKCU
URL filtered: https://twitter.com/IMdatafizz/status/745182375479369729


Processing URLs:  82%|████████▏ | 820/1000 [29:37<04:28,  1.49s/it]

Error extracting text from https://offshoreleaks.icij.org/search?c=NGA&amp;cat=3&amp;e=&amp;j=&amp;q=&amp;utf8=%E2%9C%93: 403 Client Error: Forbidden for url: https://offshoreleaks.icij.org/search?c=NGA&amp;cat=3&amp;e=&amp;j=&amp;q=&amp;utf8=%E2%9C%93


Processing URLs:  82%|████████▏ | 822/1000 [29:40<04:25,  1.49s/it]

Error extracting text from http://www.stltoday.com/lifestyles/health-med-fit/gene-therapy-helps-babies-fight-type-of-leukemia/article_2b692406-71c2-5dd1-b802-cccc0438befb.html: 404 Client Error: Not Found for url: https://www.stltoday.com/life-entertainment/nation-world/wellness/gene-therapy-helps-babies-fight-type-of-leukemia/article_2b692406-71c2-5dd1-b802-cccc0438befb.html


Processing URLs:  82%|████████▎ | 825/1000 [29:45<04:26,  1.52s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13941026001314: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13941026001314 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5f620>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  83%|████████▎ | 827/1000 [29:46<02:58,  1.03s/it]

Error extracting text from https://www.isro.gov.in/gslv-f10-chandrayaan-2-mission: 404 Client Error: Not Found for url: https://www.isro.gov.in/gslv-f10-chandrayaan-2-mission


Processing URLs:  83%|████████▎ | 828/1000 [29:47<03:28,  1.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-24/south-africa-says-budget-credible-enough-to-avoid-downgrade


Processing URLs:  84%|████████▎ | 835/1000 [29:57<03:25,  1.25s/it]

Error extracting text from http://editorial.rottentomatoes.com/article/dc-insiders-call-veep-the-most-realistic-show-about-politics/: 403 Client Error: Forbidden for url: http://editorial.rottentomatoes.com/article/dc-insiders-call-veep-the-most-realistic-show-about-politics/


Processing URLs:  84%|████████▍ | 838/1000 [30:01<03:05,  1.14s/it]

Error extracting text from http://www.nytimes.com/2015/12/11/business/international/vw-emissions-scandal.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/11/business/international/vw-emissions-scandal.html


Processing URLs:  85%|████████▍ | 849/1000 [30:26<02:53,  1.15s/it]

Error extracting text from http://president.ps/: 403 Client Error: Forbidden for url: https://president.ps/
Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/01/24/Baghdad-seeks-police-training-support.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/01/24/Baghdad-seeks-police-training-support.html


Processing URLs:  85%|████████▌ | 854/1000 [31:32<17:09,  7.05s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-mexico-ford-motor-idUSKBN14Q2AY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-mexico-ford-motor-idUSKBN14Q2AY?il=0
Error extracting text from https://www.wsj.com/articles/north-korea-launches-possible-ballistic-missile-south-korean-news-agency-reports-1494713029: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-launches-possible-ballistic-missile-south-korean-news-agency-reports-1494713029


Processing URLs:  86%|████████▌ | 855/1000 [31:32<12:20,  5.11s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/359939-former-moore-colleague-common-knowledge-that-he-dated-teen: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/359939-former-moore-colleague-common-knowledge-that-he-dated-teen/


Processing URLs:  86%|████████▌ | 858/1000 [31:37<06:40,  2.82s/it]

Error extracting text from http://intpolicydigest.org/2016/06/07/the-dilemma-of-new-start-and-the-inf/: 403 Client Error: Forbidden for url: https://intpolicydigest.org/the-dilemma-of-new-start-and-the-inf


Processing URLs:  86%|████████▌ | 859/1000 [36:42<3:39:48, 93.53s/it]

Error extracting text from http://www.japantoday.com/category/uncategorized/view/abe-aims-to-keep-ruling-blocs-upper-house-majority: 404 Client Error: Not Found for url: https://japantoday.com/category/uncategorized/abe-aims-to-keep-ruling-blocs-upper-house-majority


Processing URLs:  86%|████████▌ | 862/1000 [37:11<1:27:43, 38.14s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN0TV0MX20151212#UOVgJYxr7csYKyPJ.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN0TV0MX20151212#UOVgJYxr7csYKyPJ.97
URL filtered: https://www.youtube.com/watch?v=hfjHJneVonE&t=1m43s


Processing URLs:  86%|████████▋ | 864/1000 [37:12<47:10, 20.81s/it]  

Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-spd-under-renewed-fire-over-german-coalition-deal-idUSKBN1FW1BZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-spd-under-renewed-fire-over-german-coalition-deal-idUSKBN1FW1BZ?il=0


Processing URLs:  87%|████████▋ | 867/1000 [37:15<22:49, 10.30s/it]

Error extracting text from http://tinyurl.com/lr9nvwn: 400 Client Error: Bad Request for url: https://weknowmemes.com/wp-content/uploads/2012/11/bear-shitting-in-the-woods.jpg


Processing URLs:  87%|████████▋ | 871/1000 [37:18<07:58,  3.71s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu/no-brexit-trade-deal-yet-as-serious-issues-remain-british-minister-says-idUSKBN28X0TW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/no-brexit-trade-deal-yet-as-serious-issues-remain-british-minister-says-idUSKBN28X0TW?il=0
Error extracting text from http://www.autoindustriya.com/auto-industry-news/toyota-to-fully-resume-production-in-japanese-plants-on-friday.html: 403 Client Error: Forbidden for url: http://www.autoindustriya.com/auto-industry-news/toyota-to-fully-resume-production-in-japanese-plants-on-friday.html


Processing URLs:  87%|████████▋ | 872/1000 [37:19<06:32,  3.07s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57335#.WZanctGxWf1: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57335#.WZanctGxWf1


Processing URLs:  88%|████████▊ | 875/1000 [37:24<04:17,  2.06s/it]

Error extracting text from http://www.reuters.com/article/us-afghanistan-minister-idUSKCN0VF0E1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-minister-idUSKCN0VF0E1


Processing URLs:  88%|████████▊ | 877/1000 [37:25<02:35,  1.26s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0W62LA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-rousseff-idUSKCN0W62LA


Processing URLs:  88%|████████▊ | 880/1000 [37:29<02:40,  1.34s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/joe-biden-favorable-rating: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/joe-biden-favorable-rating
URL filtered: https://www.facebook.com/DonaldTrump/posts/10156500910690725:0?_fb_noscript=1


Processing URLs:  88%|████████▊ | 882/1000 [37:30<01:32,  1.28it/s]

Error extracting text from https://www.nytimes.com/2017/06/27/technology/ransomware-hackers.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/27/technology/ransomware-hackers.html


Processing URLs:  88%|████████▊ | 883/1000 [37:31<01:59,  1.02s/it]

Error extracting text from https://www.securitycouncilreport.org/whatsinblue/2021/04/private-meeting-on-myanmar-via-vtc.php: 403 Client Error: Forbidden for url: https://www.securitycouncilreport.org/whatsinblue/2021/04/private-meeting-on-myanmar-via-vtc.php


Processing URLs:  89%|████████▉ | 889/1000 [37:36<01:19,  1.40it/s]

Error extracting text from http://www.nytimes.com/2000/11/06/world/2-religions-clashing-in-ivory-coast.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2000/11/06/world/2-religions-clashing-in-ivory-coast.html


Processing URLs:  89%|████████▉ | 893/1000 [37:43<01:55,  1.08s/it]

Error extracting text from http://www.opec.org/opec_web/en/about_us/163.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/about_us/163.htm
Error extracting text from http://www.balkaninsight.com/en/article/bomb-blast-kills-two-in-montenegrin-resort-kotor-09-06-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/bomb-blast-kills-two-in-montenegrin-resort-kotor-09-06-2016


Processing URLs:  89%|████████▉ | 894/1000 [37:43<01:24,  1.25it/s]

Error extracting text from http://www.reuters.com/article/2015/11/02/us-iran-rights-arrest-idUSKCN0SR18620151102: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/02/us-iran-rights-arrest-idUSKCN0SR18620151102


Processing URLs:  90%|████████▉ | 896/1000 [37:44<01:04,  1.61it/s]

Error extracting text from https://www.eaglobal.org/events/london2020/: 404 Client Error: Not Found for url: https://www.effectivealtruism.org/ea-global/events/london2020
Error extracting text from http://www.globalconstructionreview.com/news/italian-soldiers-and-engine7ers-arri7ve-s7ave/: 403 Client Error: Forbidden for url: http://www.globalconstructionreview.com/news/italian-soldiers-and-engine7ers-arri7ve-s7ave/


Processing URLs:  90%|████████▉ | 898/1000 [37:44<00:37,  2.75it/s]

Error extracting text from http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html


Processing URLs:  90%|█████████ | 900/1000 [37:46<00:56,  1.77it/s]

Error extracting text from http://www.hindustantimes.com/world-newspaper/suicides-of-chinese-officials-up-amid-anti-graft-drive/story-B2Q8sNoorkWDOKJkRjdHzH.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/world-newspaper/suicides-of-chinese-officials-up-amid-anti-graft-drive/story-B2Q8sNoorkWDOKJkRjdHzH.html


Processing URLs:  90%|█████████ | 904/1000 [37:47<00:34,  2.80it/s]

Error extracting text from http://www.reuters.com/article/us-sanfrancisco-cyber-idUSKCN0VZ2RL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-sanfrancisco-cyber-idUSKCN0VZ2RL
Error extracting text from http://www.iol.co.za/news/world/bus-protest-in-brazil-turns-violent-1.1968582: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/world/bus-protest-in-brazil-turns-violent-1.1968582


Processing URLs:  91%|█████████ | 910/1000 [37:59<02:16,  1.51s/it]

Error extracting text from https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?ie=UTF8&amp;qid=1495325995&amp;sr=8-1&amp;keywords=dictators+handbook: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?ie=UTF8&amp;qid=1495325995&amp;sr=8-1&amp;keywords=dictators+handbook


Processing URLs:  91%|█████████▏| 913/1000 [38:10<04:04,  2.81s/it]

Error extracting text from http://www.wsj.com/articles/hong-kong-leader-c-y-leung-wont-seek-re-election-1481272374: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hong-kong-leader-c-y-leung-wont-seek-re-election-1481272374


Processing URLs:  92%|█████████▏| 919/1000 [38:19<02:18,  1.71s/it]

URL filtered: https://news.bloomberglaw.com/bankruptcy-law/u-s-leveraged-loan-default-rate-expected-to-fall-further


Processing URLs:  92%|█████████▎| 925/1000 [38:26<01:24,  1.12s/it]

Error extracting text from http://gawker.com/turns-out-joe-biden-leaked-that-heartwarming-story-abou-1734924960: 404 Client Error: Not Found for url: https://gawker.com/turns-out-joe-biden-leaked-that-heartwarming-story-abou-1734924960


Processing URLs:  93%|█████████▎| 926/1000 [38:27<01:08,  1.09it/s]

Error extracting text from https://bit.ly/3AyawFk0: 404 Client Error: Not Found for url: https://bit.ly/3AyawFk0


Processing URLs:  93%|█████████▎| 928/1000 [38:30<01:33,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-idUSKBN17N1ZH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-idUSKBN17N1ZH


Processing URLs:  93%|█████████▎| 929/1000 [38:33<01:53,  1.60s/it]

Error extracting text from http://www.foxnews.com/world/2015/11/05/russia-sends-anti-aircraft-missiles-to-syria-to-protect-jets-during-airstrikes/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/11/05/russia-sends-anti-aircraft-missiles-to-syria-to-protect-jets-during-airstrikes/


Processing URLs:  93%|█████████▎| 932/1000 [38:38<01:50,  1.63s/it]

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/May/REINZ%20Residential%20Press%20Release%20-%20May%202021.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/May/REINZ%20Residential%20Press%20Release%20-%20May%202021.pdf


Processing URLs:  94%|█████████▎| 935/1000 [39:38<11:12, 10.35s/it]

Error extracting text from http://blogs.rollcall.com/218/ryan-leaves-door-open-policy-riders-spending-bill/: HTTPConnectionPool(host='blogs.rollcall.com', port=80): Max retries exceeded with url: /218/ryan-leaves-door-open-policy-riders-spending-bill/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ff2e6ed0>, 'Connection to blogs.rollcall.com timed out. (connect timeout=60)'))
Error extracting text from http://www.arcap.ca/wp-content/uploads/2016/09/CapeShiller.pdf: HTTPConnectionPool(host='www.arcap.ca', port=80): Max retries exceeded with url: /wp-content/uploads/2016/09/CapeShiller.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303418ad0>: Failed to resolve 'www.arcap.ca' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.fox5ny.com/news/36071935-story: 403 Client Error: Forbidden for url: http://www.fox5ny.com/news/36071935-story


Processing URLs:  94%|█████████▎| 936/1000 [39:38<08:26,  7.91s/it]

Error extracting text from https://lobelog.com/iran-gearing-up-to-join-the-wto-again/: 403 Client Error: Forbidden for url: https://lobelog.com/iran-gearing-up-to-join-the-wto-again/


Processing URLs:  94%|█████████▍| 941/1000 [39:46<02:39,  2.71s/it]

Error extracting text from http://www.ibtimes.com/hillary-clinton-donald-trump-latest-polls-clinton-leading-florida-pennsylvania-nevada-2425166: 403 Client Error: Forbidden for url: https://www.ibtimes.com/hillary-clinton-donald-trump-latest-polls-clinton-leading-florida-pennsylvania-nevada-2425166


Processing URLs:  95%|█████████▍| 947/1000 [39:55<01:09,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13B2CP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13B2CP


Processing URLs:  95%|█████████▌| 951/1000 [40:00<01:02,  1.28s/it]

Error extracting text from https://af.reuters.com/article/topNews/idAFKBN25G0LQ-OZATP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af
Error extracting text from http://www.hindustantimes.com/world-news/politicians-nervous-as-imran-khan-pushes-for-army-takeover-in-pakistan/story-WtIxYXJ8LE5WXJb9OIaB0J.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/world-news/politicians-nervous-as-imran-khan-pushes-for-army-takeover-in-pakistan/story-WtIxYXJ8LE5WXJb9OIaB0J.html


Processing URLs:  95%|█████████▌| 953/1000 [41:00<11:24, 14.57s/it]

Error extracting text from https://www.tradingfloor.com/posts/opec-knows-its-every-man-for-himself-hansen-6599799: HTTPSConnectionPool(host='www.tradingfloor.com', port=443): Max retries exceeded with url: /posts/opec-knows-its-every-man-for-himself-hansen-6599799 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe74ed50>, 'Connection to www.tradingfloor.com timed out. (connect timeout=60)'))
URL filtered: https://about.fb.com/wp-content/uploads/2021/06/Facebook-Responses-to-Oversight-Board-Recommendations-in-Trump-Case.pdf


Processing URLs:  96%|█████████▌| 961/1000 [41:12<01:48,  2.79s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2017/01/25/0301000000AEN20170125010900315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  96%|█████████▋| 963/1000 [41:13<01:00,  1.65s/it]

Error extracting text from https://www.nytimes.com/2017/01/24/us/politics/budget-deficit-trump.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/24/us/politics/budget-deficit-trump.html?_r=0


Processing URLs:  96%|█████████▋| 964/1000 [41:15<01:03,  1.77s/it]

Error extracting text from http://www.38north.org/2017/06/wgraham060217/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  97%|█████████▋| 968/1000 [41:28<01:18,  2.45s/it]

URL filtered: https://www.youtube.com/watch?v=kQtXuwzn0K0
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16122B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-idUSKBN16122B


Processing URLs:  97%|█████████▋| 969/1000 [41:28<00:58,  1.88s/it]

Error extracting text from https://www.reuters.com/article/us-russia-sanctions-defence-bank/russia-launches-new-defense-bank-to-shield-lenders-from-sanctions-idUSKBN1F71TJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-sanctions-defence-bank/russia-launches-new-defense-bank-to-shield-lenders-from-sanctions-idUSKBN1F71TJ


Processing URLs:  97%|█████████▋| 974/1000 [41:31<00:25,  1.02it/s]

Error extracting text from http://mashable.com/2015/01/27/apple-q1-earnings-2015/#ySJXVyGEmSqj: 404 Client Error: Not Found for url: https://mashable.com/2015/01/27/apple-q1-earnings-2015/#ySJXVyGEmSqj


Processing URLs:  98%|█████████▊| 976/1000 [42:33<05:00, 12.53s/it]

Error extracting text from http://psm.du.edu/media/documents/reports_and_stats/us_data/dod_quarterly_census/dod_quarterly_census_2014_jan.pdf: HTTPConnectionPool(host='psm.du.edu', port=80): Max retries exceeded with url: /media/documents/reports_and_stats/us_data/dod_quarterly_census/dod_quarterly_census_2014_jan.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fccf1970>, 'Connection to psm.du.edu timed out. (connect timeout=60)'))
Error extracting text from http://www.reuters.com/article/us-oil-meeting-iran-idUSKCN0XD0BF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-meeting-iran-idUSKCN0XD0BF


Processing URLs:  98%|█████████▊| 977/1000 [42:40<04:12, 10.97s/it]

Error extracting text from http://www.brasilpost.com.br/2016/01/28/cunha-adia-impeachment_n_8982964.html: HTTPConnectionPool(host='www.brasilpost.com.br', port=80): Max retries exceeded with url: /2016/01/28/cunha-adia-impeachment_n_8982964.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5f530>: Failed to resolve 'www.brasilpost.com.br' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  98%|█████████▊| 978/1000 [42:40<02:55,  7.96s/it]

Error extracting text from http://www.tradingeconomics.com/germany/car-registrations: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/germany/car-registrations


Processing URLs:  98%|█████████▊| 979/1000 [42:44<02:19,  6.63s/it]

URL filtered: https://twitter.com/raveenaujmaya


Processing URLs:  98%|█████████▊| 984/1000 [42:59<00:55,  3.45s/it]

Error extracting text from http://uk.reuters.com/article/uk-global-oil-idUKKBN1AI02U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  99%|█████████▉| 988/1000 [43:03<00:18,  1.53s/it]

Error extracting text from http://uk.reuters.com/article/turkey-energy-russia-iran-idUKL8N1L13IZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  99%|█████████▉| 991/1000 [43:13<00:23,  2.62s/it]

URL filtered: http://www.youtube.com/watch?v=TRCUO7-lbUE&amp;sns=em


Processing URLs:  99%|█████████▉| 994/1000 [43:15<00:08,  1.41s/it]

Error extracting text from http://finviz.com/quote.ashx?t=tsla: 403 Client Error: Forbidden for url: https://finviz.com/quote.ashx?t=tsla


Processing URLs: 100%|█████████▉| 997/1000 [43:20<00:04,  1.59s/it]

Error extracting text from https://global.handelsblatt.com/opinion/the-russian-agenda-793034: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/opinion/the-russian-agenda-793034


Processing URLs: 100%|█████████▉| 999/1000 [43:24<00:01,  1.67s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/suspected-rebels-kill-ranger-congo-park-38077317: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/suspected-rebels-kill-ranger-congo-park-38077317


Processing URLs: 100%|██████████| 1000/1000 [43:25<00:00,  2.61s/it]


Error extracting text from http://thehill.com/policy/cybersecurity/354734-election-hacking-report-finds-us-has-a-lot-to-do-in-a-short-period-of: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/354734-election-hacking-report-finds-us-has-a-lot-to-do-in-a-short-period-of/


Processing URLs:   0%|          | 3/1000 [00:02<12:59,  1.28it/s]

Error extracting text from http://www.nytimes.com/2016/12/03/world/asia/philippines-rodrigo-duterte-donald-trump.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/03/world/asia/philippines-rodrigo-duterte-donald-trump.html


Processing URLs:   0%|          | 4/1000 [00:03<13:49,  1.20it/s]

URL filtered: https://www.youtube.com/watch?v=3WNvZPDqSzQ


Processing URLs:   1%|          | 9/1000 [00:10<18:53,  1.14s/it]

Error extracting text from https://www.nytimes.com/2018/01/15/world/asia/afghanistan-atta-muhammad-noor-president.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/15/world/asia/afghanistan-atta-muhammad-noor-president.html


Processing URLs:   1%|          | 10/1000 [00:11<19:19,  1.17s/it]

Error extracting text from https://publishingperspectives.com/2021/08/npd-sees-the-us-print-market-up-in-july-momentum-slowing/: 403 Client Error: Forbidden for url: https://publishingperspectives.com/2021/08/npd-sees-the-us-print-market-up-in-july-momentum-slowing/


Processing URLs:   1%|          | 12/1000 [00:22<49:41,  3.02s/it]  

Error extracting text from https://www.yahoo.com/news/panama-aims-end-june-much-delayed-canal-expansion-001713762.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/panama-aims-end-june-much-delayed-canal-expansion-001713762.html


Processing URLs:   2%|▏         | 15/1000 [00:28<35:09,  2.14s/it]

Error extracting text from https://www.lowyinstitute.org/the-interpreter/chinas-first-overseas-military-base-djibouti-likely-be-taste-things-come: 404 Client Error: Not Found for url: https://www.lowyinstitute.org/the-interpreter/chinas-first-overseas-military-base-djibouti-likely-be-taste-things-come


Processing URLs:   2%|▏         | 17/1000 [00:31<26:42,  1.63s/it]

Error extracting text from http://www.imdb.com/title/tt0078655/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0078655/


Processing URLs:   2%|▏         | 18/1000 [00:32<23:59,  1.47s/it]

Error extracting text from http://www.jsonline.com/news/statepolitics/ron-johnson-introduces-bills-to-tackle-government-waste-b99733187z1-381019641.html: 404 Client Error: OK for url: https://www.jsonline.com/news/statepolitics/ron-johnson-introduces-bills-to-tackle-government-waste-b99733187z1-381019641.html/


Processing URLs:   2%|▏         | 20/1000 [00:36<30:42,  1.88s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-16/u-k-s-may-rejects-calls-for-scottish-independence-referendum


Processing URLs:   2%|▏         | 23/1000 [00:40<24:02,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-usa-iran-navy-idUSKCN11C1TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-navy-idUSKCN11C1TL


Processing URLs:   2%|▎         | 25/1000 [00:41<14:45,  1.10it/s]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-usa-economy-idUSKCN0SV0HQ20151106#gkfavBjUohk644x9.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-usa-economy-idUSKCN0SV0HQ20151106#gkfavBjUohk644x9.97


Processing URLs:   3%|▎         | 28/1000 [00:42<08:21,  1.94it/s]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-resumes-political-crisis-talks-02-04-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-resumes-political-crisis-talks-02-04-2016
Error extracting text from https://www.irrawaddy.com/in-person/interview/international-pressure-essential-in-war-against-junta-myanmars-un-ambassador.html: 403 Client Error: Forbidden for url: https://www.irrawaddy.com/in-person/interview/international-pressure-essential-in-war-against-junta-myanmars-un-ambassador.html


Processing URLs:   3%|▎         | 32/1000 [00:52<28:27,  1.76s/it]

Error extracting text from http://cleantechnica.com/2015/04/13/solar-wind-power-prices-often-lower-fossil-fuel-power-prices/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2015/04/13/solar-wind-power-prices-often-lower-fossil-fuel-power-prices/


Processing URLs:   4%|▎         | 36/1000 [00:58<23:18,  1.45s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=57675#.WcpdszMfnq0: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=57675#.WcpdszMfnq0


Processing URLs:   4%|▎         | 37/1000 [01:00<25:28,  1.59s/it]

URL filtered: https://www.youtube.com/watch?v=WFNNQk6cxf4


Processing URLs:   4%|▍         | 43/1000 [01:07<14:16,  1.12it/s]

Error extracting text from http://www.reuters.com/article/us-health-gene-editing-idUSKBN1AC0CM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-gene-editing-idUSKBN1AC0CM


Processing URLs:   4%|▍         | 45/1000 [01:09<16:52,  1.06s/it]

Error extracting text from http://www.shanghaidaily.com/article/article_xinhua.aspx?id=321571: 404 Client Error: Not Found for url: http://www.shanghaidaily.com/article/article_xinhua.aspx?id=321571


Processing URLs:   5%|▍         | 48/1000 [01:13<15:09,  1.05it/s]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-volkswagen-emissions-lawsuits-idUSKBN0TJ2MF20151130: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-volkswagen-emissions-lawsuits-idUSKBN0TJ2MF20151130


Processing URLs:   5%|▌         | 54/1000 [01:20<19:33,  1.24s/it]

Error extracting text from http://www.newsweek.com/isis-using-drones-rigged-munitions-attack-advancing-forces-raqqa-628955: 403 Client Error: Forbidden for url: https://www.newsweek.com/isis-using-drones-rigged-munitions-attack-advancing-forces-raqqa-628955


Processing URLs:   6%|▌         | 56/1000 [01:23<19:53,  1.26s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X30GY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X30GY


Processing URLs:   6%|▌         | 57/1000 [01:25<25:27,  1.62s/it]

Error extracting text from http://stardrive.org/stardrive/index.php/all-blog-articles/5628-darpa-wikileaks-national-security: 404 Client Error: Not Found for url: https://stardrive.org/stardrive/index.php/all-blog-articles/5628-darpa-wikileaks-national-security


Processing URLs:   6%|▌         | 59/1000 [01:31<35:41,  2.28s/it]

URL filtered: https://www.youtube.com/watch?v=J7-8sCLWwLk


Processing URLs:   6%|▋         | 63/1000 [01:37<24:40,  1.58s/it]

Error extracting text from https://orientalreview.org/2020/12/15/aman-2021-may-strengthen-cooperation-among-friendly-nations/: HTTPSConnectionPool(host='orientalreview.org', port=443): Max retries exceeded with url: /2020/12/15/aman-2021-may-strengthen-cooperation-among-friendly-nations/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x304dd4b00>: Failed to resolve 'orientalreview.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.stratfor.com/analysis/advance-toward-mosul-begins?utm_source=LinkedIn&amp;utm_medium=social&amp;utm_campaign=article


Processing URLs:   7%|▋         | 68/1000 [01:38<10:02,  1.55it/s]

Error extracting text from http://www.malaysia-chronicle.com/index.php?option=com_k2&amp;view=item&amp;id=606063:najib-reiterates-no-ringgit-peg-or-capital-controls&amp;Itemid=3#axzz3qx2bC4k7: HTTPConnectionPool(host='www.malaysia-chronicle.com', port=80): Max retries exceeded with url: /index.php?option=com_k2&amp;view=item&amp;id=606063:najib-reiterates-no-ringgit-peg-or-capital-controls&amp;Itemid=3 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x304dd7a40>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0Z406V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0Z406V
Error extracting text from https://www.rfi.fr/en/business/20210520-european-parliament-votes-to-freeze-eu-china-investment-deal-cai-human-rights-sanctions-xinjiang-hong-kong: 403 Client Error: Forbidden for url

Processing URLs:   7%|▋         | 71/1000 [01:40<09:12,  1.68it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trade-nafta-idUSKBN17S2DG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-nafta-idUSKBN17S2DG?il=0


Processing URLs:   7%|▋         | 73/1000 [01:41<09:35,  1.61it/s]

Error extracting text from https://www.crows.org/: 403 Client Error: Forbidden for url: https://www.crows.org/


Processing URLs:   7%|▋         | 74/1000 [01:42<09:26,  1.63it/s]

Error extracting text from http://thehill.com/opinion/cybersecurity/362193-cia-and-nsa-codes-are-on-the-web-and-the-leakers-could-be-in-the: 403 Client Error: Forbidden for url: https://thehill.com/opinion/cybersecurity/362193-cia-and-nsa-codes-are-on-the-web-and-the-leakers-could-be-in-the/


Processing URLs:   8%|▊         | 76/1000 [01:45<16:55,  1.10s/it]

Error extracting text from https://www.latimes.com/world-nation/story/2020-06-04/after-historic-casino-closure-gambling-returns-to-las-vegas: 403 Client Error: Forbidden for url: https://www.latimes.com/world-nation/story/2020-06-04/after-historic-casino-closure-gambling-returns-to-las-vegas


Processing URLs:   8%|▊         | 78/1000 [01:47<14:32,  1.06it/s]

Error extracting text from http://blogs.wsj.com/washwire/2015/12/07/anti-trump-effort-launches-super-pac/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/12/07/anti-trump-effort-launches-super-pac/
URL filtered: https://www.youtube.com/watch?v=rD8SmacBUcU


Processing URLs:   8%|▊         | 81/1000 [01:48<10:09,  1.51it/s]

URL filtered: https://www.youtube.com/watch?v=mWBvcJAXwu4


Processing URLs:   8%|▊         | 85/1000 [02:03<37:41,  2.47s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/686365/iraqi-forces-preparing-for-mosul-assault-oir-spokesman-says: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/686365/iraqi-forces-preparing-for-mosul-assault-oir-spokesman-says


Processing URLs:   9%|▊         | 86/1000 [02:05<37:08,  2.44s/it]

Error extracting text from https://slatestarcodex.com/2014/05/23/ssc-gives-a-graduation-speech/: 403 Client Error: Forbidden for url: https://slatestarcodex.com/2014/05/23/ssc-gives-a-graduation-speech/


Processing URLs:   9%|▊         | 87/1000 [02:09<43:49,  2.88s/it]

Error extracting text from https://www.news24.com/SouthAfrica/News/has-zuma-been-reduced-to-a-puppet-20180112: 403 Client Error: Forbidden for url: https://thewitness.co.za


Processing URLs:   9%|▉         | 90/1000 [02:12<23:41,  1.56s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/01/03/0200000000AEN20160103000700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   9%|▉         | 91/1000 [02:12<18:15,  1.21s/it]

Error extracting text from http://www.bbĉ.com: HTTPConnectionPool(host='www.xn--bb-2la.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304dd5190>: Failed to resolve 'www.xn--bb-2la.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  10%|▉         | 95/1000 [02:16<15:44,  1.04s/it]

Error extracting text from https://www.wsj.com/articles/theresa-may-is-no-maggie-thatcher-1493924928: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/theresa-may-is-no-maggie-thatcher-1493924928


Processing URLs:  10%|█         | 104/1000 [02:31<22:52,  1.53s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/05/my-super-picker/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/05/my-super-picker/


Processing URLs:  11%|█         | 108/1000 [04:34<6:26:12, 25.98s/it]

Error extracting text from https://trudeaumetre.polimeter.org/: HTTPSConnectionPool(host='trudeaumetre.polimeter.org', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ffc6e660>, 'Connection to trudeaumetre.polimeter.org timed out. (connect timeout=60)'))
Error extracting text from http://www.nytimes.com/2016/09/05/world/asia/south-china-sea-philippines-scarborough-shoal.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/05/world/asia/south-china-sea-philippines-scarborough-shoal.html?_r=0


Processing URLs:  11%|█         | 109/1000 [04:35<4:31:58, 18.31s/it]

Error extracting text from http://en.vietnamplus.vn/thailand-backs-faster-conclusion-of-rcep-negotiations/106508.vnp: HTTPSConnectionPool(host='en.vietnamplus.vn', port=443): Max retries exceeded with url: /thailand-backs-faster-conclusion-of-rcep-negotiations/106508.vnp (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  11%|█         | 110/1000 [04:37<3:20:16, 13.50s/it]

Error extracting text from http://www.foxnews.com/politics/2015/09/27/hassan-rouhani-iran-could-release-us-prisoners/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/09/27/hassan-rouhani-iran-could-release-us-prisoners/
URL filtered: https://www.bloomberg.com/news/articles/2016-12-19/scotland-threatens-to-leave-u-k-if-forced-out-of-single-market


Processing URLs:  12%|█▏        | 120/1000 [05:15<1:34:34,  6.45s/it]

Error extracting text from https://www.washingtonpost.com/politics/a-look-at-the-known-ties-between-trump-associates-and-russia/2017/03/03/a151728a-ffeb-11e6-9b78-824ccab94435_story.html?utm_term=.3c31b10317b8: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/a-look-at-the-known-ties-between-trump-associates-and-russia/2017/03/03/a151728a-ffeb-11e6-9b78-824ccab94435_story.html?utm_term=.3c31b10317b8


Processing URLs:  12%|█▏        | 122/1000 [05:16<50:18,  3.44s/it]  

Error extracting text from http://www.nytimes.com/2016/06/17/world/europe/nato-russia-cyberwarfare.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/17/world/europe/nato-russia-cyberwarfare.html?_r=0


Processing URLs:  12%|█▏        | 123/1000 [05:17<38:49,  2.66s/it]

Error extracting text from http://www.bostonherald.com/news/local_coverage/2016/02/beacon_hill_lawmakers_head_north_for_new_hampshire_primary: 404 Client Error: Not Found for url: https://www.bostonherald.com/news/local_coverage/2016/02/beacon_hill_lawmakers_head_north_for_new_hampshire_primary


Processing URLs:  12%|█▏        | 124/1000 [05:17<29:18,  2.01s/it]

Error extracting text from http://www.latimes.com/opinion/op-ed/la-oe-boot-obama-reliance-on-special-forces-20160112-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/op-ed/la-oe-boot-obama-reliance-on-special-forces-20160112-story.html


Processing URLs:  12%|█▎        | 125/1000 [05:19<26:18,  1.80s/it]

Error extracting text from http://www.gov.me/en/News/152657/Report-on-the-implementation-of-the-Fifth-Annual-National-Programme-of-Montenegro-within-the-Membership-Action-Plan-MAP.html: 404 Client Error: not found for url: https://www.gov.me/en/News/152657/Report-on-the-implementation-of-the-Fifth-Annual-National-Programme-of-Montenegro-within-the-Membership-Action-Plan-MAP.html


Processing URLs:  13%|█▎        | 128/1000 [05:25<28:13,  1.94s/it]

Error extracting text from http://www.nytimes.com/2015/09/17/us/politics/obama-hints-at-sanctions-against-china-over-cyberattacks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/17/us/politics/obama-hints-at-sanctions-against-china-over-cyberattacks.html?_r=0


Processing URLs:  13%|█▎        | 131/1000 [05:27<15:49,  1.09s/it]

Error extracting text from http://blog.teleosleaders.com/2013/07/19/emotional-empathy-and-cognitive-empathy/: 404 Client Error: Not Found for url: http://blog.teleosleaders.com/2013/07/19/emotional-empathy-and-cognitive-empathy/


Processing URLs:  13%|█▎        | 134/1000 [05:35<37:49,  2.62s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-11/stocks-tumble-in-worst-week-since-august-as-fed-anxiety-spreads


Processing URLs:  15%|█▍        | 146/1000 [05:50<12:58,  1.10it/s]

Error extracting text from http://www.timesofisrael.com/with-coup-defeated-erdogan-seeks-revamped-powers/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/with-coup-defeated-erdogan-seeks-revamped-powers/


Processing URLs:  15%|█▍        | 147/1000 [05:52<15:03,  1.06s/it]

Error extracting text from http://www.stardailystandard.com/uk/turkish-pm-says-no-cause-to-halt-akkuyu-plant-with-russia/89271/: 406 Client Error: Not Acceptable for url: http://www.stardailystandard.com/uk/turkish-pm-says-no-cause-to-halt-akkuyu-plant-with-russia/89271/


Processing URLs:  15%|█▌        | 150/1000 [05:54<13:00,  1.09it/s]

Error extracting text from https://coronavirus.upenn.edu/announcement/planning-penn%E2%80%99s-spring-2021-semester: 404 Client Error: Not Found for url: https://wellness.upenn.edu/announcement/planning-penn%E2%80%99s-spring-2021-semester


Processing URLs:  15%|█▌        | 154/1000 [06:59<4:26:51, 18.93s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-03-09/poll-majority-oppose-gops-plan-to-block-obamas-supreme-court-nominee: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▌        | 156/1000 [07:02<2:20:52, 10.01s/it]

URL filtered: http://www.breitbart.com/national-security/2016/02/09/brazil-rio-de-janeiro-carnival-will-feature-refugee-children-including-syrians-nypost-com20160209kenya-threatens-to-pull-out-of-olympics-over-zikautm_campaignsocialflowutm_sourcenyptwitter/
URL filtered: http://www.bloomberg.com/news/articles/2016-05-10/online-polls-overstated-support-for-ukip-analyst-singh-says


Processing URLs:  16%|█▌        | 162/1000 [07:16<54:01,  3.87s/it]  

URL filtered: https://twitter.com/CNBCnow/status/882956674490925056


Processing URLs:  17%|█▋        | 167/1000 [07:23<32:28,  2.34s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/what-the-superforecasters-say-about-when-the-fed-will-lift-rates


Processing URLs:  17%|█▋        | 173/1000 [07:27<12:43,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-peru-election-guzman-idUSKCN0VD2SR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-guzman-idUSKCN0VD2SR


Processing URLs:  18%|█▊        | 177/1000 [07:39<40:33,  2.96s/it]

URL filtered: https://www.youtube.com/watch?v=czJTxZqXLnw


Processing URLs:  18%|█▊        | 182/1000 [07:46<21:57,  1.61s/it]

Error extracting text from http://www.reuters.com/article/2015/10/18/us-iran-nuclear-eu-idUSKCN0SC10620151018: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/18/us-iran-nuclear-eu-idUSKCN0SC10620151018


Processing URLs:  19%|█▊        | 187/1000 [07:56<22:58,  1.70s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-23/saudi-arabia-won-t-cut-crude-output-oil-minister-al-naimi-says?cmpid=wsdemand


Processing URLs:  20%|█▉        | 196/1000 [08:09<21:40,  1.62s/it]

Error extracting text from https://www.sec.gov/divisions/investment/guidance/13fpt2.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/divisions/investment/guidance/13fpt2.htm


Processing URLs:  20%|█▉        | 197/1000 [08:10<17:28,  1.31s/it]

Error extracting text from http://www.hybridcars.com/california-governments-add-toyota-mirai-fuel-cell-cars-to-their-fleets/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/california-governments-add-toyota-mirai-fuel-cell-cars-to-their-fleets/


Processing URLs:  20%|█▉        | 199/1000 [08:11<13:53,  1.04s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/transmission/variant-cases.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/transmission/variant-cases.html
Error extracting text from http://www.opec.org/opec_web/en/press_room/3046.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/3046.htm


Processing URLs:  20%|██        | 200/1000 [08:12<12:17,  1.08it/s]

Error extracting text from http://www.pravdareport.com/russia/politics/04-08-2016/135221-west_russia_anaconda-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/russia/politics/04-08-2016/135221-west_russia_anaconda-0/


Processing URLs:  20%|██        | 203/1000 [08:16<13:07,  1.01it/s]

Error extracting text from http://www.parliament.uk/about/how/elections-and-voting/general/: 403 Client Error: Forbidden for url: http://www.parliament.uk/about/how/elections-and-voting/general/


Processing URLs:  21%|██        | 207/1000 [08:22<14:37,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-britain-election-nireland-idUSKBN17T2CS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-nireland-idUSKBN17T2CS


Processing URLs:  21%|██        | 212/1000 [09:35<4:31:41, 20.69s/it]

Error extracting text from http://aa.com.tr/en/economy/turkish-capital-controls-out-of-the-question/611713: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  21%|██▏       | 214/1000 [09:39<2:25:22, 11.10s/it]

Error extracting text from http://www.publicpolicypolling.com/pdf/2015/PPP_Release_NH_10616.pdf-: 404 Client Error: Not Found for url: https://www.publicpolicypolling.com/wp-content/uploads/2017/09/PPP_Release_NH_10616.pdf-


Processing URLs:  22%|██▏       | 217/1000 [10:40<3:37:25, 16.66s/it]

Error extracting text from https://www.mesadeconversaciones.com.co/documentos-y-comunicados: HTTPSConnectionPool(host='www.mesadeconversaciones.com.co', port=443): Max retries exceeded with url: /documentos-y-comunicados (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fccf12e0>, 'Connection to www.mesadeconversaciones.com.co timed out. (connect timeout=60)'))
Error extracting text from https://www.reuters.com/world/americas/brazils-bolsonaro-turns-center-right-senator-political-survival-2021-07-27/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazils-bolsonaro-turns-center-right-senator-political-survival-2021-07-27/


Processing URLs:  22%|██▏       | 222/1000 [10:45<43:48,  3.38s/it]  

Error extracting text from https://www.nytimes.com/2019/08/12/world/europe/russia-nuclear-accident-putin.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2019/08/12/world/europe/russia-nuclear-accident-putin.html


Processing URLs:  22%|██▎       | 225/1000 [10:48<22:26,  1.74s/it]

Error extracting text from http://www.enoughproject.org/conflicts/eastern_congo/armed-groups: 403 Client Error: Forbidden for url: http://www.enoughproject.org/conflicts/eastern_congo/armed-groups


Processing URLs:  23%|██▎       | 226/1000 [10:49<19:47,  1.53s/it]

Error extracting text from http://iowastartingline.com/2015/10/30/does-any-republican-want-to-win-iowa-or-what/: 403 Client Error: Forbidden for url: http://iowastartingline.com/2015/10/30/does-any-republican-want-to-win-iowa-or-what/
URL filtered: https://twitter.com/RALee85/status/1497940431040098309


Processing URLs:  23%|██▎       | 228/1000 [10:49<11:55,  1.08it/s]

Error extracting text from http://www.iranwatch.org/our-publications/nuclear-iran-weekly/major-iranian-nuclear-entities-receive-early-sanctions-relief: 403 Client Error: Forbidden for url: https://www.iranwatch.org/our-publications/nuclear-iran-weekly/major-iranian-nuclear-entities-receive-early-sanctions-relief


Processing URLs:  23%|██▎       | 231/1000 [10:55<15:16,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-idUSKBN13X2JS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-idUSKBN13X2JS
URL filtered: https://www.youtube.com/playlist?list=PLnH5kZ13Bb5Z3trx2pY4eWiZGH2gnlRn0


Processing URLs:  24%|██▎       | 235/1000 [10:59<13:22,  1.05s/it]

Error extracting text from http://in.reuters.com/article/us-turkey-russia-commentary-idINKBN15329M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  24%|██▍       | 238/1000 [11:03<16:07,  1.27s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/north-koreas-nuclear-weapons-strategy-21179: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/north-koreas-nuclear-weapons-strategy-21179


Processing URLs:  24%|██▍       | 241/1000 [11:05<09:07,  1.39it/s]

Error extracting text from https://www.khaama.com/afghanistans-160m-new-defense-ministry-mini-pentagon-inaugurated-1885: 403 Client Error: Forbidden for url: https://www.khaama.com/afghanistans-160m-new-defense-ministry-mini-pentagon-inaugurated-1885


Processing URLs:  25%|██▍       | 246/1000 [11:14<16:53,  1.34s/it]

Error extracting text from http://www.balkaninsight.com/en/article/russia-does-not-stir-up-montenegro-protests-opposition-says-10-21-2015-1: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/russia-does-not-stir-up-montenegro-protests-opposition-says-10-21-2015-1


Processing URLs:  25%|██▍       | 248/1000 [11:15<10:02,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN18Q025: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN18Q025


Processing URLs:  25%|██▌       | 252/1000 [11:19<11:38,  1.07it/s]

Error extracting text from http://www.nytimes.com/2016/02/12/world/africa/rivals-disrupt-jacob-zumas-speech-on-south-african-economy.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/12/world/africa/rivals-disrupt-jacob-zumas-speech-on-south-african-economy.html


Processing URLs:  25%|██▌       | 253/1000 [11:20<11:21,  1.10it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-04-20/saudi-arabia-says-some-oil-producers-reach-deal-to-extend-cuts


Processing URLs:  26%|██▌       | 257/1000 [11:30<27:35,  2.23s/it]

Error extracting text from http://www.rtoinsider.com/entergy-nypsc-indian-point-21254/: 404 Client Error: Not Found for url: https://www.rtoinsider.com/entergy-nypsc-indian-point-21254/


Processing URLs:  26%|██▌       | 258/1000 [11:35<36:41,  2.97s/it]

Error extracting text from http://communitynewspapers.com/palmetto-bay/villages-celebration-picnic-attracts-4500-attendees/: 403 Client Error: Forbidden for url: https://communitynewspapers.com/palmetto-bay-featured/villages-celebration-picnic-attracts-4500-attendees/


Processing URLs:  26%|██▌       | 259/1000 [11:37<33:04,  2.68s/it]

Error extracting text from https://www.afghanistan-analysts.org/afghanistans-national-unity-government-rift-2-the-problems-that-will-not-go-away/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/afghanistans-national-unity-government-rift-2-the-problems-that-will-not-go-away/


Processing URLs:  26%|██▌       | 260/1000 [11:38<26:13,  2.13s/it]

Error extracting text from http://heavy.com/news/2016/12/new-isis-islamic-state-amaq-news-tamim-neighborhood-east-mosul-iraqi-army-istishhadi-istishhad-suicide-car-bomb-attack-photo-report/2/: 404 Client Error: Not Found for url: https://heavy.com/news/2016/12/new-isis-islamic-state-amaq-news-tamim-neighborhood-east-mosul-iraqi-army-istishhadi-istishhad-suicide-car-bomb-attack-photo-report/2/


Processing URLs:  26%|██▌       | 262/1000 [11:41<21:49,  1.77s/it]

Error extracting text from https://gaceta.es/gaceta-tv/estamos-llegando-al-limite-de-lo-que-podemos-hacer-para-enfrentar-a-ortega-20210622-0700/: 403 Client Error: Forbidden for url: https://gaceta.es/gaceta-tv/estamos-llegando-al-limite-de-lo-que-podemos-hacer-para-enfrentar-a-ortega-20210622-0700/


Processing URLs:  27%|██▋       | 266/1000 [14:59<8:22:02, 41.04s/it] 

Error extracting text from https://www.ouest-france.fr/sante/virus/coronavirus/covid-19-en-ethiopie-le-non-respect-du-port-du-masque-peut-etre-puni-de-deux-ans-de-prison-7029866: 403 Client Error: Forbidden for url: https://www.ouest-france.fr/sante/virus/coronavirus/covid-19-en-ethiopie-le-non-respect-du-port-du-masque-peut-etre-puni-de-deux-ans-de-prison-7029866


Processing URLs:  27%|██▋       | 269/1000 [15:05<3:24:56, 16.82s/it]

URL filtered: https://www.opensecrets.org/orgs/facebook-inc/totals?id=D000033563


Processing URLs:  27%|██▋       | 272/1000 [15:08<1:42:17,  8.43s/it]



Processing URLs:  27%|██▋       | 273/1000 [15:12<1:30:29,  7.47s/it]

URL filtered: https://www.youtube.com/watch?v=1-uNMj57Y4c


Processing URLs:  28%|██▊       | 281/1000 [15:19<15:42,  1.31s/it]  

Error extracting text from http://www.nytimes.com/1992/02/05/world/venezuela-crushes-army-coup-attempt.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1992/02/05/world/venezuela-crushes-army-coup-attempt.html
Error extracting text from http://www.nytimes.com/2015/10/13/us/politics/filing-deadlines-add-to-pressure-on-joe-biden.html?ref=politics: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/13/us/politics/filing-deadlines-add-to-pressure-on-joe-biden.html?ref=politics


Processing URLs:  28%|██▊       | 282/1000 [15:20<15:41,  1.31s/it]

Error extracting text from http://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&amp;s=RBRTE&amp;f=D: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  29%|██▊       | 287/1000 [15:35<29:48,  2.51s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-09-03/trudeau-tumbles-further-behind-conservative-freight-train


Processing URLs:  29%|██▉       | 292/1000 [15:51<35:29,  3.01s/it]

Error extracting text from http://www.newsweek.com/nigeria-scores-killed-suspected-attacks-fulani-herdsmen-enugu-452524: 403 Client Error: Forbidden for url: https://www.newsweek.com/nigeria-scores-killed-suspected-attacks-fulani-herdsmen-enugu-452524


Processing URLs:  30%|██▉       | 297/1000 [15:59<18:21,  1.57s/it]

Error extracting text from http://www.who.int/influenza/resources/documents/pandemic_guidance_04_2009/en/: 404 Client Error: Not Found for url: https://www.who.int/influenza/resources/documents/pandemic_guidance_04_2009/en/


Processing URLs:  30%|███       | 303/1000 [16:10<19:40,  1.69s/it]

Error extracting text from https://reut.rs/3vyNw6G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-asia-blinken-japan/blinken-warns-china-against-coercion-and-aggression-on-first-asia-trip-idUSKBN2B71C9


Processing URLs:  31%|███       | 312/1000 [16:28<31:22,  2.74s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-22/chinese-investors-hunting-for-yield-are-running-out-of-options


Processing URLs:  32%|███▏      | 316/1000 [16:39<37:59,  3.33s/it]

Error extracting text from https://firstdraftnews.com/crosscheck-launches/: Exceeded 30 redirects.


Processing URLs:  32%|███▏      | 319/1000 [16:57<1:05:40,  5.79s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-01/african-union-plans-further-talks-with-burundi-over-peacekeepers
URL filtered: https://www.youtube.com/watch?v=iUrzicaiRLU


Processing URLs:  32%|███▏      | 324/1000 [17:00<22:57,  2.04s/it]  

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://oglobo.globo.com/brasil/cunha-arquiva-pedido-de-impeachment-de-dilma-feito-por-presidiario-18609523&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://oglobo.globo.com/brasil/cunha-arquiva-pedido-de-impeachment-de-dilma-feito-por-presidiario-18609523&amp;prev=search


Processing URLs:  33%|███▎      | 329/1000 [17:10<18:51,  1.69s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-06/full-text-of-elon-musk-s-aug-29-e-mail-to-tesla-employees


Processing URLs:  34%|███▎      | 335/1000 [17:17<16:53,  1.52s/it]

Error extracting text from http://www.ibtimes.com/nato-expanding-despite-putins-threats-montenegro-could-become-newest-member-2144372: 403 Client Error: Forbidden for url: https://www.ibtimes.com/nato-expanding-despite-putins-threats-montenegro-could-become-newest-member-2144372


Processing URLs:  34%|███▍      | 338/1000 [17:22<16:17,  1.48s/it]

Error extracting text from http://data.unhcr.org/syrianrefugees/country.php?id=224: 404 Client Error: Not Found for url: https://data.unhcr.org:443/syrianrefugees/country.php?id=224


Processing URLs:  34%|███▍      | 341/1000 [17:28<21:23,  1.95s/it]

Error extracting text from http://www.worldbulletin.net/middle-east/169513/iran-allows-6229-candidates-to-stand-for-parliament: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/middle-east/169513/iran-allows-6229-candidates-to-stand-for-parliament


Processing URLs:  34%|███▍      | 342/1000 [17:29<15:51,  1.45s/it]

Error extracting text from http://www.wsj.com/articles/five-ways-to-see-the-financial-future-1443219804: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/five-ways-to-see-the-financial-future-1443219804


Processing URLs:  35%|███▍      | 348/1000 [17:36<14:13,  1.31s/it]

Error extracting text from http://www.wsj.com/articles/hillary-clinton-vs-foia-1443136818: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hillary-clinton-vs-foia-1443136818


Processing URLs:  35%|███▍      | 349/1000 [17:36<10:42,  1.01it/s]

Error extracting text from http://www.wsj.com/articles/china-corrals-white-house-reporters-spawns-some-tension-as-obama-arrives-in-hangzhou-1472919704: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-corrals-white-house-reporters-spawns-some-tension-as-obama-arrives-in-hangzhou-1472919704


Processing URLs:  36%|███▌      | 355/1000 [17:40<07:15,  1.48it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/03/11/asia-pacific/china-set-to-begin-operating-civilian-flights-to-and-from-disputed-south-china-sea-next-year/#.VuMBBMdeCuU: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/03/11/asia-pacific/china-set-to-begin-operating-civilian-flights-to-and-from-disputed-south-china-sea-next-year/#.VuMBBMdeCuU


Processing URLs:  36%|███▌      | 357/1000 [17:44<13:48,  1.29s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-oil-is-russia-riding-to-pdvsas-rescue-1501858066: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-oil-is-russia-riding-to-pdvsas-rescue-1501858066


Processing URLs:  36%|███▌      | 361/1000 [17:47<08:24,  1.27it/s]

URL filtered: https://twitter.com/markknoller/status/656862930298892288
Error extracting text from http://www.washingtontimes.com/news/2016/jan/7/hillary-clinton-lead-bernie-sanders-3-points-new-h/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/7/hillary-clinton-lead-bernie-sanders-3-points-new-h/


Processing URLs:  36%|███▌      | 362/1000 [17:49<10:46,  1.01s/it]

Error extracting text from http://www.stripes.com/news/spain-belgium-take-over-nato-baltic-air-policing-1.387338: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/spain-belgium-take-over-nato-baltic-air-policing-1.387338


Processing URLs:  36%|███▋      | 364/1000 [17:55<18:03,  1.70s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1411FE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1411FE
URL filtered: http://www.bloomberg.com/news/articles/2016-03-29/byd-sees-electric-car-sales-tripling-in-market-coveted-by-tesla


Processing URLs:  37%|███▋      | 366/1000 [17:55<11:59,  1.13s/it]

Error extracting text from http://scout.com/military/warrior/Article/The-Russian-Militarys-Greatest-Enemy-And-Its-Not-America-107135992: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  37%|███▋      | 367/1000 [17:56<10:06,  1.04it/s]

Error extracting text from http://m.news24.com.ng/Nigeria/National/News/herdsmen-kill-six-in-kaduna-village-20160816: HTTPConnectionPool(host='m.news24.com.ng', port=80): Max retries exceeded with url: /Nigeria/National/News/herdsmen-kill-six-in-kaduna-village-20160816 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30713cb30>: Failed to resolve 'm.news24.com.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 376/1000 [18:13<20:38,  1.98s/it]

Error extracting text from http://www.newsweek.com/russia-us-broke-nato-peace-treaty-possible-military-response-683526: 403 Client Error: Forbidden for url: https://www.newsweek.com/russia-us-broke-nato-peace-treaty-possible-military-response-683526


Processing URLs:  38%|███▊      | 377/1000 [18:14<17:57,  1.73s/it]

Error extracting text from http://iran.usembassy.gov/index.html: HTTPConnectionPool(host='iran.usembassy.gov', port=80): Max retries exceeded with url: /index.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5d400>: Failed to resolve 'iran.usembassy.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  38%|███▊      | 382/1000 [18:40<58:48,  5.71s/it]

Error extracting text from http://www.portdevelopmentmiddleeast.com/: HTTPConnectionPool(host='www.portdevelopmentmiddleeast.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5cc20>: Failed to resolve 'www.portdevelopmentmiddleeast.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/haynesdeborah/status/1497271046294192132


Processing URLs:  38%|███▊      | 385/1000 [18:40<27:36,  2.69s/it]

Error extracting text from http://www.advisorperspectives.com/articles/2015/03/17/mlps-will-weather-the-storm: 403 Client Error: Forbidden for url: https://www.advisorperspectives.com/articles/2015/03/17/mlps-will-weather-the-storm


Processing URLs:  39%|███▉      | 390/1000 [18:44<14:03,  1.38s/it]

Error extracting text from http://www.crisisgroup.org/en/regions/europe/turkey-cyprus/turkey.aspx: 404 Client Error: Not Found for url: https://www.crisisgroup.org/en/regions/europe/turkey-cyprus/turkey.aspx


Processing URLs:  39%|███▉      | 391/1000 [18:45<12:12,  1.20s/it]

Error extracting text from https://www.cia.gov/index.html: 403 Client Error: Forbidden for url: https://www.cia.gov/index.html


Processing URLs:  39%|███▉      | 393/1000 [18:49<15:27,  1.53s/it]

Error extracting text from https://www.nytimes.com/2017/04/01/us/politics/trump-border-tax-import-koch.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/01/us/politics/trump-border-tax-import-koch.html


Processing URLs:  40%|████      | 401/1000 [19:01<13:09,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-libya-security-france-idUSKCN0VX1C3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-libya-security-france-idUSKCN0VX1C3


Processing URLs:  40%|████      | 405/1000 [19:03<07:42,  1.29it/s]

Error extracting text from https://www.ice.gov/daca: 403 Client Error: Forbidden for url: https://www.ice.gov/daca


Processing URLs:  41%|████      | 409/1000 [19:21<28:17,  2.87s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/26005-meshrano-jirga-approves-presidents-decree-on-electoral-reforms: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/26005-meshrano-jirga-approves-presidents-decree-on-electoral-reforms


Processing URLs:  42%|████▏     | 415/1000 [19:28<11:21,  1.16s/it]

Error extracting text from http://www.npr.org/player/v2/mediaPlayer.html?action=1&amp;t=1&amp;islist=false&amp;id=453217045&amp;m=453217046&amp;live=1: 404 Client Error: Not Found for url: https://www.npr.org/player/v2/mediaPlayer.html?action=1&amp;t=1&amp;islist=false&amp;id=453217045&amp;m=453217046&amp;live=1?action=1&amp;t=1&amp;islist=false&amp;id=453217045&amp;m=453217046&amp;live=1


Processing URLs:  42%|████▏     | 416/1000 [19:29<11:34,  1.19s/it]

Error extracting text from http://www.novelrank.com/asin/1594206279: HTTPSConnectionPool(host='www.novelrank.com', port=443): Max retries exceeded with url: /asin/1594206279 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))
URL filtered: http://www.theverge.com/2016/12/15/13960062/facebook-fact-check-partnerships-fake-news


Processing URLs:  42%|████▏     | 420/1000 [19:31<06:50,  1.41it/s]

Error extracting text from http://www.nytimes.com/2016/01/28/world/asia/us-china-north-korea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/28/world/asia/us-china-north-korea.html?_r=0


Processing URLs:  42%|████▏     | 423/1000 [19:35<11:14,  1.17s/it]

Error extracting text from http://m.wsbradio.com/news/ap/top-news/greece-resumes-migrant-deportations-from-islands/nq2bj/: 404 Client Error: Not Found for url: https://www.wsbradio.com/news/ap/top-news/greece-resumes-migrant-deportations-from-islands/nq2bj/


Processing URLs:  43%|████▎     | 429/1000 [20:00<36:34,  3.84s/it]

URL filtered: https://www.kcrg.com/2021/03/01/biden-to-meet-with-mexican-president-amid-migration-issues/?utm_source=facebook&amp;utm_medium=social&amp;utm_campaign=snd&amp;utm_content=kcrg&amp;fbclid=IwAR2WL_tXskhj4uTHWRec2q02pXQb7NKT1o9iaHe6mM-hqHKISLtrFXD3IAQ


Processing URLs:  43%|████▎     | 432/1000 [20:01<16:42,  1.76s/it]

Error extracting text from https://www.nytimes.com/live/2021/05/20/world/israel-palestine-gaza#cease-fire-israel-hamas-biden: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/05/20/world/israel-palestine-gaza#cease-fire-israel-hamas-biden
URL filtered: https://www.financialexpress.com/industry/technology/russia-could-ban-facebook-twitter-youtube-for-censoring-content/2157891/


Processing URLs:  43%|████▎     | 434/1000 [20:02<11:00,  1.17s/it]

Error extracting text from http://globalriskinsights.com/2016/01/russias-long-game-in-antarctica-runs-political-risk/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/01/russias-long-game-in-antarctica-runs-political-risk/


Processing URLs:  44%|████▎     | 436/1000 [20:04<11:13,  1.19s/it]

Error extracting text from https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=29aeafd9-7e54-44c3-8547-c88e69f53e30&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KRT-MFK1-JDRJ-W2PS-00000-00&amp;pddocid=urn%3AcontentItem%3A5KRT-MFK: 403 Client Error: Forbidden for url: https://advance.lexis.com/document/?pdmfid=1000516&amp;crid=29aeafd9-7e54-44c3-8547-c88e69f53e30&amp;pddocfullpath=%2Fshared%2Fdocument%2Fnews%2Furn%3AcontentItem%3A5KRT-MFK1-JDRJ-W2PS-00000-00&amp;pddocid=urn%3AcontentItem%3A5KRT-MFK


Processing URLs:  44%|████▍     | 444/1000 [20:16<09:19,  1.01s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-oil-russia-citgo/venezuela-confirms-discussing-citgo-collateral-swap-with-rosneft-idUSKBN1C9242: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-russia-citgo/venezuela-confirms-discussing-citgo-collateral-swap-with-rosneft-idUSKBN1C9242


Processing URLs:  46%|████▌     | 455/1000 [20:41<14:04,  1.55s/it]

Error extracting text from http://www.wsj.com/articles/mccarthy-wont-reauthorize-export-import-bank-charter-1403450288: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/mccarthy-wont-reauthorize-export-import-bank-charter-1403450288


Processing URLs:  46%|████▌     | 457/1000 [20:41<07:59,  1.13it/s]

Error extracting text from https://www.nytimes.com/2017/10/31/world/middleeast/israel-rivlin-netanyahu-democracy.html?mtrref=www.gjopen.com&amp;auth=login-email: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/31/world/middleeast/israel-rivlin-netanyahu-democracy.html?mtrref=www.gjopen.com&amp;auth=login-email
Error extracting text from http://www.reuters.com/article/us-northkorea-satelitte-missiledefense-idUSKCN0VJ2PD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-satelitte-missiledefense-idUSKCN0VJ2PD


Processing URLs:  46%|████▌     | 458/1000 [20:41<06:24,  1.41it/s]

Error extracting text from https://www.dhs.gov/ntas/advisory/national-terrorism-advisory-system-bulletin-january-27-2021: 403 Client Error: Forbidden for url: https://www.dhs.gov/ntas/advisory/national-terrorism-advisory-system-bulletin-january-27-2021


Processing URLs:  46%|████▌     | 461/1000 [20:46<10:55,  1.22s/it]

Error extracting text from http://economistsview.typepad.com/timduy/2015/11/onto-the-next-question.html: 403 Client Error: Forbidden for url: https://economistsview.typepad.com/timduy/2015/11/onto-the-next-question.html


Processing URLs:  46%|████▋     | 463/1000 [20:48<08:53,  1.01it/s]

Error extracting text from https://www.reuters.com/business/energy/gazprom-fails-book-more-gas-transit-europe-despite-kremlin-reassurance-2021-11-02/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/gazprom-fails-book-more-gas-transit-europe-despite-kremlin-reassurance-2021-11-02/


Processing URLs:  46%|████▋     | 464/1000 [20:49<06:50,  1.31it/s]

Error extracting text from https://www.nytimes.com/2018/01/17/world/asia/north-south-korea-olympics.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/17/world/asia/north-south-korea-olympics.html


Processing URLs:  46%|████▋     | 465/1000 [20:50<07:39,  1.16it/s]

Error extracting text from http://commonslibraryblog.com/2015/10/21/a-brief-guide-to-the-period-before-the-eu-referendum/: 406 Client Error: Not Acceptable for url: http://commonslibraryblog.com/2015/10/21/a-brief-guide-to-the-period-before-the-eu-referendum/
URL filtered: https://www.youtube.com/watch?v=b240PGCMwV0


Processing URLs:  47%|████▋     | 467/1000 [20:50<05:54,  1.50it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-09-22/pdvsa-swap-likely-distressed-positive-for-venezuela-moody-s


Processing URLs:  47%|████▋     | 469/1000 [20:51<04:00,  2.20it/s]

Error extracting text from https://www.reuters.com/business/energy/german-regulators-nord-stream-2-move-may-delay-commissioning-march-sources-2021-11-17/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/german-regulators-nord-stream-2-move-may-delay-commissioning-march-sources-2021-11-17/


Processing URLs:  47%|████▋     | 472/1000 [20:52<04:36,  1.91it/s]

Error extracting text from http://finance.yahoo.com/news/eu-fails-set-target-date-125945976.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/eu-fails-set-target-date-125945976.html


Processing URLs:  47%|████▋     | 474/1000 [21:58<2:19:55, 15.96s/it]

Error extracting text from http://en.kremlin.ru/events/president/news/50533: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)
URL filtered: http://www.mercurynews.com/2017/09/09/michael-mcfaul-russia-donald-trump-twitter/


Processing URLs:  48%|████▊     | 477/1000 [22:02<1:10:21,  8.07s/it]

Error extracting text from http://mobile.nytimes.com/comments/2016/03/06/opinion/sunday/tricked-into-cheating-and-sentenced-to-death.html: 404 Client Error: Not Found for url: https://archive.nytimes.com/www.nytimes.com/comments/2016/03/06/opinion/sunday/tricked-into-cheating-and-sentenced-to-death.html


Processing URLs:  48%|████▊     | 479/1000 [22:09<50:46,  5.85s/it]  

Error extracting text from https://www.nytimes.com/2021/04/06/world/europe/iran-nuclear-deal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/06/world/europe/iran-nuclear-deal.html


Processing URLs:  48%|████▊     | 483/1000 [22:11<16:22,  1.90s/it]

Error extracting text from https://www.reuters.com/article/us-somalia-security/suicide-bomber-kills-at-least-18-at-police-academy-in-somalias-capital-idUSKBN1E80GF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-security/suicide-bomber-kills-at-least-18-at-police-academy-in-somalias-capital-idUSKBN1E80GF?il=0


Processing URLs:  48%|████▊     | 484/1000 [22:12<13:44,  1.60s/it]

Error extracting text from http://www.ebay.com/itm/25k-Iraqi-Dinar-Notes-1-x-25-000-Uncirculated-IQD-A2-/301821759002?hash=item4645fa8e1a:g:4G0AAOSwLVZVzIVi: 404 Client Error: Not Found for url: https://www.ebay.com/itm/25k-Iraqi-Dinar-Notes-1-x-25-000-Uncirculated-IQD-A2-/301821759002?hash=item4645fa8e1a:g:4G0AAOSwLVZVzIVi


Processing URLs:  49%|████▊     | 487/1000 [22:15<09:43,  1.14s/it]

Error extracting text from https://www.nytimes.com/2017/01/10/opinion/seven-questions-about-health-reform.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/10/opinion/seven-questions-about-health-reform.html?_r=0


Processing URLs:  49%|████▉     | 489/1000 [22:16<07:02,  1.21it/s]

Error extracting text from http://www.theblaze.com/stories/2015/09/28/vw-almost-certain-to-face-significant-legal-backlash-over-vehicle-emissions-scandal/: 404 Client Error: Not Found for url: https://www.theblaze.com/stories/2015/09/28/vw-almost-certain-to-face-significant-legal-backlash-over-vehicle-emissions-scandal/
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-un-idUSKBN18T2X3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-un-idUSKBN18T2X3


Processing URLs:  49%|████▉     | 490/1000 [22:17<07:40,  1.11it/s]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2015/july/tradoc_153635.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2015/july/tradoc_153635.pdf


Processing URLs:  49%|████▉     | 492/1000 [22:23<15:36,  1.84s/it]

Error extracting text from http://www.ktbs.com/story/30366615/apple-beats-earnings-estimates-issues-healthy-forecast: 404 Client Error: Not Found for url: https://www.ktbs.com/story/30366615/apple-beats-earnings-estimates-issues-healthy-forecast/


Processing URLs:  50%|████▉     | 496/1000 [22:31<14:41,  1.75s/it]

Error extracting text from http://trade.ec.europa.eu/doclib/docs/2015/november/tradoc_153935.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2015/november/tradoc_153935.pdf
URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/rousseff-to-face-impeachment-in-brazil-congress-speaker-says


Processing URLs:  50%|████▉     | 499/1000 [23:36<2:16:56, 16.40s/it]

Error extracting text from https://ics-radar.shodan.io/: HTTPSConnectionPool(host='ics-radar.shodan.io', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303b878f0>, 'Connection to ics-radar.shodan.io timed out. (connect timeout=60)'))


Processing URLs:  50%|█████     | 505/1000 [23:50<34:53,  4.23s/it]  

Error extracting text from https://www.thelifeyoucansave.org/About-Us: 403 Client Error: Forbidden for url: https://www.thelifeyoucansave.org/About-Us


Processing URLs:  51%|█████     | 508/1000 [23:51<14:54,  1.82s/it]

Error extracting text from http://abcnews.go.com/Business/wireStory/eu-appeals-greece-meet-bailout-conditions-monday-42555368: 404 Client Error: Not Found for url: https://abcnews.go.com/Business/wireStory/eu-appeals-greece-meet-bailout-conditions-monday-42555368
Error extracting text from http://www.reuters.com/article/us-iran-economy-growth-idUSKCN0W70BT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-economy-growth-idUSKCN0W70BT


Processing URLs:  51%|█████     | 509/1000 [23:52<11:28,  1.40s/it]

Error extracting text from http://seekingalpha.com/article/3702056-could-the-fed-hike-the-discount-rate-on-monday: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3702056-could-the-fed-hike-the-discount-rate-on-monday


Processing URLs:  51%|█████     | 512/1000 [23:57<14:24,  1.77s/it]

Error extracting text from https://www.reuters.com/article/us-health-birdflu-france-idUSKBN19L2PF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-birdflu-france-idUSKBN19L2PF


Processing URLs:  52%|█████▏    | 515/1000 [23:59<09:21,  1.16s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-04/16/c_135284782.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/16/c_135284782.htm


Processing URLs:  52%|█████▏    | 517/1000 [25:00<2:18:25, 17.20s/it]

Error extracting text from https://www.nazret.com/2016/12/04/discontent-grows-louder-in-ethiopia-as-regime-fights-for-survival/: HTTPSConnectionPool(host='www.nazret.com', port=443): Max retries exceeded with url: /2016/12/04/discontent-grows-louder-in-ethiopia-as-regime-fights-for-survival/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x306bffec0>, 'Connection to www.nazret.com timed out. (connect timeout=60)'))
URL filtered: https://in.reuters.com/article/usa-trump-russia-kremlin/kremlin-hits-at-twitters-prejudiced-move-against-two-russian-media-outlets-idINKBN1CW15G


Processing URLs:  52%|█████▏    | 519/1000 [25:03<1:22:11, 10.25s/it]

Error extracting text from http://www.shoutoutuk.org/2015/11/13/colectiv-anger-sets-romania-ablaze/: 404 Client Error: Not Found for url: https://www.shoutoutuk.org/2015/11/13/colectiv-anger-sets-romania-ablaze/


Processing URLs:  52%|█████▏    | 520/1000 [25:03<1:03:13,  7.90s/it]

Error extracting text from https://www.nytimes.com/2017/07/18/world/middleeast/trump-iran-sanctions-nuclear.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/18/world/middleeast/trump-iran-sanctions-nuclear.html
URL filtered: https://www.bloomberg.com/opinion/articles/2020-12-27/larry-summers-trump-pelosi-2-000-stimulus-checks-are-a-mistake?sref=htOHjx5Y


Processing URLs:  53%|█████▎    | 526/1000 [25:18<29:26,  3.73s/it]  

Error extracting text from http://journal-neo.org/2016/08/10/assessment-of-erdogan-putin-meeting-now-were-talking/: HTTPConnectionPool(host='journal-neo.org', port=80): Max retries exceeded with url: /2016/08/10/assessment-of-erdogan-putin-meeting-now-were-talking/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023dfec0>: Failed to resolve 'journal-neo.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  53%|█████▎    | 527/1000 [25:19<23:32,  2.99s/it]

Error extracting text from http://www.dtic.mil/docs/citations/ADA549541: HTTPSConnectionPool(host='www.dtic.mil', port=443): Max retries exceeded with url: /docs/citations/ADA549541 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  53%|█████▎    | 531/1000 [25:23<11:26,  1.46s/it]

Error extracting text from https://www.nytimes.com/2021/01/22/world/europe/us-russia-biden-nuclear-treaty.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/22/world/europe/us-russia-biden-nuclear-treaty.html


Processing URLs:  54%|█████▍    | 538/1000 [25:32<07:24,  1.04it/s]

Error extracting text from http://jewishjournal.com/rosnersdomain/230810/is-bibi-in-trouble/: 403 Client Error: Forbidden for url: http://jewishjournal.com/rosnersdomain/230810/is-bibi-in-trouble/


Processing URLs:  54%|█████▍    | 540/1000 [25:34<07:10,  1.07it/s]

Error extracting text from http://globalnation.inquirer.net/146529/duterte-says-ph-cant-win-in-scarborough-shoal: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/146529/duterte-says-ph-cant-win-in-scarborough-shoal


Processing URLs:  55%|█████▍    | 545/1000 [25:42<10:12,  1.35s/it]

Error extracting text from http://www.newsweek.com/can-ethiopias-new-cabinet-address-oromo-protests-518375: 403 Client Error: Forbidden for url: https://www.newsweek.com/can-ethiopias-new-cabinet-address-oromo-protests-518375


Processing URLs:  55%|█████▍    | 547/1000 [25:45<11:49,  1.57s/it]FloatObject (b'0.00-6051436') invalid; use 0.0 instead
FloatObject (b'0.00-8503398') invalid; use 0.0 instead
Processing URLs:  55%|█████▌    | 550/1000 [26:20<1:18:55, 10.52s/it]

Error extracting text from http://www.todayszaman.com/latest-news_us-backed-fighters-capture-isil-held-town-in-northeast-syria_412824.html: 522 Server Error:  for url: http://www.todayszaman.com/latest-news_us-backed-fighters-capture-isil-held-town-in-northeast-syria_412824.html


Processing URLs:  55%|█████▌    | 551/1000 [26:21<55:44,  7.45s/it]  

Error extracting text from http://www.wsj.com/articles/russia-iran-seen-coordinating-on-defense-of-assad-regime-in-syria-1442856556: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-iran-seen-coordinating-on-defense-of-assad-regime-in-syria-1442856556


Processing URLs:  55%|█████▌    | 554/1000 [26:26<26:53,  3.62s/it]

Error extracting text from https://www.army.mil/article/155880/Welcome_to_the_Jungle___25th_ID_trains_jungle_experts: 403 Client Error: Forbidden for url: https://www.army.mil/article/155880/Welcome_to_the_Jungle___25th_ID_trains_jungle_experts
URL filtered: https://twitter.com/hashtag/spiritcooking?src=hash


Processing URLs:  56%|█████▌    | 556/1000 [26:26<14:55,  2.02s/it]

Error extracting text from http://www.ohchr.org/en/NewsEvents/Pages/DisplayNews.aspx?NewsID=19835&amp;LangID=E: 403 Client Error: Forbidden for url: https://www.ohchr.org/en/NewsEvents/Pages/DisplayNews.aspx?NewsID=19835&amp;LangID=E


Processing URLs:  56%|█████▌    | 562/1000 [26:48<25:31,  3.50s/it]

Error extracting text from https://www.zdf.de/nachrichten/politik/corona-indische-mutante-evp-100.html: 404 Client Error: Not Found for url: https://www.zdf.de/nachrichten/politik/corona-indische-mutante-evp-100.html


Processing URLs:  56%|█████▋    | 565/1000 [26:51<12:42,  1.75s/it]

Error extracting text from http://blogs.wsj.com/brussels/2015/10/09/georgia-postpones-bid-to-advance-nato-membership/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/brussels/2015/10/09/georgia-postpones-bid-to-advance-nato-membership/


Processing URLs:  57%|█████▋    | 568/1000 [27:15<30:50,  4.28s/it]

Error extracting text from http://timesofindia.indiatimes.com/world/china/China-starts-to-build-its-first-floating-nuclear-power-reactor-for-deployment-off-coast/articleshow/55298242.cms: 410 Client Error: Gone for url: https://timesofindia.indiatimes.com/world/china/China-starts-to-build-its-first-floating-nuclear-power-reactor-for-deployment-off-coast/articleshow/55298242.cms
Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuela-to-meet-creditors-in-bid-to-dodge-default-idUSKBN1DD0IG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuela-to-meet-creditors-in-bid-to-dodge-default-idUSKBN1DD0IG


Processing URLs:  57%|█████▋    | 573/1000 [27:21<15:08,  2.13s/it]

Error extracting text from https://echo.msk.ru/news/2151536-echo.html: HTTPSConnectionPool(host='echo.msk.ru', port=443): Max retries exceeded with url: /news/2151536-echo.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:  58%|█████▊    | 577/1000 [27:27<10:43,  1.52s/it]

Error extracting text from https://www.predictit.org/Contract/6252/Will-the-estate-tax-be-repealed-in-2017#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/6252/Will-the-estate-tax-be-repealed-in-2017#data
URL filtered: https://www.bloomberg.com/opinion/articles/2022-01-02/niall-ferguson-biden-eu-nato-won-t-stop-putin-s-ukraine-invasion


Processing URLs:  59%|█████▉    | 590/1000 [27:54<06:56,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/former-clinton-aide-to-invoke-fifth-amendment-in-email-case-1464820706: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/former-clinton-aide-to-invoke-fifth-amendment-in-email-case-1464820706


Processing URLs:  59%|█████▉    | 591/1000 [27:54<05:18,  1.28it/s]

Error extracting text from http://www.wsj.com/articles/feds-yellen-december-is-live-possibility-for-first-rate-increase-1446654282: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/feds-yellen-december-is-live-possibility-for-first-rate-increase-1446654282


Processing URLs:  60%|█████▉    | 595/1000 [27:58<06:43,  1.00it/s]

Error extracting text from http://www.businessinsider.com/r-us-governors-hackers-academics-team-up-to-secure-elections-2017-10: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-us-governors-hackers-academics-team-up-to-secure-elections-2017-10


Processing URLs:  60%|█████▉    | 596/1000 [28:58<2:06:01, 18.72s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-02-04/heres-what-would-happen-if-saudi-arabia-deployed-troops-to-syria: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  60%|█████▉    | 597/1000 [29:00<1:30:49, 13.52s/it]

Error extracting text from http://www.newsweek.com/taiwan-f-16-fighter-jets-rockets-tanks-and-attack-helicopters-stages-live-fire-795871: 403 Client Error: Forbidden for url: https://www.newsweek.com/taiwan-f-16-fighter-jets-rockets-tanks-and-attack-helicopters-stages-live-fire-795871


Processing URLs:  60%|██████    | 601/1000 [29:02<24:01,  3.61s/it]  

Error extracting text from http://www.wsj.com/articles/brazil-congress-upholds-most-of-rousseffs-vetoes-on-extra-spending-1443006711: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-congress-upholds-most-of-rousseffs-vetoes-on-extra-spending-1443006711
Error extracting text from https://www.wsj.com/articles/venezuelans-vote-on-new-assembly-amid-protests-1501426663: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelans-vote-on-new-assembly-amid-protests-1501426663


Processing URLs:  61%|██████    | 606/1000 [29:12<16:04,  2.45s/it]

Error extracting text from https://bit.ly/2SfAg7Y: 403 Client Error: Forbidden for url: https://conservativehome.com/thetorydiary/2021/06/reports-of-johnsons-political-demise-are-greatly-exaggerated.html?utm_source=newsletter&utm_medium=email&utm_campaign=18/06/2021+(Copy)


Processing URLs:  61%|██████    | 608/1000 [29:13<09:28,  1.45s/it]

Error extracting text from https://www.reuters.com/article/us-usa-court-mobilephones/u-s-supreme-court-weighs-major-digital-privacy-case-idUSKBN1DT0LQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-mobilephones/u-s-supreme-court-weighs-major-digital-privacy-case-idUSKBN1DT0LQ


Processing URLs:  61%|██████    | 611/1000 [29:17<07:43,  1.19s/it]

Error extracting text from https://www.nytimes.com/2017/07/31/world/europe/russia-military-exercise-zapad-west.html?mcubz=3: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/31/world/europe/russia-military-exercise-zapad-west.html?mcubz=3


Processing URLs:  61%|██████▏   | 613/1000 [29:18<04:41,  1.37it/s]

Error extracting text from http://www.nytimes.com/2015/08/30/world/asia/malaysia-protests-najib-razak-1mdb.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/30/world/asia/malaysia-protests-najib-razak-1mdb.html


Processing URLs:  62%|██████▏   | 618/1000 [29:25<06:29,  1.02s/it]

Error extracting text from https://www.reuters.com/business/energy/nord-stream-2-sanctions-waivers-could-help-normalise-russia-us-ties-moscow-says-2021-05-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/nord-stream-2-sanctions-waivers-could-help-normalise-russia-us-ties-moscow-says-2021-05-19/


Processing URLs:  62%|██████▏   | 621/1000 [29:35<14:44,  2.33s/it]

Error extracting text from http://csbcorrespondent.com/market-update-november-13-2015: 403 Client Error: Forbidden for url: https://www.southstatecorrespondent.com


Processing URLs:  62%|██████▏   | 623/1000 [29:42<16:47,  2.67s/it]

Error extracting text from https://www.stlucianewsonline.com/venezuela-ratifies-plan-to-refinance-and-restructure-debt/: 404 Client Error: Not Found for url: https://www.stlucianewsonline.com/venezuela-ratifies-plan-to-refinance-and-restructure-debt/


Processing URLs:  63%|██████▎   | 630/1000 [29:49<04:59,  1.23it/s]

Error extracting text from http://english.alarabiya.net/en/News/gulf/2016/11/15/Despite-JASTA-Saudi-US-investments-enjoy-immunity-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/gulf/2016/11/15/Despite-JASTA-Saudi-US-investments-enjoy-immunity-.html
Error extracting text from https://static.googleusercontent.com/media/www.g.u.00rz.com/en/us/selfdrivingcar/files/reports/report-0916.pdf: 404 Client Error: Not Found for url: https://static.googleusercontent.com/media/www.g.u.00rz.com/en/us/selfdrivingcar/files/reports/report-0916.pdf


Processing URLs:  63%|██████▎   | 633/1000 [29:54<08:08,  1.33s/it]

Error extracting text from http://www.newsweek.com/flat-earth-conspiracy-america-726761: 403 Client Error: Forbidden for url: https://www.newsweek.com/flat-earth-conspiracy-america-726761


Processing URLs:  64%|██████▎   | 636/1000 [29:57<07:35,  1.25s/it]

Error extracting text from https://www.ipcc.ch/publications_and_data/ar4/wg1/en/spmsspm-projections-of.html: 404 Client Error: Not Found for url: https://www.ipcc.ch/publications_and_data/ar4/wg1/en/spmsspm-projections-of.html


Processing URLs:  64%|██████▍   | 640/1000 [30:01<05:05,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-eu-turkey-idUSKCN10W0P4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-eu-turkey-idUSKCN10W0P4
Error extracting text from http://www.reuters.com/article/us-usa-trump-epa-budget-idUSKBN1692XA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-epa-budget-idUSKBN1692XA


Processing URLs:  64%|██████▍   | 641/1000 [30:02<05:53,  1.02it/s]

Error extracting text from https://www.sec.gov/news/press-release/2017-131: 403 Client Error: Forbidden for url: https://www.sec.gov/news/press-release/2017-131
URL filtered: http://www.bloomberg.com/quicktake/chinas-debt-bomb


Processing URLs:  64%|██████▍   | 644/1000 [30:05<05:43,  1.04it/s]

Error extracting text from http://en.censor.net.ua/news/394186/memorandum_with_imf_agreed_upon_finance_minister_danyliuk: 403 Client Error: Forbidden for url: https://censor.net/en/news/394186/memorandum_with_imf_agreed_upon_finance_minister_danyliuk


Processing URLs:  65%|██████▌   | 652/1000 [30:30<14:19,  2.47s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-30/from-guerrilla-to-impeachment-the-dilma-rousseff-story


Processing URLs:  65%|██████▌   | 654/1000 [30:37<16:15,  2.82s/it]

Error extracting text from http://theiranproject.com/blog/2016/10/17/iran-boost-oil-output-4-million-barrels-opec-plans-cut/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=iran-boost-oil-output-4-million-barrels-opec-plans-cut


Processing URLs:  66%|██████▌   | 655/1000 [30:37<12:39,  2.20s/it]

Error extracting text from https://www.nytimes.com/2020/03/15/us/politics/joe-biden-female-vice-president.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/03/15/us/politics/joe-biden-female-vice-president.html


Processing URLs:  66%|██████▌   | 659/1000 [30:45<13:47,  2.43s/it]

Error extracting text from http://ec.europa.eu/dgs/home-affairs/what-we-do/policies/european-agenda-migration/background-information/docs/eam_state_of_play_and_future_actions_20160113_en.pdf: 404 Client Error: Not Found for url: https://home-affairs.ec.europa.eu/sites/default/files/what-we-do/policies/european-agenda-migration/background-information/docs/eam_state_of_play_and_future_actions_20160113_en.pdf
URL filtered: http://www.usatoday.com/story/tech/2016/11/19/how-facebook-plans-crack-down-fake-news/94123842/


Processing URLs:  66%|██████▌   | 662/1000 [30:48<08:38,  1.53s/it]

Error extracting text from http://finance.yahoo.com/news/china-selling-u-bonds-because-182529754.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/china-selling-u-bonds-because-182529754.html


Processing URLs:  66%|██████▋   | 664/1000 [30:51<08:03,  1.44s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/polls/morning-consult-campaign-for-sustainable-rx-pricing-23184: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/polls/morning-consult-campaign-for-sustainable-rx-pricing-23184


Processing URLs:  66%|██████▋   | 665/1000 [30:52<07:56,  1.42s/it]

Error extracting text from http://www.interactions.org/cms/?pid=1002289: 403 Client Error: Forbidden for url: http://www.interactions.org/cms/?pid=1002289


Processing URLs:  67%|██████▋   | 669/1000 [30:55<05:08,  1.07it/s]

Error extracting text from http://www.nytimes.com/2016/05/24/world/americas/brazil-dilma-rousseff-impeachment-petrobras.html?ref=world&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/24/world/americas/brazil-dilma-rousseff-impeachment-petrobras.html?ref=world&amp;_r=0


Processing URLs:  68%|██████▊   | 675/1000 [32:05<1:45:39, 19.51s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220828-iran-to-speed-up-development-of-missile-program-despite-us-threats-commander: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220828-iran-to-speed-up-development-of-missile-program-despite-us-threats-commander (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fedb5e20>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:  68%|██████▊   | 680/1000 [32:30<32:07,  6.02s/it]  

Error extracting text from https://eurovision.tv/: 403 Client Error: Forbidden for url: https://eurovision.tv/


Processing URLs:  68%|██████▊   | 681/1000 [32:31<24:17,  4.57s/it]

Error extracting text from https://www.ibtimes.com/china-fires-its-jl-3-submarine-launched-ballistic-missile-new-south-china-sea-test-2975240: 403 Client Error: Forbidden for url: https://www.ibtimes.com/china-fires-its-jl-3-submarine-launched-ballistic-missile-new-south-china-sea-test-2975240


Processing URLs:  69%|██████▊   | 686/1000 [32:36<07:32,  1.44s/it]

Error extracting text from http://www.arabnews.com/node/1000546/saudi-arabia: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1000546/saudi-arabia


Processing URLs:  69%|██████▉   | 688/1000 [32:41<09:20,  1.79s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-11-29/most-asian-index-futures-climb-while-crude-selloff-stalks-market


Processing URLs:  69%|██████▉   | 690/1000 [32:41<05:18,  1.03s/it]

Error extracting text from https://news.abs-cbn.com/news/03/24/21/think-tank-urges-china-to-withdraw-militia-fleet-from-ph-waters-warns-of-global-security-risks: 403 Client Error: Forbidden for url: https://news.abs-cbn.com/news/03/24/21/think-tank-urges-china-to-withdraw-militia-fleet-from-ph-waters-warns-of-global-security-risks


Processing URLs:  70%|██████▉   | 697/1000 [32:55<07:00,  1.39s/it]

Error extracting text from http://www.nytimes.com/2016/02/07/world/asia/north-korea-moves-up-rocket-launching-plan.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/asia/north-korea-moves-up-rocket-launching-plan.html?_r=0


Processing URLs:  70%|███████   | 701/1000 [33:13<20:50,  4.18s/it]

Error extracting text from http://www.business-standard.com/article/news-ians/seoul-to-complete-ballistic-missile-development-by-2017-115100100502_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/seoul-to-complete-ballistic-missile-development-by-2017-115100100502_1.html


Processing URLs:  71%|███████▏  | 714/1000 [33:32<08:29,  1.78s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YD0D8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YD0D8


Processing URLs:  72%|███████▏  | 721/1000 [33:45<06:07,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-opec-talks-iran-idUSKBN13D140?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-talks-iran-idUSKBN13D140?il=0


Processing URLs:  72%|███████▏  | 723/1000 [33:48<06:20,  1.37s/it]

Error extracting text from http://www.kxnet.com/story/30190659/cramer-tips-tpp-vote-to-presidents-oil-export-stance: 404 Client Error: Not Found for url: https://www.kxnet.com/story/30190659/cramer-tips-tpp-vote-to-presidents-oil-export-stance


Processing URLs:  72%|███████▏  | 724/1000 [33:50<06:53,  1.50s/it]

Error extracting text from https://uk.reuters.com/article/us-cohen-russia-commentary-idUKKBN1AO1RP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  73%|███████▎  | 726/1000 [33:54<07:52,  1.73s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/cleanup-begins-dakota-access-pipeline-protest-encampment-45168829: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/cleanup-begins-dakota-access-pipeline-protest-encampment-45168829


Processing URLs:  73%|███████▎  | 731/1000 [34:00<05:36,  1.25s/it]

Error extracting text from https://news.yahoo.com/us-economy-not-yet-ready-162820054.html: 404 Client Error: Not Found for url: https://news.yahoo.com/us-economy-not-yet-ready-162820054.html


Processing URLs:  73%|███████▎  | 734/1000 [34:03<04:34,  1.03s/it]

URL filtered: https://twitter.com/NASAWebb/status/1474719348686217225


Processing URLs:  74%|███████▎  | 736/1000 [34:03<02:39,  1.65it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/03/13/ISIS-pulls-out-of-town-in-Iraq-s-Anbar-province.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/03/13/ISIS-pulls-out-of-town-in-Iraq-s-Anbar-province.html


Processing URLs:  74%|███████▍  | 738/1000 [34:06<03:46,  1.16it/s]

Error extracting text from http://aranews.net/2016/02/dozens-of-isis-fighters-surrender-to-the-kurdish-peshmerga-north-iraq/: 404 Client Error: Not Found for url: http://aranews.net/2016/02/dozens-of-isis-fighters-surrender-to-the-kurdish-peshmerga-north-iraq/


Processing URLs:  74%|███████▍  | 739/1000 [34:11<08:50,  2.03s/it]

Error extracting text from https://www.reuters.com/article/us-nato-lithuania/u-s-send-extra-fighters-to-police-baltic-skies-during-russian-exercise-idUSKCN1BA1TE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-lithuania/u-s-send-extra-fighters-to-police-baltic-skies-during-russian-exercise-idUSKCN1BA1TE


Processing URLs:  74%|███████▍  | 743/1000 [34:22<10:44,  2.51s/it]

Error extracting text from http://www.ibtimes.com/north-korea-preparing-war-new-artificial-islands-could-be-nuclear-missile-launch-2534629: 403 Client Error: Forbidden for url: https://www.ibtimes.com/north-korea-preparing-war-new-artificial-islands-could-be-nuclear-missile-launch-2534629


Processing URLs:  74%|███████▍  | 744/1000 [34:26<12:24,  2.91s/it]

Error extracting text from http://energyinfrapost.com/global-energy-companies-incl-gazprom-vedanta-petrofac-hardy-oil-keen-expand-operations-indias-oil-gas-sector/: 403 Client Error: Forbidden for url: http://energyinfrapost.com/global-energy-companies-incl-gazprom-vedanta-petrofac-hardy-oil-keen-expand-operations-indias-oil-gas-sector/


Processing URLs:  74%|███████▍  | 745/1000 [34:27<10:12,  2.40s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/16/narrative-structure-of-pakistani-literary-fiction/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/16/narrative-structure-of-pakistani-literary-fiction/


Processing URLs:  75%|███████▍  | 747/1000 [34:29<07:00,  1.66s/it]

Error extracting text from https://www.washingtontimes.com/news/2017/aug/2/ismail-ghaani-iranian-general-boasts-of-quds-force/: 403 Client Error: Forbidden for url: https://www.washingtontimes.com/news/2017/aug/2/ismail-ghaani-iranian-general-boasts-of-quds-force/


Processing URLs:  75%|███████▌  | 751/1000 [34:38<09:34,  2.31s/it]

Error extracting text from https://www.nytimes.com/2022/01/16/world/europe/russia-ukraine-invasion.html?nl=todaysheadlines&amp;emc=edit_th_20220116: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/01/16/world/europe/russia-ukraine-invasion.html?nl=todaysheadlines&amp;emc=edit_th_20220116


Processing URLs:  76%|███████▌  | 756/1000 [34:43<04:17,  1.05s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-chemicalweapons/syria-toxic-gas-inquiry-to-end-after-russia-again-blocks-u-n-renewal-idUSKBN1DH27L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-chemicalweapons/syria-toxic-gas-inquiry-to-end-after-russia-again-blocks-u-n-renewal-idUSKBN1DH27L
URL filtered: http://www.bloomberg.com/news/articles/2016-02-10/hsbc-board-said-to-meet-sunday-to-mull-headquarters-decision
Error extracting text from http://www.reuters.com/article/us-italy-renzi-idUSKCN0VV1DD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-renzi-idUSKCN0VV1DD


Processing URLs:  76%|███████▌  | 758/1000 [34:46<05:04,  1.26s/it]

Error extracting text from http://www.nigeriatoday.ng/2016/08/bloodbath-26-killed-kaduna-adamawa-suspected-fulani-herdsmen/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/08/bloodbath-26-killed-kaduna-adamawa-suspected-fulani-herdsmen/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff1cf4d0>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▌  | 760/1000 [34:47<03:31,  1.13it/s]

Error extracting text from http://thehill.com/opinion/national-security/353418-the-2018-midterms-are-coming-and-russia-is-ready: 403 Client Error: Forbidden for url: https://thehill.com/opinion/national-security/353418-the-2018-midterms-are-coming-and-russia-is-ready/


Processing URLs:  76%|███████▌  | 761/1000 [34:49<04:41,  1.18s/it]

Error extracting text from https://www.ifsmsl.com/2020/12/04/us-truck-driver-shortfall-steeper-than-expected/: 403 Client Error: Forbidden for url: https://www.ifsmsl.com/2020/12/04/us-truck-driver-shortfall-steeper-than-expected/


Processing URLs:  76%|███████▋  | 763/1000 [34:50<03:32,  1.12it/s]

Error extracting text from http://www.nytimes.com/2014/04/01/books/flash-boys-by-michael-lewis-a-tale-of-high-speed-trading.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2014/04/01/books/flash-boys-by-michael-lewis-a-tale-of-high-speed-trading.html


Processing URLs:  76%|███████▋  | 764/1000 [34:52<04:15,  1.08s/it]

Error extracting text from http://www.nationalreview.com/article/429355/supporting-assad-best-option: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/429355/supporting-assad-best-option/


Processing URLs:  77%|███████▋  | 768/1000 [34:56<03:51,  1.00it/s]

Error extracting text from http://www.stateofthemedia.org/2013/news-magazines-embracing-their-digital-future/: 404 Client Error: Not Found for url: https://www.pewresearch.org/journalism/2016/06/15/state-of-the-news-media-2016//2013/news-magazines-embracing-their-digital-future/
Error extracting text from http://www.nytimes.com/2015/10/12/world/middleeast/jason-rezaian-verdict-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/12/world/middleeast/jason-rezaian-verdict-iran.html


Processing URLs:  77%|███████▋  | 772/1000 [34:59<02:46,  1.37it/s]

Error extracting text from https://www.reuters.com/article/us-russia-politics-navalny-police-idUSKBN2AA128: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-politics-navalny-police-idUSKBN2AA128


Processing URLs:  77%|███████▋  | 774/1000 [35:03<04:33,  1.21s/it]

Error extracting text from https://af.reuters.com/article/topNews/idAFKCN25F1L7-OZATP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  78%|███████▊  | 777/1000 [35:09<05:50,  1.57s/it]

Error extracting text from http://www.wsj.com/articles/former-israeli-spy-chief-meir-dagan-dies-1458207769: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/former-israeli-spy-chief-meir-dagan-dies-1458207769


Processing URLs:  78%|███████▊  | 779/1000 [35:10<04:00,  1.09s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/economy-budget/339451-funeral-for-the-filibuster-gop-may-lay-senate-tool-to-rest: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/economy-budget/339451-funeral-for-the-filibuster-gop-may-lay-senate-tool-to-rest/
URL filtered: http://www.bloomberg.com/news/articles/2016-07-15/london-luxury-home-sales-slump-43-as-brexit-vote-deters-buyers


Processing URLs:  78%|███████▊  | 781/1000 [35:11<02:32,  1.44it/s]

Error extracting text from http://seekingalpha.com/article/3982671-tesla-demand-hits-wall: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3982671-tesla-demand-hits-wall


Processing URLs:  78%|███████▊  | 783/1000 [35:12<02:06,  1.72it/s]

Error extracting text from http://slatestarcodex.com/: 403 Client Error: Forbidden for url: http://slatestarcodex.com/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-government-idUSKCN0WN1W1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-government-idUSKCN0WN1W1


Processing URLs:  78%|███████▊  | 784/1000 [35:14<03:12,  1.12it/s]

Error extracting text from http://economictimes.indiatimes.com/articleshow/52285884.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/52285884.cms


Processing URLs:  79%|███████▊  | 786/1000 [35:15<02:43,  1.31it/s]

Error extracting text from http://cqrollcall.com/chill-with-russia-brings-nuclear-insecurity/: 403 Client Error: Forbidden for url: http://cqrollcall.com/chill-with-russia-brings-nuclear-insecurity/


Processing URLs:  79%|███████▊  | 787/1000 [35:15<02:09,  1.65it/s]

Error extracting text from http://www.wsj.com/articles/oil-prices-edge-higher-ahead-of-data-1441876543: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-prices-edge-higher-ahead-of-data-1441876543


Processing URLs:  80%|███████▉  | 798/1000 [35:28<03:31,  1.05s/it]

Error extracting text from http://www.independent.co.uk/news/world/asia/north-korea-preparing-to-launch-longrange-missile-10494244.html: 404 Client Error: Not Found for url: https://www.independent.co.uk/news/world/asia/north-korea-preparing-to-launch-longrange-missile-10494244.html


Processing URLs:  80%|████████  | 800/1000 [35:31<04:15,  1.28s/it]

Error extracting text from http://www.newindianexpress.com/nation/All-You-Need-to-Know-About-International-Fleets-Participating-in-IFR-2016/2016/02/01/article3256146.ece: 404 Client Error: Not Found for url: https://www.newindianexpress.com/nation/All-You-Need-to-Know-About-International-Fleets-Participating-in-IFR-2016/2016/02/01/article3256146.ece
URL filtered: https://twitter.com/ericgeller/status/788506808918810624
URL filtered: http://www.bloomberg.com/news/articles/2011-06-08/opec-members-are-unable-to-reach-consensus-on-output-quotas-el-badri-says


Processing URLs:  81%|████████  | 807/1000 [35:42<05:13,  1.62s/it]

Error extracting text from http://greece.greekreporter.com/2016/10/17/eurostat-one-in-three-greeks-live-in-conditions-of-poverty/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/10/17/eurostat-one-in-three-greeks-live-in-conditions-of-poverty/


Processing URLs:  81%|████████  | 809/1000 [35:44<04:26,  1.39s/it]

Error extracting text from http://fusion.net/story/108985/nicaraguas-interest-in-russian-fighter-jets-could-trigger-the-stupidest-arms-race-ever/?utm_source=emailshare&amp;utm_medium=email&amp;utm_campaign=socialshare&amp;utm_content=sticky+nav: HTTPConnectionPool(host='fusion.net', port=80): Max retries exceeded with url: /story/108985/nicaraguas-interest-in-russian-fighter-jets-could-trigger-the-stupidest-arms-race-ever/?utm_source=emailshare&amp;utm_medium=email&amp;utm_campaign=socialshare&amp;utm_content=sticky+nav (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303fdf1a0>: Failed to resolve 'fusion.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 817/1000 [36:00<05:58,  1.96s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-26/eu-repeats-demand-u-k-come-up-with-brexit-bill-methodology


Processing URLs:  82%|████████▏ | 822/1000 [36:04<03:20,  1.13s/it]

Error extracting text from http://www.nytimes.com/2016/07/26/world/middleeast/isis-iraq-insurgency.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/26/world/middleeast/isis-iraq-insurgency.html


Processing URLs:  82%|████████▏ | 823/1000 [36:05<03:13,  1.09s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/putin-lays-number-us-cut-755-moscow-diplomats-48943705: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/putin-lays-number-us-cut-755-moscow-diplomats-48943705


Processing URLs:  83%|████████▎ | 826/1000 [36:25<12:49,  4.42s/it]

Error extracting text from http://www.dailysabah.com/money/2016/03/07/france-to-continue-cooperation-with-turkey-on-nuclear-projects: 404 Client Error: Not Found for url: https://www.dailysabah.com/money/2016/03/07/france-to-continue-cooperation-with-turkey-on-nuclear-projects


Processing URLs:  83%|████████▎ | 829/1000 [36:30<07:15,  2.55s/it]

Error extracting text from https://www.nytimes.com/2017/10/05/world/middleeast/trump-iran-nuclear-deal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/05/world/middleeast/trump-iran-nuclear-deal.html


Processing URLs:  83%|████████▎ | 833/1000 [36:35<03:49,  1.37s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-perspec-intel-911-russia-trump-0905-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-perspec-intel-911-russia-trump-0905-story.html


Processing URLs:  84%|████████▎ | 836/1000 [36:40<03:45,  1.37s/it]

Error extracting text from http://www.washingtontimes.com/news/2017/apr/23/what-turkeys-referendum-reveals/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/apr/23/what-turkeys-referendum-reveals/


Processing URLs:  84%|████████▍ | 840/1000 [36:42<01:45,  1.52it/s]

Error extracting text from https://news.usni.org/2016/07/18/analysis-can-china-enforce-south-china-sea-air-defense-identification-zone: 403 Client Error: Forbidden for url: https://news.usni.org/2016/07/18/analysis-can-china-enforce-south-china-sea-air-defense-identification-zone


Processing URLs:  84%|████████▍ | 841/1000 [36:43<01:31,  1.73it/s]

Error extracting text from https://www.telegraph.co.uk/world-news/2021/08/28/myanmars-un-ambassador-escaped-plot-kill-now-fighting-keep-job/amp/: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/world-news/2021/08/28/myanmars-un-ambassador-escaped-plot-kill-now-fighting-keep-job/amp/


Processing URLs:  84%|████████▍ | 842/1000 [36:45<02:31,  1.04it/s]

Error extracting text from http://en.news-front.info/2016/07/21/montenegro-s-nato-membership-referendum-indisputably-important/: HTTPConnectionPool(host='en.news-front.info', port=80): Max retries exceeded with url: /2016/07/21/montenegro-s-nato-membership-referendum-indisputably-important/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3061882c0>: Failed to resolve 'en.news-front.info' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  85%|████████▍ | 846/1000 [36:51<03:28,  1.35s/it]

Error extracting text from http://www.al-monitor.com/pulse/originals/2015/12/iraq-kurdistan-sinjar-liberated-isis-hegemony.htm: 404 Client Error: Not Found for url: https://www.al-monitor.com/originals/2015/12/iraq-kurdistan-sinjar-liberated-isis-hegemony.htm


Processing URLs:  85%|████████▌ | 851/1000 [37:07<05:22,  2.17s/it]

URL filtered: https://www.youtube.com/watch?v=q5q77MQzU2Q


Processing URLs:  85%|████████▌ | 853/1000 [37:07<03:04,  1.25s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/04/republic-of-congo-air-strikes-hit-residential-areas-including-schools/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/04/republic-of-congo-air-strikes-hit-residential-areas-including-schools/


Processing URLs:  85%|████████▌ | 854/1000 [37:08<03:08,  1.29s/it]

Error extracting text from http://www.globalcapital.com/article/yvy4zb8lbb17/ntpc-readies-landmark-green-masala: 404 Client Error: Not Found for url: https://www.globalcapital.com/article/yvy4zb8lbb17/ntpc-readies-landmark-green-masala


Processing URLs:  86%|████████▌ | 860/1000 [37:26<05:29,  2.35s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html#polls


Processing URLs:  87%|████████▋ | 869/1000 [37:50<06:53,  3.15s/it]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-deal-idUSKCN0WC0HF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-deal-idUSKCN0WC0HF


Processing URLs:  87%|████████▋ | 870/1000 [37:50<05:13,  2.41s/it]

Error extracting text from http://blogs.wsj.com/chinarealtime/2015/10/14/the-next-hot-chinese-listing-a-sex-toy-maker/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/chinarealtime/2015/10/14/the-next-hot-chinese-listing-a-sex-toy-maker/


Processing URLs:  87%|████████▋ | 872/1000 [37:54<04:29,  2.10s/it]

Error extracting text from http://sites.duke.edu/niou/files/2011/06/goldstone-bates-etal.pdf: 404 Client Error: Not Found for url: https://sites.duke.edu/niou/files/2011/06/goldstone-bates-etal.pdf


Processing URLs:  87%|████████▋ | 874/1000 [37:55<02:48,  1.34s/it]

Error extracting text from http://opinion.inquirer.net/93058/the-moral-factor-in-political-transitions: 403 Client Error: Forbidden for url: https://opinion.inquirer.net/93058/the-moral-factor-in-political-transitions


Processing URLs:  88%|████████▊ | 877/1000 [37:58<02:10,  1.06s/it]

Error extracting text from http://www.todayonline.com/world/comparing-rcep-tpp?page=1: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/comparing-rcep-tpp?page=1


Processing URLs:  88%|████████▊ | 879/1000 [38:01<02:27,  1.22s/it]

Error extracting text from http://english.alarabiya.net/en/News/gulf/2017/05/08/Arab-coalition-launches-air-strikes-on-Houthi-sites-in-Yemen-s-Taiz.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/gulf/2017/05/08/Arab-coalition-launches-air-strikes-on-Houthi-sites-in-Yemen-s-Taiz.html


Processing URLs:  88%|████████▊ | 880/1000 [38:04<03:21,  1.68s/it]

Error extracting text from http://www.rutlandherald.com/article/20151228/NEWS03/151229549/0/1011: 404 Client Error: Not Found for url: https://www.rutlandherald.com/article/20151228/news03/151229549/0/1011/


Processing URLs:  88%|████████▊ | 881/1000 [38:04<02:42,  1.36s/it]

Error extracting text from http://www.sciencealert.com/nuclear-power-plants-are-still-using-pagers-to-communicate-and-that-could-be-a-big-problem: 403 Client Error: Forbidden for url: https://www.sciencealert.com/nuclear-power-plants-are-still-using-pagers-to-communicate-and-that-could-be-a-big-problem


Processing URLs:  89%|████████▊ | 886/1000 [38:13<03:09,  1.66s/it]

Error extracting text from http://www.wsj.com/articles/brexit-referendum-will-also-test-u-k-pollsters-1464687182: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brexit-referendum-will-also-test-u-k-pollsters-1464687182


Processing URLs:  89%|████████▉ | 888/1000 [38:14<02:10,  1.17s/it]

Error extracting text from https://de.scribd.com/document/324333858/PDVSA-2017-into-2020-Exchange-Offer-OC-16-August-2016: 410 Client Error: Gone for url: https://de.scribd.com/document/324333858/PDVSA-2017-into-2020-Exchange-Offer-OC-16-August-2016


Processing URLs:  89%|████████▉ | 890/1000 [38:15<01:23,  1.32it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-referendum-europe-idUSKBN16T13E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-europe-idUSKBN16T13E


Processing URLs:  89%|████████▉ | 894/1000 [38:17<00:47,  2.24it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/09/21/Iraqi-forces-ready-by-early-October-for-Mosul-assault-top-US-officer-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/09/21/Iraqi-forces-ready-by-early-October-for-Mosul-assault-top-US-officer-.html
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN18H12O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN18H12O
Error extracting text from http://www.reuters.com/article/us-northkorea-rights-un-idUSKBN0TO02020151205#QpBBwIAaeQXbX4Oy.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-rights-un-idUSKBN0TO02020151205#QpBBwIAaeQXbX4Oy.99


Processing URLs:  90%|████████▉ | 896/1000 [38:19<01:07,  1.55it/s]

Error extracting text from https://www.nytimes.com/2017/08/21/us/navy-collisions-history-mccain-fitzgerald.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/21/us/navy-collisions-history-mccain-fitzgerald.html?_r=0


Processing URLs:  90%|████████▉ | 897/1000 [38:20<01:31,  1.12it/s]

Error extracting text from http://thehill.com/blogs/floor-action/senate/326625-grassley-gorsuch-will-fall-short-of-60-votes: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/326625-grassley-gorsuch-will-fall-short-of-60-votes/


Processing URLs:  90%|████████▉ | 899/1000 [38:30<03:59,  2.37s/it]

Error extracting text from https://www.nytimes.com/2017/08/20/world/asia/uss-john-mccain-collision-merchant-ship.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/20/world/asia/uss-john-mccain-collision-merchant-ship.html


Processing URLs:  90%|█████████ | 901/1000 [38:32<03:02,  1.84s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-26/tsipras-faces-new-imf-curve-ball-to-unlock-greek-aid-scenarios


Processing URLs:  91%|█████████ | 909/1000 [38:49<03:34,  2.35s/it]

Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:46417f21862e443fb9d7e99db4c838e9: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:46417f21862e443fb9d7e99db4c838e9 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30787ddf0>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  91%|█████████ | 911/1000 [38:50<02:22,  1.60s/it]

URL filtered: https://mobile.twitter.com/akarlin0/status/1499680986728218625


Processing URLs:  91%|█████████▏| 913/1000 [38:51<01:33,  1.08s/it]

Error extracting text from http://thehill.com/policy/transportation/248230-mccarthy-to-senate-keep-ex-im-out-of-highway-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/transportation/248230-mccarthy-to-senate-keep-ex-im-out-of-highway-bill/


Processing URLs:  92%|█████████▏| 916/1000 [38:53<01:07,  1.25it/s]

Error extracting text from http://caracaschronicles.com/2015/12/01/the-simple-majority-con/: 403 Client Error: Forbidden for url: http://caracaschronicles.com/2015/12/01/the-simple-majority-con/


Processing URLs:  92%|█████████▏| 917/1000 [38:54<01:27,  1.05s/it]

Error extracting text from http://www.ibtimes.com/europe-refugee-crisis-facts-wealthy-educated-syrians-risking-lives-leave-war-2089018: 403 Client Error: Forbidden for url: https://www.ibtimes.com/europe-refugee-crisis-facts-wealthy-educated-syrians-risking-lives-leave-war-2089018


Processing URLs:  92%|█████████▏| 922/1000 [39:03<02:06,  1.62s/it]

Error extracting text from http://www.ibtimes.com/what-dark-matter-made-scientists-estimate-mass-prime-candidate-axions-2440949: 403 Client Error: Forbidden for url: https://www.ibtimes.com/what-dark-matter-made-scientists-estimate-mass-prime-candidate-axions-2440949


Processing URLs:  92%|█████████▎| 925/1000 [39:08<01:36,  1.29s/it]

Error extracting text from http://abcnews.go.com/Technology/wireStory/hit-list-exposes-russian-hacking-us-elections-50878281: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/hit-list-exposes-russian-hacking-us-elections-50878281
Error extracting text from http://www.nytimes.com/2016/07/12/business/us-chamber-of-commerce-donald-trump.html?emc=edit_th_20160712&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/12/business/us-chamber-of-commerce-donald-trump.html?emc=edit_th_20160712&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=1


Processing URLs:  93%|█████████▎| 927/1000 [39:10<01:23,  1.14s/it]

Error extracting text from https://www.realcleardefense.com/articles/2020/11/30/beijings_line_on_the_south_china_sea_nothing_to_see_here_651344.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2020/11/30/beijings_line_on_the_south_china_sea_nothing_to_see_here_651344.html
URL filtered: https://www.bloomberg.com/news/articles/2017-03-06/u-s-oil-industry-becomes-refiner-to-the-world-as-exports-boom


Processing URLs:  93%|█████████▎| 929/1000 [39:10<00:48,  1.47it/s]

Error extracting text from http://www.ingoswann.com/super-powers-of-the-human-biomind.html: 404 Client Error: Not Found for url: https://ingoswann.com/super-powers-of-the-human-biomind.html


Processing URLs:  93%|█████████▎| 932/1000 [39:14<01:11,  1.06s/it]

Error extracting text from http://abcnews.com.co/donald-trump-protester-speaks-out-i-was-paid-to-protest/: 403 Client Error: Forbidden for url: https://hollywoodgazette.com/donald-trump-protester-speaks-out-i-was-paid-to-protest/


Processing URLs:  94%|█████████▎| 935/1000 [39:22<02:05,  1.93s/it]

Error extracting text from https://www.operationblockbuster.com/: HTTPSConnectionPool(host='www.operationblockbuster.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe842f60>: Failed to resolve 'www.operationblockbuster.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  94%|█████████▍| 939/1000 [39:26<01:06,  1.09s/it]

Error extracting text from http://english.aawsat.com/2016/05/article55351442/jaish-al-islam-al-rahman-legion-agree-end-fight-eastern-ghouta: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/05/article55351442/jaish-al-islam-al-rahman-legion-agree-end-fight-eastern-ghouta


Processing URLs:  95%|█████████▍| 946/1000 [39:43<02:05,  2.32s/it]

Error extracting text from http://www.joc.com/maritime-news/container-lines/hapag-lloyd/hapag-won’t-upsize-panama-canal-tonnage-end-year-ceo-says_20160114.html: 404 Client Error: Not Found for url: https://www.joc.com/article/hapag-wont-upsize-panama-canal-tonnage-end-year-ceo-says_20160114.html


Processing URLs:  95%|█████████▌| 950/1000 [39:48<01:08,  1.37s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/recent-developments-surrounding-south-china-sea-44395305: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/recent-developments-surrounding-south-china-sea-44395305


Processing URLs:  96%|█████████▌| 956/1000 [40:06<01:50,  2.51s/it]

Error extracting text from http://www.baltimoresun.com/news/maryland/education/k-12/bs-md-nsa-challenge-20171106-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/news/maryland/education/k-12/bs-md-nsa-challenge-20171106-story.html


Processing URLs:  96%|█████████▌| 959/1000 [40:08<00:50,  1.23s/it]

Error extracting text from http://www.komodoexercise.org/#!komodo-exercise-2014/c21wy: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303203e90>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nato.int/cps/en/natohq/news_128096.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_128096.htm
URL filtered: https://twitter.com/drtoddwo/status/789247038764425217


Processing URLs:  97%|█████████▋| 966/1000 [40:14<00:38,  1.15s/it]

Error extracting text from http://www.scout.com/military/warrior/story/1694626-china-s-success-with-east-china-sea-strategy: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  97%|█████████▋| 967/1000 [40:16<00:42,  1.28s/it]

Error extracting text from http://www.financialexpress.com/article/world-news/syrias-bashar-al-assad-makes-rare-appearance-outside-capital-for-eid/308670/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/world-news/syrias-bashar-al-assad-makes-rare-appearance-outside-capital-for-eid/308670/


Processing URLs:  97%|█████████▋| 972/1000 [40:33<01:10,  2.51s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/26216-senators-select-members-to-decide-electoral-reforms: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/26216-senators-select-members-to-decide-electoral-reforms


Processing URLs:  97%|█████████▋| 974/1000 [40:39<01:11,  2.75s/it]

Error extracting text from http://www.shalegas.international/2016/01/19/will-europe-become-the-preferred-2016-destination-for-u-s-lng/: 404 Client Error: Not Found for url: https://shalegas.international/2016/01/19/will-europe-become-the-preferred-2016-destination-for-u-s-lng/


Processing URLs:  98%|█████████▊| 978/1000 [40:53<01:33,  4.23s/it]

Error extracting text from http://www.australianetworknews.com/australia-vs-isis-aussie-troops-plan-attack-on-mosul-train-iraqi-soldiers/: HTTPConnectionPool(host='www.australianetworknews.com', port=80): Max retries exceeded with url: /australia-vs-isis-aussie-troops-plan-attack-on-mosul-train-iraqi-soldiers/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30618bf50>: Failed to resolve 'www.australianetworknews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  98%|█████████▊| 984/1000 [41:03<00:33,  2.11s/it]

URL filtered: http://www.bloomberg.com/news/articles/2014-02-19/tesla-quarterly-results-beat-analyst-estimates-on-model-s-growth


Processing URLs:  99%|█████████▉| 991/1000 [41:15<00:15,  1.76s/it]

Error extracting text from http://www.counterpunch.org/2015/12/29/fukushima-today/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2015/12/29/fukushima-today/


Processing URLs:  99%|█████████▉| 993/1000 [41:18<00:10,  1.46s/it]

Error extracting text from http://latincorrespondent.com/2015/12/leaking-locks-disrupt-panama-canal-expansion/: 403 Client Error: Forbidden for url: https://latincorrespondent.com/2015/12/leaking-locks-disrupt-panama-canal-expansion/


Processing URLs: 100%|█████████▉| 995/1000 [42:19<01:33, 18.78s/it]

Error extracting text from http://www.irantracker.org/iran-news-round-may-19-2016: HTTPConnectionPool(host='www.irantracker.org', port=80): Max retries exceeded with url: /iran-news-round-may-19-2016 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30618b3b0>, 'Connection to www.irantracker.org timed out. (connect timeout=60)'))


Processing URLs: 100%|█████████▉| 996/1000 [42:20<00:54, 13.67s/it]

URL filtered: https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/?utm_content=bufferce144&amp;utm_medium=social&amp;utm_source=linkedin.com&amp;utm_campaign=buffer


Processing URLs: 100%|██████████| 1000/1000 [42:26<00:00,  2.55s/it]


Error extracting text from http://www.reuters.com/article/us-usa-fiscal-senate-idUSKBN0TT2PD20151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fiscal-senate-idUSKBN0TT2PD20151210


Processing URLs:   1%|          | 10/1000 [00:22<19:41,  1.19s/it] 

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKCN1AV04M?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKCN1AV04M?il=0


Processing URLs:   2%|▏         | 17/1000 [00:45<47:10,  2.88s/it]  

Error extracting text from https://paxos.com/usdp/.: 403 Client Error: Forbidden for url: https://paxos.com/usdp/


Processing URLs:   2%|▏         | 18/1000 [00:45<34:17,  2.10s/it]

Error extracting text from http://news.yahoo.com/weak-turnout-brazil-impeachment-protests-180205914.html: 404 Client Error: Not Found for url: http://news.yahoo.com/weak-turnout-brazil-impeachment-protests-180205914.html


Processing URLs:   2%|▏         | 22/1000 [00:51<24:00,  1.47s/it]

Error extracting text from http://www.reuters.com/article/2015/09/28/us-mideast-crisis-putin-usa-idUSKCN0RR0H820150928: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/28/us-mideast-crisis-putin-usa-idUSKCN0RR0H820150928
URL filtered: http://www.bloomberg.com/news/articles/2015-12-11/morgan-stanley-sees-triple-tantrum-if-central-banks-find-success


Processing URLs:   2%|▎         | 25/1000 [00:53<13:44,  1.18it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/house-bill/757/text?q=%7B%22search%22%3A%5B%22hr+757%22%5D%7D: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/house-bill/757/text?q=%7B%22search%22%3A%5B%22hr+757%22%5D%7D


Processing URLs:   3%|▎         | 29/1000 [01:00<25:22,  1.57s/it]

Error extracting text from http://mobile.nytimes.com/2016/04/14/world/middleeast/senate-votes-to-ban-imports-of-syrian-art-and-antiquities.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/04/14/world/middleeast/senate-votes-to-ban-imports-of-syrian-art-and-antiquities.html?_r=0&amp;referer=https://www.google.com/


Processing URLs:   3%|▎         | 33/1000 [01:11<37:50,  2.35s/it]

Error extracting text from https://www.justsecurity.org/72318/chads-counterterrorism-support-abroad-drives-repression-and-discontent-at-home/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/72318/chads-counterterrorism-support-abroad-drives-repression-and-discontent-at-home/
URL filtered: https://markets.businessinsider.com/currencies/news/facebook-stock-billionaire-mike-novogratz-ahead-novi-digital-wallet-launch-2021-4-1030281618


Processing URLs:   5%|▍         | 46/1000 [01:29<25:43,  1.62s/it]

Error extracting text from https://www.reuters.com/news/world: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/news/world


Processing URLs:   5%|▌         | 53/1000 [01:38<21:48,  1.38s/it]

Error extracting text from http://thehill.com/homenews/senate/361839-graham-trump-throwing-a-life-line-to-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/361839-graham-trump-throwing-a-life-line-to-moore/


Processing URLs:   6%|▌         | 59/1000 [01:44<13:18,  1.18it/s]

Error extracting text from http://www.nato.int/docu/update/2003/03-march/e0326b.htm: 403 Client Error: Forbidden for url: http://www.nato.int/docu/update/2003/03-march/e0326b.htm
Error extracting text from http://www.reuters.com/article/us-burundi-unrest-amnesty-idUSKCN0V62TV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-unrest-amnesty-idUSKCN0V62TV
URL filtered: https://www.thedailybeast.com/facebook-earned-dollar220k-from-trump-pac-ads-after-banning-trump-himself


Processing URLs:   6%|▋         | 64/1000 [01:50<16:27,  1.05s/it]

Error extracting text from https://thehill.com/homenews/campaign/513456-bush-endorsing-biden-dont-hold-your-breath: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/513456-bush-endorsing-biden-dont-hold-your-breath/


Processing URLs:   7%|▋         | 67/1000 [01:53<15:26,  1.01it/s]

Error extracting text from http://www.nytimes.com/2016/12/02/world/asia/afghanistan-security-terrorism-taliban.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/02/world/asia/afghanistan-security-terrorism-taliban.html


Processing URLs:   7%|▋         | 70/1000 [01:56<11:35,  1.34it/s]

Error extracting text from http://www.tradingeconomics.com/japan/gdp-growth: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/japan/gdp-growth
Error extracting text from http://www.reuters.com/article/us-usa-trump-idUSKBN14V18L?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-idUSKBN14V18L?il=0


Processing URLs:   7%|▋         | 73/1000 [02:01<20:14,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN1762XX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN1762XX


Processing URLs:   8%|▊         | 76/1000 [02:03<12:52,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-spain-politics-election-idUSKCN10D14S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-election-idUSKCN10D14S


Processing URLs:   8%|▊         | 85/1000 [02:13<15:58,  1.05s/it]

Error extracting text from https://www.amazon.com/Localism-philosophy-government-Mark-Moore-ebook/dp/B00B0GACAQ: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Localism-philosophy-government-Mark-Moore-ebook/dp/B00B0GACAQ


Processing URLs:   9%|▊         | 87/1000 [02:24<41:30,  2.73s/it]

Error extracting text from http://www.wsj.com/articles/house-passes-highway-bill-with-export-import-bank-renewal-1446740235: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/house-passes-highway-bill-with-export-import-bank-renewal-1446740235
Error extracting text from https://www.reuters.com/business/energy/oil-prices-climb-second-day-after-us-stockpiles-fall-2021-06-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-prices-climb-second-day-after-us-stockpiles-fall-2021-06-30/


Processing URLs:   9%|▉         | 90/1000 [02:28<28:14,  1.86s/it]

Error extracting text from https://www.nytimes.com/2017/10/12/business/economy/what-would-happen-if-the-us-withdrew-from-nafta.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/12/business/economy/what-would-happen-if-the-us-withdrew-from-nafta.html


Processing URLs:   9%|▉         | 91/1000 [02:29<25:55,  1.71s/it]

Error extracting text from http://www.europarl.europa.eu/external/html/budgetataglance/default_en.html#hungary: 404 Client Error: Not Found for url: https://www.europarl.europa.eu/external/html/budgetataglance/default_en.html#hungary
Error extracting text from http://www.opec.org/opec_web/en/publications/3407.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/publications/3407.htm


Processing URLs:   9%|▉         | 93/1000 [02:31<20:05,  1.33s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-usa-dialogue-idUSKBN1AH51K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-usa-dialogue-idUSKBN1AH51K


Processing URLs:  10%|▉         | 98/1000 [02:35<14:48,  1.01it/s]

Error extracting text from http://www.kurdistan24.net/en/news/779ba322-3e64-40be-8b6e-afa88485d865/Senior-IS-leader-killed-near-Mosul: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/779ba322-3e64-40be-8b6e-afa88485d865/Senior-IS-leader-killed-near-Mosul


Processing URLs:  10%|█         | 105/1000 [02:45<17:16,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-frontex-idUSKCN0XF1PB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-frontex-idUSKCN0XF1PB


Processing URLs:  11%|█         | 109/1000 [02:52<25:20,  1.71s/it]

Error extracting text from http://www.realcleardefense.com/articles/2016/11/02/the_russian_nuclear_weapons_buildup_110294.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/11/02/the_russian_nuclear_weapons_buildup_110294.html
URL filtered: https://mobile.twitter.com/CNBCnow?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor


Processing URLs:  11%|█         | 112/1000 [02:54<14:21,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-deal-idUSKBN14H12V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-deal-idUSKBN14H12V


Processing URLs:  12%|█▏        | 116/1000 [02:59<14:34,  1.01it/s]

Error extracting text from http://www.nytimes.com/2016/12/06/world/asia/saudi-arabia-afghanistan.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/06/world/asia/saudi-arabia-afghanistan.html?_r=0
Error extracting text from https://www.congress.gov/search?q={%22source%22:%22legislation%22,%22congress%22:117}&searchResultViewType=expanded: 403 Client Error: Forbidden for url: https://www.congress.gov/search?q=%7B%22source%22:%22legislation%22,%22congress%22:117%7D&searchResultViewType=expanded


Processing URLs:  12%|█▏        | 121/1000 [03:07<20:48,  1.42s/it]

Error extracting text from http://www.nytimes.com/2015/09/22/world/middleeast/russia-deploys-ground-attack-aircraft-to-syrian-base.html?_r=2&amp;utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2AMideast%20Brief: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/22/world/middleeast/russia-deploys-ground-attack-aircraft-to-syrian-base.html?_r=2&amp;utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=New%20Campaign&amp;utm_term=%2AMideast%20Brief
URL filtered: https://www.bloomberglaw.com/product/blaw/document/X1Q6NUBE2K82?bc=W1siQmxvb21iZXJnIExhdyIsIi9wcm9kdWN0L2JsYXcvc2VhcmNoL2h1Yi9ncm91cC81YzljODZjNjExNGRkN2JkODQxOTg


Processing URLs:  13%|█▎        | 126/1000 [03:20<40:51,  2.80s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-04-26/almost-half-of-u-k-office-workers-are-back-at-their-desks?cmpid=BBD042621_CEU&amp;utm_medium=email&amp;utm_source=newsletter&amp;utm_term=210426&amp;utm_campaign=closeeurope&amp;sref=1mjpCW3y


Processing URLs:  13%|█▎        | 128/1000 [03:21<23:59,  1.65s/it]

Error extracting text from https://www.sciencemag.org/news/2020/07/misconduct-allegations-push-psychology-hero-his-pedestal: 403 Client Error: Forbidden for url: https://www.science.org/news/2020/07/misconduct-allegations-push-psychology-hero-his-pedestal


Processing URLs:  13%|█▎        | 131/1000 [03:23<18:44,  1.29s/it]

Error extracting text from http://www.ibtimes.co.uk/nicola-sturgeon-publishes-draft-bill-second-scottish-independence-referendum-1587290: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/nicola-sturgeon-publishes-draft-bill-second-scottish-independence-referendum-1587290


Processing URLs:  13%|█▎        | 134/1000 [04:27<4:23:13, 18.24s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/article175520831.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  14%|█▎        | 136/1000 [04:29<2:22:03,  9.86s/it]

URL filtered: https://www.youtube.com/watch?v=sTN7jD9ovUI


Processing URLs:  14%|█▍        | 139/1000 [04:32<1:05:43,  4.58s/it]

Error extracting text from http://www.komodoexercise.org/#!Multilateral-Naval-Exercise-Komodo-2016-Socialization/c1flr/1: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301992810>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  14%|█▍        | 143/1000 [04:41<52:02,  3.64s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-07-15/turkey-lira-etf-drop-as-prime-minister-says-military-revolting


Processing URLs:  15%|█▌        | 150/1000 [04:50<20:58,  1.48s/it]

Error extracting text from https://www.sanantoniomag.com/what-you-need-to-know-about-san-antonios-school-mask-mandate/: 403 Client Error: Forbidden for url: https://www.sanantoniomag.com/what-you-need-to-know-about-san-antonios-school-mask-mandate/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13T0OG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13T0OG


Processing URLs:  15%|█▌        | 151/1000 [04:50<16:01,  1.13s/it]

Error extracting text from http://www.reuters.com/article/2015/11/25/us-usa-economy-idUSKBN0TE1RY20151125: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/25/us-usa-economy-idUSKBN0TE1RY20151125


Processing URLs:  15%|█▌        | 153/1000 [05:07<1:13:23,  5.20s/it]

Error extracting text from https://www.wired.com/2017/05/fcc-votes-begin-dismantling-net-neutrality/: 504 Server Error: Gateway Time-out for url: https://www.wired.com/2017/05/fcc-votes-begin-dismantling-net-neutrality/


Processing URLs:  15%|█▌        | 154/1000 [05:07<53:53,  3.82s/it]  

Error extracting text from https://qlur.com/news/722793/uefas-due-date-for-euro-2020-hosts-as-well-as-wembleys-fan-strategies#: 404 Client Error: Not Found for url: https://www.qlur.com/news/722793/uefas-due-date-for-euro-2020-hosts-as-well-as-wembleys-fan-strategies


Processing URLs:  16%|█▌        | 155/1000 [05:09<45:10,  3.21s/it]

Error extracting text from https://www.defense.gov/Portals/1/Documents/pubs/2017_China_Military_Power_Report.PDF?ver=2017-06-06-141328-770: 404 Client Error: Not Found for url: https://www.defense.gov/Portals/1/Documents/pubs/2017_China_Military_Power_Report.PDF?ver=2017-06-06-141328-770


Processing URLs:  16%|█▌        | 162/1000 [05:19<19:44,  1.41s/it]

Error extracting text from http://news.nationalpost.com/news/canada/canadian-politics/top-soldier-defends-iraq-mission-as-non-combat-says-hes-expert-on-what-is-combat: 403 Client Error: Forbidden for url: https://nationalpost.com/category/news//


Processing URLs:  16%|█▋        | 163/1000 [05:21<20:33,  1.47s/it]

Error extracting text from https://www.reuters.com/article/britain-scotland-poll/scottish-nationalists-set-for-record-majority-boosting-independence-push-idUSL8N2JP224: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-scotland-poll/scottish-nationalists-set-for-record-majority-boosting-independence-push-idUSL8N2JP224


Processing URLs:  16%|█▋        | 165/1000 [05:22<14:22,  1.03s/it]

URL filtered: https://www.youtube.com/watch?v=qnH5BGfrV7o


Processing URLs:  17%|█▋        | 168/1000 [05:24<12:06,  1.15it/s]

Error extracting text from http://inside-poland.com/t/european-parliament-adopts-resolution-on-threat-to-democracy-amid-polands-constitutional-crisis/: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=inside-poland.com


Processing URLs:  17%|█▋        | 169/1000 [05:25<13:26,  1.03it/s]

Error extracting text from http://uk.reuters.com/article/uk-germany-deutsche-bank-moody-s-idUKKCN1230DD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: https://www.youtube.com/watch?v=lTi9qDJziAM


Processing URLs:  17%|█▋        | 171/1000 [05:27<13:33,  1.02it/s]

URL filtered: https://www.facebook.com/dangerburundi/posts/1535769710056282


Processing URLs:  18%|█▊        | 175/1000 [05:29<09:46,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-eu-google-antitrust-idUSKBN16E1VE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-google-antitrust-idUSKBN16E1VE
URL filtered: http://www.bloomberg.com/news/articles/2016-01-13/venezuelan-congress-backs-down-in-conflict-with-supreme-court


Processing URLs:  18%|█▊        | 181/1000 [05:33<07:44,  1.76it/s]

Error extracting text from http://www.reuters.com/article/2015/04/17/us-opec-oil-idUSKBN0N821M20150417: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/04/17/us-opec-oil-idUSKBN0N821M20150417
Error extracting text from http://blogs.wsj.com/japanrealtime/2014/12/15/toyotas-fuel-cell-powered-mirai-hits-showrooms/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/japanrealtime/2014/12/15/toyotas-fuel-cell-powered-mirai-hits-showrooms/


Processing URLs:  18%|█▊        | 183/1000 [05:37<13:53,  1.02s/it]

Error extracting text from http://www.timeinc.com/brands/: 403 Client Error: Forbidden for url: http://www.timeinc.com/brands/


Processing URLs:  19%|█▉        | 192/1000 [05:54<22:48,  1.69s/it]

Error extracting text from http://latincorrespondent.com/2016/01/venezuela-tops-corruption-charts/: 403 Client Error: Forbidden for url: https://latincorrespondent.com/2016/01/venezuela-tops-corruption-charts/


Processing URLs:  19%|█▉        | 193/1000 [05:55<20:52,  1.55s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-09-24/boehner-said-to-propose-government-shutdown-avoiding-strategy


Processing URLs:  20%|█▉        | 197/1000 [06:01<21:05,  1.58s/it]

Error extracting text from https://cdanews.com/2016/01/microsoft-corporation-to-warn-users-of-nation-state-hacks/: 404 Client Error: Not Found for url: https://cdanews.com/2016/01/microsoft-corporation-to-warn-users-of-nation-state-hacks/


Processing URLs:  20%|█▉        | 198/1000 [07:01<3:53:19, 17.46s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-01-29/syrian-opposition-not-in-geneva-on-the-day-of-peace-talks: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  20%|██        | 205/1000 [07:18<39:29,  2.98s/it]  

Error extracting text from http://blogs.wsj.com/moneybeat/2015/09/11/making-the-case-for-raising-rates-and-raising-them-next-week/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2015/09/11/making-the-case-for-raising-rates-and-raising-them-next-week/


Processing URLs:  21%|██        | 209/1000 [07:20<18:39,  1.42s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/


Processing URLs:  21%|██        | 212/1000 [07:26<18:34,  1.41s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-proposal-idUSKBN1632E4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-proposal-idUSKBN1632E4


Processing URLs:  22%|██▏       | 215/1000 [07:28<11:55,  1.10it/s]

Error extracting text from http://www.nytimes.com/2016/08/25/world/middleeast/turkey-syria-isis.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/25/world/middleeast/turkey-syria-isis.html


Processing URLs:  22%|██▏       | 218/1000 [07:30<10:01,  1.30it/s]

Error extracting text from http://www.forbes.com/sites/markadomanis/2015/12/14/russias-recession-hasnt-yet-had-much-of-a-demographic-impact/: 410 Client Error: Gone for url: https://www.forbes.com/sites/markadomanis/2015/12/14/russias-recession-hasnt-yet-had-much-of-a-demographic-impact/
Error extracting text from https://www.congress.gov/bill/117th-congress/house-bill/5376/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/117th-congress/house-bill/5376/text


Processing URLs:  22%|██▏       | 219/1000 [07:33<18:31,  1.42s/it]

Error extracting text from http://www.oxitec.com/first-phase-of-oxitecs-brazil-trial-successfully-completed/: 404 Client Error: Not Found for url: https://www.oxitec.com/first-phase-of-oxitecs-brazil-trial-successfully-completed/


Processing URLs:  22%|██▏       | 222/1000 [07:36<14:49,  1.14s/it]

Error extracting text from http://www.staugustine.net/blogs/rectify-names-a-blog-on-publishing/e2809cthey-had-learned-nothing-and-forgotten-nothinge2809d-march-11-2013/: 404 Client Error: Not Found for url: http://www.staugustine.net/blogs/rectify-names-a-blog-on-publishing/e2809cthey-had-learned-nothing-and-forgotten-nothinge2809d-march-11-2013/


Processing URLs:  22%|██▏       | 223/1000 [07:39<20:47,  1.60s/it]

Error extracting text from http://ballotpedia.org/wiki/index.php/State-by-state: 404 Client Error: Not Found for url: https://ballotpedia.org:443/wiki/index.php/State-by-state


Processing URLs:  22%|██▏       | 224/1000 [07:41<21:26,  1.66s/it]

Error extracting text from https://www.reuters.com/world/middle-east/uae-eases-covid-19-face-mask-rules-expo-nears-2021-09-22/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/uae-eases-covid-19-face-mask-rules-expo-nears-2021-09-22/


Processing URLs:  23%|██▎       | 234/1000 [07:58<28:02,  2.20s/it]

Error extracting text from http://www.gupc.com.pa/en/press/press-releases/232-el-panel-de-resolucion-de-disputas-dab-da-la-razon-a-gupc-en-principales-reclamos-en-el-proyecto-de-ampliacion-del-canal: 404 Client Error: Not Found for url: https://www.gupc.com.pa/en/press/press-releases/232-el-panel-de-resolucion-de-disputas-dab-da-la-razon-a-gupc-en-principales-reclamos-en-el-proyecto-de-ampliacion-del-canal


Processing URLs:  24%|██▎       | 237/1000 [08:01<16:40,  1.31s/it]

Error extracting text from https://www.researchgate.net/publication/3230647_The_Design_of_Low-Frequency_Underwater_Acoustic_Projectors_Present_Status_and_Future_Trends: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/3230647_The_Design_of_Low-Frequency_Underwater_Acoustic_Projectors_Present_Status_and_Future_Trends


Processing URLs:  25%|██▍       | 247/1000 [08:17<22:45,  1.81s/it]

Error extracting text from http://ohiopolitics.blog.daytondailynews.com/2015/10/13/whos-winning-the-democratic-debate/: 404 Client Error: Not Found for url: https://www.daytondailynews.com/blog/ohio-politics/2015/10/13/whos-winning-the-democratic-debate/


Processing URLs:  25%|██▍       | 248/1000 [08:17<16:53,  1.35s/it]

Error extracting text from http://www.wsj.com/articles/brazils-government-unveils-measures-to-alleviate-recession-1454017037: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-government-unveils-measures-to-alleviate-recession-1454017037


Processing URLs:  25%|██▌       | 250/1000 [08:19<14:01,  1.12s/it]

Error extracting text from http://webcache.googleusercontent.com/search?q=cache:Gp9HP1pbS-sJ:www.ft.com/fastft/2015/12/17/boj-to-get-through-2015-without-touching-policy/+&amp;cd=7&amp;hl=en&amp;ct=clnk&amp;gl=us: 404 Client Error: Not Found for url: http://webcache.googleusercontent.com/search?q=cache:Gp9HP1pbS-sJ:www.ft.com/fastft/2015/12/17/boj-to-get-through-2015-without-touching-policy/+&amp;cd=7&amp;hl=en&amp;ct=clnk&amp;gl=us


Processing URLs:  25%|██▌       | 254/1000 [08:21<06:47,  1.83it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-15/london-house-prices-fall-at-fastest-pace-since-financial-crisis
Error extracting text from http://www.reuters.com/article/us-britain-eu-ireland-idUSKBN1AD0KG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-ireland-idUSKBN1AD0KG


Processing URLs:  26%|██▌       | 257/1000 [08:23<07:40,  1.61it/s]

Error extracting text from http://www.thelocal.it/20160404/italy-pm-matteo-renzi-to-visit-iran: 403 Client Error: Forbidden for url: https://www.thelocal.it/20160404/italy-pm-matteo-renzi-to-visit-iran


Processing URLs:  26%|██▌       | 258/1000 [08:23<06:26,  1.92it/s]

Error extracting text from http://news.yahoo.com/court-rules-brazil-presidents-budget-accounting-illegal-232625747.html: 404 Client Error: Not Found for url: http://news.yahoo.com/court-rules-brazil-presidents-budget-accounting-illegal-232625747.html


Processing URLs:  26%|██▌       | 259/1000 [08:24<06:15,  1.97it/s]

Error extracting text from https://seekingalpha.com/article/3959933-predicting-stock-market-returns-lose-normal-switch-laplace: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3959933-predicting-stock-market-returns-lose-normal-switch-laplace


Processing URLs:  26%|██▌       | 260/1000 [08:24<05:36,  2.20it/s]

Error extracting text from https://www.realclearpolitics.com/video/2021/10/06/nih_director_collins_resignation_has_nothing_to_do_with_wuhan_lab_leak_theory.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/video/2021/10/06/nih_director_collins_resignation_has_nothing_to_do_with_wuhan_lab_leak_theory.html


Processing URLs:  26%|██▋       | 263/1000 [08:28<11:59,  1.02it/s]

Error extracting text from https://www.theglobeandmail.com/news/world/israeli-protesters-urge-netanyahu-to-step-down-over-bribery-allegations/article38004720/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/israeli-protesters-urge-netanyahu-to-step-down-over-bribery-allegations/article38004720/
URL filtered: https://m.youtube.com/watch?feature=youtu.be&amp;v=wbkS26PX4rc


Processing URLs:  27%|██▋       | 270/1000 [08:34<08:39,  1.41it/s]

Error extracting text from https://www.nytimes.com/2017/06/24/world/asia/korea-joint-winter-olympics.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/24/world/asia/korea-joint-winter-olympics.html
Error extracting text from http://www.wsj.com/articles/saudi-russia-qatar-venezuela-agree-to-freeze-oil-output-at-januarys-levels-1455615900: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-russia-qatar-venezuela-agree-to-freeze-oil-output-at-januarys-levels-1455615900


Processing URLs:  27%|██▋       | 272/1000 [08:39<19:12,  1.58s/it]

Error extracting text from http://www.opec.org/opec_web/en/311.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/311.htm


Processing URLs:  28%|██▊       | 280/1000 [09:51<3:46:18, 18.86s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/article42134757.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  28%|██▊       | 281/1000 [09:51<2:42:33, 13.57s/it]

Error extracting text from http://www.mystatesman.com/news/news/state-regional-govt-politics/ted-cruz-draws-crowds-on-his-amazing-ride-through-/npyjP/: 404 Client Error: OK for url: https://www.statesman.com/news/news/state-regional-govt-politics/ted-cruz-draws-crowds-on-his-amazing-ride-through-/npyjP/


Processing URLs:  28%|██▊       | 283/1000 [09:53<1:24:17,  7.05s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/05/manual-category-assignment-work-in-progress/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/05/manual-category-assignment-work-in-progress/


Processing URLs:  29%|██▉       | 288/1000 [10:00<27:13,  2.29s/it]  

Error extracting text from https://www.sfgate.com/sports/article/North-Korea-raises-eyebrows-with-choice-for-12704669.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/sports/article/North-Korea-raises-eyebrows-with-choice-for-12704669.php


Processing URLs:  29%|██▉       | 289/1000 [10:01<22:10,  1.87s/it]

Error extracting text from https://global.handelsblatt.com/politics/business-resistance-to-a-new-grand-coalition-is-building-867056: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/politics/business-resistance-to-a-new-grand-coalition-is-building-867056


Processing URLs:  30%|██▉       | 298/1000 [10:16<17:15,  1.48s/it]

URL filtered: https://twitter.com/CNBCnow/status/675444989653045248


Processing URLs:  30%|███       | 301/1000 [10:18<11:18,  1.03it/s]

Error extracting text from https://www.reuters.com/business/exclusive-us-has-reached-out-china-about-cutting-oil-imports-iran-officials-say-2021-09-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/exclusive-us-has-reached-out-china-about-cutting-oil-imports-iran-officials-say-2021-09-28/


Processing URLs:  30%|███       | 303/1000 [10:20<11:29,  1.01it/s]

Error extracting text from https://www.alleastafrica.com/2017/10/30/un-security-council-worried-s-sudan-amid-formations-53-new-rebel-groups/: 503 Server Error: Service Unavailable for url: https://www.alleastafrica.com/2017/10/30/un-security-council-worried-s-sudan-amid-formations-53-new-rebel-groups/
URL filtered: http://www.bloomberg.com/news/articles/2016-08-25/chinese-takeovers-trigger-global-backlash-ahead-of-g-20-summit


Processing URLs:  31%|███       | 307/1000 [10:22<07:39,  1.51it/s]

Error extracting text from http://www.wsj.com/articles/opec-moves-to-rein-in-costs-1446561549: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-moves-to-rein-in-costs-1446561549


Processing URLs:  31%|███       | 309/1000 [10:25<12:42,  1.10s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014862/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014862/


Processing URLs:  31%|███       | 311/1000 [10:28<13:28,  1.17s/it]

URL filtered: https://twitter.com/disclosetv/status/1497276667991138311


Processing URLs:  31%|███▏      | 314/1000 [10:30<10:49,  1.06it/s]

Error extracting text from https://www.fbi.gov/services/cjis/ucr/publications#Crime-in%20the%20U.S.: 403 Client Error: Forbidden for url: https://www.fbi.gov/services/cjis/ucr/publications#Crime-in%20the%20U.S.


Processing URLs:  32%|███▏      | 315/1000 [10:31<08:42,  1.31it/s]

Error extracting text from https://www.nytimes.com/2018/02/23/opinion/benjamin-netanyahu-israel.html?rref=collection%2Fsectioncollection%2Fopinion: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/23/opinion/benjamin-netanyahu-israel.html?rref=collection%2Fsectioncollection%2Fopinion


Processing URLs:  32%|███▏      | 317/1000 [10:33<11:39,  1.02s/it]

Error extracting text from http://af.reuters.com/article/commoditiesNews/idAFL8N1652VA?pageNumber=2&amp;virtualBrandChannel=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  32%|███▏      | 320/1000 [10:34<06:05,  1.86it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-defense-idUSKBN13V13Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-defense-idUSKBN13V13Q
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-idUSKCN1B918E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-idUSKCN1B918E


Processing URLs:  32%|███▎      | 325/1000 [10:42<10:16,  1.09it/s]

Error extracting text from https://www.wsj.com/articles/cia-creates-new-mission-center-to-turn-up-heat-on-iran-1496426232?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/cia-creates-new-mission-center-to-turn-up-heat-on-iran-1496426232?mod=e2fb


Processing URLs:  33%|███▎      | 330/1000 [11:48<3:36:30, 19.39s/it]

Error extracting text from http://www.einstein.yu.edu/faculty/484/nir-barzilai/: HTTPConnectionPool(host='www.einstein.yu.edu', port=80): Max retries exceeded with url: /faculty/484/nir-barzilai/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303fdc170>, 'Connection to www.einstein.yu.edu timed out. (connect timeout=60)'))


Processing URLs:  33%|███▎      | 331/1000 [11:50<2:37:11, 14.10s/it]

Error extracting text from http://www.presstv.com/Detail/2016/07/12/474793/US-letter-Obama-Iran-engagement: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/07/12/474793/US-letter-Obama-Iran-engagement


Processing URLs:  33%|███▎      | 334/1000 [11:53<58:42,  5.29s/it]  

Error extracting text from https://finance.yahoo.com/news/2-german-bond-yields-slip-152712811.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/2-german-bond-yields-slip-152712811.html
Error extracting text from http://www.nytimes.com/2016/07/01/us/politics/chris-christie-donald-trump.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/01/us/politics/chris-christie-donald-trump.html?_r=0


Processing URLs:  34%|███▎      | 335/1000 [11:54<46:15,  4.17s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.foliomag.com/time-inc-quad-graphics-expand-printing-partnership/: Document is empty


Processing URLs:  34%|███▍      | 338/1000 [11:59<25:02,  2.27s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2016/01/06/33/0401000000AEN20160106004100315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  34%|███▍      | 339/1000 [11:59<18:13,  1.65s/it]

Error extracting text from https://www.nytimes.com/2017/12/08/health/next-flu-pandemic.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/08/health/next-flu-pandemic.html


Processing URLs:  34%|███▍      | 341/1000 [12:02<17:17,  1.57s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/crowds-march-against-coup-targeting-brazils-president-dilma-rousseff/articleshow/50214206.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/crowds-march-against-coup-targeting-brazils-president-dilma-rousseff/articleshow/50214206.cms


Processing URLs:  34%|███▍      | 343/1000 [12:04<16:16,  1.49s/it]

Error extracting text from http://en.trend.az/iran/politics/2499270.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2499270.html


Processing URLs:  35%|███▍      | 347/1000 [12:46<2:02:53, 11.29s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18849-rumour-mill-grinds-over-tatmadaw-nld-power-sharing-negotiations.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18849-rumour-mill-grinds-over-tatmadaw-nld-power-sharing-negotiations.html


Processing URLs:  35%|███▍      | 349/1000 [12:47<1:02:03,  5.72s/it]

Error extracting text from http://www.reuters.com/article/us-vietnam-china-protests-idUSKBN0UG0FA20160103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-vietnam-china-protests-idUSKBN0UG0FA20160103


Processing URLs:  35%|███▌      | 350/1000 [12:58<1:18:28,  7.24s/it]

URL filtered: http://www.bloomberg.com/view/articles/2016-06-02/greece-don-t-just-sit-there-undo-something


Processing URLs:  36%|███▋      | 363/1000 [13:32<29:31,  2.78s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-01-05/nyse-says-it-no-longer-plans-to-delist-china-telecom-companies?sref=BwmcOiPO


Processing URLs:  36%|███▋      | 365/1000 [13:33<18:41,  1.77s/it]

Error extracting text from https://cleantechnica.com/2016/11/30/lucid-motors-atieva-factory-400-mile-atvus-arizona/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/11/30/lucid-motors-atieva-factory-400-mile-atvus-arizona/


Processing URLs:  37%|███▋      | 366/1000 [13:33<14:44,  1.39s/it]

Error extracting text from https://www.nytimes.com/live/2022/03/30/world/ukraine-russia-war-news/russia-is-moving-some-troops-away-from-kyiv-a-ukrainian-official-says: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/03/30/world/ukraine-russia-war-news/russia-is-moving-some-troops-away-from-kyiv-a-ukrainian-official-says


Processing URLs:  38%|███▊      | 378/1000 [13:49<11:06,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13D1G4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13D1G4


Processing URLs:  38%|███▊      | 382/1000 [13:57<19:42,  1.91s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-16/kenya-s-elections-who-s-who-guide-to-the-main-political-players


Processing URLs:  38%|███▊      | 384/1000 [14:00<16:58,  1.65s/it]

Error extracting text from http://www.isn.ethz.ch/Digital-Library/Articles/Detail/?id=194072: 404 Client Error: Not found UA for url: https://css.ethz.ch/en/services.html


Processing URLs:  39%|███▊      | 386/1000 [14:02<14:45,  1.44s/it]

Error extracting text from https://www.nytimes.com/2021/01/14/world/middleeast/yemen-famine-houthis.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/14/world/middleeast/yemen-famine-houthis.html


Processing URLs:  39%|███▊      | 387/1000 [14:06<19:53,  1.95s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-03/the-year-that-makes-or-breaks-brexit


Processing URLs:  39%|███▉      | 390/1000 [14:08<12:18,  1.21s/it]

Error extracting text from https://www.asminternational.org/web/cmdnetwork/home/-/journal_content/56/10180/25655039/NEWS: 404 Client Error: Not Found for url: https://www.asminternational.org/cmdnetwork/home/-/journal_content/56/10180/25655039/NEWS
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0VV1FO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0VV1FO
URL filtered: https://www.youtube.com/watch?v=e1JyYSUD6Ac


Processing URLs:  39%|███▉      | 394/1000 [14:13<14:28,  1.43s/it]

Error extracting text from http://inserbia.info/today/2015/10/bulatovic-say-no-to-nato-they-owe-us-blood/: 404 Client Error: Not Found for url: https://inserbia.info/today/2015/10/bulatovic-say-no-to-nato-they-owe-us-blood/


Processing URLs:  40%|███▉      | 395/1000 [14:13<11:46,  1.17s/it]

Error extracting text from http://www.cdm.me/english/trans-adriatic-pipeline-socar-preparing-a-feasibility-studies-for-montenegro: 403 Client Error: Forbidden for url: https://www.cdm.me/english/trans-adriatic-pipeline-socar-preparing-a-feasibility-studies-for-montenegro


Processing URLs:  40%|████      | 401/1000 [14:21<12:22,  1.24s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-tutsi-army-officers-missions-abroad-choose-defect-instead-returning-home-1576703: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-tutsi-army-officers-missions-abroad-choose-defect-instead-returning-home-1576703


Processing URLs:  40%|████      | 403/1000 [14:24<13:57,  1.40s/it]

Error extracting text from http://www.cfr.org/economics/venezuelas-economic-fractures/p32853: 404 Client Error: Not Found for url: https://www.cfr.org/economics/venezuelas-economic-fractures/p32853


Processing URLs:  41%|████      | 406/1000 [14:27<09:56,  1.00s/it]

Error extracting text from http://www.nytimes.com/2016/06/29/us/politics/benghazi-report-hillary-clinton-house-committee.html?emc=edit_th_20160629&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/29/us/politics/benghazi-report-hillary-clinton-house-committee.html?emc=edit_th_20160629&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  41%|████      | 407/1000 [14:28<07:35,  1.30it/s]

Error extracting text from https://www.nytimes.com/2018/01/16/world/asia/taliban-red-unit-afghanistan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/16/world/asia/taliban-red-unit-afghanistan.html


Processing URLs:  41%|████      | 411/1000 [14:31<07:37,  1.29it/s]

Error extracting text from http://www.publicnow.com/view/75A45699F2A3F44C116C3859DA7AAED2F1639FBA?3810xxx1454662916: 403 Client Error: Forbidden for url: https://www.publicnow.com/view/75A45699F2A3F44C116C3859DA7AAED2F1639FBA?3810xxx1454662916


Processing URLs:  41%|████▏     | 413/1000 [14:33<07:15,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/rivalries-stall-push-to-retake-iraqi-city-1457311742: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/rivalries-stall-push-to-retake-iraqi-city-1457311742


Processing URLs:  41%|████▏     | 414/1000 [14:34<08:51,  1.10it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/iran-attend-qatar-oil-production-meeting-38459346: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iran-attend-qatar-oil-production-meeting-38459346


Processing URLs:  42%|████▏     | 415/1000 [14:36<09:58,  1.02s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-election-scotland-idUKKBN17P08J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  42%|████▏     | 417/1000 [14:38<09:50,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-cyber-summit-northkorea-banks/banks-fearing-north-korea-hacking-prepare-defenses-cyber-experts-idUSKBN1D0320: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-summit-northkorea-banks/banks-fearing-north-korea-hacking-prepare-defenses-cyber-experts-idUSKBN1D0320
URL filtered: http://www.bloomberg.com/news/articles/2016-08-08/china-backlash-over-u-s-missile-shield-puts-north-asia-on-edge


Processing URLs:  42%|████▏     | 419/1000 [14:39<06:24,  1.51it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/09/22/venezuela-is-debt-swap-event-default-citgo-knows/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/09/22/venezuela-is-debt-swap-event-default-citgo-knows/


Processing URLs:  42%|████▏     | 421/1000 [14:40<06:50,  1.41it/s]

Error extracting text from http://iranfrontpage.com/views/2015/09/settlement-of-parchin-question-can-be-a-milestone-in-iran-iaea-ties/: 404 Client Error: Not Found for url: https://iranfrontpage.com/views/2015/09/settlement-of-parchin-question-can-be-a-milestone-in-iran-iaea-ties/


Processing URLs:  42%|████▎     | 425/1000 [14:44<08:32,  1.12it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.libertar.in/2016/03/crime-confirmado-dilma-monitorava-juiz.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.libertar.in/2016/03/crime-confirmado-dilma-monitorava-juiz.html&amp;prev=search


Processing URLs:  43%|████▎     | 432/1000 [14:52<08:55,  1.06it/s]

Error extracting text from http://mizzima.com/latest-news-politics-news-domestic/outgoing-myanmar-president-cancels-us-visit-monitor-political: 403 Client Error: Forbidden for url: http://mizzima.com/latest-news-politics-news-domestic/outgoing-myanmar-president-cancels-us-visit-monitor-political


Processing URLs:  44%|████▎     | 436/1000 [15:05<22:58,  2.44s/it]

Error extracting text from https://tass.com/russia/698665: 502 Server Error: Bad Gateway for url: https://tass.com/russia/698665


Processing URLs:  44%|████▎     | 437/1000 [15:09<26:39,  2.84s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-france-vaccines-idUSKBN29Y2OL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-france-vaccines-idUSKBN29Y2OL


Processing URLs:  44%|████▍     | 439/1000 [15:09<15:01,  1.61s/it]

Error extracting text from http://www.balkaninsight.com/en/article/media-bias-distorts-podgorica-protest-reports--10-15-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/media-bias-distorts-podgorica-protest-reports--10-15-2015


Processing URLs:  44%|████▍     | 442/1000 [15:11<09:24,  1.01s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/361519-top-bush-aide-gop-is-over-if-they-let-moore-serve-in-senate: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/361519-top-bush-aide-gop-is-over-if-they-let-moore-serve-in-senate/


Processing URLs:  45%|████▍     | 447/1000 [15:16<08:41,  1.06it/s]

Error extracting text from https://antonygreen.com.au/when-can-the-next-federal-election-be-held/: 403 Client Error: Forbidden for url: https://antonygreen.com.au/when-can-the-next-federal-election-be-held/


Processing URLs:  45%|████▌     | 454/1000 [15:23<07:46,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-cyberattack-sanctions-idUSKBN0KB16U20150102: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-cyberattack-sanctions-idUSKBN0KB16U20150102
URL filtered: https://www.bloomberg.com/news/videos/2018-01-18/trump-says-nafta-is-a-bad-joke-video


Processing URLs:  46%|████▌     | 457/1000 [16:23<2:13:24, 14.74s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2021-02-20/iran-spokesman-says-tehran-confident-about-lifting-of-us-sanctions-despite-wrangling: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  46%|████▌     | 460/1000 [16:29<1:00:25,  6.71s/it]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-usa-election-carson-idUSKCN0SV2CT20151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-usa-election-carson-idUSKCN0SV2CT20151106


Processing URLs:  46%|████▌     | 462/1000 [16:34<40:01,  4.46s/it]  

Error extracting text from http://www.reuters.com/article/us-eu-usa-ttip-idUSKCN12J10C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-ttip-idUSKCN12J10C


Processing URLs:  46%|████▋     | 463/1000 [16:35<32:14,  3.60s/it]

Error extracting text from http://www.cfr.org/greece/timeline-greeces-debt-crisis/p36451: 404 Client Error: Not Found for url: https://www.cfr.org/greece/timeline-greeces-debt-crisis/p36451


Processing URLs:  46%|████▋     | 465/1000 [16:37<19:37,  2.20s/it]

Error extracting text from http://www.scotland.com/currency/: 403 Client Error: Forbidden for url: https://www.scotland.com/currency/


Processing URLs:  47%|████▋     | 466/1000 [16:38<15:29,  1.74s/it]

Error extracting text from http://www.readingeagle.com/ap/article/china-takes-island-building-skills-to-dutertes-backyard: 404 Client Error: Not Found for url: https://www.readingeagle.com/ap/article/china-takes-island-building-skills-to-dutertes-backyard


Processing URLs:  47%|████▋     | 468/1000 [16:38<08:50,  1.00it/s]

Error extracting text from https://www.nytimes.com/2017/10/09/nyregion/amtrak-penn-station-derailments.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;reg: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/09/nyregion/amtrak-penn-station-derailments.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=photo-spot-region&amp;reg


Processing URLs:  48%|████▊     | 475/1000 [16:53<11:12,  1.28s/it]

Error extracting text from http://www.arabnews.com/node/1206931/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1206931/middle-east


Processing URLs:  48%|████▊     | 479/1000 [16:56<06:54,  1.26it/s]

URL filtered: https://www.youtube.com/watch?v=xBhUiYSnwsE
Error extracting text from https://www.timesofisrael.com/poll-shows-center-left-could-block-netanyahu-coalition-in-next-election/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/poll-shows-center-left-could-block-netanyahu-coalition-in-next-election/


Processing URLs:  48%|████▊     | 480/1000 [16:58<07:37,  1.14it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-22/venezuela-sees-crude-in-mid-20s-if-opec-doesn-t-take-action


Processing URLs:  48%|████▊     | 483/1000 [17:04<15:36,  1.81s/it]

Error extracting text from https://www.globalhungerindex.org/designations.html: 403 Client Error: Forbidden for url: https://www.globalhungerindex.org/designations.html


Processing URLs:  48%|████▊     | 485/1000 [17:06<12:43,  1.48s/it]

Error extracting text from https://www.faa.gov/uas/model_aircraft/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/model_aircraft/
Error extracting text from http://www.france24.com/en/20151013-task-force-lafayette-french-veterans-volunteer-fight-isis-islamic-state-group: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151013-task-force-lafayette-french-veterans-volunteer-fight-isis-islamic-state-group
URL filtered: http://www.youtube.com/watch?v=wTXXcr2GEtc&amp;sns=em
URL filtered: https://www.bloomberg.com/news/articles/2018-02-21/netanyahu-legal-woes-worsen-as-ex-top-aide-agrees-to-testify


Processing URLs:  49%|████▉     | 490/1000 [17:10<09:30,  1.12s/it]

Error extracting text from http://www.business-standard.com/article/news-ani/isis-fighting-hard-to-keep-control-of-stronghold-mosul-116111900853_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ani/isis-fighting-hard-to-keep-control-of-stronghold-mosul-116111900853_1.html


Processing URLs:  50%|████▉     | 498/1000 [17:22<13:37,  1.63s/it]

Error extracting text from https://www.wsj.com/articles/amazon-other-tech-giants-could-be-forced-to-shed-assets-under-house-bill-11623423248: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/amazon-other-tech-giants-could-be-forced-to-shed-assets-under-house-bill-11623423248


Processing URLs:  50%|████▉     | 499/1000 [17:23<11:47,  1.41s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-idUKKBN17U0QR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  50%|█████     | 501/1000 [17:24<08:42,  1.05s/it]

Error extracting text from http://www.nytimes.com/2015/10/21/us/politics/lawmakers-wrangle-over-plans-to-avert-manage-or-embrace-default.html?emc=edit_th_20151021&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/21/us/politics/lawmakers-wrangle-over-plans-to-avert-manage-or-embrace-default.html?emc=edit_th_20151021&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  50%|█████     | 503/1000 [17:27<09:07,  1.10s/it]

Error extracting text from https://cmr.asm.org/content/29/3/695: 403 Client Error: Forbidden for url: https://cmr.asm.org/content/29/3/695


Processing URLs:  50%|█████     | 505/1000 [17:29<07:49,  1.05it/s]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16G02Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16G02Y


Processing URLs:  51%|█████     | 510/1000 [17:51<45:59,  5.63s/it]

Error extracting text from http://jer.pennpress.org/media/26167/sampleArt22.pdf: 522 Server Error:  for url: https://jer.pennpress.org/media/26167/sampleArt22.pdf


Processing URLs:  52%|█████▏    | 519/1000 [18:19<19:52,  2.48s/it]

Error extracting text from http://www.baltimoresun.com/news/maryland/baltimore-city/bs-md-ci-syed-cell-phone-motion-20150824-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/news/maryland/baltimore-city/bs-md-ci-syed-cell-phone-motion-20150824-story.html


Processing URLs:  52%|█████▏    | 524/1000 [18:32<15:34,  1.96s/it]

Error extracting text from http://fuelfix.com/blog/2015/10/01/senate-panel-advances-oil-exports-in-party-line-vote-signalling-trouble-ahead-for-bill/#35264101=0: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2015/10/01/senate-panel-advances-oil-exports-in-party-line-vote-signalling-trouble-ahead-for-bill/#35264101=0


Processing URLs:  52%|█████▎    | 525/1000 [18:33<12:26,  1.57s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-21/ukraine-sees-imf-aid-at-less-than-third-its-projection-for-2016


Processing URLs:  53%|█████▎    | 527/1000 [18:33<07:02,  1.12it/s]

Error extracting text from https://www.teslamotors.com/blog/secret-tesla-motors-master-plan-just-between-you-and-me: 403 Client Error: Forbidden for url: https://www.teslamotors.com/blog/secret-tesla-motors-master-plan-just-between-you-and-me


Processing URLs:  53%|█████▎    | 532/1000 [18:41<09:09,  1.17s/it]

Error extracting text from http://drones.fsd.ch/wp-content/uploads/2016/09/Drones-in-Humanitarian-Acion-Survey-Analysis-FINAL21.pdf: HTTPConnectionPool(host='drones.fsd.ch', port=80): Max retries exceeded with url: /wp-content/uploads/2016/09/Drones-in-Humanitarian-Acion-Survey-Analysis-FINAL21.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d8ec0>: Failed to resolve 'drones.fsd.ch' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  54%|█████▎    | 535/1000 [19:13<1:03:45,  8.23s/it]

Error extracting text from http://gas2.org/2016/07/11/breaking-news-another-tesla-model-x-crashes-autopilot/: 522 Server Error:  for url: https://gas2.org/2016/07/11/breaking-news-another-tesla-model-x-crashes-autopilot/


Processing URLs:  54%|█████▎    | 537/1000 [19:14<35:59,  4.66s/it]  

Error extracting text from http://www.nationmultimedia.com/breakingnews/Myanmar-presidential-nomination-date-moved-up-30280508.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/breakingnews/Myanmar-presidential-nomination-date-moved-up-30280508.html


Processing URLs:  54%|█████▍    | 538/1000 [19:18<35:19,  4.59s/it]

Error extracting text from http://www.asean.org/storage/2015/12/RCEP-Leaders-Joint-Statement_22-Nov-2015_FINAL.pdf: 404 Client Error: Not Found for url: https://asean.org/storage/2015/12/RCEP-Leaders-Joint-Statement_22-Nov-2015_FINAL.pdf


Processing URLs:  54%|█████▍    | 539/1000 [19:20<29:43,  3.87s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.mexicostar.com/index.php/sid/247059303: Document is empty
URL filtered: https://www.youtube.com/watch?v=dsx2vdn7gpY
Error extracting text from http://www.nytimes.com/2015/10/01/us/politics/government-shutdown-congress.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/us/politics/government-shutdown-congress.html


Processing URLs:  54%|█████▍    | 543/1000 [19:21<09:48,  1.29s/it]

Error extracting text from http://www.wsj.com/articles/british-prime-minister-david-cameron-rules-out-another-vote-on-whether-to-leave-eu-1463492823: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/british-prime-minister-david-cameron-rules-out-another-vote-on-whether-to-leave-eu-1463492823


Processing URLs:  54%|█████▍    | 544/1000 [19:21<07:40,  1.01s/it]

Error extracting text from https://www.nytimes.com/2017/07/05/business/media/jeffrey-zucker-cnn-trump.html?emc=edit_mbe_20170706&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/05/business/media/jeffrey-zucker-cnn-trump.html?emc=edit_mbe_20170706&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;te=1&amp;_r=0


Processing URLs:  55%|█████▍    | 545/1000 [19:21<06:01,  1.26it/s]

Error extracting text from https://www.nytimes.com/2021/03/25/world/boris-johnson-hopes-britains-vaccine-success-will-vindicate-his-brexit-project.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/25/world/boris-johnson-hopes-britains-vaccine-success-will-vindicate-his-brexit-project.html


Processing URLs:  55%|█████▍    | 547/1000 [19:23<06:39,  1.13it/s]

Error extracting text from https://www.uber.com/about: 406 Client Error: Not Acceptable for url: https://www.uber.com/about


Processing URLs:  55%|█████▍    | 548/1000 [20:24<2:15:35, 18.00s/it]

Error extracting text from http://ourinsight.opinium.co.uk/survey-results/britons-and-europe: HTTPConnectionPool(host='ourinsight.opinium.co.uk', port=80): Max retries exceeded with url: /survey-results/britons-and-europe (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3051db890>, 'Connection to ourinsight.opinium.co.uk timed out. (connect timeout=60)'))


Processing URLs:  55%|█████▍    | 549/1000 [20:24<1:37:43, 13.00s/it]

Error extracting text from http://www.migrationpolicy.org/news/paradox-eu-turkey-refugee-deal: 403 Client Error: Forbidden for url: https://www.migrationpolicy.org/news/paradox-eu-turkey-refugee-deal


Processing URLs:  55%|█████▌    | 550/1000 [20:27<1:13:48,  9.84s/it]

Error extracting text from http://www.cedem.me/images/jDownloads_new/Program%20Empirijska%20istazivanja/Politicko%20javno%20mnjenje/CEDEM_decembar_2016_istrazivanje.pdf: 404 Client Error: Not Found for url: http://www.cedem.me/images/jDownloads_new/Program%20Empirijska%20istazivanja/Politicko%20javno%20mnjenje/CEDEM_decembar_2016_istrazivanje.pdf


Processing URLs:  56%|█████▌    | 560/1000 [20:51<25:36,  3.49s/it]  

Error extracting text from http://www.lfpress.com/2016/06/15/reports-of-eus-death-being-greatly-exaggerated: 403 Client Error: Forbidden for url: https://lfpress.com/


Processing URLs:  56%|█████▋    | 563/1000 [20:54<12:27,  1.71s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/314141-nd-governor-says-dakota-access-pipeline-will-likely-be-built: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/314141-nd-governor-says-dakota-access-pipeline-will-likely-be-built/


Processing URLs:  57%|█████▋    | 566/1000 [20:56<07:02,  1.03it/s]

Error extracting text from https://www.timesofisrael.com/israel-said-to-relax-some-gaza-restrictions-as-ceasefire-diplomacy-continues/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/israel-said-to-relax-some-gaza-restrictions-as-ceasefire-diplomacy-continues/


Processing URLs:  57%|█████▋    | 568/1000 [20:58<06:08,  1.17it/s]

Error extracting text from https://www.wsj.com/articles/oil-hits-11-week-high-on-u-s-stock-draws-1502358575: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-hits-11-week-high-on-u-s-stock-draws-1502358575


Processing URLs:  57%|█████▋    | 572/1000 [21:04<08:45,  1.23s/it]

Error extracting text from https://www.debka.com/us-strike-russians-laid-euphrates-bridge-among-targets/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /us-strike-russians-laid-euphrates-bridge-among-targets/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  57%|█████▋    | 573/1000 [21:05<08:40,  1.22s/it]

Error extracting text from http://uk.reuters.com/article/uk-afghanistan-taliban-idUKKCN0X12CW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  57%|█████▊    | 575/1000 [21:08<10:13,  1.44s/it]

URL filtered: https://www.youtube.com/watch?v=jndwB7kq0qM


Processing URLs:  58%|█████▊    | 578/1000 [21:11<09:08,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-energy-nationalparks-idUSKBN14V1EP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-energy-nationalparks-idUSKBN14V1EP


Processing URLs:  58%|█████▊    | 581/1000 [21:13<06:25,  1.09it/s]

URL filtered: https://www.youtube.com/watch?v=rVfIhrHTs5M


Processing URLs:  58%|█████▊    | 583/1000 [22:13<1:26:19, 12.42s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/haiti/article56092280.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  58%|█████▊    | 584/1000 [23:13<2:38:47, 22.90s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2016-02-24/kerry-advises-against-hitting-iran-with-more-sanctions-now: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  58%|█████▊    | 585/1000 [23:15<2:03:08, 17.80s/it]

Error extracting text from https://in.reuters.com/article/kaspersky-cyber/kaspersky-ceo-says-he-would-leave-if-russia-asked-him-to-spy-idINKBN1DT0I0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
Error extracting text from http://www.reuters.com/article/us-poland-constitution-idUSKCN0X11EO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-constitution-idUSKCN0X11EO


Processing URLs:  59%|█████▊    | 587/1000 [23:15<1:13:18, 10.65s/it]

Error extracting text from https://www.fbi.gov/investigate/terrorism: 403 Client Error: Forbidden for url: https://www.fbi.gov/investigate/terrorism


Processing URLs:  59%|█████▉    | 590/1000 [23:16<33:04,  4.84s/it]  

Error extracting text from http://allafrica.com/stories/201604240179.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201604240179.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303776660>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  59%|█████▉    | 591/1000 [23:17<24:48,  3.64s/it]

Error extracting text from https://seekingalpha.com/news/3271056-time-warner-chief-confident-approval-and-t-buyout: 403 Client Error: Forbidden for url: https://seekingalpha.com/news/3271056-time-warner-chief-confident-approval-and-t-buyout


Processing URLs:  59%|█████▉    | 594/1000 [23:28<21:00,  3.10s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-tsipras-imf-idUSKBN0TQ2Q520151207#I8OwIR2mrletHgRa.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-tsipras-imf-idUSKBN0TQ2Q520151207#I8OwIR2mrletHgRa.97


Processing URLs:  60%|█████▉    | 599/1000 [23:35<08:42,  1.30s/it]

Error extracting text from http://belfercenter.ksg.harvard.edu/publication/26252/former_pentagon_official_michael_sulmeyer_joins_harvard_kennedy_schools_belfer_center_as_director_of_cyber_security_project.html: HTTPConnectionPool(host='belfercenter.ksg.harvard.edu', port=80): Max retries exceeded with url: /publication/26252/former_pentagon_official_michael_sulmeyer_joins_harvard_kennedy_schools_belfer_center_as_director_of_cyber_security_project.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304921400>: Failed to resolve 'belfercenter.ksg.harvard.edu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  60%|██████    | 601/1000 [23:35<05:38,  1.18it/s]

Error extracting text from https://www.reuters.com/article/us-russia-politics/putin-approves-changes-allowing-him-to-stay-in-power-until-2036-idUSKBN20X1FD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-politics/putin-approves-changes-allowing-him-to-stay-in-power-until-2036-idUSKBN20X1FD


Processing URLs:  61%|██████    | 606/1000 [23:37<02:46,  2.36it/s]

Error extracting text from https://www.reuters.com/world/us/us-republican-report-says-coronavirus-leaked-chinese-lab-scientists-still-2021-08-02/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-republican-report-says-coronavirus-leaked-chinese-lab-scientists-still-2021-08-02/
Error extracting text from http://news.yahoo.com/surge-anti-american-hostility-iran-setback-jailed-us-125731631.html: 404 Client Error: Not Found for url: http://news.yahoo.com/surge-anti-american-hostility-iran-setback-jailed-us-125731631.html


Processing URLs:  61%|██████    | 607/1000 [23:39<05:02,  1.30it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-04/volkswagen-sued-by-u-s-for-cheating-on-emissions-standards


Processing URLs:  61%|██████▏   | 614/1000 [23:54<09:55,  1.54s/it]

Error extracting text from http://www.wsj.com/articles/brazilian-government-repays-14-billion-to-state-banks-1451513127: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazilian-government-repays-14-billion-to-state-banks-1451513127


Processing URLs:  62%|██████▏   | 620/1000 [24:06<09:57,  1.57s/it]

Error extracting text from https://www.wsj.com/livecoverage/trump-impeachment-house-biden/card/VYM8jPsFeIdBN13YQvNM: 403 Client Error: Forbidden for url: https://www.wsj.com/livecoverage/trump-impeachment-house-biden/card/VYM8jPsFeIdBN13YQvNM


Processing URLs:  62%|██████▎   | 625/1000 [24:11<05:21,  1.17it/s]

Error extracting text from http://www.gopusa.com/news/2015/10/10/donald-trumps-campaign-structure-is-for-real/: 403 Client Error: Forbidden for url: http://www.gopusa.com/news/2015/10/10/donald-trumps-campaign-structure-is-for-real/


Processing URLs:  63%|██████▎   | 628/1000 [24:17<07:54,  1.27s/it]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-syria-raqqa-20160604-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-syria-raqqa-20160604-snap-story.html


Processing URLs:  63%|██████▎   | 632/1000 [24:20<05:01,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-twc-cyberattack-idUSKBN0UL01P20160107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-twc-cyberattack-idUSKBN0UL01P20160107


Processing URLs:  64%|██████▎   | 636/1000 [24:25<05:52,  1.03it/s]

Error extracting text from http://www.washingtontimes.com/news/2017/apr/16/vladimir-putin-should-be-courted-by-trump-for-part/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/apr/16/vladimir-putin-should-be-courted-by-trump-for-part/


Processing URLs:  64%|██████▎   | 637/1000 [24:25<05:01,  1.20it/s]

Error extracting text from http://training.goodjudgment.com/keepingscore/index.html.&quot: HTTPConnectionPool(host='training.goodjudgment.com', port=80): Max retries exceeded with url: /keepingscore/index.html.&quot (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb56d0>: Failed to resolve 'training.goodjudgment.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  64%|██████▍   | 640/1000 [24:28<04:43,  1.27it/s]

Error extracting text from http://www.cnbc.com/2017/03/09/reuters-america-paths-open-to-new-pacific-trade-pact-post-tpp-chile-trade-head.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/03/09/reuters-america-paths-open-to-new-pacific-trade-pact-post-tpp-chile-trade-head.html


Processing URLs:  64%|██████▍   | 643/1000 [24:30<04:13,  1.41it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-bill-idUSKBN17Z0GZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-bill-idUSKBN17Z0GZ?il=0


Processing URLs:  64%|██████▍   | 644/1000 [24:31<04:34,  1.30it/s]

URL filtered: https://www.youtube.com/watch?v=0XyDbQ9rRB8


Processing URLs:  65%|██████▌   | 652/1000 [24:35<02:53,  2.01it/s]

Error extracting text from https://www.nytimes.com/2017/09/12/technology/apple-iphone-event.html?emc=edit_th_20170913&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/12/technology/apple-iphone-event.html?emc=edit_th_20170913&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  65%|██████▌   | 654/1000 [24:37<04:35,  1.26it/s]

Error extracting text from http://nationalinterest.org/blog/the-buzz/look-out-north-korean-nuclear-test-isolation-grows-14647: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/look-out-north-korean-nuclear-test-isolation-grows-14647


Processing URLs:  66%|██████▌   | 655/1000 [24:38<05:20,  1.08it/s]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-mosul-shiite-analysis-3abd2a3a-adbd-11e6-a31b-4b6397e625d0-20161118-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-mosul-shiite-analysis-3abd2a3a-adbd-11e6-a31b-4b6397e625d0-20161118-story.html


Processing URLs:  66%|██████▌   | 656/1000 [24:43<11:19,  1.98s/it]

URL filtered: https://twitter.com/brianlilley/status/1431761510062252033


Processing URLs:  66%|██████▌   | 658/1000 [24:44<07:50,  1.38s/it]

Error extracting text from https://missilethreat.csis.org/north-korea-tests-icbm-2/: 403 Client Error: Forbidden for url: https://missilethreat.csis.org/north-korea-tests-icbm-2/


Processing URLs:  66%|██████▋   | 664/1000 [24:54<08:15,  1.47s/it]

Error extracting text from https://warisboring.com/the-allied-force-fighting-to-liberate-mosul-is-tearing-itself-apart-6aaad154f707?mc_cid=fa8468ed77&amp;mc_eid=0467f21653#.1ozfpjr05: 403 Client Error: Forbidden for url: https://warisboring.com/the-allied-force-fighting-to-liberate-mosul-is-tearing-itself-apart-6aaad154f707?mc_cid=fa8468ed77&amp;mc_eid=0467f21653#.1ozfpjr05


Processing URLs:  67%|██████▋   | 669/1000 [24:59<04:58,  1.11it/s]

URL filtered: https://www.bloomberg.com/politics/videos/2017-12-12/eu-warns-u-k-about-reneging-on-brexit-agreements-video
Error extracting text from http://www.sandiegouniontribune.com/news/2016/feb/14/the-latest-iran-offers-to-help-syria-with-air/: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/2016/feb/14/the-latest-iran-offers-to-help-syria-with-air/


Processing URLs:  67%|██████▋   | 670/1000 [25:19<30:58,  5.63s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/once-adulated-greek-pm-tsipras-among-least-popular-chiefs/2016/11/30/8012b4b4-b737-11e6-939c-91749443c5e5_story.html?utm_term=.f934b2ec24c6: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/once-adulated-greek-pm-tsipras-among-least-popular-chiefs/2016/11/30/8012b4b4-b737-11e6-939c-91749443c5e5_story.html?utm_term=.f934b2ec24c6


Processing URLs:  67%|██████▋   | 671/1000 [25:22<26:54,  4.91s/it]

Error extracting text from http://www.praguemonitor.com/2016/04/19/czech-government-agrees-admitting-montenegro-nato: 404 Client Error: Not Found for url: https://praguemonitor.com/2016/04/19/czech-government-agrees-admitting-montenegro-nato


Processing URLs:  67%|██████▋   | 673/1000 [25:25<18:22,  3.37s/it]

Error extracting text from http://startbuyingstocks.com/buy-airbnb-stock-when-it-has-an-ipo/: 404 Client Error: Not Found for url: http://ww1.startbuyingstocks.com


Processing URLs:  68%|██████▊   | 675/1000 [25:29<13:50,  2.55s/it]

Error extracting text from http://www.newsweek.com/south-china-sea-dispute-everything-we-know-about-beijings-latest-569685?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/south-china-sea-dispute-everything-we-know-about-beijings-latest-569685?rx=us


Processing URLs:  68%|██████▊   | 676/1000 [25:30<11:02,  2.04s/it]

Error extracting text from http://uk.reuters.com/article/2015/09/19/uk-iran-nuclear-france-idUKKCN0RJ08N20150919: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  68%|██████▊   | 683/1000 [25:39<07:37,  1.44s/it]

Error extracting text from http://www.ibtimes.com/king-bhumibol-funeral-update-thai-crown-prince-maha-vajiralongkorn-appoints-princess-2433257: 403 Client Error: Forbidden for url: https://www.ibtimes.com/king-bhumibol-funeral-update-thai-crown-prince-maha-vajiralongkorn-appoints-princess-2433257


Processing URLs:  68%|██████▊   | 684/1000 [25:39<05:41,  1.08s/it]

Error extracting text from https://www.khaama.com/parliament-delays-voting-for-presidential-decree-regarding-electoral-reforms-01226: 403 Client Error: Forbidden for url: https://www.khaama.com/parliament-delays-voting-for-presidential-decree-regarding-electoral-reforms-01226


Processing URLs:  68%|██████▊   | 685/1000 [25:40<04:46,  1.10it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/256167-report-biden-plans-family-conversation-on-2016: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/256167-report-biden-plans-family-conversation-on-2016/


Processing URLs:  69%|██████▊   | 687/1000 [25:42<05:04,  1.03it/s]

Error extracting text from https://www.nytimes.com/2021/10/22/world/australia/new-zealand-vaccination-gangs.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/10/22/world/australia/new-zealand-vaccination-gangs.html


Processing URLs:  69%|██████▉   | 690/1000 [25:48<06:23,  1.24s/it]

Error extracting text from https://www.sandiegouniontribune.com/news/nation-world/story/2021-06-11/explainer-what-will-change-under-israels-new-government: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/nation-world/story/2021-06-11/explainer-what-will-change-under-israels-new-government


Processing URLs:  70%|██████▉   | 695/1000 [25:56<07:07,  1.40s/it]

Error extracting text from http://www.nytimes.com/2015/10/26/business/international/volkswagen-investigation-focus-to-include-managers-who-turned-a-blind-eye.html?emc=edit_th_20151026&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/26/business/international/volkswagen-investigation-focus-to-include-managers-who-turned-a-blind-eye.html?emc=edit_th_20151026&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  70%|██████▉   | 698/1000 [26:03<10:20,  2.06s/it]

Error extracting text from http://thebulletinpanama.com/2015/11/cracks-under-repair-in-canal-expansion/: 404 Client Error: Not Found for url: https://thebulletinpanama.com/2015/11/cracks-under-repair-in-canal-expansion/


Processing URLs:  70%|██████▉   | 699/1000 [26:03<08:26,  1.68s/it]

Error extracting text from https://global.handelsblatt.com/opinion/lies-damned-lies-and-political-epidemics-569208: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/opinion/lies-damned-lies-and-political-epidemics-569208


Processing URLs:  71%|███████   | 708/1000 [26:17<06:19,  1.30s/it]

Error extracting text from http://news.yahoo.com/chinas-pacific-actions-galvanize-neighbors-against-pentagon-chief-200337819.html: 404 Client Error: Not Found for url: http://news.yahoo.com/chinas-pacific-actions-galvanize-neighbors-against-pentagon-chief-200337819.html


Processing URLs:  71%|███████   | 710/1000 [26:19<04:26,  1.09it/s]

Error extracting text from http://www.nytimes.com/2015/09/18/world/finger-pointing-but-few-answers-after-a-syria-solution-fails.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/18/world/finger-pointing-but-few-answers-after-a-syria-solution-fails.html?_r=1


Processing URLs:  71%|███████   | 711/1000 [26:20<05:11,  1.08s/it]

Error extracting text from http://armscontrolcenter.org/the-real-facts-on-the-iran-nuclear-negotiations/: 403 Client Error: Forbidden for url: http://armscontrolcenter.org/the-real-facts-on-the-iran-nuclear-negotiations/


Processing URLs:  72%|███████▏  | 719/1000 [26:31<04:15,  1.10it/s]

Error extracting text from https://thehill.com/policy/national-security/553958-fbi-reclassifies-2017-baseball-field-shooting-as-domestic-terror: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/553958-fbi-reclassifies-2017-baseball-field-shooting-as-domestic-terror/


Processing URLs:  72%|███████▏  | 722/1000 [26:37<06:23,  1.38s/it]

Error extracting text from http://www.jstor.org/stable/2110937: 420 Client Error: Enhance Your Calm for url: http://www.jstor.org/stable/2110937


Processing URLs:  72%|███████▏  | 723/1000 [26:38<05:14,  1.14s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/iran-undertook-nuclear-bomb-studies-before-2009-iaea-finds


Processing URLs:  73%|███████▎  | 727/1000 [26:52<13:28,  2.96s/it]

Error extracting text from http://www.cfr.org/china/china-north-korea-relationship/p11097: 404 Client Error: Not Found for url: https://www.cfr.org/china/china-north-korea-relationship/p11097


Processing URLs:  73%|███████▎  | 728/1000 [26:52<10:48,  2.38s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-north-carolina-senate-burr-vs-ross: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-north-carolina-senate-burr-vs-ross


Processing URLs:  73%|███████▎  | 730/1000 [26:54<07:12,  1.60s/it]

Error extracting text from http://www.puntapacificarealty.com/panama-news/panama-canal-2014-expansion/: 403 Client Error: Forbidden for url: http://www.puntapacificarealty.com/panama-news/panama-canal-2014-expansion/
URL filtered: https://twitter.com/BraddJaffy/status/854527303635927040


Processing URLs:  74%|███████▎  | 737/1000 [27:06<05:25,  1.24s/it]

Error extracting text from http://www.wsj.com/articles/saudi-arabia-cuts-oil-prices-amid-opec-price-war-1443967741: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/saudi-arabia-cuts-oil-prices-amid-opec-price-war-1443967741


Processing URLs:  75%|███████▍  | 748/1000 [27:36<07:44,  1.84s/it]

Error extracting text from http://the-japan-news.com/news/article/0002881346: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002881346


Processing URLs:  75%|███████▌  | 754/1000 [27:55<12:13,  2.98s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN1A104O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN1A104O


Processing URLs:  76%|███████▌  | 755/1000 [27:56<09:50,  2.41s/it]

Error extracting text from https://texasnewstoday.com/oil-companies-cut-production-in-the-gulf-of-mexico-by-91-ahead-of-hurricane-ida/434835/: 404 Client Error: Not Found for url: https://texasnewstoday.com/oil-companies-cut-production-in-the-gulf-of-mexico-by-91-ahead-of-hurricane-ida/434835/


Processing URLs:  76%|███████▌  | 756/1000 [27:56<07:23,  1.82s/it]

Error extracting text from http://www.iranwatch.org/sanctions: 403 Client Error: Forbidden for url: https://www.iranwatch.org/sanctions


Processing URLs:  76%|███████▌  | 758/1000 [27:59<05:48,  1.44s/it]

Error extracting text from http://www.nytimes.com/2016/02/25/world/europe/refugees-migrants-austria-greece.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/25/world/europe/refugees-migrants-austria-greece.html?_r=0


Processing URLs:  76%|███████▌  | 759/1000 [27:59<05:04,  1.26s/it]

Error extracting text from https://insideevs.com/global-insights-electric-car-race-automakers-companies/: 410 Client Error: Gone for url: https://insideevs.com/news/336313/global-insights-the-electric-car-race-automakers-and-countries/


Processing URLs:  76%|███████▌  | 760/1000 [28:03<07:44,  1.94s/it]

Error extracting text from https://sscaitournament.com/: 523 Server Error:  for url: https://sscaitournament.com/


Processing URLs:  76%|███████▌  | 761/1000 [28:04<07:05,  1.78s/it]

Error extracting text from http://www.news24.com.ng/National/News/war-waged-against-fulani-herdsmen-20160705: HTTPConnectionPool(host='www.news24.com.ng', port=80): Max retries exceeded with url: /National/News/war-waged-against-fulani-herdsmen-20160705 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3056860f0>: Failed to resolve 'www.news24.com.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▋  | 764/1000 [28:16<09:58,  2.54s/it]

Error extracting text from http://www.oxforddictionaries.com/us/definition/american_english/will: 403 Client Error: Forbidden for url: https://languages.oup.com/


Processing URLs:  77%|███████▋  | 767/1000 [28:19<05:47,  1.49s/it]

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/09/04/0401000000AEN20150904000300315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  77%|███████▋  | 770/1000 [28:43<24:18,  6.34s/it]

Error extracting text from https://www.washingtonpost.com/politics/national-democratic-groups-bailing-on-ohio-senate-race/2016/08/30/68d985b8-6ec4-11e6-993f-73c693a89820_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/national-democratic-groups-bailing-on-ohio-senate-race/2016/08/30/68d985b8-6ec4-11e6-993f-73c693a89820_story.html


Processing URLs:  77%|███████▋  | 772/1000 [28:43<12:14,  3.22s/it]

Error extracting text from http://www.nytimes.com/2015/11/24/health/a-move-closer-to-total-disappearance-of-polio.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/24/health/a-move-closer-to-total-disappearance-of-polio.html?_r=0


Processing URLs:  77%|███████▋  | 774/1000 [28:45<07:55,  2.10s/it]

Error extracting text from http://www.gametheory.net/dictionary/WeaklyDominantStrategy.html: 406 Client Error: Not Acceptable for url: http://www.gametheory.net/dictionary/WeaklyDominantStrategy.html


Processing URLs:  78%|███████▊  | 777/1000 [28:50<05:28,  1.47s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/official_texts_17120.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/official_texts_17120.htm


Processing URLs:  78%|███████▊  | 780/1000 [28:55<06:11,  1.69s/it]

Error extracting text from https://www.confidencial.com.ni/politica/gabriel-alvarez-recomienda-a-la-oposicion-proponer-cambios-de-magistrados-en-el-cse/: 403 Client Error: Forbidden for url: https://www.confidencial.digital/politica/gabriel-alvarez-recomienda-a-la-oposicion-proponer-cambios-de-magistrados-en-el-cse/


Processing URLs:  78%|███████▊  | 782/1000 [28:58<06:34,  1.81s/it]

Error extracting text from http://politicalticker.blogs.cnn.com/2007/09/25/clinton-widens-lead-over-obama-in-new-hampshire/: 410 Client Error: Gone for url: http://politicalticker.blogs.cnn.com/2007/09/25/clinton-widens-lead-over-obama-in-new-hampshire/


Processing URLs:  79%|███████▊  | 787/1000 [29:09<07:48,  2.20s/it]

Error extracting text from https://www.reuters.com/article/us-china-congress-wang/chinas-xi-looks-set-to-keep-right-hand-man-on-despite-age-idUSKBN1CG0JI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-congress-wang/chinas-xi-looks-set-to-keep-right-hand-man-on-despite-age-idUSKBN1CG0JI
URL filtered: https://www.youtube.com/watch?v=2MhJxx6A7dE


Processing URLs:  79%|███████▉  | 789/1000 [29:10<05:27,  1.55s/it]

Error extracting text from https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Risikogebiete_neu.html/: 404 Client Error: Not Found for url: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Risikogebiete_neu.html/


Processing URLs:  79%|███████▉  | 790/1000 [29:11<04:29,  1.28s/it]

Error extracting text from http://seekingalpha.com/article/4012053-nextev-atieva-faraday-future-finalize-development-tesla-competitors: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4012053-nextev-atieva-faraday-future-finalize-development-tesla-competitors


Processing URLs:  79%|███████▉  | 792/1000 [29:12<03:02,  1.14it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/geral/o-nome-da-crise-e-dilma-o-sobrenome-e-pt/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/geral/o-nome-da-crise-e-dilma-o-sobrenome-e-pt/&amp;prev=search
Error extracting text from http://ajw.asahi.com/article/behind_news/politics/AJ201601230027: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/behind_news/politics/AJ201601230027 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3073ce690>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▉  | 794/1000 [29:12<01:52,  1.84it/s]

Error extracting text from http://www.reuters.com/article/us-colombia-rebels-idUSKCN0VS01Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-colombia-rebels-idUSKCN0VS01Z


Processing URLs:  80%|███████▉  | 797/1000 [29:18<04:42,  1.39s/it]

Error extracting text from https://www.foreignaffairs.com/articles/syria/2016-01-19/assad-has-it-his-way: 500 Server Error: Internal Server Error for url: https://www.foreignaffairs.com/articles/syria/2016-01-19/assad-has-it-his-way


Processing URLs:  80%|████████  | 800/1000 [29:21<03:40,  1.10s/it]

Error extracting text from https://www.france24.com/en/live-news/20210702-israeli-settlers-leave-west-bank-outpost-after-govt-deal: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210702-israeli-settlers-leave-west-bank-outpost-after-govt-deal


Processing URLs:  80%|████████  | 802/1000 [29:24<03:59,  1.21s/it]

Error extracting text from http://www.europarl.europa.eu/sides/getDoc.do?type=AMD&amp;reference=B8-2016-0309&amp;format=PDF&amp;language=EN&amp;secondRef=002-002: 404 Client Error:  for url: https://www.europarl.europa.eu/sides/getDoc.do?type=AMD&amp;reference=B8-2016-0309&amp;format=PDF&amp;language=EN&amp;secondRef=002-002


Processing URLs:  81%|████████  | 806/1000 [29:29<03:21,  1.04s/it]

Error extracting text from https://www.wsj.com/articles/goldmans-venezuela-bond-trade-wasnt-reviewed-by-top-executives-1496411586: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/goldmans-venezuela-bond-trade-wasnt-reviewed-by-top-executives-1496411586


Processing URLs:  81%|████████  | 808/1000 [29:30<02:36,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN16405J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN16405J


Processing URLs:  81%|████████  | 811/1000 [29:31<01:25,  2.21it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-07/pdvsa-extends-debt-swap-deadline-on-low-bondholder-participation
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-davos-exclusive-idUSKCN0V10BO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-davos-exclusive-idUSKCN0V10BO
Error extracting text from http://www.nigeriatoday.ng/2016/06/again-suspected-fulani-herders-kill-farmer-in-delta/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/06/again-suspected-fulani-herders-kill-farmer-in-delta/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d94f0>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  81%|████████▏ | 814/1000 [29:32<01:01,  3.03it/s]

Error extracting text from https://www.khaama.com/afghanistan-likely-to-get-more-attack-helicopters-and-artillery-from-india-01859: 403 Client Error: Forbidden for url: https://www.khaama.com/afghanistan-likely-to-get-more-attack-helicopters-and-artillery-from-india-01859


Processing URLs:  82%|████████▏ | 815/1000 [29:32<01:29,  2.07it/s]

Error extracting text from http://www.econtalk.org/archives/2016/04/richard_jones_o.html: 403 Client Error: Forbidden for url: http://www.econtalk.org/archives/2016/04/richard_jones_o.html


Processing URLs:  82%|████████▏ | 816/1000 [29:33<01:37,  1.89it/s]

Error extracting text from http://csbcorrespondent.com/market-update-november-18-2015: 403 Client Error: Forbidden for url: https://www.southstatecorrespondent.com


Processing URLs:  82%|████████▏ | 818/1000 [29:39<04:13,  1.39s/it]

Error extracting text from http://www.wsj.com/articles/iran-oil-minister-refuses-saudi-demand-to-curb-crude-output-1459696604: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iran-oil-minister-refuses-saudi-demand-to-curb-crude-output-1459696604


Processing URLs:  82%|████████▏ | 820/1000 [29:40<02:56,  1.02it/s]

Error extracting text from http://www.econtalk.org/archives/2007/07/bueno_de_mesqui.html: 403 Client Error: Forbidden for url: http://www.econtalk.org/archives/2007/07/bueno_de_mesqui.html


Processing URLs:  82%|████████▏ | 821/1000 [29:44<06:02,  2.02s/it]

Error extracting text from http://www.reuters.com/article/2015/09/17/us-iran-nuclear-congress-idUSKCN0RF2VX20150917: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/17/us-iran-nuclear-congress-idUSKCN0RF2VX20150917


Processing URLs:  82%|████████▎ | 825/1000 [29:49<04:23,  1.51s/it]

Error extracting text from https://www.usni.org/selected-writings-admiral-james-g-stavridis-us-navy: 403 Client Error: Forbidden for url: https://www.usni.org/selected-writings-admiral-james-g-stavridis-us-navy


Processing URLs:  83%|████████▎ | 830/1000 [29:55<02:56,  1.04s/it]

Error extracting text from https://cruxnow.com/global-church/2017/08/20/20-million-people-conflict-added-drought-means-no-food-eat/: 404 Client Error: Not Found for url: https://cruxnow.com/global-church/2017/08/20/20-million-people-conflict-added-drought-means-no-food-eat


Processing URLs:  83%|████████▎ | 833/1000 [29:59<03:25,  1.23s/it]

Error extracting text from http://m.therepublic.com/view/story/608f853b2110494db5acc0061c40528a/ML--Iran: 404 Client Error: Not Found for url: http://m.therepublic.com/view/story/608f853b2110494db5acc0061c40528a/ML--Iran


Processing URLs:  84%|████████▎ | 836/1000 [30:05<04:14,  1.55s/it]

Error extracting text from http://www.wsj.com/articles/san-francisco-feds-williams-sees-rate-increase-this-year-if-risks-dissipate-1441658566: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/san-francisco-feds-williams-sees-rate-increase-this-year-if-risks-dissipate-1441658566


Processing URLs:  84%|████████▍ | 838/1000 [30:07<03:25,  1.27s/it]

Error extracting text from http://www.c-span.org/video/?404436-1/james-clapper-testimony-global-threats: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?404436-1/james-clapper-testimony-global-threats


Processing URLs:  84%|████████▍ | 841/1000 [30:10<02:59,  1.13s/it]

URL filtered: https://www.youtube.com/watch?v=ZKbPPuaHnxw


Processing URLs:  85%|████████▍ | 847/1000 [30:21<04:27,  1.75s/it]

Error extracting text from http://zeenews.india.com/sports/cricket/icc-world-twenty20-2016/twenty20-world-cup-india-favorites-to-win-ms-dhoni-best-at-handling-pressure-says-kumar-sangakkara_1850187.html: 403 Client Error: Forbidden for url: https://zeenews.india.com/sports/cricket/icc-world-twenty20-2016/twenty20-world-cup-india-favorites-to-win-ms-dhoni-best-at-handling-pressure-says-kumar-sangakkara_1850187.html


Processing URLs:  85%|████████▌ | 850/1000 [31:25<47:10, 18.87s/it]

Error extracting text from https://www.aa.com.tr/en/economy/german-retailer-aldi-nord-to-raise-prices-by-20-50-on-monday/2554218: HTTPSConnectionPool(host='www.aa.com.tr', port=443): Read timed out. (read timeout=60)


Processing URLs:  85%|████████▌ | 851/1000 [31:27<34:44, 13.99s/it]

Error extracting text from http://mathbits.com/MathBits/TISection/Statistics2/correlation.htm: 404 Client Error: Not Found for url: https://www.mathbits.com/MathBits/TISection/Statistics2/correlation.htm


Processing URLs:  86%|████████▌ | 856/1000 [31:39<09:37,  4.01s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/19/saturday-afternoon-update/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/19/saturday-afternoon-update/


Processing URLs:  86%|████████▌ | 858/1000 [31:40<05:33,  2.35s/it]

Error extracting text from http://www.world-nuclear.org/info/Nuclear-Fuel-Cycle/Transport/Transport-of-Radioactive-Materials/: 404 Client Error: Not Found for url: https://www.world-nuclear.org/info/Nuclear-Fuel-Cycle/Transport/Transport-of-Radioactive-Materials/
Error extracting text from http://www.scientificamerican.com/article/horn-of-africa-grows-hotter-and-drier/: 403 Client Error: Forbidden for url: http://www.scientificamerican.com/article/horn-of-africa-grows-hotter-and-drier/


Processing URLs:  86%|████████▌ | 860/1000 [31:41<03:14,  1.39s/it]

Error extracting text from https://www.nytimes.com/2017/02/03/us/politics/iran-sanctions-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/03/us/politics/iran-sanctions-trump.html


Processing URLs:  86%|████████▌ | 861/1000 [31:42<02:24,  1.04s/it]

Error extracting text from http://news.yahoo.com/venezuelan-opposition-candidate-shot-dead-during-campaign-event-042049148.html: 404 Client Error: Not Found for url: http://news.yahoo.com/venezuelan-opposition-candidate-shot-dead-during-campaign-event-042049148.html


Processing URLs:  86%|████████▋ | 864/1000 [31:45<02:10,  1.04it/s]

Error extracting text from http://www.reuters.com/article/2015/10/31/greece-nbg-results-idUSL8N12V0Y920151031: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/31/greece-nbg-results-idUSL8N12V0Y920151031


Processing URLs:  86%|████████▋ | 865/1000 [31:46<02:04,  1.09it/s]

Error extracting text from http://in.reuters.com/article/india-australia-t20-sydney-raina-yuvraj-idINKCN0V90JE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  87%|████████▋ | 869/1000 [31:50<02:04,  1.06it/s]

Error extracting text from http://www.latimes.com/business/autos/la-fi-autos-tesla-earnings-20160803-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/autos/la-fi-autos-tesla-earnings-20160803-snap-story.html


Processing URLs:  88%|████████▊ | 881/1000 [32:07<02:52,  1.45s/it]

Error extracting text from http://www.theage.com.au/business/markets/opec-close-to-crushing-us-oil-boom-20151021-gkf7v8.html#ixzz3pGIDAfws: 404 Client Error: Not Found for url: https://www.theage.com.au/business/markets/opec-close-to-crushing-us-oil-boom-20151021-gkf7v8.html#ixzz3pGIDAfws
URL filtered: https://www.bloomberg.com/news/articles/2018-02-23/merkel-upsets-just-about-everyone-with-latest-eu-refugee-plan


Processing URLs:  88%|████████▊ | 884/1000 [32:08<01:35,  1.21it/s]

Error extracting text from http://www.saarc-sec.org/SAARC-Summit/7/: 404 Client Error: Not Found for url: https://www.saarc-sec.org/SAARC-Summit/7/


Processing URLs:  89%|████████▊ | 886/1000 [32:12<02:27,  1.29s/it]

Error extracting text from http://www.thejewishweek.com/editorial-opinion/editorial/iran-testing-us#eFVI9rTRd55FAsUV.99: 404 Client Error: Not Found for url: http://www.thejewishweek.com/editorial-opinion/editorial/iran-testing-us#eFVI9rTRd55FAsUV.99


Processing URLs:  89%|████████▉ | 890/1000 [32:15<01:45,  1.04it/s]

Error extracting text from http://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y31G20151209: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y31G20151209


Processing URLs:  90%|████████▉ | 897/1000 [32:23<02:02,  1.19s/it]

URL filtered: https://www.recode.net/2017/10/21/16512414/apple-amazon-facebook-google-tech-congress-lobbying-2017-russia-sex-trafficking-daca


Processing URLs:  90%|████████▉ | 899/1000 [32:24<01:27,  1.16it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56552#.WPxUS4jyuUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56552#.WPxUS4jyuUk


Processing URLs:  90%|█████████ | 901/1000 [32:26<01:18,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN16K1FH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN16K1FH?il=0


Processing URLs:  90%|█████████ | 904/1000 [32:33<02:36,  1.63s/it]

Error extracting text from http://uk.reuters.com/article/uk-colombia-peace-idUKKBN13A1Y3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  91%|█████████ | 907/1000 [32:35<01:29,  1.04it/s]

Error extracting text from https://www.middleeastmonitor.com/news/americas/22884-un-chief-syria-peace-must-not-be-dependent-on-assad: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/news/americas/22884-un-chief-syria-peace-must-not-be-dependent-on-assad


Processing URLs:  91%|█████████ | 912/1000 [32:45<02:26,  1.66s/it]

Error extracting text from http://mobile.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?referer=http://www.google.com.au/search?q=biden&amp;oq=bid&amp;gs_l=mobile-heirloom-serp.1.2.0i131l3j0l2.1812.2770.0.4810.4.3.0.1.1.0.358.1011.3-3.3.0....0...1c.1.34.mobile-heirloom-serp..0.4.1056.wpZE4BZjnWM: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/10/22/us/politics/joe-biden-will-not-run-for-president.html?referer=http://www.google.com.au/search%3fq=biden&amp;oq=bid&amp;gs_l=mobile-heirloom-serp.1.2.0i131l3j0l2.1812.2770.0.4810.4.3.0.1.1.0.358.1011.3-3.3.0....0...1c.1.34.mobile-heirloom-serp..0.4.1056.wpZE4BZjnWM


Processing URLs:  91%|█████████▏| 913/1000 [32:46<02:22,  1.64s/it]

Error extracting text from http://www.newsweek.com/dr-congo-how-do-you-solve-problem-kabila-428152: 403 Client Error: Forbidden for url: https://www.newsweek.com/dr-congo-how-do-you-solve-problem-kabila-428152


Processing URLs:  92%|█████████▏| 922/1000 [33:33<08:53,  6.84s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=similar&amp;id=marvel2016.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=similar&amp;id=marvel2016.htm


Processing URLs:  92%|█████████▏| 924/1000 [33:42<06:40,  5.27s/it]

Error extracting text from https://www.washingtonpost.com/politics/portmans-ohio-campaign-a-sharp-contrast-to-trumps/2016/07/16/35b442c4-4b4f-11e6-8dac-0c6e4accc5b1_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/portmans-ohio-campaign-a-sharp-contrast-to-trumps/2016/07/16/35b442c4-4b4f-11e6-8dac-0c6e4accc5b1_story.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKCN0VJ0WZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-talks-idUSKCN0VJ0WZ


Processing URLs:  92%|█████████▎| 925/1000 [33:42<04:43,  3.79s/it]

Error extracting text from https://www.wsj.com/articles/u-s-and-china-avert-potential-clash-in-u-n-over-myanmar-representation-11632234248: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-and-china-avert-potential-clash-in-u-n-over-myanmar-representation-11632234248


Processing URLs:  93%|█████████▎| 926/1000 [33:43<03:32,  2.87s/it]

Error extracting text from http://www.bostonherald.com/news/international/2017/03/putin_hails_russia_turkey_ties_as_he_hosts_syria_talks: 404 Client Error: Not Found for url: https://www.bostonherald.com/news/international/2017/03/putin_hails_russia_turkey_ties_as_he_hosts_syria_talks


Processing URLs:  93%|█████████▎| 928/1000 [33:45<02:14,  1.87s/it]

Error extracting text from http://tass.com/pressreview/921098: 502 Server Error: Bad Gateway for url: https://tass.com/pressreview/921098
Error extracting text from http://www.reuters.com/article/us-eurozone-greece-idUSKCN18C1XG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-idUSKCN18C1XG


Processing URLs:  94%|█████████▍| 938/1000 [34:00<00:49,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-afghanistan-nominees-idUSKCN0VX29L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-nominees-idUSKCN0VX29L
URL filtered: http://www.bloomberg.com/news/articles/2016-04-10/peru-votes-in-presidential-election-as-fujimori-leads-polls


Processing URLs:  94%|█████████▍| 941/1000 [34:01<00:25,  2.29it/s]

Error extracting text from https://ics-cert.us-cert.gov/alerts/ICS-ALERT-14-281-01B: 403 Client Error: Forbidden for url: https://ics-cert.us-cert.gov/alerts/ICS-ALERT-14-281-01B


Processing URLs:  94%|█████████▍| 942/1000 [34:01<00:22,  2.60it/s]

Error extracting text from https://www.nytimes.com/2021/06/23/science/coronavirus-sequences.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/23/science/coronavirus-sequences.html


Processing URLs:  94%|█████████▍| 945/1000 [34:04<00:42,  1.30it/s]

Error extracting text from http://www.pcacases.com/web/view/7: 406 Client Error: Not Acceptable for url: http://www.pcacases.com/web/view/7


Processing URLs:  95%|█████████▍| 946/1000 [34:09<01:35,  1.78s/it]

Error extracting text from http://38north.org/2016/07/rcarlin071216/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  95%|█████████▍| 948/1000 [34:10<00:59,  1.15s/it]

Error extracting text from http://www.smh.com.au/business/cheap-oil-to-stay-as-opec-confronts-oversupply-20151125-gl7fho.html#ixzz3slslq1x7: 404 Client Error: Not Found for url: https://www.smh.com.au/business/cheap-oil-to-stay-as-opec-confronts-oversupply-20151125-gl7fho.html#ixzz3slslq1x7


Processing URLs:  95%|█████████▍| 949/1000 [34:12<01:14,  1.45s/it]

Error extracting text from http://images.akamai.steamusercontent.com/ugc/577947268901721458/B9CAD85EEFB9DC90256CB237BFEF7CC5E75A0B60/: 404 Client Error: Not Found for url: http://images.akamai.steamusercontent.com/ugc/577947268901721458/B9CAD85EEFB9DC90256CB237BFEF7CC5E75A0B60/


Processing URLs:  95%|█████████▌| 952/1000 [34:16<01:09,  1.45s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Putin-s-draw-not-so-even?page=1: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Putin-s-draw-not-so-even?page=1


Processing URLs:  95%|█████████▌| 953/1000 [34:17<01:02,  1.33s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/brazil-court-rule-trial-speaker-lower-house-37374556: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/brazil-court-rule-trial-speaker-lower-house-37374556


Processing URLs:  96%|█████████▌| 955/1000 [34:19<00:47,  1.05s/it]

Error extracting text from https://news.google.com/newspapers?nid=1755&amp;dat=19800208&amp;id=Zk80AAAAIBAJ&amp;sjid=zWcEAAAAIBAJ&amp;pg=6122,3088711: 404 Client Error: Not Found for url: https://news.google.com/newspapers?nid=1755&amp;dat=19800208&amp;id=Zk80AAAAIBAJ&amp;sjid=zWcEAAAAIBAJ&amp;pg=6122,3088711


Processing URLs:  96%|█████████▌| 956/1000 [34:20<00:44,  1.01s/it]

Error extracting text from http://www.debka.com/article/24909/A-Chinese-aircraft-carrier-docks-at-Tartus-to-support-Russian-Iranian-military-buildup-: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/24909/A-Chinese-aircraft-carrier-docks-at-Tartus-to-support-Russian-Iranian-military-buildup- (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  96%|█████████▌| 962/1000 [34:32<01:02,  1.64s/it]

Error extracting text from http://www.khaama.com/taliban-leader-qari-ghafoor-killed-with-his-14-fighters-in-kunduz-airstrike-01359: 403 Client Error: Forbidden for url: http://www.khaama.com/taliban-leader-qari-ghafoor-killed-with-his-14-fighters-in-kunduz-airstrike-01359
URL filtered: https://www.youtube.com/watch?v=9jK-NcRmVcw
Error extracting text from https://english.alarabiya.net/en/News/middle-east/2021/01/20/Israeli-Syrian-officials-discussed-Iran-and-its-militias-presence-in-Syria-Report: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2021/01/20/Israeli-Syrian-officials-discussed-Iran-and-its-militias-presence-in-Syria-Report
Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1640G7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1640G7


Processing URLs:  97%|█████████▋| 967/1000 [34:38<00:44,  1.33s/it]

Error extracting text from http://www.ibtimes.com/amid-russian-tensions-nato-scrambled-jets-160-times-2015-lithuania-says-2251705: 403 Client Error: Forbidden for url: https://www.ibtimes.com/amid-russian-tensions-nato-scrambled-jets-160-times-2015-lithuania-says-2251705


Processing URLs:  97%|█████████▋| 969/1000 [34:45<01:11,  2.29s/it]

Error extracting text from http://www.us-sabc.org/i4a/pages/Index.cfm?pageID=3690: 404 Client Error: Not Found for url: http://ussaudi.org/i4a/pages/Index.cfm?pageID=3690


Processing URLs:  97%|█████████▋| 970/1000 [34:46<00:59,  1.98s/it]

Error extracting text from https://www.reuters.com/article/us-olympics-2018-northkorea/north-korean-closing-olympics-delegation-includes-man-blamed-for-deadly-sinking-idUSKCN1G60KN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-olympics-2018-northkorea/north-korean-closing-olympics-delegation-includes-man-blamed-for-deadly-sinking-idUSKCN1G60KN


Processing URLs:  98%|█████████▊| 975/1000 [34:52<00:35,  1.43s/it]

Error extracting text from http://www.newsweek.com/eu-commissioner-warns-isis-influx-continent-iraq-steps-mosul-offensive-511053: 403 Client Error: Forbidden for url: https://www.newsweek.com/eu-commissioner-warns-isis-influx-continent-iraq-steps-mosul-offensive-511053


Processing URLs:  98%|█████████▊| 976/1000 [34:54<00:38,  1.59s/it]

Error extracting text from http://www.westernjournalism.com/ex-obama-admin-official-drops-bombshell-about-biden-that-may-derail-his-campaign-before-it-starts/: 404 Client Error: Not Found for url: https://www.westernjournal.com/ex-obama-admin-official-drops-bombshell-about-biden-that-may-derail-his-campaign-before-it-starts/


Processing URLs:  99%|█████████▊| 986/1000 [35:18<00:12,  1.09it/s]

Error extracting text from https://direct.mit.edu/isec/article/33/1/82/11939/Closing-Time-Assessing-the-Iranian-Threat-to-the: 403 Client Error: Forbidden for url: https://direct.mit.edu/isec/article/33/1/82/11939/Closing-Time-Assessing-the-Iranian-Threat-to-the
Error extracting text from http://www.reuters.com/article/us-oil-iran-exclusive-idUSKCN0VE21S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-iran-exclusive-idUSKCN0VE21S


Processing URLs:  99%|█████████▉| 988/1000 [35:20<00:13,  1.11s/it]

Error extracting text from http://www.nytimes.com/2017/01/08/us/politics/russia-turkey-syria-airstrikes-isis.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2017/01/08/us/politics/russia-turkey-syria-airstrikes-isis.html?_r=0


Processing URLs:  99%|█████████▉| 993/1000 [35:30<00:09,  1.38s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-tillerson-idUSKBN17L08I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-tillerson-idUSKBN17L08I


Processing URLs: 100%|█████████▉| 995/1000 [35:30<00:04,  1.15it/s]

Error extracting text from http://www.nytimes.com/2016/03/24/business/at-new-york-auto-show-a-parade-of-new-models.html?emc=edit_th_20160324&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/24/business/at-new-york-auto-show-a-parade-of-new-models.html?emc=edit_th_20160324&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs: 100%|█████████▉| 996/1000 [35:36<00:08,  2.15s/it]

Error extracting text from https://www.reuters.com/world/uk/britain-approves-pfizer-covid-19-pill-2021-12-31/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/britain-approves-pfizer-covid-19-pill-2021-12-31/


Processing URLs: 100%|██████████| 1000/1000 [35:39<00:00,  2.14s/it]
Processing URLs:   0%|          | 4/1000 [00:05<27:42,  1.67s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-03/sec-s-gensler-signals-pathway-for-a-bitcoin-etf-with-tough-rules


Processing URLs:   1%|          | 7/1000 [00:06<12:45,  1.30it/s]

Error extracting text from http://www.rand.org/content/dam/rand/pubs/perspectives/PE100/PE115/RAND_PE115.pdf: 403 Client Error: Forbidden for url: https://www.rand.org/content/dam/rand/pubs/perspectives/PE100/PE115/RAND_PE115.pdf


Processing URLs:   1%|          | 9/1000 [00:10<21:32,  1.30s/it]

Error extracting text from http://www.census.gov/prod/cen2010/briefs/c2010br-04.pdf: 404 Client Error: Not Found for url: https://www.census.gov/prod/cen2010/briefs/c2010br-04.pdf


Processing URLs:   1%|          | 11/1000 [00:12<21:33,  1.31s/it]

Error extracting text from http://www.andreamoro.net/perm/papers/why_do_incumbent_senators_win.pdf: 404 Client Error: Not Found for url: https://andreamoro.net/perm/papers/why_do_incumbent_senators_win.pdf


Processing URLs:   1%|▏         | 14/1000 [00:17<23:56,  1.46s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/beijing-likely-to-launch-military-flights-in-disputed-south-china-sea-experts: 403 Client Error: Forbidden for url: https://www.straitstimes.com/asia/east-asia/beijing-likely-to-launch-military-flights-in-disputed-south-china-sea-experts


Processing URLs:   2%|▏         | 18/1000 [00:25<28:10,  1.72s/it]

Error extracting text from http://www.advisor.ca/news/economic/world-economy-to-remain-fragile-in-2016-vanguard-196367: 404 Client Error: Not Found for url: https://www.advisor.ca/news/economic/world-economy-to-remain-fragile-in-2016-vanguard-196367


Processing URLs:   2%|▏         | 20/1000 [00:27<20:31,  1.26s/it]

Error extracting text from http://politicscounter.com/?p=56: 403 Client Error: Forbidden for url: https://politicscounter.com/?p=56


Processing URLs:   2%|▏         | 21/1000 [02:27<10:03:58, 37.02s/it]

Error extracting text from http://connection.ebscohost.com/c/articles/111340969/why-genocide-for-oil-not-boko-haram-buharis-priority: HTTPConnectionPool(host='connection.ebscohost.com', port=80): Max retries exceeded with url: /c/articles/111340969/why-genocide-for-oil-not-boko-haram-buharis-priority (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3056843e0>, 'Connection to connection.ebscohost.com timed out. (connect timeout=60)'))


Processing URLs:   2%|▏         | 24/1000 [02:33<3:41:48, 13.64s/it] 

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKCN0ZA0S2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKCN0ZA0S2


Processing URLs:   3%|▎         | 27/1000 [02:35<1:23:07,  5.13s/it]

Error extracting text from https://english.alarabiya.net/en/perspective/features/2016/10/11/Sheikh-Zakzaky-the-face-of-a-dangerous-Iranian-design-in-Africa.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/perspective/features/2016/10/11/Sheikh-Zakzaky-the-face-of-a-dangerous-Iranian-design-in-Africa.html


Processing URLs:   3%|▎         | 30/1000 [02:38<39:07,  2.42s/it]  

Error extracting text from https://www.realcleardefense.com/articles/2017/11/10/cyber_crime_north_koreas_billion-dollar_soft_spot_112614.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2017/11/10/cyber_crime_north_koreas_billion-dollar_soft_spot_112614.html


Processing URLs:   3%|▎         | 31/1000 [02:39<31:36,  1.96s/it]

Error extracting text from https://www.humboldtforum.org/de/besuch-vor-ort/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/de/besuch-vor-ort/


Processing URLs:   4%|▎         | 37/1000 [02:44<12:42,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-trade-idUSKBN15P1OU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-trade-idUSKBN15P1OU
Error extracting text from http://www.nato.int/cps/en/natohq/topics_49736.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_49736.htm


Processing URLs:   4%|▍         | 39/1000 [02:45<10:18,  1.55it/s]

Error extracting text from http://www.nytimes.com/2016/09/15/world/asia/china-npc-election-fraud-liaoning.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/15/world/asia/china-npc-election-fraud-liaoning.html?_r=0


Processing URLs:   4%|▍         | 40/1000 [02:47<15:02,  1.06it/s]

Error extracting text from http://syriadirect.org/news/regime-opens-new-rail-line-in-coastal-heartland/: 404 Client Error: Not Found for url: http://syriadirect.org/news/regime-opens-new-rail-line-in-coastal-heartland/


Processing URLs:   5%|▍         | 47/1000 [03:01<18:21,  1.16s/it]

URL filtered: https://www.youtube.com/watch?v=6ZugVil2v4w
Error extracting text from http://www.nytimes.com/2015/06/06/business/international/opec-oil-prices.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/06/business/international/opec-oil-prices.html?_r=0


Processing URLs:   5%|▌         | 51/1000 [03:05<13:54,  1.14it/s]

Error extracting text from https://www.barchart.com/futures/quotes/CBZ21/options/dec-21: 403 Client Error: Forbidden for url: https://www.barchart.com/futures/quotes/CBZ21/options/dec-21


Processing URLs:   5%|▌         | 53/1000 [03:06<11:47,  1.34it/s]

Error extracting text from https://www.nytimes.com/2014/08/24/world/asia/in-afghan-election-signs-of-systemic-fraud-cast-doubt-on-many-votes.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2014/08/24/world/asia/in-afghan-election-signs-of-systemic-fraud-cast-doubt-on-many-votes.html?_r=0


Processing URLs:   5%|▌         | 54/1000 [03:08<17:53,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-m-a-idUSKBN14J008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-m-a-idUSKBN14J008


Processing URLs:   6%|▋         | 63/1000 [03:18<17:32,  1.12s/it]

Error extracting text from http://www.yorkshirepost.co.uk/news/exclusive-labour-expects-eu-referendum-this-june-1-7676374: 403 Client Error: Forbidden for url: https://www.yorkshirepost.co.uk/news/exclusive-labour-expects-eu-referendum-this-june-1-7676374


Processing URLs:   6%|▋         | 64/1000 [03:20<19:08,  1.23s/it]

URL filtered: https://twitter.com/samwangphd/status/725444379926757377?refsrc=email&amp;s=11


Processing URLs:   7%|▋         | 69/1000 [03:24<12:38,  1.23it/s]

Error extracting text from http://news.sky.com/story/1682349/brexit-campaign-chief-in-bruising-encounter: 404 Client Error: Not Found for url: https://news.sky.com/story/1682349/brexit-campaign-chief-in-bruising-encounter


Processing URLs:   7%|▋         | 70/1000 [03:24<11:50,  1.31it/s]

Error extracting text from http://jamanetwork.com/journals/jama/article-abstract/1104740: 403 Client Error: Forbidden for url: http://jamanetwork.com/journals/jama/article-abstract/1104740


Processing URLs:   7%|▋         | 72/1000 [03:25<10:51,  1.43it/s]

Error extracting text from https://www.afghanistan-analysts.org/struggling-to-get-a-quorum-fiddling-the-figures-and-suspending-mps/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/struggling-to-get-a-quorum-fiddling-the-figures-and-suspending-mps/


Processing URLs:   7%|▋         | 73/1000 [03:27<13:07,  1.18it/s]

Error extracting text from http://www.vanguardngr.com/2016/06/breaking-jonathan-arrives-nigeria/: 403 Client Error: Forbidden for url: https://www.vanguardngr.com/2016/06/breaking-jonathan-arrives-nigeria/


Processing URLs:   8%|▊         | 77/1000 [04:32<4:52:23, 19.01s/it]

Error extracting text from http://www.newsobserver.com/news/business/article141212643.html: HTTPConnectionPool(host='www.newsobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:   8%|▊         | 78/1000 [05:33<8:02:51, 31.42s/it]

Error extracting text from http://www.arkleg.state.ar.us/assembly/2017/2017R/Pages/BillInformation.aspx?measureno=hb1162: HTTPConnectionPool(host='www.arkleg.state.ar.us', port=80): Max retries exceeded with url: /assembly/2017/2017R/Pages/BillInformation.aspx?measureno=hb1162 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x307f0e8d0>, 'Connection to www.arkleg.state.ar.us timed out. (connect timeout=60)'))


Processing URLs:   8%|▊         | 82/1000 [05:39<2:11:39,  8.60s/it]

Error extracting text from http://www.wsj.com/articles/time-inc-enters-video-streaming-fray-with-debut-of-people-entertainment-network-1473586200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-enters-video-streaming-fray-with-debut-of-people-entertainment-network-1473586200


Processing URLs:   9%|▉         | 89/1000 [05:49<24:54,  1.64s/it]  

Error extracting text from http://www.cctv-america.com/2016/01/11/s-korea-to-hold-six-party-talks-on-dprk-nuke-program-with-china-us-japan: 403 Client Error: Forbidden for url: http://america.cgtn.com/2016/01/11/s-korea-to-hold-six-party-talks-on-dprk-nuke-program-with-china-us-japan
Error extracting text from https://www.reuters.com/article/us-britain-eu/eu-tells-uks-johnson-to-decide-as-time-runs-out-for-brexit-deal-idUSKBN28H0LW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/eu-tells-uks-johnson-to-decide-as-time-runs-out-for-brexit-deal-idUSKBN28H0LW


Processing URLs:   9%|▉         | 90/1000 [05:49<19:17,  1.27s/it]

Error extracting text from http://news.softpedia.com/news/us-sabotages-snowden-movie-release-with-release-of-damning-report-508389.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/us-sabotages-snowden-movie-release-with-release-of-damning-report-508389.shtml


Processing URLs:   9%|▉         | 91/1000 [05:51<23:16,  1.54s/it]

URL filtered: https://twitter.com/Arianespace/status/1473760271084703760


Processing URLs:  10%|▉         | 97/1000 [07:08<4:40:36, 18.65s/it]

Error extracting text from http://www.recode.net/2016/11/30/13779302/elaine-chao-donald-trump-dot-dol-secretary: Exceeded 30 redirects.


Processing URLs:  10%|█         | 100/1000 [07:09<1:52:13,  7.48s/it]

Error extracting text from https://wikileaks.org/podesta-emails/emailid/44016: 403 Client Error: Forbidden for url: https://wikileaks.org/podesta-emails/emailid/44016
Error extracting text from https://www.wsj.com/articles/chinese-plans-for-disputed-south-china-sea-shoal-in-question-1489765193: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinese-plans-for-disputed-south-china-sea-shoal-in-question-1489765193


Processing URLs:  10%|█         | 103/1000 [07:14<58:46,  3.93s/it]  

Error extracting text from http://www.reuters.com/article/us-baltics-usa-mccain-tillerson-idUSKBN14I10Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-baltics-usa-mccain-tillerson-idUSKBN14I10Y


Processing URLs:  11%|█         | 106/1000 [07:20<40:15,  2.70s/it]

Error extracting text from http://www.huffingtonpost.ca/2017/04/06/bill-blair-canada-day-pot-legalization_n_15847936.html: 410 Client Error: Gone for url: https://www.huffpost.com/archive/ca/entry/2017/04/06/bill-blair-canada-day-pot-legalization_n_15847936.html


Processing URLs:  11%|█         | 111/1000 [07:36<44:20,  2.99s/it]  

Error extracting text from http://www.newsweek.com/mosul-offensive-against-isis-begin-soon-says-french-defense-minister-504679: 403 Client Error: Forbidden for url: https://www.newsweek.com/mosul-offensive-against-isis-begin-soon-says-french-defense-minister-504679


Processing URLs:  11%|█▏        | 113/1000 [07:40<33:26,  2.26s/it]

URL filtered: https://www.facebook.com/24HoursTelevision/posts/112291510991819


Processing URLs:  12%|█▏        | 116/1000 [07:41<17:52,  1.21s/it]

Error extracting text from http://www.nejm.org/doi/full/10.1056/NEJMp1514467: 403 Client Error: Forbidden for url: http://www.nejm.org/doi/full/10.1056/NEJMp1514467


Processing URLs:  12%|█▏        | 124/1000 [08:16<1:44:29,  7.16s/it]

Error extracting text from https://www.thoughtco.com/britains-disastrous-retreat-from-kabul-1773762: 406 Client Error: Not Acceptable for url: https://www.thoughtco.com/britains-disastrous-retreat-from-kabul-1773762
URL filtered: https://www.bloomberg.com/news/articles/2016-12-13/putin-ready-to-meet-trump-at-any-moment


Processing URLs:  13%|█▎        | 127/1000 [08:18<45:54,  3.16s/it]  

Error extracting text from https://play.google.com/store/apps/details?id=com.girish.venezuelaecon: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=com.girish.venezuelaecon
URL filtered: http://uk.reuters.com/article/uk-facebook-drone-idUKKCN0Q42HU20150730


Processing URLs:  13%|█▎        | 129/1000 [08:18<27:30,  1.90s/it]

Error extracting text from https://www.wsj.com/articles/in-german-election-campaign-third-place-is-the-real-winner-1502622031: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-german-election-campaign-third-place-is-the-real-winner-1502622031


Processing URLs:  13%|█▎        | 130/1000 [08:18<22:50,  1.57s/it]

Error extracting text from http://caracaschronicles.com/2016/01/12/50565/: 403 Client Error: Forbidden for url: http://caracaschronicles.com/2016/01/12/50565/


Processing URLs:  13%|█▎        | 132/1000 [08:30<45:41,  3.16s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/ambassador-us-continue-south-china-sea-flights-sail-36681851: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/ambassador-us-continue-south-china-sea-flights-sail-36681851


Processing URLs:  14%|█▎        | 137/1000 [08:34<19:58,  1.39s/it]

Error extracting text from http://www.globaltimes.cn/content/1044849.shtml: 404 Client Error: Not Found for url: https://www.globaltimes.cn/content/1044849.shtml


Processing URLs:  14%|█▍        | 139/1000 [08:40<28:02,  1.95s/it]

Error extracting text from http://www.technewstoday.com/26721-apple-inc-iphone-6s-sales-below-expectations-pacific-crest-securities/: 404 Client Error: Not Found for url: https://www.technewstoday.com/26721-apple-inc-iphone-6s-sales-below-expectations-pacific-crest-securities/


Processing URLs:  14%|█▍        | 144/1000 [08:56<47:13,  3.31s/it]  

Error extracting text from http://thehill.com/blogs/in-the-know/in-the-know/264216-poll-forty-four-percent-of-dems-would-take-refugees-from: 403 Client Error: Forbidden for url: https://thehill.com/blogs/in-the-know/in-the-know/264216-poll-forty-four-percent-of-dems-would-take-refugees-from/


error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserErro

Error extracting text from http://reiffcenterblog.cnu.edu/2015/10/kickstarting-the-six-party-talks-how-to-re-engage-with-north-korea/: Document is empty


Processing URLs:  15%|█▍        | 147/1000 [09:28<1:51:57,  7.88s/it]

Error extracting text from http://www.wantchinatimes.com/news/content?id=20150628000003&amp;cid=1206: 522 Server Error:  for url: http://www.wantchinatimes.com/news/content?id=20150628000003&amp;cid=1206
Error extracting text from http://www.reuters.com/article/us-afghanistan-blast-idUSKCN123187: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-blast-idUSKCN123187


Processing URLs:  15%|█▌        | 150/1000 [09:34<54:59,  3.88s/it]  

Error extracting text from https://www.nord-stream2.com/media-info/news-events/application-for-precautionary-certification-as-independent-transmission-system-operator-submitted-150/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /media-info/news-events/application-for-precautionary-certification-as-independent-transmission-system-operator-submitted-150/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fdf5c110>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  15%|█▌        | 152/1000 [09:35<33:03,  2.34s/it]

URL filtered: https://www.politico.com/story/2017/11/01/google-facebook-twitter-russia-meddling-244412


Processing URLs:  16%|█▌        | 158/1000 [09:39<14:29,  1.03s/it]

Error extracting text from https://www.france24.com/en/live-news/20210820-quake-hit-haiti-s-pm-vows-elections-soon-as-possible: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210820-quake-hit-haiti-s-pm-vows-elections-soon-as-possible


Processing URLs:  16%|█▌        | 162/1000 [09:43<11:17,  1.24it/s]

Error extracting text from http://www.reuters.com/article/2015/10/21/us-iran-nuclear-khamenei-iaea-idUSKCN0SF1BK20151021?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/21/us-iran-nuclear-khamenei-iaea-idUSKCN0SF1BK20151021?mod=related&amp;channelName=worldNews
Error extracting text from http://www.reuters.com/article/us-iran-nuclear-iaea-idUSKBN0TZ2W120151216: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-iaea-idUSKBN0TZ2W120151216


Processing URLs:  17%|█▋        | 166/1000 [09:49<15:23,  1.11s/it]

Error extracting text from http://cleantechnica.com/2016/03/07/ev-sales-22-of-car-sales-in-netherlands-in-december-2015/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/03/07/ev-sales-22-of-car-sales-in-netherlands-in-december-2015/


Processing URLs:  17%|█▋        | 168/1000 [09:53<21:07,  1.52s/it]

Error extracting text from https://www.kickstarter.com/projects/projectblue/project-blue-a-space-telescope-to-photograph-anoth: 403 Client Error: Forbidden for url: https://www.kickstarter.com/projects/projectblue/project-blue-a-space-telescope-to-photograph-anoth


Processing URLs:  17%|█▋        | 171/1000 [09:56<13:19,  1.04it/s]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/10/28/eu-canada-trade-agreement/


Processing URLs:  18%|█▊        | 177/1000 [10:08<22:39,  1.65s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-27/south-korea-s-park-eyes-china-japan-summit-after-kim-showdown


Processing URLs:  18%|█▊        | 182/1000 [10:12<14:18,  1.05s/it]

Error extracting text from http://web.stanford.edu/group/ethnic/workingpapers/dur3.pdf: 404 Client Error: Not Found for url: http://web.stanford.edu/group/ethnic/workingpapers/dur3.pdf


Processing URLs:  18%|█▊        | 184/1000 [10:15<16:59,  1.25s/it]

Error extracting text from http://blogs.wsj.com/moneybeat/2016/06/22/is-a-recession-on-the-horizon-yellen-says-no-others-not-so-sure-newsletter-draft/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2016/06/22/is-a-recession-on-the-horizon-yellen-says-no-others-not-so-sure-newsletter-draft/


Processing URLs:  19%|█▊        | 187/1000 [10:18<17:08,  1.26s/it]

Error extracting text from https://www.justsecurity.org/44697/steele-dossier-knowing/#more-44697: 403 Client Error: Forbidden for url: https://www.justsecurity.org/44697/steele-dossier-knowing/#more-44697


Processing URLs:  19%|█▉        | 189/1000 [10:21<16:46,  1.24s/it]

Error extracting text from http://users.eniinternet.com/bradleym/Compare.html: 404 Client Error: Not Found for url: http://users.eniinternet.com/bradleym/Compare.html


Processing URLs:  19%|█▉        | 192/1000 [10:24<13:00,  1.04it/s]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2468.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2468.htm
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN18O0R3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN18O0R3


Processing URLs:  19%|█▉        | 194/1000 [10:26<12:25,  1.08it/s]

Error extracting text from http://thehill.com/homenews/news/348675-rather-trump-afraid-of-what-mueller-will-find-out: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/348675-rather-trump-afraid-of-what-mueller-will-find-out/


Processing URLs:  20%|██        | 202/1000 [10:51<28:15,  2.12s/it]

Error extracting text from https://www.stripes.com/news/2-fort-carson-brigades-tapped-for-afghanistan-deployment-1.503474: 404 Client Error: Not Found for url: https://www.stripes.com/news/2-fort-carson-brigades-tapped-for-afghanistan-deployment-1.503474


Processing URLs:  20%|██        | 205/1000 [10:54<15:14,  1.15s/it]

Error extracting text from https://www.oecd.org/newsroom/oecd-announces-candidates-for-next-secretary-general.htm: 403 Client Error: Forbidden for url: https://www.oecd.org/newsroom/oecd-announces-candidates-for-next-secretary-general.htm


Processing URLs:  21%|██        | 211/1000 [11:09<21:21,  1.62s/it]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/07/turkey-human-rights-in-grave-danger-following-coup-attempt-and-subsequent-crackdown/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/07/turkey-human-rights-in-grave-danger-following-coup-attempt-and-subsequent-crackdown/


Processing URLs:  22%|██▏       | 215/1000 [11:14<18:33,  1.42s/it]

Error extracting text from https://nbaa.org/news/business-aviation-insider/2021-march-april/new-supersonic-era-dawning/: 403 Client Error: Forbidden for url: https://nbaa.org/news/business-aviation-insider/2021-march-april/new-supersonic-era-dawning/
Error extracting text from http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016&gt: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016&gt


Processing URLs:  22%|██▏       | 217/1000 [11:16<15:42,  1.20s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2014-10-14/this-wasnt-your-typical-candidate-debate


Processing URLs:  22%|██▏       | 221/1000 [11:21<17:24,  1.34s/it]

Error extracting text from http://www.foxnews.com/politics/2015/09/17/house-panel-votes-to-lift-40-year-old-us-ban-on-oil-exports/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/09/17/house-panel-votes-to-lift-40-year-old-us-ban-on-oil-exports/


Processing URLs:  22%|██▏       | 223/1000 [11:24<19:36,  1.51s/it]

Error extracting text from https://uk.reuters.com/article/opec-oil/table-opec-oil-output-falls-by-170000-bpd-in-august-reuters-survey-idUKL8N1LH3OE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  23%|██▎       | 228/1000 [11:32<19:42,  1.53s/it]

Error extracting text from http://www.ibtimes.com/france-convinced-greece-will-live-economic-reform-promises-2258420: 403 Client Error: Forbidden for url: https://www.ibtimes.com/france-convinced-greece-will-live-economic-reform-promises-2258420


Processing URLs:  23%|██▎       | 231/1000 [11:34<12:22,  1.04it/s]

Error extracting text from https://www.teslamotors.com/support/model-3-reservations-faq: 403 Client Error: Forbidden for url: https://www.teslamotors.com/support/model-3-reservations-faq


Processing URLs:  23%|██▎       | 232/1000 [11:35<13:37,  1.06s/it]



Processing URLs:  24%|██▎       | 235/1000 [11:37<09:23,  1.36it/s]

URL filtered: https://twitter.com/kwanwoo


Processing URLs:  24%|██▍       | 243/1000 [11:46<15:19,  1.21s/it]

Error extracting text from http://www.tradingeconomics.com/turkey/foreign-exchange-reserves: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/turkey/foreign-exchange-reserves


Processing URLs:  24%|██▍       | 245/1000 [11:51<24:15,  1.93s/it]

Error extracting text from http://www.theweek.co.uk/scottish-independence/55716/nicola-sturgeon-in-new-push-for-scottish-independence: 404 Client Error: Not Found for url: https://theweek.com/scottish-independence/55716/nicola-sturgeon-in-new-push-for-scottish-independence


Processing URLs:  25%|██▍       | 246/1000 [11:53<24:13,  1.93s/it]

URL filtered: https://www.youtube.com/watch?v=6MMmYyIZlC4


Processing URLs:  25%|██▍       | 249/1000 [11:56<17:25,  1.39s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-06/magnitude-5-1-quake-detected-near-north-korea-nuclear-test-site


Processing URLs:  25%|██▌       | 252/1000 [11:57<09:51,  1.26it/s]

Error extracting text from https://www.cnbc.com/2018/01/25/reuters-america-not-so-anti-establishment-now--italys-5-stars-evolving-face.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2018/01/25/reuters-america-not-so-anti-establishment-now--italys-5-stars-evolving-face.html
Error extracting text from http://www.reuters.com/article/2015/09/24/us-mideast-crisis-syria-idUSKCN0RO1XN20150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/24/us-mideast-crisis-syria-idUSKCN0RO1XN20150924


Processing URLs:  26%|██▌       | 257/1000 [12:04<16:10,  1.31s/it]

Error extracting text from http://advances.sciencemag.org/content/2/6/e1600377.full.pdf+html: 403 Client Error: Forbidden for url: https://www.science.org/doi/pdf/10.1126/sciadv.1600377


Processing URLs:  26%|██▌       | 260/1000 [12:06<10:15,  1.20it/s]

Error extracting text from http://allafrica.com/stories/201604250198.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201604250198.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x306e0d3d0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  27%|██▋       | 266/1000 [12:14<11:28,  1.07it/s]

Error extracting text from http://gcaptain.com/panama-canal-crack-fixed/: 403 Client Error: Forbidden for url: http://gcaptain.com/panama-canal-crack-fixed/
Error extracting text from http://www.reuters.com/article/us-britain-eu-johnson-iran/uk-certain-iran-nuclear-deal-to-be-preserved-u-s-says-remains-committed-idUSKBN1CS107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-johnson-iran/uk-certain-iran-nuclear-deal-to-be-preserved-u-s-says-remains-committed-idUSKBN1CS107
Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0YT1H6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0YT1H6


Processing URLs:  27%|██▋       | 269/1000 [12:16<07:42,  1.58it/s]

Error extracting text from https://www.reuters.com/article/tv-cgtn-allemagne-idFRKBN2AY0D9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/tv-cgtn-allemagne-idFRKBN2AY0D9


Processing URLs:  27%|██▋       | 273/1000 [12:17<05:30,  2.20it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-nuclear-idUSKBN14B1ZZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-nuclear-idUSKBN14B1ZZ


Processing URLs:  28%|██▊       | 277/1000 [12:28<19:30,  1.62s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2016/03/13/a-hail-mary-heave-to-block-trumps-nomination/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2016/03/13/a-hail-mary-heave-to-block-trumps-nomination/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3016622d0>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  28%|██▊       | 278/1000 [12:29<16:41,  1.39s/it]

Error extracting text from http://nyti.ms/1QldBKL: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/13/world/middleeast/syria-russia-airstrikes.html?smid=pl-share


Processing URLs:  28%|██▊       | 280/1000 [12:30<12:43,  1.06s/it]

Error extracting text from http://www.wsj.com/articles/boehner-departure-ramps-up-congressional-uncertainty-1443200953: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/boehner-departure-ramps-up-congressional-uncertainty-1443200953


Processing URLs:  29%|██▉       | 292/1000 [12:56<30:50,  2.61s/it]

Error extracting text from https://larswericson.wordpress.com/2015/12/31/what-are-the-odds-that-this-will-work/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/31/what-are-the-odds-that-this-will-work/


Processing URLs:  30%|██▉       | 296/1000 [13:03<20:44,  1.77s/it]

Error extracting text from http://www.nytimes.com/2016/03/22/us/politics/hillary-clinton-and-donald-trump-vow-to-protect-israel-but-differ-on-means.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/22/us/politics/hillary-clinton-and-donald-trump-vow-to-protect-israel-but-differ-on-means.html


Processing URLs:  30%|███       | 300/1000 [13:07<09:59,  1.17it/s]

Error extracting text from http://www.nytimes.com/2015/10/12/opinion/oil-exports-should-be-paired-with-clean-energy-tax-breaks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/12/opinion/oil-exports-should-be-paired-with-clean-energy-tax-breaks.html?_r=0
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN1A4042?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN1A4042?il=0


Processing URLs:  30%|███       | 302/1000 [13:07<05:34,  2.08it/s]

Error extracting text from http://www.france24.com/en/20160206-syria-rebels-face-rout-allies-saudi-turkey-may-send-troops: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160206-syria-rebels-face-rout-allies-saudi-turkey-may-send-troops
Error extracting text from http://www.latimes.com/politics/la-na-pol-mueller-white-house-20170920-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-mueller-white-house-20170920-story.html
Error extracting text from http://www.nyt: HTTPConnectionPool(host='www.nyt', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3056866c0>: Failed to resolve 'www.nyt' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|███       | 305/1000 [13:21<39:08,  3.38s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/kremlin-demands-for-assads-departure-thoughtless/2016/10/22/413676a8-9848-11e6-9cae-2a3574e296a6_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/kremlin-demands-for-assads-departure-thoughtless/2016/10/22/413676a8-9848-11e6-9cae-2a3574e296a6_story.html


Processing URLs:  31%|███       | 307/1000 [13:23<26:48,  2.32s/it]

Error extracting text from https://fnf-europe.org/2016/04/06/9904/: HTTPSConnectionPool(host='jandaslot88official.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x305684650>: Failed to resolve 'jandaslot88official.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  31%|███       | 311/1000 [13:26<12:35,  1.10s/it]

Error extracting text from https://www.congress.gov/bill/113th-congress/senate-bill/2277/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/113th-congress/senate-bill/2277/text


Processing URLs:  32%|███▏      | 316/1000 [13:38<27:27,  2.41s/it]

Error extracting text from https://rbth.com/news/2016/10/09/rosatom-ceo-and-turkish-energy-minister-discuss-akkuyu-npp-project_637175: 404 Client Error: Not Found for url: https://www.rbth.com/news/2016/10/09/rosatom-ceo-and-turkish-energy-minister-discuss-akkuyu-npp-project_637175


Processing URLs:  32%|███▏      | 317/1000 [14:38<3:43:49, 19.66s/it]

Error extracting text from http://www.miamiherald.com/news/politics-government/national-politics/article81338377.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  32%|███▏      | 320/1000 [14:41<1:23:24,  7.36s/it]

Error extracting text from http://www.thegatewaypundit.com/2016/10/wikileaks-uncover-murder-plot-podesta-documents-suggest-scalia-assassination/: 403 Client Error: Forbidden for url: https://www.thegatewaypundit.com/2016/10/wikileaks-uncover-murder-plot-podesta-documents-suggest-scalia-assassination/


Processing URLs:  32%|███▏      | 321/1000 [14:41<59:09,  5.23s/it]  

Error extracting text from https://panampost.com/sabrina-martin/2016/04/19/new-polls-show-fujimori-behind-in-peruvian-presidential-election/: 403 Client Error: Forbidden for url: https://panampost.com/sabrina-martin/2016/04/19/new-polls-show-fujimori-behind-in-peruvian-presidential-election/


Processing URLs:  32%|███▏      | 323/1000 [14:42<31:47,  2.82s/it]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN10Z0ED?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN10Z0ED?il=0
URL filtered: http://www.nytimes.com/2016/12/15/technology/facebook-fake-news.html


Processing URLs:  33%|███▎      | 328/1000 [14:52<25:44,  2.30s/it]

Error extracting text from https://www.majorityleader.gov/wp-content/uploads/2016/11/2017_ANNUAL_CALENDAR.pdf: 404 Client Error: Not Found for url: https://www.majorityleader.gov/wp-content/uploads/2016/11/2017_annual_calendar.pdf


Processing URLs:  33%|███▎      | 329/1000 [14:52<19:55,  1.78s/it]

Error extracting text from http://www.cdm.me/english/sheikh-mohammed-in-tivat: 403 Client Error: Forbidden for url: https://www.cdm.me/english/sheikh-mohammed-in-tivat


Processing URLs:  33%|███▎      | 330/1000 [14:54<19:33,  1.75s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/03/30/744010/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/03/30/744010/story.html


Processing URLs:  33%|███▎      | 332/1000 [14:55<12:19,  1.11s/it]

Error extracting text from http://www.scotsman.com/news/politics/scott-macnab-how-did-snp-lose-majority-despite-vote-share-rise-1-4121509: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scott-macnab-how-did-snp-lose-majority-despite-vote-share-rise-1-4121509


Processing URLs:  33%|███▎      | 333/1000 [14:56<11:11,  1.01s/it]

Error extracting text from http://www.wsj.com/articles/vietnam-says-chinas-flights-to-south-china-sea-a-threat-to-air-safety-1452334018: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/vietnam-says-chinas-flights-to-south-china-sea-a-threat-to-air-safety-1452334018


Processing URLs:  34%|███▎      | 336/1000 [14:58<08:35,  1.29it/s]

Error extracting text from http://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2270(2016): 403 Client Error: Forbidden for url: https://www.un.org/en/ga/search/view_doc.asp?symbol=S/RES/2270(2016)
Error extracting text from http://www.nytimes.com/2016/03/10/world/asia/google-alphago-lee-se-dol.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/10/world/asia/google-alphago-lee-se-dol.html?_r=0


Processing URLs:  34%|███▎      | 337/1000 [14:58<06:51,  1.61it/s]

Error extracting text from https://www.nytimes.com/2017/08/01/world/americas/venezuela-opposition-nicolas-maduro.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/01/world/americas/venezuela-opposition-nicolas-maduro.html


Processing URLs:  34%|███▍      | 339/1000 [15:00<06:49,  1.61it/s]

Error extracting text from http://www.wsj.com/articles/oil-change-affluent-saudi-arabia-goes-to-work-1464716895: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-change-affluent-saudi-arabia-goes-to-work-1464716895


Processing URLs:  34%|███▍      | 341/1000 [15:05<17:10,  1.56s/it]

URL filtered: http://www.businessinsider.com/google-and-salesforce-interested-in-buying-twitter-report-2016-9


Processing URLs:  34%|███▍      | 344/1000 [15:06<10:16,  1.06it/s]

Error extracting text from http://www.nytimes.com/2016/03/05/sports/olympics/as-olympics-near-and-zika-spreads-no-talk-of-a-plan-b.html?emc=edit_th_20160305&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/05/sports/olympics/as-olympics-near-and-zika-spreads-no-talk-of-a-plan-b.html?emc=edit_th_20160305&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  34%|███▍      | 345/1000 [15:07<08:42,  1.25it/s]

Error extracting text from http://www.wsj.com/articles/SB115204584251197609: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/SB115204584251197609


Processing URLs:  35%|███▍      | 349/1000 [15:10<08:44,  1.24it/s]

Error extracting text from https://www.reuters.com/article/us-iran-usa-gulf/iran-cites-change-in-u-s-navy-behavior-in-gulf-idUSKBN1FI1UP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-gulf/iran-cites-change-in-u-s-navy-behavior-in-gulf-idUSKBN1FI1UP


Processing URLs:  35%|███▌      | 350/1000 [15:11<07:54,  1.37it/s]

Error extracting text from http://thehill.com/policy/finance/257691-export-import-brawl-enters-upper-chamber: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/257691-export-import-brawl-enters-upper-chamber/


Processing URLs:  36%|███▌      | 356/1000 [15:32<36:29,  3.40s/it]

Error extracting text from https://www.nytimes.com/2017/05/17/opinion/trump-russia-mueller-special-counsel.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/17/opinion/trump-russia-mueller-special-counsel.html?_r=0


Processing URLs:  36%|███▌      | 359/1000 [15:37<23:57,  2.24s/it]

Error extracting text from https://www.justsecurity.org/73798/biden-must-stick-to-his-pledge-to-end-us-support-for-the-yemen-war/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/73798/biden-must-stick-to-his-pledge-to-end-us-support-for-the-yemen-war/


Processing URLs:  36%|███▋      | 363/1000 [15:40<11:12,  1.06s/it]

Error extracting text from http://www.reuters.com/article/us-india-kashmir-pakistan-idUSKBN18904U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-kashmir-pakistan-idUSKBN18904U


Processing URLs:  36%|███▋      | 364/1000 [15:42<12:40,  1.20s/it]

Error extracting text from http://www.citizen.co.za/1244379/jacob-zuma-is-going-nowhere/: 404 Client Error: Not Found for url: https://www.citizen.co.za/jacob-zuma-is-going-nowhere/


Processing URLs:  37%|███▋      | 368/1000 [15:47<14:01,  1.33s/it]

Error extracting text from https://tradingeconomics.com/commodities: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodities


Processing URLs:  37%|███▋      | 372/1000 [15:51<08:23,  1.25it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-putin-idUSKBN15E0SF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-putin-idUSKBN15E0SF?il=0
Error extracting text from https://www.globalconstructionreview.com/news/biden-changes-tack-ethiopias-grand-renaissance-dam/: 403 Client Error: Forbidden for url: https://www.globalconstructionreview.com/news/biden-changes-tack-ethiopias-grand-renaissance-dam/


Processing URLs:  38%|███▊      | 376/1000 [15:55<09:04,  1.15it/s]

Error extracting text from https://www.russiamatters.org/news/russia-analytical-report/russia-analytical-report-jan-25-feb-1-2021: 403 Client Error: Forbidden for url: https://www.russiamatters.org/news/russia-analytical-report/russia-analytical-report-jan-25-feb-1-2021


Processing URLs:  38%|███▊      | 380/1000 [16:03<14:45,  1.43s/it]

Error extracting text from http://english.alarabiya.net/en/webtv/reports/2016/07/18/Watch-Raid-destroys-ISIS-convoy-near-Mosul-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/webtv/reports/2016/07/18/Watch-Raid-destroys-ISIS-convoy-near-Mosul-.html


Processing URLs:  38%|███▊      | 383/1000 [16:06<09:47,  1.05it/s]

Error extracting text from https://english.alarabiya.net/News/gulf/2021/11/09/Expo-2020-Dubai-Visitor-numbers-approach-three-million-: 403 Client Error: Forbidden for url: https://english.alarabiya.net/News/gulf/2021/11/09/Expo-2020-Dubai-Visitor-numbers-approach-three-million-


Processing URLs:  39%|███▉      | 389/1000 [16:23<26:58,  2.65s/it]

Error extracting text from http://www.reuters.com/article/2015/11/04/us-ukraine-russia-eu-sanctions-idUSKCN0ST1YR20151104#6PWZMhQ6hAi6QboC.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/04/us-ukraine-russia-eu-sanctions-idUSKCN0ST1YR20151104#6PWZMhQ6hAi6QboC.97
URL filtered: https://twitter.com/BreakingNews/status/683742832390176768


Processing URLs:  39%|███▉      | 393/1000 [16:27<15:41,  1.55s/it]

URL filtered: https://www.youtube.com/watch?v=gqsT4xnKZPg


Processing URLs:  40%|███▉      | 395/1000 [16:30<14:25,  1.43s/it]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-palestinians-newspaper-idUSKCN0SV1EV20151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-palestinians-newspaper-idUSKCN0SV1EV20151106


Processing URLs:  40%|███▉      | 398/1000 [16:33<11:58,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-olympics-rio-zika-idUSKCN0YT2BA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-olympics-rio-zika-idUSKCN0YT2BA


Processing URLs:  40%|████      | 400/1000 [16:33<08:25,  1.19it/s]

Error extracting text from https://www.whitehouse.gov/america-first-energy: 404 Client Error: Not Found for url: https://www.whitehouse.gov/america-first-energy


Processing URLs:  40%|████      | 402/1000 [16:34<07:31,  1.32it/s]

Error extracting text from http://culture.polishsite.us/articles/art212fr.htm: HTTPConnectionPool(host='culture.polishsite.us', port=80): Max retries exceeded with url: /articles/art212fr.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3029d61e0>: Failed to resolve 'culture.polishsite.us' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 403/1000 [16:35<06:26,  1.55it/s]

Error extracting text from http://www.michigan.gov/snyder/0,4668,7-277-57577-318728--,00.html: 403 Client Error: Forbidden for url: http://www.michigan.gov/snyder/0,4668,7-277-57577-318728--,00.html
URL filtered: https://www.youtube.com/watch?v=132VQIyn5LY
Error extracting text from https://www.reuters.com/world/ukraine-says-russian-aircraft-fired-belarus-ukrainian-air-space-2022-03-11/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/ukraine-says-russian-aircraft-fired-belarus-ukrainian-air-space-2022-03-11/


Processing URLs:  41%|████      | 407/1000 [16:38<07:51,  1.26it/s]

Error extracting text from http://www.ibtimes.com/apple-inc-aapl-q4-2015-earnings-preview-doubts-surround-iphone-6s-sales-momentum-2156564: 403 Client Error: Forbidden for url: https://www.ibtimes.com/apple-inc-aapl-q4-2015-earnings-preview-doubts-surround-iphone-6s-sales-momentum-2156564


Processing URLs:  41%|████      | 410/1000 [16:42<10:15,  1.04s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-syria-usa-idUKKBN19I07K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  41%|████▏     | 414/1000 [16:46<08:22,  1.17it/s]

Error extracting text from https://www.yahoo.com/finance/video/leaked-clinton-emails-show-she-114400435.html: 400 Client Error: Invalid HTTP Request for url: https://finance.yahoo.com/video/leaked-clinton-emails-show-she-114400435.html
URL filtered: https://twitter.com/RusConsulGen/status/814814558678294528
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-signals-readiness-for-new-election-after-coalition-talks-collapse-idUSKBN1DJ0I3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-signals-readiness-for-new-election-after-coalition-talks-collapse-idUSKBN1DJ0I3?il=0


Processing URLs:  42%|████▏     | 417/1000 [16:58<23:16,  2.39s/it]

Error extracting text from http://www.nytimes.com/2016/05/12/business/energy-environment/canadas-oil-sands-industry-staggers-after-a-devastating-fire.html?emc=edit_th_20160512&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/12/business/energy-environment/canadas-oil-sands-industry-staggers-after-a-devastating-fire.html?emc=edit_th_20160512&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  42%|████▏     | 423/1000 [17:16<19:01,  1.98s/it]

Error extracting text from http://qvmgroup.com/invest/2013/03/19/how-often-does-sp-500-have-10-and-20-negative-price-moves/: 404 Client Error: Not Found for url: http://qvmgroup.com/invest/2013/03/19/how-often-does-sp-500-have-10-and-20-negative-price-moves/
URL filtered: http://www.bloomberg.com/politics/articles/2015-11-28/karl-rove-opens-his-rolodex-for-ben-carson


Processing URLs:  43%|████▎     | 430/1000 [17:29<19:56,  2.10s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/yemen-rebels-reject-envoy-mediator-conflict-47850994: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/yemen-rebels-reject-envoy-mediator-conflict-47850994


Processing URLs:  43%|████▎     | 432/1000 [17:32<15:55,  1.68s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O2QLA36S972801-1A7NPOH8A5P3B9JPIUBF23DDIS


Processing URLs:  43%|████▎     | 434/1000 [17:33<12:04,  1.28s/it]

Error extracting text from http://www.middle-east-online.com/english/?id=75612: 404 Client Error: Not Found for url: https://www.middle-east-online.com/english/?id=75612


Processing URLs:  44%|████▍     | 439/1000 [17:40<08:34,  1.09it/s]

URL filtered: https://www.youtube.com/watch?v=vFr3K2DORc8
Error extracting text from http://www.reuters.com/article/us-brazil-corruption-cunha-idUSKCN0Z02MY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-cunha-idUSKCN0Z02MY?il=0


Processing URLs:  44%|████▍     | 441/1000 [17:40<05:25,  1.72it/s]

Error extracting text from http://www.reuters.com/article/us-china-vietnam-idUSKBN14Y0JZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-vietnam-idUSKBN14Y0JZ
Error extracting text from https://www.chathamhouse.org/publication/cyber-security-civil-nuclear-facilities-understanding-risks: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/publication/cyber-security-civil-nuclear-facilities-understanding-risks


Processing URLs:  44%|████▍     | 443/1000 [17:43<07:56,  1.17it/s]

Error extracting text from http://www.wsj.com/articles/brexit-could-leave-economy-3-6-smaller-says-u-k-treasury-1463996928: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brexit-could-leave-economy-3-6-smaller-says-u-k-treasury-1463996928


Processing URLs:  44%|████▍     | 444/1000 [17:43<06:19,  1.46it/s]

Error extracting text from https://www.nytimes.com/2015/08/25/opinion/joe-nocera-the-man-who-got-china-right.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/08/25/opinion/joe-nocera-the-man-who-got-china-right.html


Processing URLs:  45%|████▍     | 448/1000 [17:50<15:28,  1.68s/it]

Error extracting text from https://ycharts.com/indicators/us_rotary_rigs: 403 Client Error: Forbidden for url: https://ycharts.com/indicators/us_rotary_rigs


Processing URLs:  45%|████▌     | 450/1000 [17:51<11:06,  1.21s/it]

Error extracting text from http://eleccions25n.cat/: HTTPConnectionPool(host='eleccions25n.cat', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe9104a0>: Failed to resolve 'eleccions25n.cat' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  45%|████▌     | 453/1000 [17:56<12:51,  1.41s/it]

URL filtered: https://www.youtube.com/watch?v=TRCUO7-lbUE&amp;sns=em


Processing URLs:  46%|████▌     | 459/1000 [18:00<06:21,  1.42it/s]

URL filtered: https://www.youtube.com/watch?v=kFRLeL25HyU&amp;t=1s
Error extracting text from https://www.taiwanembassy.org/ni_es/post/16995.html: 403 Client Error: Forbidden for url: https://www.taiwanembassy.org/ni_es/post/16995.html


Processing URLs:  46%|████▋     | 465/1000 [18:06<06:00,  1.48it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-01/merkel-s-nato-defense-pledge-doubted-by-german-foreign-minister
Error extracting text from https://www.ajot.com/blogs/full/panama-canal-administrator-says-inauguration-on-schedule-for-june: 403 Client Error: Forbidden for url: https://www.ajot.com/blogs/full/panama-canal-administrator-says-inauguration-on-schedule-for-june


Processing URLs:  47%|████▋     | 470/1000 [18:09<04:34,  1.93it/s]

Error extracting text from http://www.nytimes.com/2016/03/30/science/nuclear-fuels-are-vulnerable-despite-a-push.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/30/science/nuclear-fuels-are-vulnerable-despite-a-push.html?_r=0


Processing URLs:  48%|████▊     | 477/1000 [18:18<06:47,  1.28it/s]

Error extracting text from http://www.nytimes.com/2015/10/08/us/politics/donald-trumps-bombast-seems-to-be-wearing-out-its-welcome.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/08/us/politics/donald-trumps-bombast-seems-to-be-wearing-out-its-welcome.html


Processing URLs:  48%|████▊     | 478/1000 [18:23<18:38,  2.14s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-07-27/venezuela-bonds-rally-as-pdvsa-swap-bets-ease-default-concern


Processing URLs:  48%|████▊     | 483/1000 [18:26<08:15,  1.04it/s]

URL filtered: https://www.youtube.com/watch?v=PVtUlEcIoG4


Processing URLs:  49%|████▉     | 490/1000 [18:36<10:55,  1.29s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdomagno.com.br/ver_post.php%3Fid%3D143933&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.blogdomagno.com.br/ver_post.php%3Fid%3D143933&amp;prev=search


Processing URLs:  49%|████▉     | 492/1000 [18:40<12:21,  1.46s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/china-conducts-combat-pa/3020194.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/china-conducts-combat-pa/3020194.html


Processing URLs:  49%|████▉     | 493/1000 [18:43<15:19,  1.81s/it]

Error extracting text from http://www.ibtimes.com/cern-lhc-update-large-hadron-collider-starts-2017-operations-earnest-2542974: 403 Client Error: Forbidden for url: https://www.ibtimes.com/cern-lhc-update-large-hadron-collider-starts-2017-operations-earnest-2542974


Processing URLs:  50%|████▉     | 499/1000 [18:59<26:17,  3.15s/it]

Error extracting text from http://www.acleddata.com/data/realtime-data-2016/: 404 Client Error: Not Found for url: https://acleddata.com/data/realtime-data-2016/


Processing URLs:  50%|█████     | 502/1000 [19:04<16:49,  2.03s/it]

Error extracting text from https://gcaptain.com/panama-canal-expansion-to-open-end-june/: 403 Client Error: Forbidden for url: https://gcaptain.com/panama-canal-expansion-to-open-end-june/


Processing URLs:  50%|█████     | 503/1000 [27:04<20:04:36, 145.42s/it]

Error extracting text from https://www.thespainreport.com/articles/795-160725201523-not-even-a-begrudging-minority-deal-for-rajoy: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/795-160725201523-not-even-a-begrudging-minority-deal-for-rajoy (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303777050>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  51%|█████     | 507/1000 [27:10<5:22:55, 39.30s/it]  

URL filtered: https://www.youtube.com/watch?v=rbsqaJwpu6A
Error extracting text from http://www.reuters.com/article/us-russia-finland-nato-putin-idUSKCN0ZH5IV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-finland-nato-putin-idUSKCN0ZH5IV


Processing URLs:  51%|█████     | 509/1000 [27:12<3:01:34, 22.19s/it]

Error extracting text from http://english.aawsat.com/2016/07/article55354736/mosul-12-women-protested-isis-executed: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/07/article55354736/mosul-12-women-protested-isis-executed
Error extracting text from http://www.khaama.com/afghanistans-160m-new-defense-ministry-mini-pentagon-inaugurated-1885: 403 Client Error: Forbidden for url: http://www.khaama.com/afghanistans-160m-new-defense-ministry-mini-pentagon-inaugurated-1885


Processing URLs:  51%|█████     | 512/1000 [27:12<1:19:38,  9.79s/it]

Error extracting text from http://www.nytimes.com/2016/01/08/world/middleeast/isis-ramadi-iraq-retaking.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/08/world/middleeast/isis-ramadi-iraq-retaking.html
Error extracting text from http://www.reuters.com/article/us-turkey-referendum-erdogan-conservativ-idUSKBN17C1TP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-erdogan-conservativ-idUSKBN17C1TP


Processing URLs:  51%|█████▏    | 513/1000 [27:13<1:00:37,  7.47s/it]

Error extracting text from http://predictit.org/: 403 Client Error: Forbidden for url: http://predictit.org/
URL filtered: http://www.bloomberg.com/politics/trackers/2015-12-16/congress-reaches-fiscal-deal-that-ends-u-s-crude-oil-export-ban


Processing URLs:  52%|█████▏    | 515/1000 [27:13<36:06,  4.47s/it]  

Error extracting text from http://thehill.com/blogs/floor-action/senate/255425-senate-passes-funding-bill-to-avert-government-shutdown: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/255425-senate-passes-funding-bill-to-avert-government-shutdown/


Processing URLs:  52%|█████▏    | 520/1000 [27:17<11:15,  1.41s/it]

Error extracting text from http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/19/world/middleeast/syria-talks-isis.html
Error extracting text from http://www.nytimes.com/2015/10/18/world/middleeast/iran-nuclear-deal-sanctions.html?emc=edit_th_20151018&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/18/world/middleeast/iran-nuclear-deal-sanctions.html?emc=edit_th_20151018&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  52%|█████▏    | 523/1000 [27:18<07:02,  1.13it/s]

Error extracting text from http://www.presstv.com/Detail/2015/12/07/440685/Iran-Russia-IAEA-JCPOA-P51: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2015/12/07/440685/Iran-Russia-IAEA-JCPOA-P51


Processing URLs:  53%|█████▎    | 527/1000 [27:30<17:29,  2.22s/it]

Error extracting text from http://bit.ly/2tDJIjP: 410 Client Error: Gone for url: https://www.tuko.co.ke/247827-raila-odinga-pulls-a-massive-crowd-dp-rutos-stronghold-photos.html?source=notification


Processing URLs:  53%|█████▎    | 531/1000 [27:34<09:54,  1.27s/it]

Error extracting text from http://www.cdm.me/english/stoltenberg-nato-will-know-how-to-appreciate-montenegros-progress-in-december: 403 Client Error: Forbidden for url: https://www.cdm.me/english/stoltenberg-nato-will-know-how-to-appreciate-montenegros-progress-in-december


Processing URLs:  54%|█████▍    | 538/1000 [27:39<05:36,  1.37it/s]

Error extracting text from https://www.reuters.com/article/us-venezuela-russia-visit/russias-putin-may-meet-venezuelas-maduro-kremlin-idUSKCN1C016U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-russia-visit/russias-putin-may-meet-venezuelas-maduro-kremlin-idUSKCN1C016U
URL filtered: https://www.bloomberg.com/news/articles/2017-03-16/behind-trump-s-russia-romance-there-s-a-tower-full-of-oligarchs


Processing URLs:  54%|█████▍    | 540/1000 [27:40<03:58,  1.93it/s]

Error extracting text from http://www.reuters.com/article/2015/11/27/china-bonds-idUSL3N13M2LN20151127#tG7F8LKMhKxSRhrD.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/27/china-bonds-idUSL3N13M2LN20151127#tG7F8LKMhKxSRhrD.97
Error extracting text from https://www.wsj.com/articles/treasury-yields-slip-after-inflation-data-matches-expectations-11639149666: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/treasury-yields-slip-after-inflation-data-matches-expectations-11639149666
URL filtered: https://twitter.com/NASAWebb/status/1473412410245632006


Processing URLs:  54%|█████▍    | 542/1000 [27:41<04:29,  1.70it/s]

Error extracting text from http://www.longfinance.net/images/GFCI18_23Sep2015.pdf: 404 Client Error: Not Found for url: https://www.longfinance.net/images/GFCI18_23Sep2015.pdf


Processing URLs:  55%|█████▍    | 545/1000 [27:47<08:30,  1.12s/it]

Error extracting text from http://www.reuters.com/article/us-usa-iran-missiles-idUSKBN0UF21G20160101: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-missiles-idUSKBN0UF21G20160101


Processing URLs:  55%|█████▌    | 550/1000 [28:01<17:51,  2.38s/it]

Error extracting text from http://www.lacarguy.com/green/article/mike-sullivan-talks-about-the-toyota-mirai-in-the-media: 404 Client Error: Not Found for url: https://www.lacarguy.com/green/article/mike-sullivan-talks-about-the-toyota-mirai-in-the-media


Processing URLs:  55%|█████▌    | 551/1000 [28:01<13:12,  1.77s/it]

Error extracting text from https://www.nytimes.com/2017/07/09/technology/att-time-warner-merger.html?emc=edit_th_20170710&amp;nl=todaysheadlines&amp;nlid=69656099&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/09/technology/att-time-warner-merger.html?emc=edit_th_20170710&amp;nl=todaysheadlines&amp;nlid=69656099&amp;_r=0


Processing URLs:  55%|█████▌    | 554/1000 [28:09<16:27,  2.21s/it]

Error extracting text from https://pythagorassite.files.wordpress.com/2016/04/normal_distribution_calculator_with_step_by_step_explanation.png?w=978: 404 Client Error: Not Found for url: https://pythagorassite.files.wordpress.com/2016/04/normal_distribution_calculator_with_step_by_step_explanation.png?w=978


Processing URLs:  56%|█████▌    | 560/1000 [28:15<06:14,  1.17it/s]

Error extracting text from http://thehill.com/homenews/administration/351696-clapper-conceivable-trump-was-picked-up-on-manafort-wiretap: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/351696-clapper-conceivable-trump-was-picked-up-on-manafort-wiretap/
Error extracting text from http://www.wsj.com/articles/international-meeting-on-syria-ceasefire-canceled-1455885745: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/international-meeting-on-syria-ceasefire-canceled-1455885745


Processing URLs:  56%|█████▌    | 562/1000 [28:18<09:24,  1.29s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2015/09/01/feingold_strickland_fight_history_in_comeback_bids_127938.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2015/09/01/feingold_strickland_fight_history_in_comeback_bids_127938.html


Processing URLs:  56%|█████▋    | 564/1000 [28:38<36:02,  4.96s/it]

Error extracting text from http://postimg.org/image/s4om2dsvj/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/s4om2dsvj/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f29f0>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  57%|█████▋    | 566/1000 [29:40<2:29:54, 20.72s/it]

Error extracting text from http://www.cmegroup.com/trading/agricultural/grain-and-oilseed/corn.html: HTTPConnectionPool(host='www.cmegroup.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  57%|█████▋    | 570/1000 [30:43<2:09:06, 18.02s/it]

Error extracting text from http://www.islandpacket.com/news/business/article130301804.html#storylink=cpy: HTTPConnectionPool(host='www.islandpacket.com', port=80): Read timed out. (read timeout=60)
Error extracting text from https://www.reuters.com/article/us-germany-politics/german-spd-leader-dampens-hopes-for-quick-coalition-deal-source-idUSKBN1FI13O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/german-spd-leader-dampens-hopes-for-quick-coalition-deal-source-idUSKBN1FI13O


Processing URLs:  57%|█████▋    | 571/1000 [30:44<1:33:30, 13.08s/it]

Error extracting text from http://syriadirect.org/news/russian-knocks-aleppo-water-treatment-plant-offline-%E2%80%98the-sight-of-water-flowing-from-a-faucet-has-become-almost-like-a-dream%E2%80%99/: 404 Client Error: Not Found for url: http://syriadirect.org/news/russian-knocks-aleppo-water-treatment-plant-offline-%E2%80%98the-sight-of-water-flowing-from-a-faucet-has-become-almost-like-a-dream%E2%80%99/


Processing URLs:  57%|█████▋    | 572/1000 [31:45<3:14:52, 27.32s/it]

Error extracting text from https://www.ikn.army.mil/apps/IKNWMS/IKN_Websites/CDI/MIPB/files/MIPBJan_Mar16IKN.pdf: HTTPSConnectionPool(host='www.ikn.army.mil', port=443): Max retries exceeded with url: /apps/IKNWMS/IKN_Websites/CDI/MIPB/files/MIPBJan_Mar16IKN.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3011f1c40>, 'Connection to www.ikn.army.mil timed out. (connect timeout=60)'))


Processing URLs:  57%|█████▋    | 574/1000 [31:55<1:51:39, 15.73s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-27/these-30-cent-bonds-are-barclays-s-top-pick-in-venezuela-default


Processing URLs:  58%|█████▊    | 580/1000 [33:02<2:21:31, 20.22s/it]

Error extracting text from http://www.usnews.com/news/business/articles/2015/11/30/greece-urges-no-delay-to-debt-relief-talks-with-creditors: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  58%|█████▊    | 582/1000 [33:07<1:20:57, 11.62s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/2016-02/02/content_6885785.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/2016-02/02/content_6885785.htm


Processing URLs:  59%|█████▉    | 588/1000 [33:13<15:24,  2.24s/it]  

Error extracting text from http://english.yonhapnews.co.kr/northkorea/2015/09/27/0401000000AEN20150927001700320.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  59%|█████▉    | 592/1000 [33:16<07:02,  1.04s/it]

Error extracting text from https://www.reuters.com/article/us-nireland-politics/british-irish-pms-to-visit-northern-ireland-urging-end-to-political-crisis-idUSKBN1FW00C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nireland-politics/british-irish-pms-to-visit-northern-ireland-urging-end-to-political-crisis-idUSKBN1FW00C


Processing URLs:  60%|█████▉    | 595/1000 [33:19<06:39,  1.01it/s]

Error extracting text from http://cysticfibrosisnewstoday.com/2015/10/27/vertex-crispr-use-gene-editing-search-new-cystic-fibrosis-treatments/: 403 Client Error: Forbidden for url: http://cysticfibrosisnewstoday.com/2015/10/27/vertex-crispr-use-gene-editing-search-new-cystic-fibrosis-treatments/
URL filtered: http://www.businessinsider.com/russian-linked-facebook-ads-sow-chaos-racial-religious-groups-2017-9


Processing URLs:  60%|█████▉    | 598/1000 [33:21<04:35,  1.46it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/08/asia-pacific/unsc-slams-pyongyang-rocket-launch-vows-hard-sanctions-despite-china-russia-reluctance/#.Vrq2d5MrJTY: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/02/08/asia-pacific/unsc-slams-pyongyang-rocket-launch-vows-hard-sanctions-despite-china-russia-reluctance/#.Vrq2d5MrJTY


Processing URLs:  60%|█████▉    | 599/1000 [33:22<06:21,  1.05it/s]

Error extracting text from http://www.nationalreview.com/article/434409/hillary-clinton-email-scandal-fbi-james-comey-justice-department-indict-2016: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/434409/hillary-clinton-email-scandal-fbi-james-comey-justice-department-indict-2016/


Processing URLs:  61%|██████    | 606/1000 [33:29<05:04,  1.30it/s]

Error extracting text from http://thehill.com/homenews/administration/257124-biden-would-run-optimistic-campaign: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/257124-biden-would-run-optimistic-campaign/


Processing URLs:  61%|██████    | 607/1000 [33:31<05:59,  1.09it/s]

Error extracting text from http://www.ibtimes.co.uk/: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/


Processing URLs:  61%|██████    | 610/1000 [33:33<04:40,  1.39it/s]

Error extracting text from http://www.reuters.com/article/us-iran-election-khamenei-idUSKCN0UN0GX20160109: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-khamenei-idUSKCN0UN0GX20160109


Processing URLs:  61%|██████    | 611/1000 [33:35<06:55,  1.07s/it]

Error extracting text from http://woodtv.com/2016/01/18/snyder-doesnt-have-to-release-flint-info-under-foia/: 404 Client Error: Not Found for url: https://www.woodtv.com/2016/01/18/snyder-doesnt-have-to-release-flint-info-under-foia/


Processing URLs:  61%|██████    | 612/1000 [33:35<05:20,  1.21it/s]

Error extracting text from http://post.understandingwar.org/report/competing-visions-syria-and-iraq-myth-anti-isis-grand-coalition%20: HTTPConnectionPool(host='post.understandingwar.org', port=80): Max retries exceeded with url: /report/competing-visions-syria-and-iraq-myth-anti-isis-grand-coalition%20 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303201970>: Failed to resolve 'post.understandingwar.org' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://cryptonews.com/news/facebook-to-ramp-up-payments-ahead-of-diem-launch-11097.htm


Processing URLs:  62%|██████▏   | 615/1000 [33:37<04:49,  1.33it/s]

Error extracting text from https://studentloanhero.com/student-loan-debt-statistics/: 403 Client Error: Forbidden for url: https://studentloanhero.com/student-loan-debt-statistics/


Processing URLs:  62%|██████▏   | 619/1000 [33:52<14:47,  2.33s/it]

Error extracting text from http://news.abs-cbn.com/video/news/10/27/16/pinoy-fishermen-say-they-can-fish-in-scarborough-shoal-again: 403 Client Error: Forbidden for url: http://news.abs-cbn.com/video/news/10/27/16/pinoy-fishermen-say-they-can-fish-in-scarborough-shoal-again


Processing URLs:  62%|██████▏   | 621/1000 [33:53<10:29,  1.66s/it]

Error extracting text from http://www.kyivpost.com/article/content/ukraine-politics/unian-imf-to-meet-on-ukraine-late-in-august-finance-ministry-419667.html: 403 Client Error: Forbidden for url: https://www.kyivpost.com/article/content/ukraine-politics/unian-imf-to-meet-on-ukraine-late-in-august-finance-ministry-419667.html
URL filtered: https://www.bloomberg.com/news/articles/2017-11-29/goldman-warns-highest-valuations-since-1900-mean-pain-is-coming


Processing URLs:  62%|██████▏   | 623/1000 [33:54<06:56,  1.10s/it]

Error extracting text from http://news.sky.com/story/1718716/syrian-government-forces-close-in-on-aleppo: 404 Client Error: Not Found for url: https://news.sky.com/story/1718716/syrian-government-forces-close-in-on-aleppo


Processing URLs:  62%|██████▏   | 624/1000 [33:56<08:34,  1.37s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1543151/000154315115000002/xslFormDX01/primary_doc.xml: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1543151/000154315115000002/xslFormDX01/primary_doc.xml


Processing URLs:  63%|██████▎   | 627/1000 [34:04<10:23,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN147005?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN147005?il=0


Processing URLs:  63%|██████▎   | 628/1000 [34:07<13:02,  2.10s/it]

Error extracting text from https://reut.rs/3r60lCU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  63%|██████▎   | 629/1000 [34:08<11:13,  1.82s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-usa-idUSKCN0ZT2TY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-usa-idUSKCN0ZT2TY


Processing URLs:  63%|██████▎   | 632/1000 [34:11<07:19,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-nato-idUSKBN0TQ0HU20151207#iUz8w0lm0ZRqKLMf.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-nato-idUSKBN0TQ0HU20151207#iUz8w0lm0ZRqKLMf.97


Processing URLs:  64%|██████▎   | 635/1000 [34:16<09:14,  1.52s/it]

Error extracting text from http://www.cis.upenn.edu/~ungar/papers/forecast_AAAI_MAGG.pdf: 404 Client Error: Not Found for url: https://www.cis.upenn.edu/~ungar/papers/forecast_AAAI_MAGG.pdf


Processing URLs:  64%|██████▎   | 636/1000 [34:17<07:40,  1.26s/it]

Error extracting text from https://www.amazon.com/When-Money-Dies-Devaluation-Hyperinflation/dp/1586489941: 500 Server Error:  for url: https://www.amazon.com/When-Money-Dies-Devaluation-Hyperinflation/dp/1586489941


Processing URLs:  64%|██████▍   | 638/1000 [34:18<05:48,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-china-usa-defence-idUSKBN16U0SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-usa-defence-idUSKBN16U0SB


Processing URLs:  64%|██████▍   | 640/1000 [34:21<07:20,  1.22s/it]

Error extracting text from https://www.barrons.com/articles/why-some-experts-think-a-bitcoin-etf-will-be-delayed-51620853749: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/why-some-experts-think-a-bitcoin-etf-will-be-delayed-51620853749


Processing URLs:  64%|██████▍   | 642/1000 [34:32<19:12,  3.22s/it]

Error extracting text from https://usun.state.gov/remarks/7933: 404 Client Error: Not Found for url: https://usun.usmission.gov/remarks/7933/


Processing URLs:  65%|██████▌   | 651/1000 [34:55<10:55,  1.88s/it]

URL filtered: https://www.stratfor.com/analysis/gap-widens-between-europes-north-and-south?utm_source=LinkedIn&amp;utm_medium=social&amp;utm_campaign=article
Error extracting text from https://www.reuters.com/article/uk-britain-politics-scotland/scotlands-conservative-party-leader-calls-bid-for-independence-vote-reckless-idUSKBN2A10X8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-politics-scotland/scotlands-conservative-party-leader-calls-bid-for-independence-vote-reckless-idUSKBN2A10X8


Processing URLs:  66%|██████▌   | 656/1000 [35:00<06:51,  1.20s/it]

Error extracting text from https://www.porttechnology.org/news/panama_canal_chief_releases_expansion_statement: 403 Client Error: Forbidden for url: https://www.porttechnology.org/news/panama_canal_chief_releases_expansion_statement


Processing URLs:  66%|██████▋   | 664/1000 [35:57<19:10,  3.42s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN10S0XU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN10S0XU


Processing URLs:  67%|██████▋   | 668/1000 [36:10<17:56,  3.24s/it]

Error extracting text from http://www.kiro7.com/news/world/the-latest-iranian-moderates-win-majority-in-parliament/129617730: 404 Client Error: Not Found for url: https://www.kiro7.com/news/world/the-latest-iranian-moderates-win-majority-in-parliament/129617730/


Processing URLs:  67%|██████▋   | 671/1000 [36:15<11:30,  2.10s/it]

Error extracting text from http://www.thelocal.it/20150709/venice-mayor-bans-books-on-homosexuality: 403 Client Error: Forbidden for url: https://www.thelocal.it/20150709/venice-mayor-bans-books-on-homosexuality


Processing URLs:  67%|██████▋   | 672/1000 [36:15<08:35,  1.57s/it]

URL filtered: https://twitter.com/mj_lee/status/685924600224452608


Processing URLs:  68%|██████▊   | 679/1000 [36:24<06:32,  1.22s/it]

Error extracting text from http://www.nytimes.com/2016/04/29/world/middleeast/iraq-joe-biden-visit.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/29/world/middleeast/iraq-joe-biden-visit.html?_r=0


Processing URLs:  69%|██████▉   | 689/1000 [36:41<04:52,  1.06it/s]

Error extracting text from http://www.nytimes.com/2015/11/21/business/international/greece-presents-2016-budget-and-revamps-recession-prediction.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/21/business/international/greece-presents-2016-budget-and-revamps-recession-prediction.html?_r=0
Error extracting text from http://www.reuters.com/article/us-bmw-i-idUSKCN0ZQ0O7?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-bmw-i-idUSKCN0ZQ0O7?il=0


Processing URLs:  69%|██████▉   | 693/1000 [36:45<06:28,  1.26s/it]

Error extracting text from http://www.deepmind.com/alpha-go.html: 404 Client Error: Not Found for url: https://deepmind.google/alpha-go.html


Processing URLs:  70%|███████   | 702/1000 [37:01<07:00,  1.41s/it]

Error extracting text from https://www.reuters.com/article/us-hongkong-protests-china/chinese-official-says-hong-kong-facing-biggest-crisis-since-1997-idUSKCN1UX07K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hongkong-protests-china/chinese-official-says-hong-kong-facing-biggest-crisis-since-1997-idUSKCN1UX07K


Processing URLs:  70%|███████   | 703/1000 [37:01<05:31,  1.12s/it]

Error extracting text from https://thehill.com/homenews/senate/541826-senate-rejects-sanders-15-minimum-wage-hike: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/541826-senate-rejects-sanders-15-minimum-wage-hike/


Processing URLs:  71%|███████   | 706/1000 [37:06<05:39,  1.15s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/261397-poll-trump-drops-12-points-in-one-week: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/261397-poll-trump-drops-12-points-in-one-week/


Processing URLs:  71%|███████   | 708/1000 [37:12<10:34,  2.17s/it]

Error extracting text from http://www.france24.com/en/20160129-iran-coerces-afghans-fight-syria-shiite-militia: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160129-iran-coerces-afghans-fight-syria-shiite-militia


Processing URLs:  71%|███████▏  | 713/1000 [37:19<08:10,  1.71s/it]

Error extracting text from http://www.amazon.com/Relentless-Strike-History-Special-Operations/dp/1250014549: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Relentless-Strike-History-Special-Operations/dp/1250014549


Processing URLs:  71%|███████▏  | 714/1000 [37:20<06:55,  1.45s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-20/oil-speculators-most-bullish-in-two-months-as-opec-calls-for-80


Processing URLs:  72%|███████▏  | 716/1000 [37:21<04:21,  1.09it/s]

Error extracting text from https://www.wayneformayor.com/updates/im-running-for-mayor-of-berkeley: HTTPSConnectionPool(host='www.wayneformayor.com', port=443): Max retries exceeded with url: /updates/im-running-for-mayor-of-berkeley (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.wayneformayor.com'. (_ssl.c:1000)")))


Processing URLs:  72%|███████▏  | 721/1000 [37:28<06:27,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-eu-usa-ttip-merkel-idUSKCN11E332?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-usa-ttip-merkel-idUSKCN11E332?il=0


Processing URLs:  72%|███████▏  | 722/1000 [37:30<06:56,  1.50s/it]

Error extracting text from https://2015burundi.crowdmap.com/main: 404 Client Error: Not Found for url: https://2015burundi.crowdmap.com/main


Processing URLs:  72%|███████▏  | 723/1000 [37:30<05:10,  1.12s/it]

Error extracting text from https://au.news.yahoo.com/world/a/31038030/brazils-president-rousseff-fights-for-her-political-life/: 404 Client Error: Not Found for url: https://au.news.yahoo.com/brazils-president-rousseff-fights-for-her-political-life-31038030.html


Processing URLs:  73%|███████▎  | 726/1000 [37:33<04:32,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-cambodia-china-japan-idUSKCN0VQ152: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cambodia-china-japan-idUSKCN0VQ152


Processing URLs:  73%|███████▎  | 729/1000 [37:35<03:29,  1.30it/s]

Error extracting text from http://www.nytimes.com/2016/06/10/world/europe/brexit-britain-european-union-media.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/10/world/europe/brexit-britain-european-union-media.html


Processing URLs:  73%|███████▎  | 731/1000 [37:37<03:25,  1.31it/s]

Error extracting text from https://www.dhs.gov/sites/default/files/ntas/alerts/21_0127_ntas-bulletin.pdf: 403 Client Error: Forbidden for url: https://www.dhs.gov/sites/default/files/ntas/alerts/21_0127_ntas-bulletin.pdf
Error extracting text from http://www.reuters.com/article/2015/06/10/us-japan-economy-boj-sato-idUSKBN0OQ05V20150610: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/06/10/us-japan-economy-boj-sato-idUSKBN0OQ05V20150610
URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/pioneer-s-sheffield-sees-40-oil-if-opec-doesn-t-extend-cuts


Processing URLs:  73%|███████▎  | 734/1000 [37:38<02:07,  2.09it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/18/oil-export-ban/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/18/oil-export-ban/


Processing URLs:  74%|███████▍  | 743/1000 [37:56<07:50,  1.83s/it]

URL filtered: http://globalnation.inquirer.net/140947/key-points-arbitral-tribunal-decision-verdict-award-philippines-china-maritime-dispute-unclos-arbitration-spratly-islands-scarborough?utm_content=buffer00064&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  74%|███████▍  | 745/1000 [38:00<07:56,  1.87s/it]

Error extracting text from http://www.emlakmetrekare.com/2016/04/11/akkuyu-nukleer-santrali-icin-acele-kamulastirma/: 404 Client Error: Not Found for url: http://www.emlakmetrekare.com/2016/04/11/akkuyu-nukleer-santrali-icin-acele-kamulastirma/


Processing URLs:  75%|███████▍  | 747/1000 [38:02<06:21,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/iraq-commanders-weigh-tactical-shift-in-mosul-1480013684: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraq-commanders-weigh-tactical-shift-in-mosul-1480013684


Processing URLs:  75%|███████▍  | 748/1000 [38:04<05:49,  1.39s/it]

Error extracting text from http://www.wow.com/wiki/Battle_of_Grozny_(August_1996): 404 Client Error: Not Found for url: https://www.wow.com/wiki/Battle_of_Grozny_(August_1996)


Processing URLs:  75%|███████▌  | 750/1000 [38:05<03:51,  1.08it/s]

Error extracting text from https://www.newsweek.com/us-talks-tense-china-turns-russia-advance-cooperation-1577901: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-talks-tense-china-turns-russia-advance-cooperation-1577901
Error extracting text from http://www.nytimes.com/2016/01/14/world/middleeast/defense-secretary-lauds-push-against-isis-in-speech-to-deploying-troops.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/world/middleeast/defense-secretary-lauds-push-against-isis-in-speech-to-deploying-troops.html


Processing URLs:  75%|███████▌  | 751/1000 [38:05<03:09,  1.31it/s]

Error extracting text from https://www.reuters.com/article/us-trade-nafta/u-s-rejects-proposals-to-unblock-nafta-will-seek-breakthroughs-idUSKBN1FI1N5?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta/u-s-rejects-proposals-to-unblock-nafta-will-seek-breakthroughs-idUSKBN1FI1N5?il=0


Processing URLs:  75%|███████▌  | 754/1000 [38:09<03:46,  1.09it/s]

Error extracting text from http://www.wsj.com/articles/syria-shifts-stance-toward-kurds-1471944602: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-shifts-stance-toward-kurds-1471944602


Processing URLs:  76%|███████▌  | 756/1000 [38:19<13:21,  3.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-22/volkswagen-probed-by-u-s-multistate-group-on-pollution-cheating


Processing URLs:  77%|███████▋  | 766/1000 [38:34<06:17,  1.61s/it]

Error extracting text from https://gis.harvard.edu/usa-states-and-territories/new-hampshire: 404 Client Error: Not Found for url: https://gis.harvard.edu/usa-states-and-territories/new-hampshire


Processing URLs:  77%|███████▋  | 768/1000 [38:35<04:36,  1.19s/it]

Error extracting text from http://tricorder.xprize.org/about/overview: 404 Client Error: Not Found for url: http://tricorder.xprize.org/about/overview


Processing URLs:  77%|███████▋  | 769/1000 [38:36<03:28,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/01/25/us/politics/refugees-immigrants-wall-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/25/us/politics/refugees-immigrants-wall-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  77%|███████▋  | 771/1000 [39:38<1:12:35, 19.02s/it]

Error extracting text from https://www.olympic.org/news/joint-statement-from-the-international-olympic-committee-and-the-tokyo-2020-organising-committee: HTTPSConnectionPool(host='www.olympic.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  77%|███████▋  | 772/1000 [39:39<51:42, 13.61s/it]  

Error extracting text from http://www.balkaninsight.com/en/article/croatia-stalls-serbia-s-negotiation-chapters-opening-04-07-2016-1: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/croatia-stalls-serbia-s-negotiation-chapters-opening-04-07-2016-1


Processing URLs:  78%|███████▊  | 776/1000 [39:42<17:22,  4.66s/it]

Error extracting text from https://www.crunchbase.com/funding_round/unity-biotechnology-series-b--499993df#section-overview: 403 Client Error: Forbidden for url: https://www.crunchbase.com/funding_round/unity-biotechnology-series-b--499993df#section-overview


Processing URLs:  78%|███████▊  | 777/1000 [39:43<13:07,  3.53s/it]

Error extracting text from http://www.crainsnewyork.com/article/20151101/MEDIA_ENTERTAINMENT/311019991/joe-ripp-wants-everyone-to-believe-time-inc-is-a-media-company-for-the-21st-century-not-everyone-is-buying-it: 403 Client Error: Forbidden for url: https://www.crainsnewyork.com/article/20151101/MEDIA_ENTERTAINMENT/311019991/joe-ripp-wants-everyone-to-believe-time-inc-is-a-media-company-for-the-21st-century-not-everyone-is-buying-it


Processing URLs:  78%|███████▊  | 779/1000 [39:44<07:29,  2.04s/it]

Error extracting text from http://tuoitrenews.vn/politics/32545/rough-waters-ahead-the-east-vietnam-sea-in-2016: 500 Server Error: Internal Server Error for url: https://tuoitrenews.vn/news/20140628/35500-spell-scholarships-for-vietnam-students-in-past-decade/32545.html
Error extracting text from https://www.pakistantoday.com.pk/2017/05/29/indian-army-not-expecting-limited-war-with-pakistan/: 403 Client Error: Forbidden for url: https://www.pakistantoday.com.pk/2017/05/29/indian-army-not-expecting-limited-war-with-pakistan/


Processing URLs:  78%|███████▊  | 784/1000 [41:55<39:37, 11.01s/it]  

Error extracting text from http://www.europe.easybranches.com/bulgaria/Russia-will-Build-the-First-Nuclear-Power-Plant-in-Egypt-145002: HTTPConnectionPool(host='www.europe.easybranches.com', port=80): Max retries exceeded with url: /bulgaria/Russia-will-Build-the-First-Nuclear-Power-Plant-in-Egypt-145002 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff6005f0>: Failed to resolve 'www.europe.easybranches.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▊  | 787/1000 [41:57<17:46,  5.01s/it]

Error extracting text from https://www.neweurope.eu/article/montenegro-heads-polls-nato-membership-stake/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/montenegro-heads-polls-nato-membership-stake/


Processing URLs:  79%|███████▉  | 789/1000 [41:58<10:04,  2.86s/it]

Error extracting text from http://www.nytimes.com/2016/05/11/world/asia/bangladesh-executed-motiur-rahman-nizami.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/11/world/asia/bangladesh-executed-motiur-rahman-nizami.html?_r=1
URL filtered: https://techcrunch.com/2017/10/27/russian-government-condemns-twitters-ad-ban-for-russia-today-and-sputnik/


Processing URLs:  79%|███████▉  | 792/1000 [42:04<08:00,  2.31s/it]

URL filtered: https://www.youtube.com/watch?v=yO2n7QoyieM


Processing URLs:  80%|███████▉  | 797/1000 [42:07<03:19,  1.02it/s]

Error extracting text from https://www.wsj.com/articles/for-aramco-insiders-princes-2-trillion-ipo-valuation-doesnt-add-up-1493064170?tesla=y&amp;mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-aramco-insiders-princes-2-trillion-ipo-valuation-doesnt-add-up-1493064170?tesla=y&amp;mod=e2fb


Processing URLs:  80%|███████▉  | 799/1000 [42:07<02:00,  1.67it/s]

Error extracting text from http://www.rand.org/content/dam/rand/pubs/perspectives/PE100/PE166/RAND_PE166.pdf: 403 Client Error: Forbidden for url: https://www.rand.org/content/dam/rand/pubs/perspectives/PE100/PE166/RAND_PE166.pdf
Error extracting text from http://www.nytimes.com/2006/08/07/business/worldbusiness/07cnd-soda.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2006/08/07/business/worldbusiness/07cnd-soda.html


Processing URLs:  80%|████████  | 800/1000 [42:08<02:20,  1.43it/s]

URL filtered: https://www.youtube.com/watch?v=3JvkaUvB-ec


Processing URLs:  80%|████████  | 802/1000 [42:21<10:49,  3.28s/it]

URL filtered: https://www.youtube.com/watch?v=bOGD9mvzIi4


Processing URLs:  80%|████████  | 805/1000 [42:25<07:18,  2.25s/it]

Error extracting text from https://www.reuters.com/article/us-usa-biden-brazil/brazils-bolsonaro-says-he-wants-free-trade-agreement-with-u-s-in-letter-to-biden-idUSKBN29P2RM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-brazil/brazils-bolsonaro-says-he-wants-free-trade-agreement-with-u-s-in-letter-to-biden-idUSKBN29P2RM


Processing URLs:  81%|████████  | 809/1000 [42:28<03:47,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-impeachment-vote-idUSKCN10L0HO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-impeachment-vote-idUSKCN10L0HO


Processing URLs:  81%|████████  | 811/1000 [42:30<03:40,  1.17s/it]

Error extracting text from http://www.latimes.com/politics/la-na-pol-alabama-senate-black-voters-20171210-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-alabama-senate-black-voters-20171210-story.html


Processing URLs:  82%|████████▏ | 818/1000 [42:40<03:43,  1.23s/it]

Error extracting text from http://goo.gl/y6JZmi: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/08/world/europe/european-union-britain-pew-poll.html


Processing URLs:  82%|████████▏ | 820/1000 [42:45<05:29,  1.83s/it]

Error extracting text from http://38north.org/2017/03/gtoloraya030717/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  82%|████████▏ | 822/1000 [42:52<07:44,  2.61s/it]

Error extracting text from https://in.reuters.com/article/us-iran-cyber/once-kittens-in-cyber-spy-world-iran-gains-prowess-security-experts-idINKCN1BV1VA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  82%|████████▏ | 823/1000 [42:54<06:54,  2.34s/it]

Error extracting text from http://worldmaritimenews.com/archives/181481/cargo-ship-sinks-after-collision-in-panama-canal/: HTTPConnectionPool(host='worldmaritimenews.com', port=80): Max retries exceeded with url: /archives/181481/cargo-ship-sinks-after-collision-in-panama-canal/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fae62210>: Failed to resolve 'worldmaritimenews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  83%|████████▎ | 828/1000 [43:01<05:01,  1.75s/it]

Error extracting text from http://www.cnbc.com/2015/09/22/flash-china-caixin-pmi-falls-to-47-in-september-a-6-12-year-low.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/09/22/flash-china-caixin-pmi-falls-to-47-in-september-a-6-12-year-low.html


Processing URLs:  83%|████████▎ | 833/1000 [43:21<13:40,  4.91s/it]

URL filtered: https://arstechnica.com/tech-policy/2017/03/google-facebook-twitter-must-comply-with-eu-consumer-law/


Processing URLs:  84%|████████▎ | 837/1000 [43:24<05:40,  2.09s/it]

Error extracting text from https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/January/REINZ%20Monthly%20Property%20Report%20-%20January%202021.pdf: 404 Client Error: Not Found for url: https://www.reinz.co.nz/Media/Default/Statistic%20Documents/2021/Residential/January/REINZ%20Monthly%20Property%20Report%20-%20January%202021.pdf


Processing URLs:  84%|████████▍ | 839/1000 [43:25<03:14,  1.21s/it]

Error extracting text from http://www.wsj.com/articles/discord-between-chinas-top-two-leaders-spills-into-the-open-1469134110: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/discord-between-chinas-top-two-leaders-spills-into-the-open-1469134110
Error extracting text from http://www.nytimes.com/2016/09/15/upshot/as-clinton-trump-race-tightens-heres-how-forecast-models-differ.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/15/upshot/as-clinton-trump-race-tightens-heres-how-forecast-models-differ.html


Processing URLs:  84%|████████▍ | 841/1000 [43:27<03:03,  1.15s/it]

URL filtered: https://twitter.com/andjames


Processing URLs:  84%|████████▍ | 843/1000 [43:29<02:39,  1.02s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/toyota+mirai+fcv+future+nov17.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/toyota+mirai+fcv+future+nov17/


Processing URLs:  85%|████████▌ | 851/1000 [43:42<03:58,  1.60s/it]

Error extracting text from http://www.tessgerritsen.com/how-many-copies-sold-is-a-bestseller/: 404 Client Error: Not Found for url: https://www.tessgerritsen.com/how-many-copies-sold-is-a-bestseller/


Processing URLs:  85%|████████▌ | 853/1000 [43:58<13:42,  5.60s/it]

Error extracting text from https://www.almasdarnews.com/article/strategic-town-al-eis-recaptured-night-raid-led-iranian-troops-map-update/: 522 Server Error:  for url: https://www.almasdarnews.com/article/strategic-town-al-eis-recaptured-night-raid-led-iranian-troops-map-update/


Processing URLs:  86%|████████▌ | 855/1000 [44:00<07:37,  3.15s/it]

Error extracting text from https://commonslibrary.parliament.uk/northern-ireland-protocol-article-16-and-eu-vaccine-export-controls/: 403 Client Error: Forbidden for url: https://commonslibrary.parliament.uk/northern-ireland-protocol-article-16-and-eu-vaccine-export-controls/


Processing URLs:  86%|████████▌ | 860/1000 [44:10<05:07,  2.20s/it]

Error extracting text from http://www.huffingtonpost.co.za/sarah-evans/mbete-disunity-will-cost-anc-in-2019_a_23064305/: 502 Server Error: Bad Gateway for url: https://www.huffingtonpost.co.za/sarah-evans/mbete-disunity-will-cost-anc-in-2019_a_23064305/


Processing URLs:  86%|████████▌ | 862/1000 [44:12<03:44,  1.62s/it]

Error extracting text from https://www.newsweek.com/poll-american-covid-lab-1597027: 403 Client Error: Forbidden for url: https://www.newsweek.com/poll-american-covid-lab-1597027


Processing URLs:  86%|████████▋ | 865/1000 [44:15<02:22,  1.05s/it]

Error extracting text from https://thehill.com/opinion/white-house/556133-democrats-use-of-congressional-review-act-puts-filibuster-debate-in-new?rl=1: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/556133-democrats-use-of-congressional-review-act-puts-filibuster-debate-in-new/?rl=1


Processing URLs:  87%|████████▋ | 868/1000 [44:18<02:07,  1.03it/s]

Error extracting text from https://www.wsj.com/articles/u-s-strategy-shift-in-syria-turns-focus-from-isis-to-iran-1516227677: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-strategy-shift-in-syria-turns-focus-from-isis-to-iran-1516227677


Processing URLs:  87%|████████▋ | 869/1000 [44:19<02:05,  1.05it/s]

Error extracting text from http://www.financialexpress.com/world-news/iraqi-forces-yet-to-seal-off-mosul-as-battle-enters-second-month/448676/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/world-news/iraqi-forces-yet-to-seal-off-mosul-as-battle-enters-second-month/448676/


Processing URLs:  87%|████████▋ | 872/1000 [44:23<02:05,  1.02it/s]

Error extracting text from http://www.reuters.com/article/2015/09/10/us-libya-security-talks-idUSKCN0RA2LK20150910: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/10/us-libya-security-talks-idUSKCN0RA2LK20150910
Error extracting text from https://www.barrons.com/articles/gm-gets-back-in-the-fast-lane-1517624680: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/gm-gets-back-in-the-fast-lane-1517624680


Processing URLs:  88%|████████▊ | 875/1000 [44:24<01:30,  1.38it/s]

Error extracting text from https://www.scientificamerican.com/article/the-u-k-coronavirus-mutation-is-worrying-but-not-terrifying/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/the-u-k-coronavirus-mutation-is-worrying-but-not-terrifying/


Processing URLs:  88%|████████▊ | 878/1000 [44:32<04:09,  2.05s/it]

URL filtered: https://www.theguardian.com/technology/2018/nov/24/mps-seize-cache-facebook-internal-papers


Processing URLs:  88%|████████▊ | 884/1000 [44:36<01:37,  1.20it/s]

URL filtered: https://www.youtube.com/watch?v=RwUGSYDKUxU
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O0TAWX6JIJUX01-0BPUM03CFMT0LAIGKO1SVVOM0G
Error extracting text from http://www.nytimes.com/2015/11/06/opinion/in-iran-a-deal-and-then-a-crackdown.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/06/opinion/in-iran-a-deal-and-then-a-crackdown.html?_r=0


Processing URLs:  89%|████████▉ | 890/1000 [44:44<01:49,  1.00it/s]

Error extracting text from http://trajectorymagazine.com/civil/item/1369-human-domain-analytics.html: 403 Client Error: Forbidden for url: http://trajectorymagazine.com/civil/item/1369-human-domain-analytics.html


Processing URLs:  89%|████████▉ | 894/1000 [44:52<03:01,  1.71s/it]

Error extracting text from http://laautoshow.com/floor-map-exhibitors-public-days/: 404 Client Error: Not Found for url: https://laautoshow.com/floor-map-exhibitors-public-days/


Processing URLs:  90%|████████▉ | 896/1000 [44:53<01:55,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-kremlin-idUSKBN15509L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-kremlin-idUSKBN15509L


Processing URLs:  90%|████████▉ | 897/1000 [44:54<01:54,  1.11s/it]

Error extracting text from https://www.humboldtforum.org/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/


Processing URLs:  90%|█████████ | 901/1000 [44:57<01:13,  1.34it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-us-russia-idUSKCN0UN0I320160109?feedType=RSS&amp;feedName=topNews&amp;google_editors_picks=true: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-us-russia-idUSKCN0UN0I320160109?feedType=RSS&amp;feedName=topNews&amp;google_editors_picks=true
Error extracting text from http://www.oddschecker.com/cricket/t20-world-cup: 403 Client Error: Forbidden for url: http://www.oddschecker.com/cricket/t20-world-cup


Processing URLs:  90%|█████████ | 903/1000 [45:03<03:18,  2.05s/it]

URL filtered: https://twitter.com/JamesKanag/status/1374314357606260743


Processing URLs:  91%|█████████ | 912/1000 [45:14<01:21,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuelas-maduro-dismisses-default-possibility-on-eve-of-debt-talks-idUSKBN1DC117?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuelas-maduro-dismisses-default-possibility-on-eve-of-debt-talks-idUSKBN1DC117?il=0
Error extracting text from https://townhall.com/columnists/jeffjacoby/2017/06/26/untitled-n2346780: 403 Client Error: Forbidden for url: https://townhall.com/columnists/jeffjacoby/2017/06/26/untitled-n2346780


Processing URLs:  91%|█████████▏| 914/1000 [53:16<3:25:08, 143.12s/it]

Error extracting text from https://www.thespainreport.com/articles/873-160827132841-rajoy-says-new-government-more-a-wish-than-a-fact: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/873-160827132841-rajoy-says-new-government-more-a-wish-than-a-fact (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3052c3a40>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  92%|█████████▏| 916/1000 [53:17<1:39:06, 70.79s/it] 

Error extracting text from http://www.nato.int/cps/en/natohq/news_132596.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_132596.htm


Processing URLs:  92%|█████████▏| 917/1000 [53:17<1:08:47, 49.73s/it]

Error extracting text from https://www.nytimes.com/2021/01/15/briefing/biden-stimulus-indonesia-earthquake-impeachment-vote.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/15/briefing/biden-stimulus-indonesia-earthquake-impeachment-vote.html


Processing URLs:  92%|█████████▏| 918/1000 [53:33<53:58, 39.50s/it]  

Error extracting text from https://www.wired.com/2016/08/new-form-hacking-breaks-ideas-computers-work/: 503 Server Error: Service Unavailable for url: https://www.wired.com/2016/08/new-form-hacking-breaks-ideas-computers-work/


Processing URLs:  92%|█████████▏| 919/1000 [53:34<38:06, 28.23s/it]

Error extracting text from http://www.fcctsv/document/restoring-internet-freedom-notice-proposed-rulemaking: HTTPConnectionPool(host='www.fcctsv', port=80): Max retries exceeded with url: /document/restoring-internet-freedom-notice-proposed-rulemaking (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3052c32f0>: Failed to resolve 'www.fcctsv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▏| 921/1000 [53:37<20:43, 15.74s/it]

Error extracting text from http://www.huffingtonpost.in/2017/01/24/tamil-nadu-traders-will-not-sell-coke-and-pepsi-from-march-1-on/: 404 Client Error: Not Found for url: https://www.huffpost.com/archive/in/entry/2017/01/24/tamil-nadu-traders-will-not-sell-coke-and-pepsi-from-march-1-on/


Processing URLs:  93%|█████████▎| 926/1000 [53:49<06:26,  5.23s/it]

Error extracting text from http://www.malaya.com.ph/business-news/news/afp-treads-softly-scarborough: 404 Client Error: Not Found for url: https://malaya.com.ph/business-news/news/afp-treads-softly-scarborough


Processing URLs:  93%|█████████▎| 927/1000 [53:51<05:06,  4.20s/it]

Error extracting text from http://atimes.com/2016/04/pentagon-trying-to-stop-chinese-air-defense-zone-in-disputed-sea-gertz/: 404 Client Error: Not Found for url: https://atimes.com/2016/04/pentagon-trying-to-stop-chinese-air-defense-zone-in-disputed-sea-gertz/


Processing URLs:  93%|█████████▎| 930/1000 [53:55<02:27,  2.10s/it]

Error extracting text from http://www.econtalk.org/archives/2016/11/david_gelernter.html: 403 Client Error: Forbidden for url: http://www.econtalk.org/archives/2016/11/david_gelernter.html
Error extracting text from http://www.cdm.me/english/croatia-claims-montenegro-will-receive-nato-invitation-in-december: 403 Client Error: Forbidden for url: https://www.cdm.me/english/croatia-claims-montenegro-will-receive-nato-invitation-in-december


Processing URLs:  93%|█████████▎| 931/1000 [53:55<01:44,  1.51s/it]

Error extracting text from http://www.timesofisrael.com/kerry-advises-against-new-iran-sanctions-as-khamenei-backs-anti-us-candidates/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/kerry-advises-against-new-iran-sanctions-as-khamenei-backs-anti-us-candidates/


Processing URLs:  94%|█████████▎| 937/1000 [53:58<00:42,  1.49it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-pakistan-iran-idUSKBN0TT22S20151210: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-pakistan-iran-idUSKBN0TT22S20151210
Error extracting text from http://www.reuters.com/article/2015/12/01/usa-congress-transportation-idUSL1N13Q2VH20151201#XermMvUWQ8oAj95A.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/usa-congress-transportation-idUSL1N13Q2VH20151201#XermMvUWQ8oAj95A.97


Processing URLs:  94%|█████████▍| 938/1000 [53:59<00:31,  1.99it/s]

Error extracting text from https://www.reuters.com/article/uk-britain-scotland-devolution/johnson-raises-ire-of-scots-independence-seekers-with-devolution-disaster-comment-idUSKBN27X03D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-scotland-devolution/johnson-raises-ire-of-scots-independence-seekers-with-devolution-disaster-comment-idUSKBN27X03D


Processing URLs:  94%|█████████▍| 941/1000 [54:01<00:39,  1.48it/s]

Error extracting text from http://www.wsj.com/articles/brazils-president-dilma-rousseff-reiterates-that-she-wont-resign-1458677495: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-president-dilma-rousseff-reiterates-that-she-wont-resign-1458677495
URL filtered: http://www.bloomberg.com/news/articles/2016-03-22/panama-canal-expansion-seen-open-by-july-after-years-of-setbacks
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-mosul-rights-idUSKBN13500X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-mosul-rights-idUSKBN13500X


Processing URLs:  94%|█████████▍| 945/1000 [54:04<00:36,  1.51it/s]

URL filtered: https://twitter.com/hnajibullah?lang=en


Processing URLs:  95%|█████████▌| 954/1000 [54:45<04:21,  5.68s/it]

Error extracting text from http://www.nytimes.com/2016/09/24/us/politics/iran-embassy-state-department.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/24/us/politics/iran-embassy-state-department.html


Processing URLs:  96%|█████████▌| 956/1000 [54:47<02:22,  3.23s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-suppliers-idUSKCN0YB0CA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-suppliers-idUSKCN0YB0CA


Processing URLs:  96%|█████████▌| 958/1000 [54:50<01:43,  2.46s/it]

Error extracting text from https://www.uefa.com/uefachampionsleague/: 403 Client Error: Forbidden for url: https://www.uefa.com/uefachampionsleague/


Processing URLs:  96%|█████████▌| 960/1000 [54:51<00:55,  1.40s/it]

Error extracting text from https://www.yahoo.com/news/china-russia-stage-military-drills-china-sea-031204485.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/china-russia-stage-military-drills-china-sea-031204485.html


Processing URLs:  96%|█████████▌| 962/1000 [54:54<00:54,  1.44s/it]

URL filtered: https://twitter.com/HassanRouhani/status/691932041357697024


Processing URLs:  97%|█████████▋| 967/1000 [55:02<00:54,  1.66s/it]

Error extracting text from http://tass.ru/en/defense/886110: 404 Client Error: Not Found for url: https://tass.ru/en/defense/886110
Error extracting text from http://www.reuters.com/article/us-iran-election-ahmadinejad-idUSKBN17E23P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-ahmadinejad-idUSKBN17E23P


Processing URLs:  97%|█████████▋| 969/1000 [55:04<00:44,  1.45s/it]

Error extracting text from http://www.38north.org/2017/09/jchurch092117/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  97%|█████████▋| 970/1000 [55:05<00:41,  1.37s/it]

Error extracting text from http://www.who.int/bulletin/volumes/89/7/11-086173/en/: 404 Client Error: Not Found for url: https://www.who.int/bulletin/volumes/89/7/11-086173/en/


Processing URLs:  97%|█████████▋| 972/1000 [55:08<00:37,  1.35s/it]

Error extracting text from http://www.amazon.com/Our-Mathematical-Universe-Ultimate-Reality-ebook/dp/B00DXKJ2DA/ref=sr_1_1?s=books&ie=UTF8&qid=1449776360&sr=1-1&keywords=tegmark+my+mathematical: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Our-Mathematical-Universe-Ultimate-Reality-ebook/dp/B00DXKJ2DA/ref=sr_1_1?s=books&ie=UTF8&qid=1449776360&sr=1-1&keywords=tegmark+my+mathematical


Processing URLs:  97%|█████████▋| 973/1000 [55:08<00:30,  1.14s/it]

Error extracting text from http://generalfusion.com/downloads/ICC2008_MGL.pdf: 403 Client Error: Forbidden for url: http://generalfusion.com/downloads/ICC2008_MGL.pdf


Processing URLs:  98%|█████████▊| 976/1000 [55:12<00:27,  1.14s/it]

Error extracting text from https://fcw.com/articles/2016/01/21/ukraine-russia-cyber.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2016/01/21/ukraine-russia-cyber.aspx/
Error extracting text from http://www.reuters.com/article/colombia-oil-idUSN1E7A505O20111124: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/colombia-oil-idUSN1E7A505O20111124


Processing URLs:  98%|█████████▊| 978/1000 [55:14<00:22,  1.04s/it]

Error extracting text from http://uk.reuters.com/article/uk-brazil-corruption-plea-idUKKCN0WH1S0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  98%|█████████▊| 979/1000 [55:15<00:21,  1.02s/it]

Error extracting text from http://in.reuters.com/article/venezuela-politics-idINL1N1KL06H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  98%|█████████▊| 984/1000 [55:20<00:13,  1.16it/s]

Error extracting text from http://www.presstv.com/Detail/2015/12/07/440738/EUs-Tusk-expects-UK-deal-in-Feb-to-avoid-Brexit--: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2015/12/07/440738/EUs-Tusk-expects-UK-deal-in-Feb-to-avoid-Brexit--
Error extracting text from http://www.straitstimes.com/asia/east-asia/ban-ki-moon-takes-aim-at-presidential-races-front-runner: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  99%|█████████▊| 987/1000 [55:34<00:39,  3.06s/it]

Error extracting text from http://www.wsj.com/articles/bank-of-england-warns-vote-to-leave-eu-would-damage-u-k-growth-1463052576: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bank-of-england-warns-vote-to-leave-eu-would-damage-u-k-growth-1463052576


Processing URLs:  99%|█████████▉| 988/1000 [55:35<00:29,  2.45s/it]

Error extracting text from https://cleantechnica.com/2017/01/25/china-electric-car-sales-demolish-us-european-sales/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2017/01/25/china-electric-car-sales-demolish-us-european-sales/
Error extracting text from http://english.farsnews.com/newstext.aspx?nn=13940715001132: HTTPConnectionPool(host='english.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940715001132 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3054a6e70>: Failed to resolve 'english.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▉| 990/1000 [55:35<00:14,  1.41s/it]

URL filtered: https://www.youtube.com/watch?v=2k4V_LJTvqM


Processing URLs:  99%|█████████▉| 993/1000 [55:38<00:07,  1.13s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/asean-retracts-south/2871648.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/asean-retracts-south/2871648.html
URL filtered: http://www.thelocal.de/20160307/hard-right-afd-stuns-with-election-gains-in-frankfurt?utm_medium=twitter&amp;utm_source=twitterfeed


Processing URLs: 100%|██████████| 1000/1000 [55:44<00:00,  3.34s/it]
Processing URLs:   0%|          | 1/1000 [00:00<14:55,  1.12it/s]

Error extracting text from http://www.heritage.org/research/reports/2016/07/why-montenegro-should-be-welcomed-into-nato: 404 Client Error: Not Found for url: https://www.heritage.org/research/reports/2016/07/why-montenegro-should-be-welcomed-into-nato


Processing URLs:   1%|          | 6/1000 [00:06<16:21,  1.01it/s]

Error extracting text from http://www.france24.com/en/20160126-blair-calls-ground-troops-battle-islamic-state-group-syria-iraq: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160126-blair-calls-ground-troops-battle-islamic-state-group-syria-iraq


Processing URLs:   1%|          | 7/1000 [00:07<15:23,  1.08it/s]



Processing URLs:   1%|          | 8/1000 [00:08<14:43,  1.12it/s]

Error extracting text from http://www.precisionhawk.com/media/topic/faa-grants-precisionhawk-bvlos-waiver/: 404 Client Error: Not Found for url: https://www.precisionhawk.com/media/topic/faa-grants-precisionhawk-bvlos-waiver/


Processing URLs:   1%|          | 9/1000 [00:10<20:29,  1.24s/it]

Error extracting text from http://tass.ru/en/world/864368: 404 Client Error: Not Found for url: https://tass.ru/en/world/864368


Processing URLs:   1%|          | 10/1000 [00:10<16:32,  1.00s/it]

Error extracting text from http://www.caracaschronicles.com/2016/04/30/nicolas-maduro-political-genius/: 403 Client Error: Forbidden for url: http://www.caracaschronicles.com/2016/04/30/nicolas-maduro-political-genius/


Processing URLs:   1%|          | 11/1000 [00:12<22:20,  1.36s/it]

Error extracting text from https://gestion.pe/economia/planes-y-retos-venezuela-renegociar-su-deuda-externa-2203990: 404 Client Error: Not Found for url: https://gestion.pe/economia/planes-y-retos-venezuela-renegociar-su-deuda-externa-2203990/


Processing URLs:   1%|▏         | 13/1000 [00:15<22:01,  1.34s/it]

URL filtered: https://www.usnews.com/news/technology/articles/2017-10-04/russia-facebook-ads-targeted-more-than-two-states-senate-intelligence-chair


Processing URLs:   2%|▏         | 17/1000 [00:17<12:24,  1.32it/s]

Error extracting text from http://www.presstv.com/Detail/2016/02/22/451679/Iran-Italy-PM-Renzi-visit-Rouhani/: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/02/22/451679/Iran-Italy-PM-Renzi-visit-Rouhani/


Processing URLs:   3%|▎         | 27/1000 [00:40<22:07,  1.36s/it]

Error extracting text from http://www.wsj.com/articles/armed-with-statistics-cameron-struggles-to-win-hearts-in-brexit-debate-1465159864: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/armed-with-statistics-cameron-struggles-to-win-hearts-in-brexit-debate-1465159864


Processing URLs:   3%|▎         | 31/1000 [00:43<14:14,  1.13it/s]

Error extracting text from https://www.reuters.com/business/energy/russia-says-waits-germanys-nod-start-gas-sales-via-nord-stream-2-2021-09-09/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/russia-says-waits-germanys-nod-start-gas-sales-via-nord-stream-2-2021-09-09/


Processing URLs:   3%|▎         | 33/1000 [01:44<5:01:16, 18.69s/it]

Error extracting text from https://capital.com/south-korea-to-ratify-asia-pacific-free-trade-agreement: HTTPSConnectionPool(host='capital.com', port=443): Max retries exceeded with url: /south-korea-to-ratify-asia-pacific-free-trade-agreement (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30364f530>, 'Connection to capital.com timed out. (connect timeout=60)'))


Processing URLs:   4%|▎         | 37/1000 [02:06<2:07:02,  7.92s/it]

Error extracting text from https://www.globalhungerindex.org/ranking.html: 403 Client Error: Forbidden for url: https://www.globalhungerindex.org/ranking.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-mistura-idUSKBN15Y0EJ?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-mistura-idUSKBN15Y0EJ?mod=related&amp;channelName=worldNews
Error extracting text from https://www.france24.com/en/live-news/20210809-fighting-rages-in-afghan-south-after-taliban-s-weekend-blitz: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210809-fighting-rages-in-afghan-south-after-taliban-s-weekend-blitz


Processing URLs:   4%|▍         | 42/1000 [02:15<47:43,  2.99s/it]  

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/the-latest-some-conservatives-wont-oppose-health-bill/articleshow/57526521.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/the-latest-some-conservatives-wont-oppose-health-bill/articleshow/57526521.cms


Processing URLs:   5%|▍         | 49/1000 [02:26<24:16,  1.53s/it]

Error extracting text from http://thehill.com/policy/national-security/348281-trump-associate-boasts-russia-deal-will-get-donald-elected-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/348281-trump-associate-boasts-russia-deal-will-get-donald-elected-report/


Processing URLs:   5%|▌         | 51/1000 [02:28<16:37,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-naimi-idUSKBN12Z1C7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-naimi-idUSKBN12Z1C7


Processing URLs:   5%|▌         | 53/1000 [02:29<14:26,  1.09it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/328716-canada-announces-bill-to-fully-legalize-marijuana-report: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/328716-canada-announces-bill-to-fully-legalize-marijuana-report/


Processing URLs:   6%|▌         | 56/1000 [02:33<15:52,  1.01s/it]

Error extracting text from http://thehill.com/blogs/floor-action/senate/257385-senate-starts-fast-track-on-ex-im-reauthorization: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/257385-senate-starts-fast-track-on-ex-im-reauthorization/


Processing URLs:   6%|▌         | 60/1000 [02:37<13:53,  1.13it/s]

Error extracting text from http://www.wsj.com/articles/venezuelas-pdvsa-misses-404-million-payments-on-bonds-1479768989: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-pdvsa-misses-404-million-payments-on-bonds-1479768989


Processing URLs:   6%|▌         | 61/1000 [02:37<11:10,  1.40it/s]

Error extracting text from https://www.nytimes.com/2017/01/26/business/elon-musk-donald-trump-wall-street.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/26/business/elon-musk-donald-trump-wall-street.html?_r=0


Processing URLs:   6%|▋         | 64/1000 [02:40<14:06,  1.11it/s]

URL filtered: https://www.bloomberglaw.com/product/blaw/exp_blp/ewogICAgImN0eHQiOiAiRE9DIiwKICAgICJpZCI6ICJPV1VNNEE2S0xWUjU/cmVzb3VyY2VfaWQ9NzA2YWMxODM3NzYxZ


Processing URLs:   8%|▊         | 79/1000 [03:00<18:54,  1.23s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN199131: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN199131


Processing URLs:   8%|▊         | 80/1000 [03:02<23:24,  1.53s/it]

Error extracting text from http://english.aawsat.com/2014/02/article55329312/japan-seeks-paradigm-shift-in-relationship-with-saudi-arabia: 403 Client Error: Forbidden for url: http://english.aawsat.com/2014/02/article55329312/japan-seeks-paradigm-shift-in-relationship-with-saudi-arabia
URL filtered: https://www.bloomberg.co.jp/news/articles/2016-11-11/OGH8G96TTDS401


Processing URLs:   9%|▊         | 86/1000 [03:09<23:09,  1.52s/it]

Error extracting text from http://blog.heartland.org/2017/02/the-border-adjustment-tax-great-tax-reform-that-gets-us-great-trade-reform/: 403 Client Error: Forbidden for url: https://heartland.org/opinion/
Error extracting text from http://www.reuters.com/article/us-turkey-security-erdogan-idUSKCN11Z0XP?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-erdogan-idUSKCN11Z0XP?il=0


Processing URLs:   9%|▉         | 90/1000 [03:12<15:23,  1.02s/it]

Error extracting text from http://www.reuters.com/article/us-usa-healthcare/white-house-says-trump-opposes-senates-bipartisan-obamacare-deal-idUSKBN1CN18Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-healthcare/white-house-says-trump-opposes-senates-bipartisan-obamacare-deal-idUSKBN1CN18Z


Processing URLs:   9%|▉         | 93/1000 [03:28<53:17,  3.53s/it]

Error extracting text from http://www.wdef.com/2016/02/07/corker-condemns-north-korean-missile-launch/: 404 Client Error: Not Found for url: https://www.wdef.com/2016/02/07/corker-condemns-north-korean-missile-launch/


Processing URLs:  10%|▉         | 96/1000 [03:48<1:25:12,  5.66s/it]

Error extracting text from http://www.buenosairesherald.com/article/205214/cameron-points-to-‘brexit’-vote-next: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/205214/cameron-points-to-%E2%80%98brexit%E2%80%99-vote-next


Processing URLs:  10%|▉         | 98/1000 [03:51<50:39,  3.37s/it]  

Error extracting text from http://www.amazon.com/Meta-Math-Quest-Gregory-Chaitin-ebook/dp/B001M5JVM0/ref=sr_1_2?s=books&ie=UTF8&qid=1449776071&sr=1-2&keywords=gregory+chaitin: 500 Server Error:  for url: https://www.amazon.com/Meta-Math-Quest-Gregory-Chaitin-ebook/dp/B001M5JVM0/ref=sr_1_2?s=books&ie=UTF8&qid=1449776071&sr=1-2&keywords=gregory+chaitin


Processing URLs:  10%|█         | 105/1000 [04:15<34:02,  2.28s/it]  

Error extracting text from http://lta.reuters.com/article/businessNews/idLTAKBN1D700B-OUSLB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=lta


Processing URLs:  11%|█         | 106/1000 [04:17<30:55,  2.08s/it]

Error extracting text from http://mashable.com/2012/11/07/nate-silver-wins/: 404 Client Error: Not Found for url: https://mashable.com/2012/11/07/nate-silver-wins/


Processing URLs:  11%|█         | 107/1000 [05:17<4:50:55, 19.55s/it]

Error extracting text from http://www.newsroomamerica.com/story/521724.html: HTTPConnectionPool(host='www.newsroomamerica.com', port=80): Max retries exceeded with url: /story/521724.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x305df06b0>, 'Connection to www.newsroomamerica.com timed out. (connect timeout=60)'))


Processing URLs:  11%|█         | 108/1000 [05:18<3:25:26, 13.82s/it]

Error extracting text from http://news.softpedia.com/news/oilrig-cyber-espionage-campaign-targets-saudi-arabia-s-banks-and-defense-sector-504626.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/oilrig-cyber-espionage-campaign-targets-saudi-arabia-s-banks-and-defense-sector-504626.shtml


Processing URLs:  11%|█         | 110/1000 [05:19<1:43:41,  6.99s/it]

Error extracting text from http://news.yahoo.com/reports-hint-suu-kyi-could-become-myanmar-president-032901801.html: 404 Client Error: Not Found for url: http://news.yahoo.com/reports-hint-suu-kyi-could-become-myanmar-president-032901801.html


Processing URLs:  11%|█         | 111/1000 [05:19<1:13:30,  4.96s/it]

Error extracting text from https://www.nytimes.com/2018/01/27/world/europe/czech-election-milos-zeman.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/27/world/europe/czech-election-milos-zeman.html


Processing URLs:  11%|█▏        | 113/1000 [05:21<42:33,  2.88s/it]  

Error extracting text from https://global.handelsblatt.com/finance/e-u-backs-off-common-deposit-guarantee-622538: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/finance/e-u-backs-off-common-deposit-guarantee-622538


Processing URLs:  11%|█▏        | 114/1000 [05:36<1:38:32,  6.67s/it]

Error extracting text from https://www.wired.com/story/switzerlands-getting-a-delivery-network-for-blood-toting-drones/: 503 Server Error: Service Unavailable for url: https://www.wired.com/story/switzerlands-getting-a-delivery-network-for-blood-toting-drones/


Processing URLs:  12%|█▏        | 119/1000 [06:41<4:49:48, 19.74s/it]

Error extracting text from http://www.ftseglobalmarkets.com/news/european-parliament-study-outlines-the-%E2%82%AC170bn-cost-of-dismantling-the-schengen-system.html: HTTPConnectionPool(host='www.ftseglobalmarkets.com', port=80): Max retries exceeded with url: /news/european-parliament-study-outlines-the-%E2%82%AC170bn-cost-of-dismantling-the-schengen-system.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x305df2000>, 'Connection to www.ftseglobalmarkets.com timed out. (connect timeout=60)'))


Processing URLs:  12%|█▎        | 125/1000 [07:51<1:57:16,  8.04s/it]

URL filtered: https://www.facebook.com/pratap.raychaudhuri/posts/10155728182613601


Processing URLs:  13%|█▎        | 129/1000 [07:54<47:21,  3.26s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/yellen-consigning-stocks-fate-to-earnings-that-keep-disappearing


Processing URLs:  13%|█▎        | 134/1000 [08:00<23:29,  1.63s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/274932-us-uk-will-test-nuclear-plants-with-joint-cyberattacks: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/274932-us-uk-will-test-nuclear-plants-with-joint-cyberattacks/


Processing URLs:  14%|█▍        | 138/1000 [08:03<13:44,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/beef-up-sanctions-on-north-korea-1451928025: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/beef-up-sanctions-on-north-korea-1451928025


Processing URLs:  14%|█▍        | 143/1000 [08:16<28:34,  2.00s/it]

Error extracting text from http://www.c-span.org/video/?402918-1/state-department-briefing-secretary-state-john-kerry: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?402918-1/state-department-briefing-secretary-state-john-kerry
URL filtered: https://www.youtube.com/watch?v=VoegqRJKGE8


Processing URLs:  15%|█▌        | 154/1000 [08:29<16:01,  1.14s/it]

Error extracting text from https://apnews.com/article/927d8bf22e2a98d7b7835687c8cbaf3a: 404 Client Error: Not Found for url: https://apnews.com/article/927d8bf22e2a98d7b7835687c8cbaf3a


Processing URLs:  16%|█▌        | 160/1000 [08:40<23:03,  1.65s/it]

Error extracting text from https://jen.jiji.com/jc/i?g=eco&amp;k=2021051600412: HTTPSConnectionPool(host='jen.jiji.com', port=443): Max retries exceeded with url: /jc/i?g=eco&amp;k=2021051600412 (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:1000)')))


Processing URLs:  16%|█▋        | 163/1000 [08:45<21:13,  1.52s/it]

Error extracting text from http://www.latimes.com/nation/politics/trailguide/la-na-trailguide-12282015-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/politics/trailguide/la-na-trailguide-12282015-htmlstory.html


Processing URLs:  16%|█▋        | 164/1000 [08:48<26:40,  1.91s/it]

Error extracting text from http://www.sahistory.org.za/article/thabo-mbeki-resigns-south-africa%E2%80%99s-second-democratic-president: 404 Client Error: Not Found for url: https://www.sahistory.org.za/article/thabo-mbeki-resigns-south-africa%e2%80%99s-second-democratic-president


Processing URLs:  17%|█▋        | 167/1000 [08:52<17:16,  1.24s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16U0YZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16U0YZ


Processing URLs:  17%|█▋        | 168/1000 [08:54<23:45,  1.71s/it]

Error extracting text from http://38north.org/?p=7767: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  17%|█▋        | 173/1000 [09:04<20:36,  1.50s/it]

Error extracting text from http://www.wsj.com/articles/turkey-to-press-g-20-on-joint-responses-to-refugees-syrian-conflict-1447422004: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkey-to-press-g-20-on-joint-responses-to-refugees-syrian-conflict-1447422004


Processing URLs:  18%|█▊        | 176/1000 [09:11<25:20,  1.84s/it]

Error extracting text from https://www.pancanal.com/eng/pr/press-releases/2016/04/18/pr581.html: 403 Client Error: Forbidden for url: https://www.pancanal.com/eng/pr/press-releases/2016/04/18/pr581.html


Processing URLs:  18%|█▊        | 178/1000 [09:15<24:27,  1.78s/it]

Error extracting text from https://energycommerce.house.gov/news/press-release/next-week-subcommtech-hold-oversight-hearing-fcc-commissioners/: 404 Client Error: Not Found for url: https://energycommerce.house.gov/news/press-release/next-week-subcommtech-hold-oversight-hearing-fcc-commissioners


Processing URLs:  18%|█▊        | 179/1000 [09:16<24:09,  1.77s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-21/brazil-impeachment-upheaval-proves-to-be-a-boon-for-its-coffers


Processing URLs:  18%|█▊        | 182/1000 [09:18<15:19,  1.12s/it]

Error extracting text from https://homelandprepnews.com/countermeasures/18801-ratcliffe-introduces-bill-imposing-sanctions-iranian-cyber-attackers/: 502 Server Error: Bad Gateway for url: https://homelandprepnews.com/countermeasures/18801-ratcliffe-introduces-bill-imposing-sanctions-iranian-cyber-attackers/


Processing URLs:  18%|█▊        | 184/1000 [10:20<3:54:26, 17.24s/it]

Error extracting text from https://www.statslife.org.uk/social-sciences/1910-how-many-british-immigrants-are-there-in-other-people-s-countries: HTTPSConnectionPool(host='www.statslife.org.uk', port=443): Max retries exceeded with url: /social-sciences/1910-how-many-british-immigrants-are-there-in-other-people-s-countries (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3052c3590>, 'Connection to www.statslife.org.uk timed out. (connect timeout=60)'))


Processing URLs:  19%|█▊        | 187/1000 [10:22<1:30:00,  6.64s/it]

Error extracting text from https://www.nytimes.com/2022/03/30/world/europe/putin-advisers-ukraine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/03/30/world/europe/putin-advisers-ukraine.html
Error extracting text from http://www.reuters.com/article/us-britain-eu-negotiations-idUSKBN17M2HT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-negotiations-idUSKBN17M2HT


Processing URLs:  19%|█▉        | 188/1000 [10:22<1:04:10,  4.74s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.rdnews.com.br/blog-do-romilson/artigos/por-que-o-impeachment-nao-resolve/60537&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.rdnews.com.br/blog-do-romilson/artigos/por-que-o-impeachment-nao-resolve/60537&amp;prev=search


Processing URLs:  19%|█▉        | 192/1000 [10:25<20:27,  1.52s/it]  

Error extracting text from http://www.nytimes.com/2015/12/30/world/middleeast/haider-al-abadi-iraq-ramadi-isis.html?emc=edit_th_20151230&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=3: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/30/world/middleeast/haider-al-abadi-iraq-ramadi-isis.html?emc=edit_th_20151230&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=3
Error extracting text from http://www.reuters.com/article/us-china-defence-usa-exercises-idUSKCN0VY16P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-defence-usa-exercises-idUSKCN0VY16P


Processing URLs:  20%|█▉        | 195/1000 [10:29<18:02,  1.34s/it]

Error extracting text from https://slice.mit.edu/2017/11/06/drones-disappear-morse-corp/: HTTPSConnectionPool(host='slice.mit.edu', port=443): Max retries exceeded with url: /2017/11/06/drones-disappear-morse-corp/ (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  20%|█▉        | 196/1000 [10:30<17:29,  1.31s/it]

URL filtered: https://twitter.com/YouGov/status/745232927500410880
Error extracting text from https://www.reuters.com/article/us-usa-nuclear-trump/trump-u-s-to-exit-nuclear-treaty-citing-russian-violations-idUSKCN1MU0Z8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nuclear-trump/trump-u-s-to-exit-nuclear-treaty-citing-russian-violations-idUSKCN1MU0Z8


Processing URLs:  20%|█▉        | 199/1000 [10:34<17:13,  1.29s/it]

Error extracting text from http://www.ipsos.pe/sites/default/files/opinion_data/Opinion%20Data%20Marzo%202016.pdf: 404 Client Error: Not Found for url: https://www.ipsos.com/es-pe/sites/default/files/opinion_data/Opinion%20Data%20Marzo%202016.pdf


Processing URLs:  20%|██        | 202/1000 [10:35<10:34,  1.26it/s]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-botnet/russian-hacker-wanted-by-u-s-tells-court-he-worked-for-putins-party-idUSKCN1C32EP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-botnet/russian-hacker-wanted-by-u-s-tells-court-he-worked-for-putins-party-idUSKCN1C32EP
Error extracting text from https://www.france24.com/en/live-news/20210423-turkey-seeks-arrest-of-missing-crypto-boss-over-huge-fraud: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210423-turkey-seeks-arrest-of-missing-crypto-boss-over-huge-fraud


Processing URLs:  21%|██        | 207/1000 [10:43<19:16,  1.46s/it]

Error extracting text from http://www.news.com.au/finance/business/breaking-news/eyes-on-merkel-to-rebuild-eu-postbrexit/news-story/62367b4c9408bc351c707e9fe58e30f9: 404 Client Error: Not Found for url: https://www.news.com.au/404.php
URL filtered: http://www.youtube.com/watch?v=TRCUO7-lbUE&amp;sns=e


Processing URLs:  21%|██        | 210/1000 [10:45<13:31,  1.03s/it]

Error extracting text from https://www.cfr.org/global-conflict-tracker/conflict/conflict-ukraine).: 404 Client Error: Not Found for url: https://www.cfr.org/global-conflict-tracker/conflict/conflict-ukraine).


Processing URLs:  21%|██        | 212/1000 [10:47<11:30,  1.14it/s]

Error extracting text from https://www.middleeastmonitor.com/20160909-egyptian-inflation-rates-at-8-year-high/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20160909-egyptian-inflation-rates-at-8-year-high/


Processing URLs:  22%|██▏       | 215/1000 [10:50<13:27,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0X70GE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0X70GE


Processing URLs:  22%|██▏       | 216/1000 [10:51<11:54,  1.10it/s]

Error extracting text from http://seekingalpha.com/article/4013988-pdvsa-risks-bond-default-oil-markets-daily: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4013988-pdvsa-risks-bond-default-oil-markets-daily


Processing URLs:  23%|██▎       | 227/1000 [11:09<23:25,  1.82s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/china-raps-australia-fore/2519044.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/china-raps-australia-fore/2519044.html


Processing URLs:  23%|██▎       | 231/1000 [11:16<20:38,  1.61s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/278643: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/278643
URL filtered: https://twitter.com/RosieGray/status/693534925782327296


Processing URLs:  23%|██▎       | 234/1000 [11:20<16:17,  1.28s/it]

Error extracting text from https://www.wsj.com/: 403 Client Error: Forbidden for url: https://www.wsj.com/


Processing URLs:  24%|██▍       | 241/1000 [11:25<07:05,  1.78it/s]

Error extracting text from https://www.yahoo.com/news/military-political-obstacles-fight-iss-raqa-164941662.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/military-political-obstacles-fight-iss-raqa-164941662.html


Processing URLs:  24%|██▍       | 242/1000 [11:27<10:09,  1.24it/s]

Error extracting text from http://www.developereconomics.com/smartphone-market-share-usage-country-apr-may-2014/: 404 Client Error: Not Found for url: https://www.developernation.net/smartphone-market-share-usage-country-apr-may-2014/


Processing URLs:  24%|██▍       | 245/1000 [11:29<10:17,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-restrictions-idUSKBN0TY2L520151215: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-restrictions-idUSKBN0TY2L520151215


Processing URLs:  25%|██▍       | 249/1000 [11:33<08:30,  1.47it/s]

Error extracting text from https://www.nytimes.com/2017/05/24/us/politics/russia-trump-manafort-flynn.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/24/us/politics/russia-trump-manafort-flynn.html?_r=0
Error extracting text from http://www.reuters.com/article/us-russia-cenbank-cyberattack-idUSKBN13R1TO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-cenbank-cyberattack-idUSKBN13R1TO
URL filtered: https://twitter.com/SPMiles42/status/1330741626734604289


Processing URLs:  25%|██▌       | 254/1000 [11:38<10:21,  1.20it/s]

Error extracting text from http://www.greaterkashmir.com/news/world/story/250366.html: 403 Client Error: Forbidden for url: https://www.greaterkashmir.com/news/world/story/250366.html


Processing URLs:  26%|██▌       | 258/1000 [11:43<09:50,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-vote-idUSKCN0WH0H0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-vote-idUSKCN0WH0H0


Processing URLs:  26%|██▌       | 260/1000 [11:43<06:00,  2.05it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-talks-idUSKCN0V80Q2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-talks-idUSKCN0V80Q2


Processing URLs:  26%|██▌       | 262/1000 [11:46<14:20,  1.17s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2009_2014_06JAN.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Data&amp;Monitoring/Wild_poliovirus_list_2009_2014_06JAN.pdf


Processing URLs:  26%|██▋       | 263/1000 [11:47<13:57,  1.14s/it]

Error extracting text from http://www.toyotanewsroom.com/releases/tms-august-2016-sales-chart.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/tms-august-2016-sales-chart/


Processing URLs:  27%|██▋       | 268/1000 [12:01<30:42,  2.52s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/encuesta-ipsos-revisa-todos-cuadros-sondeo-nacional-2-noticia-1887817/14: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/encuesta-ipsos-revisa-todos-cuadros-sondeo-nacional-2-noticia-1887817/14/


Processing URLs:  27%|██▋       | 271/1000 [12:03<16:52,  1.39s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-02-08/covid-strain-that-swept-u-k-circulates-most-in-super-bowl-state
Error extracting text from http://www.wsj.com/articles/bayer-makes-takeover-approach-to-monsanto-1463622691: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bayer-makes-takeover-approach-to-monsanto-1463622691


Processing URLs:  27%|██▋       | 274/1000 [13:05<3:27:23, 17.14s/it]

Error extracting text from https://www.badgerloop.com/documents/TubeSpecs.pdf: HTTPSConnectionPool(host='www.badgerloop.com', port=443): Max retries exceeded with url: /documents/TubeSpecs.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe8431a0>, 'Connection to www.badgerloop.com timed out. (connect timeout=60)'))


Processing URLs:  28%|██▊       | 279/1000 [13:09<43:31,  3.62s/it]  

Error extracting text from https://www.reuters.com/article/us-usa-trump-pence-idUSKBN1AK10Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-pence-idUSKBN1AK10Z


Processing URLs:  28%|██▊       | 282/1000 [13:16<34:38,  2.89s/it]

URL filtered: https://www.youtube.com/watch?v=Jn9UsZPXbbw


Processing URLs:  28%|██▊       | 284/1000 [13:17<19:22,  1.62s/it]

Error extracting text from https://www.fbi.gov/news/testimony/assessing-russian-activities-and-intentions-in-recent-elections: 403 Client Error: Forbidden for url: https://www.fbi.gov/news/testimony/assessing-russian-activities-and-intentions-in-recent-elections


Processing URLs:  28%|██▊       | 285/1000 [13:17<16:24,  1.38s/it]

Error extracting text from https://nationalinterest.org/blog/buzz/russia-set-built-two-new-nuclear-powered-submarines-179667&#39: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/buzz/russia-set-built-two-new-nuclear-powered-submarines-179667&#39
Error extracting text from http://www.reuters.com/article/us-oil-meeting-iran-idUSKCN0XK056: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-meeting-iran-idUSKCN0XK056


Processing URLs:  29%|██▉       | 289/1000 [13:21<13:35,  1.15s/it]

Error extracting text from http://www.aljazeera.com/indepth/interactive/2015/05/syria-country-divided-150529144229467.html: 404 Client Error: Not Found for url: https://www.aljazeera.com/indepth/interactive/2015/05/syria-country-divided-150529144229467.html


Processing URLs:  29%|██▉       | 291/1000 [13:22<09:26,  1.25it/s]

Error extracting text from https://amp.france24.com/en/live-news/20211222-dubai-expo-sushi-restaurant-closes-after-staff-catch-covid: 403 Client Error: Forbidden for url: https://amp.france24.com/en/live-news/20211222-dubai-expo-sushi-restaurant-closes-after-staff-catch-covid
Error extracting text from http://www.arabianbusiness.com/iran-says-opec-emergency-meeting-may-stop-oil-price-slide-603655.html: 403 Client Error: HTTP Forbidden for url: https://www.arabianbusiness.com/iran-says-opec-emergency-meeting-may-stop-oil-price-slide-603655.html


Processing URLs:  30%|███       | 302/1000 [13:37<17:04,  1.47s/it]

URL filtered: https://twitter.com/ianbateson/status/922428384222892032


Processing URLs:  30%|███       | 305/1000 [13:38<11:31,  1.00it/s]

Error extracting text from https://www.newsweek.com/ukraines-zelenskyy-visit-biden-august-weeks-after-nord-stream-2-deal-made-1611980: 403 Client Error: Forbidden for url: https://www.newsweek.com/ukraines-zelenskyy-visit-biden-august-weeks-after-nord-stream-2-deal-made-1611980
URL filtered: https://www.bloomberg.com/news/articles/2016-06-22/musk-says-solarcity-deal-about-synergy-but-it-may-be-about-debt


Processing URLs:  31%|███       | 309/1000 [13:46<15:49,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-poland-eu-parliament-idUSKCN0XA1BH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-eu-parliament-idUSKCN0XA1BH


Processing URLs:  31%|███       | 311/1000 [14:46<2:18:00, 12.02s/it]

Error extracting text from https://www.seattletimes.com/business/apnewsbreak-georgia-election-server-wiped-after-suit-filed/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)
Error extracting text from http://www.latimes.com/business/autos/la-fi-hy-tesla-delivery-production-3q-musk-20161002-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/autos/la-fi-hy-tesla-delivery-production-3q-musk-20161002-snap-story.html


Processing URLs:  31%|███       | 312/1000 [14:47<1:40:15,  8.74s/it]

Error extracting text from https://www.timesofisrael.com/liveblog_entry/400-russian-mercenaries-sent-to-kyiv-to-assassinate-zelensky-report/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/liveblog_entry/400-russian-mercenaries-sent-to-kyiv-to-assassinate-zelensky-report/


Processing URLs:  31%|███▏      | 314/1000 [14:48<56:06,  4.91s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-01-03/saudis-cut-diplomatic-ties-with-iran-foreign-minister-says-iiyzvsw5


Processing URLs:  32%|███▏      | 316/1000 [14:49<31:24,  2.76s/it]

Error extracting text from https://www.yahoo.com/news/debt-sanctions-disrepair-venezuelas-oil-sector-agony-054624847.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/debt-sanctions-disrepair-venezuelas-oil-sector-agony-054624847.html
URL filtered: https://twitter.com/NateSilver538/status/689318773506084864


Processing URLs:  32%|███▏      | 319/1000 [14:52<23:38,  2.08s/it]

Error extracting text from http://www.theweek.co.uk/oil-price/60838/can-the-oil-price-recovery-be-maintained: 404 Client Error: Not Found for url: https://theweek.com/oil-price/60838/can-the-oil-price-recovery-be-maintained


Processing URLs:  32%|███▏      | 320/1000 [14:53<20:47,  1.83s/it]

URL filtered: http://pro.boxoffice.com/facebook/today/


Processing URLs:  33%|███▎      | 327/1000 [15:01<13:43,  1.22s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/photo-reports/2016-01/01/content_6840094.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/photo-reports/2016-01/01/content_6840094.htm


Processing URLs:  33%|███▎      | 328/1000 [15:02<13:32,  1.21s/it]

URL filtered: https://www.washingtonpost.com/news/politics/wp/2018/03/19/everything-you-need-to-know-about-the-cambridge-analytica-facebook-debacle/?utm_term=.96ad1d7b21f5


Processing URLs:  33%|███▎      | 332/1000 [15:16<32:46,  2.94s/it]

Error extracting text from https://www.nytimes.com/2017/10/15/world/africa/somalia-bombing-mogadishu.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/15/world/africa/somalia-bombing-mogadishu.html


Processing URLs:  34%|███▎      | 336/1000 [15:19<16:17,  1.47s/it]

Error extracting text from http://www.nytimes.com/2016/12/13/world/middleeast/syria-aleppo-civilians.html?ref=middleeast: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/13/world/middleeast/syria-aleppo-civilians.html?ref=middleeast


Processing URLs:  34%|███▍      | 340/1000 [15:26<16:23,  1.49s/it]

Error extracting text from http://uk.reuters.com/article/uk-afghanistan-ministers-idUKKCN0X607Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  35%|███▍      | 347/1000 [15:34<07:46,  1.40it/s]

Error extracting text from http://news.yahoo.com/irans-president-says-wants-resolve-saudi-tensions-135406359.html?soc_src=mediacontentsharebuttons&amp;soc_trk=tw: 404 Client Error: Not Found for url: http://news.yahoo.com/irans-president-says-wants-resolve-saudi-tensions-135406359.html?soc_src=mediacontentsharebuttons&amp;soc_trk=tw
Error extracting text from https://openai.com/the-international/: 404 Client Error: Not Found for url: https://openai.com/the-international/


Processing URLs:  35%|███▌      | 350/1000 [15:37<09:25,  1.15it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/13/saturday-night-is-all-right/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/13/saturday-night-is-all-right/


Processing URLs:  35%|███▌      | 352/1000 [15:39<08:23,  1.29it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-24/russian-news-agency-interfax-faces-unprecedented-hacker-attack


Processing URLs:  36%|███▌      | 356/1000 [15:43<10:46,  1.00s/it]

Error extracting text from http://www.nytimes.com/2016/07/02/us/politics/donald-trump-vice-president.html?smprod=nytcore-ipad&amp;smid=nytcore-ipad-share: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/02/us/politics/donald-trump-vice-president.html?smprod=nytcore-ipad&amp;smid=nytcore-ipad-share


Processing URLs:  36%|███▌      | 357/1000 [15:45<12:09,  1.13s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Choppy-waters-near-Subi-Reef?page=2: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Choppy-waters-near-Subi-Reef?page=2


Processing URLs:  36%|███▌      | 361/1000 [15:52<15:02,  1.41s/it]

Error extracting text from https://www.lloyds.com/news-and-insight/press-centre/speeches/2016/02/the-implications-of-brexit-for-the-london-insurance-market: HTTPSConnectionPool(host='www.lloyds.com', port=443): Max retries exceeded with url: /news-and-insight/press-centre/speeches/2016/02/the-implications-of-brexit-for-the-london-insurance-market (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  36%|███▋      | 364/1000 [15:56<13:28,  1.27s/it]

Error extracting text from http://www.dispatch.com/content/stories/local/2016/08/30/democratic-group-begins-canceling-ads-supporting-ted-strickland.html: 404 Client Error: OK for url: https://www.dispatch.com/content/stories/local/2016/08/30/democratic-group-begins-canceling-ads-supporting-ted-strickland.html
URL filtered: https://www.bloombergquint.com/politics/2017/04/18/squabble-over-u-k-s-brexit-bill-easing-ireland-s-noonan-says


Processing URLs:  37%|███▋      | 369/1000 [16:00<10:34,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-china-idUSKBN17Y0VF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-china-idUSKBN17Y0VF
URL filtered: http://www.bloomberg.com/news/articles/2016-02-22/venezuela-s-descent-into-world-s-riskiest-sovereign-credit-q-a


Processing URLs:  37%|███▋      | 372/1000 [16:01<06:32,  1.60it/s]

Error extracting text from http://www.technologyreview.com/news/543181/crispr-gene-editing-to-be-tested-on-people-by-2017-says-editas/: 404 Client Error: Not Found for url: https://www.technologyreview.com/news/543181/crispr-gene-editing-to-be-tested-on-people-by-2017-says-editas/
Error extracting text from http://www.reuters.com/article/us-burundi-politics-idUSKCN0XB140: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-politics-idUSKCN0XB140


Processing URLs:  38%|███▊      | 376/1000 [16:04<07:49,  1.33it/s]

URL filtered: https://www.youtube.com/watch?v=aQjw2BVRVg8


Processing URLs:  38%|███▊      | 379/1000 [16:06<06:34,  1.58it/s]

Error extracting text from https://www.reuters.com/article/uk-britain-eu-progress/eu-could-give-the-nod-next-week-to-trade-talks-with-britain-idUSKBN1DV60P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-eu-progress/eu-could-give-the-nod-next-week-to-trade-talks-with-britain-idUSKBN1DV60P


Processing URLs:  38%|███▊      | 380/1000 [17:06<2:44:57, 15.96s/it]

Error extracting text from http://bit.ly/2pI7BEn: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://twitter.com/jairbolsonaro/status/1416044385972723714


Processing URLs:  39%|███▉      | 388/1000 [17:16<28:59,  2.84s/it]  

Error extracting text from https://www.oecd.org/els/family/CO_4_2_Participation_first_time_voters.pdf: 410 Client Error: Gone for url: https://www.oecd.org/els/family/CO_4_2_Participation_first_time_voters.pdf


Processing URLs:  39%|███▉      | 393/1000 [17:25<19:54,  1.97s/it]

Error extracting text from http://www.afp.com/en/news/us-philippine-war-games-begin-china-warns-outsiders: 404 Client Error: Not Found for url: https://www.afp.com/en/news/us-philippine-war-games-begin-china-warns-outsiders


Processing URLs:  39%|███▉      | 394/1000 [17:27<17:45,  1.76s/it]

Error extracting text from http://af.reuters.com/article/commoditiesNews/idAFL8N13M1XA20151127: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  40%|███▉      | 397/1000 [17:30<13:57,  1.39s/it]

Error extracting text from https://www.indepthnews.net/index.php/opinion/4099-asean-holds-the-reins-in-the-south-china-sea: 412 Client Error: Precondition Failed for url: https://www.indepthnews.net/index.php/opinion/4099-asean-holds-the-reins-in-the-south-china-sea


Processing URLs:  40%|███▉      | 398/1000 [17:31<10:32,  1.05s/it]

Error extracting text from http://www.cdm.me/english/djukanovic-port-of-bar-and-railroad-for-better-connection-of-the-region-and-evaluation-of-resources: 403 Client Error: Forbidden for url: https://www.cdm.me/english/djukanovic-port-of-bar-and-railroad-for-better-connection-of-the-region-and-evaluation-of-resources


Processing URLs:  40%|███▉      | 399/1000 [17:35<19:31,  1.95s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-06-01/china-is-making-cryptocurrency-to-challenge-bitcoin-and-dollar
URL filtered: https://www.bloomberg.com/news/articles/2017-07-19/the-manhattan-of-venezuela-parties-against-a-backdrop-of-crisis


Processing URLs:  40%|████      | 402/1000 [17:36<11:41,  1.17s/it]

Error extracting text from https://bit.ly/3uoYw4y: 406 Client Error: Not Acceptable for url: https://hungarianspectrum.org/2021/05/23/sebastian-kurzs-falling-star/


Processing URLs:  40%|████      | 404/1000 [17:42<18:05,  1.82s/it]

Error extracting text from http://www.valuewalk.com/2015/10/u-s-china-face-off-in-south-china-sea/: 404 Client Error: Not Found for url: https://www.valuewalk.com/u-s-china-face-off-in-south-china-sea


Processing URLs:  40%|████      | 405/1000 [17:43<17:51,  1.80s/it]

Error extracting text from http://www.crows.org/: 403 Client Error: Forbidden for url: http://www.crows.org/


Processing URLs:  41%|████      | 408/1000 [17:49<16:20,  1.66s/it]

Error extracting text from https://www.stripes.com/news/afghanistan-s-opium-production-rises-as-taliban-gain-ground-1.497937: 404 Client Error: Not Found for url: https://www.stripes.com/news/afghanistan-s-opium-production-rises-as-taliban-gain-ground-1.497937


Processing URLs:  41%|████      | 412/1000 [17:56<16:25,  1.68s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-03/honeywell-warns-of-increasing-attacks-by-state-sponsored-hackers


Processing URLs:  42%|████▏     | 415/1000 [18:00<15:15,  1.56s/it]

Error extracting text from https://www.pardonsnowden.org/: HTTPSConnectionPool(host='www.pardonsnowden.org', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30398b710>: Failed to resolve 'www.pardonsnowden.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 417/1000 [18:05<17:40,  1.82s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://folhanobre.com.br/2016/03/12/lava-jato-pf-recebe-manuscrito-que-liga-dilma-ao-doleiro-youssef/24369&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://folhanobre.com.br/2016/03/12/lava-jato-pf-recebe-manuscrito-que-liga-dilma-ao-doleiro-youssef/24369&amp;prev=search


Processing URLs:  42%|████▏     | 418/1000 [18:06<14:45,  1.52s/it]

Error extracting text from http://www.cnbc.com/2017/02/02/us-tax-plan-would-break-wto-rules-lawyers-say.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/02/02/us-tax-plan-would-break-wto-rules-lawyers-say.html


Processing URLs:  42%|████▏     | 420/1000 [18:07<11:11,  1.16s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/photo-reports/2015-11/25/content_6785068.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/photo-reports/2015-11/25/content_6785068.htm


Processing URLs:  42%|████▏     | 422/1000 [18:30<52:55,  5.49s/it]  

URL filtered: https://www.bloombergquint.com/onweb/reddit-fueled-traders-trigger-volatility-halts-across-the-market


Processing URLs:  43%|████▎     | 426/1000 [18:51<1:02:44,  6.56s/it]

Error extracting text from https://www.learnreligions.com/why-buddhist-monks-and-nuns-shave-their-heads-449598: 406 Client Error: Not Acceptable for url: https://www.learnreligions.com/why-buddhist-monks-and-nuns-shave-their-heads-449598


Processing URLs:  43%|████▎     | 428/1000 [18:53<37:22,  3.92s/it]  

Error extracting text from https://jsis.washington.edu/news/north-korea-cyber-attacks-new-asymmetrical-military-strategy/: HTTPSConnectionPool(host='jsis.washington.edu', port=443): Max retries exceeded with url: /news/north-korea-cyber-attacks-new-asymmetrical-military-strategy/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  43%|████▎     | 429/1000 [18:54<30:55,  3.25s/it]

Error extracting text from http://www.middle-east-online.com/english/?id=77046: 404 Client Error: Not Found for url: https://www.middle-east-online.com/english/?id=77046


Processing URLs:  44%|████▎     | 435/1000 [19:04<14:40,  1.56s/it]

Error extracting text from http://www.businessinsider.com/r-mercedes-benz-to-offer-electric-option-for-every-car-by-2022-2017-9?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-mercedes-benz-to-offer-electric-option-for-every-car-by-2022-2017-9?IR=T


Processing URLs:  45%|████▌     | 450/1000 [19:51<18:35,  2.03s/it]  

Error extracting text from http://www.reuters.com/article/2015/10/01/us-usa-oilexports-senate-idUSKCN0RV4ZQ20151001: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/01/us-usa-oilexports-senate-idUSKCN0RV4ZQ20151001
URL filtered: http://www.thedailybeast.com/russians-appear-to-use-facebook-to-push-pro-trump-flash-mobs-in-florida


Processing URLs:  46%|████▌     | 458/1000 [20:01<08:47,  1.03it/s]

URL filtered: https://www.youtube.com/watch?v=rzE9c8RW0Yk&amp;feature=youtu.be
URL filtered: http://www.middleeasteye.net/news/muslims-twitter-explain-why-theyre-too-busy-answer-call-rise-535380720
Error extracting text from http://www.nytimes.com/2015/10/29/world/middleeast/syria-talks-vienna-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/29/world/middleeast/syria-talks-vienna-iran.html
URL filtered: https://www.youtube.com/watch?v=rMz7JBRbmNo&amp;t=68s


Processing URLs:  46%|████▌     | 460/1000 [21:01<1:37:23, 10.82s/it]

Error extracting text from https://www.usnews.com/news/world-report/articles/2021-03-03/iraqi-airbase-housing-us-troops-attacked-by-rockets-iran-militia-believed-involved: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  46%|████▌     | 461/1000 [21:02<1:20:02,  8.91s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O2117L6JTSES01-42UTAI3S5R9E0TCCV44GMM7NF1


Processing URLs:  46%|████▋     | 463/1000 [21:03<52:03,  5.82s/it]  

Error extracting text from http://sandtonchronicle.co.za/164913/from-the-horses-mouth-2: 403 Client Error: Forbidden for url: https://sandtonchronicle.co.za/164913/from-the-horses-mouth-2


Processing URLs:  47%|████▋     | 467/1000 [21:13<30:11,  3.40s/it]

Error extracting text from http://www.defensenews.com/story/defense/policy-budget/warfare/2015/11/30/uae-says-ready-commit-troops-fight-syria-jihadists/76572630/: 404 Client Error: Not Found for url: https://www.defensenews.com/story/defense/policy-budget/warfare/2015/11/30/uae-says-ready-commit-troops-fight-syria-jihadists/76572630/


Processing URLs:  47%|████▋     | 469/1000 [21:33<1:05:26,  7.39s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=https://www.tribunadabahia.com.br/2016/02/23/oposicao-cria-comite-para-acelerar-impeachment-da-presidente-dilma&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=https://www.tribunadabahia.com.br/2016/02/23/oposicao-cria-comite-para-acelerar-impeachment-da-presidente-dilma&amp;prev=search


Processing URLs:  47%|████▋     | 472/1000 [21:35<29:17,  3.33s/it]  

Error extracting text from http://www.reuters.com/article/2015/10/07/us-brazil-rousseff-idUSKCN0S124S20151007: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/07/us-brazil-rousseff-idUSKCN0S124S20151007


Processing URLs:  48%|████▊     | 477/1000 [21:40<11:49,  1.36s/it]

Error extracting text from http://www.reuters.com/article/2015/11/10/us-eurozone-greece-economy-idUSKCN0SZ1DZ20151110#eJwAuW6qkV7B1UBS.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/us-eurozone-greece-economy-idUSKCN0SZ1DZ20151110#eJwAuW6qkV7B1UBS.97
Error extracting text from http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html?nlid=42208600&amp;src=recpb&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html?nlid=42208600&amp;src=recpb&amp;_r=0


Processing URLs:  48%|████▊     | 481/1000 [21:47<12:03,  1.39s/it]

Error extracting text from http://www.mod.gov.me/en/news/153059/Pentagon-Full-support-of-USA-for-the-invitation-to-NATO-membership-on-December-s-meeting-of-the-NATO-ministers-of-foreign-affair.html: HTTPConnectionPool(host='www.mod.gov.me', port=80): Max retries exceeded with url: /en/news/153059/Pentagon-Full-support-of-USA-for-the-invitation-to-NATO-membership-on-December-s-meeting-of-the-NATO-ministers-of-foreign-affair.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff932a50>: Failed to resolve 'www.mod.gov.me' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 483/1000 [22:48<2:36:23, 18.15s/it]

Error extracting text from http://www.barnesandnoble.com/w/language-thought-and-reality-benjamin-l-whorf/1100525074?ean=9781614270720&amp;st=PLA&amp;sid=BNB_DRS_Core+Shopping+Books_00000000&amp;2sid=Google_&amp;sourceId=PLGoP648&amp;k_clickid=3x648: HTTPConnectionPool(host='www.barnesandnoble.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  48%|████▊     | 485/1000 [22:48<1:19:10,  9.22s/it]

Error extracting text from http://www.nytimes.com/2016/03/26/world/europe/belgium-fears-nuclear-plants-are-vulnerable.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/26/world/europe/belgium-fears-nuclear-plants-are-vulnerable.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0
URL filtered: https://www.bloomberg.com/news/articles/2021-01-28/ethiopia-moves-artillery-to-sudanese-border-after-deadly-clashes


Processing URLs:  49%|████▉     | 488/1000 [22:49<34:03,  3.99s/it]  

Error extracting text from https://www.nytimes.com/2017/11/17/opinion/sunday/escape-roy-moores-evangelicalism.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/17/opinion/sunday/escape-roy-moores-evangelicalism.html


Processing URLs:  49%|████▉     | 490/1000 [22:52<23:02,  2.71s/it]

Error extracting text from https://bit.ly/3eVaOfZ: 403 Client Error: Forbidden for url: https://clubofmozambique.com/news/mozambique-president-nyusi-says-army-gaining-ground-in-insurgency-hit-region-watch-197451/


Processing URLs:  49%|████▉     | 491/1000 [22:52<17:07,  2.02s/it]

Error extracting text from https://www.nytimes.com/2017/07/17/world/asia/north-korea-south-military-talks.html?emc=edit_th_20170718&amp;nl=todaysheadlines&amp;nlid=70183565: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/17/world/asia/north-korea-south-military-talks.html?emc=edit_th_20170718&amp;nl=todaysheadlines&amp;nlid=70183565


Processing URLs:  50%|████▉     | 497/1000 [23:05<13:50,  1.65s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/279013-cyprus-central-bank-hit-with-cyberattack-days-after-anonymous-pledge: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/279013-cyprus-central-bank-hit-with-cyberattack-days-after-anonymous-pledge/


Processing URLs:  50%|█████     | 501/1000 [23:09<08:36,  1.03s/it]

Error extracting text from http://atimes.com/2015/09/the-china-challenge-island-building-a-military-threat-in-the-south-china-sea/: 404 Client Error: Not Found for url: https://atimes.com/2015/09/the-china-challenge-island-building-a-military-threat-in-the-south-china-sea/
Error extracting text from https://www.reuters.com/article/us-ukraine-murder/wife-of-chechen-accused-of-putin-assassination-plot-shot-dead-near-kiev-idUSKBN1CZ2J4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-murder/wife-of-chechen-accused-of-putin-assassination-plot-shot-dead-near-kiev-idUSKBN1CZ2J4


Processing URLs:  50%|█████     | 503/1000 [23:10<07:55,  1.05it/s]

Error extracting text from http://www.sfgate.com/business/article/5-big-tech-stocks-build-market-euphoria-and-11203663.php: 403 Client Error: Forbidden for url: https://www.sfgate.com/business/article/5-big-tech-stocks-build-market-euphoria-and-11203663.php


Processing URLs:  51%|█████     | 508/1000 [23:15<07:14,  1.13it/s]

Error extracting text from http://thehill.com/homenews/355444-house-roger-stone-complied-with-house-investigators-on-identity-of-wikileaks-contact: 403 Client Error: Forbidden for url: https://thehill.com/homenews/355444-house-roger-stone-complied-with-house-investigators-on-identity-of-wikileaks-contact/


Processing URLs:  51%|█████     | 511/1000 [23:20<11:00,  1.35s/it]

Error extracting text from https://jonrappoport.wordpress.com/2016/08/15/orwellian-ca-bill-reporters-cant-post-undercover-videos/: 410 Client Error: Gone for url: https://jonrappoport.wordpress.com/2016/08/15/orwellian-ca-bill-reporters-cant-post-undercover-videos/


Processing URLs:  51%|█████     | 512/1000 [23:21<08:53,  1.09s/it]

Error extracting text from http://papalvisit.americamedia.org/2015/09/21/biden-interview/: 404 Client Error: Not Found for url: http://papalvisit.americamedia.org/2015/09/21/biden-interview/


Processing URLs:  51%|█████▏    | 513/1000 [23:21<07:33,  1.07it/s]

Error extracting text from http://thehill.com/policy/finance/263177-spending-negotiators-see-11t-deal-in-sight: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/263177-spending-negotiators-see-11t-deal-in-sight/


Processing URLs:  52%|█████▏    | 516/1000 [23:22<03:41,  2.19it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-bill/2040/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-bill/2040/text
Error extracting text from http://www.nytimes.com/2015/12/03/business/economy/janet-yellen-federal-reserve-interest-rates.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/business/economy/janet-yellen-federal-reserve-interest-rates.html


Processing URLs:  52%|█████▏    | 519/1000 [23:25<04:57,  1.61it/s]

Error extracting text from http://www.crisis.acleddata.com/ethiopia-march-2016-update/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /ethiopia-march-2016-update/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3037751f0>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.oecd.org/economy/oecd-forecasts-during-and-after-the-financial-crisis-a-post-mortem.htm: 403 Client Error: Forbidden for url: https://www.oecd.org/economy/oecd-forecasts-during-and-after-the-financial-crisis-a-post-mortem.htm


Processing URLs:  52%|█████▏    | 520/1000 [23:26<05:58,  1.34it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-28/charting-the-markets-china-falls-again-defying-gains-in-other-emerging-markets


Processing URLs:  52%|█████▎    | 525/1000 [23:28<03:57,  2.00it/s]

URL filtered: https://www.youtube.com/watch?v=6337m_1x8so
Error extracting text from http://www.reuters.com/article/us-tesla-seats/teslas-seat-strategy-goes-against-the-grain-for-now-idUSKBN1CV0DS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-seats/teslas-seat-strategy-goes-against-the-grain-for-now-idUSKBN1CV0DS


Processing URLs:  53%|█████▎    | 530/1000 [23:32<04:30,  1.74it/s]

Error extracting text from http://www.reuters.com/article/us-nigeria-festival-idUSKCN11R0ZO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-festival-idUSKCN11R0ZO?il=0
Error extracting text from https://www.khaama.com/ghani-assigns-stanikzai-as-acting-nds-chief-abdullah-khan-acting-defense-minister-0852: 403 Client Error: Forbidden for url: https://www.khaama.com/ghani-assigns-stanikzai-as-acting-nds-chief-abdullah-khan-acting-defense-minister-0852


Processing URLs:  53%|█████▎    | 533/1000 [23:35<05:22,  1.45it/s]

Error extracting text from http://www.nytimes.com/2016/12/02/us/politics/trump-speaks-with-taiwans-leader-a-possible-affront-to-china.html?emc=edit_na_20161202&amp;nlid=70183565&amp;ref=headline&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/02/us/politics/trump-speaks-with-taiwans-leader-a-possible-affront-to-china.html?emc=edit_na_20161202&amp;nlid=70183565&amp;ref=headline&amp;_r=0


Processing URLs:  54%|█████▎    | 535/1000 [23:36<04:52,  1.59it/s]

Error extracting text from https://ec.europa.eu/commission/sites/beta-political/files/1_en_act_communication.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/commission/sites/beta-political/files/1_en_act_communication.pdf
Error extracting text from http://www.reuters.com/article/us-swiss-eu-idUSKCN12Q1OH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-swiss-eu-idUSKCN12Q1OH
Error extracting text from https://www.reuters.com/article/us-alphabet-uber-lawsuit/exclusive-alphabets-waymo-demanded-1-billion-in-settlement-talks-with-uber-sources-idUSKBN1CH0QC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-alphabet-uber-lawsuit/exclusive-alphabets-waymo-demanded-1-billion-in-settlement-talks-with-uber-sources-idUSKBN1CH0QC


Processing URLs:  54%|█████▍    | 544/1000 [23:55<16:25,  2.16s/it]

URL filtered: http://www.bloomberg.com/politics/graphics/2016-delegate-tracker/


Processing URLs:  55%|█████▍    | 546/1000 [23:57<12:51,  1.70s/it]

Error extracting text from http://en.trend.az/iran/politics/2466025.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2466025.html


Processing URLs:  55%|█████▌    | 551/1000 [24:03<09:34,  1.28s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/china-names-more-firms-in-car-scandal: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  55%|█████▌    | 554/1000 [24:07<09:08,  1.23s/it]

Error extracting text from https://lawstreetmedia.com/blogs/cannabis-in-america/canada-recreational-marijuana/: 404 Client Error: Not Found for url: https://lawstreetmedia.com/blogs/cannabis-in-america/canada-recreational-marijuana/


Processing URLs:  56%|█████▌    | 556/1000 [24:11<10:59,  1.48s/it]

Error extracting text from http://www.malaya.com.ph/business-news/business/ph-push-rcep-deal-while-asean-chair: 404 Client Error: Not Found for url: https://malaya.com.ph/business-news/business/ph-push-rcep-deal-while-asean-chair


Processing URLs:  56%|█████▌    | 558/1000 [24:13<09:21,  1.27s/it]

Error extracting text from https://www.humboldtforum.org/en/programm/termin/opening/finally-open-24469/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/en/programm/termin/opening/finally-open-24469/


Processing URLs:  57%|█████▋    | 566/1000 [24:27<08:20,  1.15s/it]

Error extracting text from http://www.emergingmarkets.org/Article/3544434/China-faces-Catch-22-dilemma-over-Venezuela-debt-pile-as-default-looms.html: 404 Client Error: Not Found for url: http://www.emergingmarkets.org/Article/3544434/China-faces-Catch-22-dilemma-over-Venezuela-debt-pile-as-default-looms.html
Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN16G1O9?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN16G1O9?il=0


Processing URLs:  57%|█████▋    | 567/1000 [24:27<06:52,  1.05it/s]

Error extracting text from https://www.amnesty.org.nz/saving-young-lives-execution-iran: 404 Client Error: Not Found for url: https://amnesty.org.nz/saving-young-lives-execution-iran


Processing URLs:  57%|█████▋    | 568/1000 [24:28<06:18,  1.14it/s]

Error extracting text from https://www.fao.org/worldfoodsituation/foodpricesindex.: 404 Client Error: Not Found for url: https://www.fao.org/worldfoodsituation/foodpricesindex.
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-iran-idUSKBN15Q0CR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-iran-idUSKBN15Q0CR


Processing URLs:  57%|█████▋    | 570/1000 [24:33<12:15,  1.71s/it]

Error extracting text from https://www.reuters.com/article/us-northkorea-southkorea/north-korea-agrees-to-talks-after-u-s-south-korea-postpone-military-drills-idUSKBN1EU06O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-southkorea/north-korea-agrees-to-talks-after-u-s-south-korea-postpone-military-drills-idUSKBN1EU06O


Processing URLs:  57%|█████▋    | 573/1000 [24:36<09:32,  1.34s/it]

Error extracting text from https://www.yahoo.com/finance/news/china-urges-calm-north-korea-061054029.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/china-urges-calm-north-korea-061054029.html


Processing URLs:  57%|█████▋    | 574/1000 [24:37<09:29,  1.34s/it]

Error extracting text from http://www.cdc.gov/zika/public-health-partners/risk-based-prep.html: 404 Client Error: Not Found for url: https://www.cdc.gov/zika/public-health-partners/risk-based-prep.html


Processing URLs:  58%|█████▊    | 577/1000 [24:43<09:54,  1.40s/it]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-idUSL1N1CY1H3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-idUSL1N1CY1H3


Processing URLs:  58%|█████▊    | 579/1000 [25:00<30:30,  4.35s/it]

Error extracting text from https://www.investopedia.com/articles/pf/09/avoid-five-recession-risks.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/pf/09/avoid-five-recession-risks.asp


Processing URLs:  58%|█████▊    | 580/1000 [25:01<26:04,  3.72s/it]

Error extracting text from http://www.accuweather.com/en/weather-news/new-hampshire-primary-weather-forecast-cold-snow/55210739: 403 Client Error: Forbidden for url: http://www.accuweather.com/en/weather-news/new-hampshire-primary-weather-forecast-cold-snow/55210739


Processing URLs:  58%|█████▊    | 582/1000 [25:02<16:34,  2.38s/it]

Error extracting text from http://pk.shafaqna.com/EN/32369: 404 Client Error: Not Found for url: http://pk.shafaqna.com/EN/32369


Processing URLs:  58%|█████▊    | 584/1000 [25:04<12:35,  1.82s/it]

Error extracting text from http://macedoniaonline.eu/content/view/28518/53/: 404 Client Error: Not Found for url: https://macedoniaonline.eu/content/view/28518/53


Processing URLs:  59%|█████▉    | 591/1000 [25:15<09:15,  1.36s/it]

Error extracting text from http://www.environmental-expert.com/companies/the-shibatani-group-inc-41206: HTTPSConnectionPool(host='www.environmental-expert.com', port=80): Max retries exceeded with url: /companies/the-shibatani-group-inc-41206 (Caused by SSLError(SSLError(1, '[SSL] record layer failure (_ssl.c:1000)')))


Processing URLs:  59%|█████▉    | 592/1000 [25:16<09:37,  1.42s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/chance-of-woman-becoming-un-chief-still-elusive/articleshow/53928039.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/chance-of-woman-becoming-un-chief-still-elusive/articleshow/53928039.cms


Processing URLs:  59%|█████▉    | 594/1000 [25:26<18:19,  2.71s/it]

Error extracting text from https://constitutioncenter.org/blog/case-preview-the-wedding-cake-decision: 403 Client Error: Forbidden for url: https://constitutioncenter.org/blog/case-preview-the-wedding-cake-decision


Processing URLs:  60%|█████▉    | 596/1000 [25:28<11:35,  1.72s/it]

Error extracting text from https://www.middleeastmonitor.com/20171108-saudi-arabia-summoned-abbas-in-response-to-hamas-trip-to-iran/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20171108-saudi-arabia-summoned-abbas-in-response-to-hamas-trip-to-iran/


Processing URLs:  60%|█████▉    | 598/1000 [28:32<4:24:07, 39.42s/it]

Error extracting text from http://www.mgm.gov.tr/en-us/marine-daily-report.aspx: HTTPSConnectionPool(host='mgm.gov.tr', port=443): Max retries exceeded with url: /en-us/marine-daily-report.aspx (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x306bfc500>, 'Connection to mgm.gov.tr timed out. (connect timeout=60)'))
Error extracting text from http://www.reuters.com/article/us-usa-sanctions-jacklew-idUSKCN0WW1VM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-sanctions-jacklew-idUSKCN0WW1VM


Processing URLs:  60%|█████▉    | 599/1000 [28:32<3:04:54, 27.67s/it]

Error extracting text from http://www.wsj.com/articles/dollar-rises-on-expectations-fed-will-raise-rates-next-month-1448299702: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/dollar-rises-on-expectations-fed-will-raise-rates-next-month-1448299702


Processing URLs:  60%|██████    | 602/1000 [28:35<1:06:49, 10.07s/it]

Error extracting text from https://www.yahoo.com/news/many-months-start-battle-mosul-coalition-190540445.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/many-months-start-battle-mosul-coalition-190540445.html


Processing URLs:  61%|██████    | 606/1000 [28:38<18:50,  2.87s/it]  

Error extracting text from http://www.wsj.com/articles/oil-pulls-back-on-weak-demand-growth-1473761240: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-pulls-back-on-weak-demand-growth-1473761240


Processing URLs:  61%|██████    | 611/1000 [28:43<08:48,  1.36s/it]

Error extracting text from https://theconversation.com/venezuela-why-trumps-sanctions-wont-work-82970: 403 Client Error: Forbidden for url: https://theconversation.com/venezuela-why-trumps-sanctions-wont-work-82970


Processing URLs:  61%|██████▏   | 614/1000 [28:49<10:24,  1.62s/it]

Error extracting text from http://uk.reuters.com/article/2015/11/05/uk-mideast-crisis-syria-russia-meeting-idUKKCN0SU0Q420151105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  62%|██████▏   | 618/1000 [28:59<13:49,  2.17s/it]

Error extracting text from http://fr.allafrica.com/view/group/main/main/id/00039651.html: HTTPConnectionPool(host='fr.allafrica.com', port=80): Max retries exceeded with url: /view/group/main/main/id/00039651.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x306bfd4c0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  63%|██████▎   | 626/1000 [29:13<09:35,  1.54s/it]

Error extracting text from https://www.porttechnology.org/news/panama_canal_chief_releases_expansion_statement/: 403 Client Error: Forbidden for url: https://www.porttechnology.org/news/panama_canal_chief_releases_expansion_statement/


Processing URLs:  63%|██████▎   | 632/1000 [29:23<11:12,  1.83s/it]

Error extracting text from http://www.reuters.com/article/2015/09/18/us-economy-rates-idUSKCN0RI1Q320150918: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/18/us-economy-rates-idUSKCN0RI1Q320150918


Processing URLs:  64%|██████▎   | 636/1000 [29:29<09:40,  1.59s/it]

Error extracting text from https://publishingperspectives.com/2020/07/coronavirus-statistics-npd-sees-us-unit-sales-up-in-q2-covid19/: 403 Client Error: Forbidden for url: https://publishingperspectives.com/2020/07/coronavirus-statistics-npd-sees-us-unit-sales-up-in-q2-covid19/


Processing URLs:  64%|██████▎   | 637/1000 [29:31<09:11,  1.52s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=114611: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=114611


Processing URLs:  64%|██████▍   | 642/1000 [29:38<09:03,  1.52s/it]

Error extracting text from http://www.boxofficemojo.com/movies/?page=main&amp;id=jurassicpark4.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=main&amp;id=jurassicpark4.htm


Processing URLs:  65%|██████▍   | 648/1000 [29:51<14:54,  2.54s/it]

Error extracting text from http://www.rollcall.com/news/politics/health-care-determine-progress-agenda: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/health-care-determine-progress-agenda


Processing URLs:  65%|██████▌   | 653/1000 [29:59<09:19,  1.61s/it]

Error extracting text from http://www.realcleardefense.com/articles/2016/04/18/us_to_send_200_more_troops_apache_helicopters_to_iraq_109275.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2016/04/18/us_to_send_200_more_troops_apache_helicopters_to_iraq_109275.html


Processing URLs:  66%|██████▌   | 656/1000 [30:00<04:13,  1.36it/s]

Error extracting text from https://www.reuters.com/article/us-usa-court-kennedy/no-matter-how-you-slice-it-u-s-jurist-kennedy-key-vote-in-cake-case-idUSKCN1C21E9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-kennedy/no-matter-how-you-slice-it-u-s-jurist-kennedy-key-vote-in-cake-case-idUSKCN1C21E9


Processing URLs:  66%|██████▌   | 659/1000 [30:03<06:30,  1.15s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Document/Aboutus/Governance/IMB/13IMBMeeting/13IMB_Report_EN.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Document/Aboutus/Governance/IMB/13IMBMeeting/13IMB_Report_EN.pdf


Processing URLs:  66%|██████▌   | 661/1000 [30:05<05:44,  1.02s/it]

Error extracting text from http://www.nytimes.com/2015/09/09/science/china-flexes-tech-muscles-before-state-visit-with-meeting-of-industry-giants.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/09/science/china-flexes-tech-muscles-before-state-visit-with-meeting-of-industry-giants.html


Processing URLs:  66%|██████▋   | 663/1000 [30:09<07:21,  1.31s/it]

Error extracting text from http://thehill.com/homenews/campaign/362504-moore-says-lesbians-gays-socialists-behind-sexual-misconduct-allegations: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/362504-moore-says-lesbians-gays-socialists-behind-sexual-misconduct-allegations/


Processing URLs:  67%|██████▋   | 673/1000 [30:25<08:18,  1.53s/it]

Error extracting text from http://kurdishdailynews.org/2016/03/06/president-obamas-envoy-for-anti-isis-coalition-says-mosul-operation-has-already-started/?utm_campaign=shareaholic&amp;utm_medium=mail&amp;utm_source=email: 520 Server Error:  for url: https://diyhome.io/


Processing URLs:  67%|██████▋   | 674/1000 [30:27<09:18,  1.71s/it]

Error extracting text from http://www.stripes.com/news/violence-increases-pressure-on-afghan-president-1.378497: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/violence-increases-pressure-on-afghan-president-1.378497


Processing URLs:  68%|██████▊   | 678/1000 [30:43<15:38,  2.91s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-09/venezuela-seen-staving-off-default-again-even-as-crisis-worsens


Processing URLs:  68%|██████▊   | 683/1000 [30:54<14:09,  2.68s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/navy-officer-us-bullied-china-china-sea-53161089: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/navy-officer-us-bullied-china-china-sea-53161089


Processing URLs:  69%|██████▊   | 686/1000 [30:56<06:43,  1.28s/it]

Error extracting text from https://www.nytimes.com/live/2021/02/14/world/covid-19-coronavirus: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2021/02/14/world/covid-19-coronavirus


Processing URLs:  69%|██████▊   | 687/1000 [30:57<06:00,  1.15s/it]

Error extracting text from https://finance.yahoo.com/news/trump-antitrust-pick-saw-few-155319597.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/trump-antitrust-pick-saw-few-155319597.html


Processing URLs:  69%|██████▉   | 691/1000 [31:02<06:39,  1.29s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/more-senior-officials/2247204.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/more-senior-officials/2247204.html


Processing URLs:  70%|██████▉   | 695/1000 [31:04<03:13,  1.57it/s]

Error extracting text from http://www.washingtontimes.com/news/2016/jan/26/iran-hard-liners-reject-middle-path-disqualify-mod/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/26/iran-hard-liners-reject-middle-path-disqualify-mod/


Processing URLs:  70%|██████▉   | 697/1000 [31:07<06:18,  1.25s/it]

Error extracting text from http://www.elmundo.com.ve/noticias/economia/mercados/bonos-venezolanos-en-caida-arrastrados-por-petrole.aspx#ixzz3xgWTznO4: HTTPConnectionPool(host='www.elmundo.com.ve', port=80): Max retries exceeded with url: /noticias/economia/mercados/bonos-venezolanos-en-caida-arrastrados-por-petrole.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3065ce420>: Failed to resolve 'www.elmundo.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|██████▉   | 698/1000 [31:10<08:38,  1.72s/it]

Error extracting text from http://tass.ru/en/russia/758220: 404 Client Error: Not Found for url: https://tass.ru/en/russia/758220


Processing URLs:  70%|██████▉   | 699/1000 [31:11<08:00,  1.60s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-08-03/coal-miner-alpha-natural-resources-files-for-bankruptcy


Processing URLs:  70%|███████   | 702/1000 [31:13<04:51,  1.02it/s]

Error extracting text from http://news.xinhuanet.com/english/2015-11/09/c_134798597.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-11/09/c_134798597.htm


Processing URLs:  70%|███████   | 704/1000 [31:14<03:57,  1.25it/s]

Error extracting text from http://english.chinamil.com.cn/news-channels/2016-08/10/content_7200507.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/2016-08/10/content_7200507.htm


Processing URLs:  71%|███████   | 706/1000 [31:16<03:47,  1.29it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-25/italian-ceos-court-rouhani-over-dinner-for-billion-dollar-deals


Processing URLs:  71%|███████   | 710/1000 [31:18<02:40,  1.80it/s]

Error extracting text from https://www.al-monitor.com/pulse/contents/afp/2017/11/wsahara-conflict-un-morocco-algeria.html: 404 Client Error: Not Found for url: https://www.al-monitor.com/contents/afp/2017/11/wsahara-conflict-un-morocco-algeria.html
Error extracting text from https://open.spotify.com/episode/5FbcDHl003lbV9QBF2SQrd?si=h1QVFOFQTdm6MuUQwpFYCw: 403 Client Error: Forbidden for url: https://open.spotify.com/episode/5FbcDHl003lbV9QBF2SQrd?si=h1QVFOFQTdm6MuUQwpFYCw


Processing URLs:  71%|███████   | 712/1000 [31:19<03:09,  1.52it/s]

Error extracting text from http://www.duffelblog.com/2015/09/man-love-thursday/: 403 Client Error: Forbidden for url: http://www.duffelblog.com/2015/09/man-love-thursday/


Processing URLs:  72%|███████▏  | 716/1000 [31:25<05:48,  1.23s/it]

Error extracting text from https://carnegieendowment.org/2021/01/15/what-biden-should-know-about-north-korea-s-new-nuclear-plans-pub-83638: 403 Client Error: Forbidden for url: https://carnegieendowment.org/2021/01/15/what-biden-should-know-about-north-korea-s-new-nuclear-plans-pub-83638


Processing URLs:  72%|███████▏  | 718/1000 [31:30<08:50,  1.88s/it]

Error extracting text from http://soufangroup.com/wp-content/uploads/2015/12/TSG_ForeignFightersUpdate3.pdf: 404 Client Error: Not Found for url: https://www.soufangroup.com/wp-content/uploads/2015/12/TSG_ForeignFightersUpdate3.pdf


Processing URLs:  72%|███████▏  | 719/1000 [31:32<09:13,  1.97s/it]

Error extracting text from https://thesaxon.org/2020-a-bad-year-for-foreign-journalists-in-china/30279/: 404 Client Error: Not Found for url: https://thesaxon.org/2020-a-bad-year-for-foreign-journalists-in-china/30279/


Processing URLs:  72%|███████▏  | 720/1000 [31:33<07:53,  1.69s/it]

URL filtered: https://www.washingtonpost.com/world/europe/pro-putin-politics-bots-are-flooding-russian-twitter-oxford-based-studysays/2017/06/20/19c35d6e-5474-11e7-840b-512026319da7_story.html?utm_term=.88970a23c926


Processing URLs:  72%|███████▏  | 723/1000 [31:35<05:23,  1.17s/it]

Error extracting text from http://atimes.com/2016/07/china-has-right-to-declare-adiz-in-the-south-china-sea/: 404 Client Error: Not Found for url: https://atimes.com/2016/07/china-has-right-to-declare-adiz-in-the-south-china-sea/


Processing URLs:  72%|███████▏  | 724/1000 [31:36<05:11,  1.13s/it]

Error extracting text from http://www.caam.org.cn/guojijiaoliu/20150901/1505171077.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/guojijiaoliu/20150901/1505171077.html
Error extracting text from http://www.washingtontimes.com/news/2016/jul/24/obama-rushing-mosul-offensive-against-isis-to-infl/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jul/24/obama-rushing-mosul-offensive-against-isis-to-infl/
URL filtered: https://www.youtube.com/watch?v=TqtehtSB0LI


Processing URLs:  73%|███████▎  | 729/1000 [31:41<04:10,  1.08it/s]

Error extracting text from http://thenewdaily.com.au/news/world/2017/06/14/brexit-dup-northern-ireland-troubles/: 403 Client Error: Forbidden for url: http://thenewdaily.com.au/news/world/2017/06/14/brexit-dup-northern-ireland-troubles/


Processing URLs:  73%|███████▎  | 730/1000 [31:42<04:51,  1.08s/it]

URL filtered: http://www.nydailynews.com/news/politics/movement-forms-push-paul-ryan-run-gop-2016-nom-article-1.2555547?utm_content=buffer5dc08&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=NYDailyNewsTw


Processing URLs:  74%|███████▎  | 737/1000 [39:50<9:49:31, 134.49s/it]

Error extracting text from https://www.thespainreport.com/articles/770-160621194220-spain-general-election-brief-21-06-2016: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/770-160621194220-spain-general-election-brief-21-06-2016 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3023dcc80>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  74%|███████▍  | 740/1000 [39:51<3:31:00, 48.70s/it] 

Error extracting text from http://www.reuters.com/article/us-oil-shale-idUSKCN0Z60CH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-shale-idUSKCN0Z60CH
Error extracting text from http://www.wjol.com/fitch-ratings-latest-not-downgrade-illinois-credit/: 403 Client Error: Forbidden for url: http://www.wjol.com/fitch-ratings-latest-not-downgrade-illinois-credit/


Processing URLs:  74%|███████▍  | 741/1000 [39:54<2:31:34, 35.11s/it]

Error extracting text from http://www.dailytimes.com.pk/business/31-Aug-2015/china-market-chaos-blamed-on-exodus-of-regulatory-turtles: 404 Client Error: Not Found for url: https://dailytimes.com.pk/business/31-Aug-2015/china-market-chaos-blamed-on-exodus-of-regulatory-turtles


Processing URLs:  74%|███████▍  | 744/1000 [39:57<54:26, 12.76s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2017-04-25/turkey-will-keep-pumping-money-into-infrastructure-premier-says


Processing URLs:  76%|███████▌  | 755/1000 [40:15<10:56,  2.68s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-11-24/harnessing-trump-and-sanders-korean-populist-rises-in-polls


Processing URLs:  76%|███████▌  | 757/1000 [40:15<06:43,  1.66s/it]

Error extracting text from http://alltheclaims.com/about/: 406 Client Error: Not Acceptable for url: http://alltheclaims.com/about/


Processing URLs:  76%|███████▌  | 758/1000 [40:18<07:52,  1.95s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-24/turkey-plans-multibillion-dollar-fund-to-keep-growth-on-track


Processing URLs:  77%|███████▋  | 771/1000 [40:32<02:37,  1.45it/s]

Error extracting text from https://www.nashville.gov/Portals/0/SiteContent/Police/docs/Media/Daily%20Arrest/December%202020/Report.pdf: 403 Client Error: Forbidden for url: https://www.nashville.gov/Portals/0/SiteContent/Police/docs/Media/Daily%20Arrest/December%202020/Report.pdf
Error extracting text from https://news.bitcoin.com/bitcoin-etf-approved-first-north-american-bitcoin-etf-toronto-stock-exchange/: 403 Client Error: Forbidden for url: https://news.bitcoin.com/bitcoin-etf-approved-first-north-american-bitcoin-etf-toronto-stock-exchange/


Processing URLs:  77%|███████▋  | 774/1000 [40:35<02:28,  1.52it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-security-germany-idUSKCN1021CL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-germany-idUSKCN1021CL


Processing URLs:  78%|███████▊  | 777/1000 [40:40<04:31,  1.22s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53091#.VrD2_pTfWrU: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53091#.VrD2_pTfWrU


Processing URLs:  78%|███████▊  | 779/1000 [40:42<03:31,  1.04it/s]

Error extracting text from https://translate.google.com/translate?sl=auto&amp;tl=en&amp;js=y&amp;prev=_t&amp;hl=en&amp;ie=UTF-8&amp;u=http%3A%2F%2Fwww.jornaldeluzilandia.com.br%2Ftxt.php%3Fid%3D41703&amp;edit-text=: 400 Client Error: Bad Request for url: https://translate.google.com/translate?sl=auto&amp;tl=en&amp;js=y&amp;prev=_t&amp;hl=en&amp;ie=UTF-8&amp;u=http%3A%2F%2Fwww.jornaldeluzilandia.com.br%2Ftxt.php%3Fid%3D41703&amp;edit-text=
URL filtered: https://twitter.com/R_H_Ebright/status/1407732404207370245


Processing URLs:  78%|███████▊  | 781/1000 [40:43<02:53,  1.26it/s]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-icm-idUKKCN0XF28Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  78%|███████▊  | 782/1000 [40:43<02:43,  1.34it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-11/zuma-premium-haunts-south-africa-as-cds-spread-widens-chart


Processing URLs:  78%|███████▊  | 784/1000 [40:44<01:57,  1.83it/s]

Error extracting text from https://www.un.org/press/en/2021/sgsm20680.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2021/sgsm20680.doc.htm


Processing URLs:  79%|███████▉  | 789/1000 [40:54<05:17,  1.50s/it]

Error extracting text from https://globalguessing.com/metaculus-mondays-vol2/: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /metaculus-mondays-vol2/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  79%|███████▉  | 791/1000 [40:56<03:56,  1.13s/it]

Error extracting text from https://lens.monash.edu/2018/05/17/1349695/helping-intelligence-analysis-get-it-right: 403 Client Error: Forbidden for url: https://lens.monash.edu/2018/05/17/1349695/helping-intelligence-analysis-get-it-right
Error extracting text from http://www.rfi.fr/europe/20151122-vladimir-poutine-visite-teheran-rohani-khamenei-syrie: 403 Client Error: Forbidden for url: http://www.rfi.fr/europe/20151122-vladimir-poutine-visite-teheran-rohani-khamenei-syrie


Processing URLs:  79%|███████▉  | 793/1000 [40:58<04:02,  1.17s/it]

Error extracting text from https://eand.co/how-britain-became-the-first-country-in-the-world-to-surrender-to-covid-e0d64c0a5d3f: 403 Client Error: Forbidden for url: https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Feand.co%2Fhow-britain-became-the-first-country-in-the-world-to-surrender-to-covid-e0d64c0a5d3f


Processing URLs:  80%|███████▉  | 798/1000 [41:05<04:06,  1.22s/it]

Error extracting text from http://www.southcarolinagasprices.com/GasPriceSearch.aspx: 403 Client Error: Forbidden for url: http://www.southcarolinagasprices.com/GasPriceSearch.aspx


Processing URLs:  80%|████████  | 800/1000 [41:12<08:22,  2.51s/it]

Error extracting text from https://www.reuters.com/article/us-alphabet-uber-ruling/uber-waymo-trial-delayed-as-u-s-judge-raises-prospect-of-cover-up-idUSKBN1DS26X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-alphabet-uber-ruling/uber-waymo-trial-delayed-as-u-s-judge-raises-prospect-of-cover-up-idUSKBN1DS26X


Processing URLs:  80%|████████  | 804/1000 [41:17<05:20,  1.63s/it]

Error extracting text from http://www.wsj.com/articles/china-to-lift-ban-on-ipos-1446804114: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-to-lift-ban-on-ipos-1446804114


Processing URLs:  81%|████████  | 806/1000 [41:20<04:13,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-israel-palestinians-un-idUSKBN16V2ME: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-un-idUSKBN16V2ME


Processing URLs:  81%|████████  | 807/1000 [41:22<05:00,  1.55s/it]

Error extracting text from http://www.nytimes.com/2015/11/06/us/house-passes-highway-bill.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/06/us/house-passes-highway-bill.html


Processing URLs:  81%|████████  | 811/1000 [41:29<05:14,  1.66s/it]

Error extracting text from http://cherna.gora.me/news/gegaj-postponing-the-ratification-is-a-bad-message-by-kosovo/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/gegaj-postponing-the-ratification-is-a-bad-message-by-kosovo/


Processing URLs:  81%|████████  | 812/1000 [41:29<04:16,  1.37s/it]

Error extracting text from http://www.weeklystandard.com/will-the-wheels-of-justice-grind-hillary/article/2001981: 404 Client Error: Not Found for url: http://www.weeklystandard.com/will-the-wheels-of-justice-grind-hillary/article/2001981


Processing URLs:  82%|████████▏ | 817/1000 [41:35<02:47,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-vietnam-military-idUSKCN0Z012M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-vietnam-military-idUSKCN0Z012M


Processing URLs:  82%|████████▏ | 818/1000 [41:35<02:08,  1.42it/s]

Error extracting text from https://www.nytimes.com/2021/01/26/world/europe/italy-prime-minister-giuseppe-conte-resigns.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/26/world/europe/italy-prime-minister-giuseppe-conte-resigns.html


Processing URLs:  82%|████████▏ | 819/1000 [41:35<02:02,  1.47it/s]

Error extracting text from http://gcaptain.com/april-opening-of-panama-canal-expansion-appears-unlikely/#.Vnxyd3CkqrU: 403 Client Error: Forbidden for url: http://gcaptain.com/april-opening-of-panama-canal-expansion-appears-unlikely/#.Vnxyd3CkqrU


Processing URLs:  82%|████████▏ | 820/1000 [41:36<01:37,  1.84it/s]

Error extracting text from https://www.nytimes.com/2017/06/23/world/asia/destroyer-fitzgerald-collision.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/23/world/asia/destroyer-fitzgerald-collision.html


Processing URLs:  82%|████████▎ | 825/1000 [41:45<04:13,  1.45s/it]

Error extracting text from http://www.hydrogencarsnow.com/index.php/home-hydrogen-fueling-stations/: 406 Client Error: Not Acceptable for url: http://www.hydrogencarsnow.com/index.php/home-hydrogen-fueling-stations/


Processing URLs:  83%|████████▎ | 829/1000 [41:51<03:53,  1.37s/it]

Error extracting text from http://www.reuters.com/article/us-hkex-ceo-board/hkex-talks-to-lure-saudi-aramco-listing-will-never-stop-ceo-idUSKCN1BG08Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-hkex-ceo-board/hkex-talks-to-lure-saudi-aramco-listing-will-never-stop-ceo-idUSKCN1BG08Z
Error extracting text from http://www.reuters.com/article/us-philippines-usa-defence-idUSKBN17Q120: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-usa-defence-idUSKBN17Q120


Processing URLs:  84%|████████▎ | 836/1000 [42:07<06:09,  2.25s/it]

URL filtered: https://www.youtube.com/watch?v=cQ54GDm1eL0
Error extracting text from https://www.wsj.com/graphics/the-threat-from-north-koreas-missiles/: 403 Client Error: Forbidden for url: https://www.wsj.com/graphics/the-threat-from-north-koreas-missiles/


Processing URLs:  84%|████████▍ | 838/1000 [42:09<04:36,  1.71s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_130457.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_130457.htm


Processing URLs:  84%|████████▍ | 839/1000 [42:11<04:44,  1.77s/it]

Error extracting text from https://www.reuters.com/article/us-ukraine-crisis-army/russia-left-troops-in-belarus-after-wargames-ukraine-idUSKCN1C4234: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-army/russia-left-troops-in-belarus-after-wargames-ukraine-idUSKCN1C4234
URL filtered: http://bloomberg.econoday.com/byshoweventfull.asp?fid=467002&amp;cust=bloomberg-us&amp;year=2015&amp;lid=0&amp;prev=/byweek.asp#top


Processing URLs:  84%|████████▍ | 842/1000 [42:12<02:51,  1.09s/it]

Error extracting text from http://www.stateside.com/states/2016-legislative-session-info/: HTTPSConnectionPool(host='stateside.comstates', port=443): Max retries exceeded with url: /2016-legislative-session-info/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3051d8fe0>: Failed to resolve 'stateside.comstates' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▍ | 845/1000 [42:16<02:37,  1.02s/it]

Error extracting text from https://www.afghanistan-analysts.org/charismatic-absolutist-divisive-hekmatyar-and-the-impact-of-his-return/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/charismatic-absolutist-divisive-hekmatyar-and-the-impact-of-his-return/
Error extracting text from http://www.reuters.com/article/us-netherlands-election-idUSKBN15R0IJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-netherlands-election-idUSKBN15R0IJ


Processing URLs:  85%|████████▍ | 848/1000 [42:19<02:37,  1.04s/it]

Error extracting text from https://prod.hypermind.com/ngdp/fr/welcomeHA.html: HTTPSConnectionPool(host='prod.hypermind.com', port=443): Max retries exceeded with url: /ngdp/fr/welcomeHA.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  85%|████████▌ | 854/1000 [42:25<01:31,  1.59it/s]

Error extracting text from http://www.wsj.com/articles/taliban-islamic-state-forge-informal-alliance-in-eastern-afghanistan-1470611849: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/taliban-islamic-state-forge-informal-alliance-in-eastern-afghanistan-1470611849
Error extracting text from http://legacyforalifetime.net/seotdal-geumeum/: HTTPConnectionPool(host='legacyforalifetime.net', port=80): Max retries exceeded with url: /seotdal-geumeum/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304923380>: Failed to resolve 'legacyforalifetime.net' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-turkey-mattis-idUSKBN1871F0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-turkey-mattis-idUSKBN1871F0


Processing URLs:  86%|████████▌ | 860/1000 [42:34<03:18,  1.42s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/27339-ghani-ceo-reach-electoral-reforms-agreement: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/27339-ghani-ceo-reach-electoral-reforms-agreement
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://oreacionario.com/tag/jose-eduardo-cardoso/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://oreacionario.com/tag/jose-eduardo-cardoso/&amp;prev=search


Processing URLs:  86%|████████▌ | 862/1000 [42:34<02:10,  1.06it/s]

Error extracting text from https://www.si.com/soccer/liverpool/news/latest-uefa-champions-league-winner-2021-22-betting-odds: 403 Client Error: Forbidden for url: https://www.si.com/soccer/liverpool/news/latest-uefa-champions-league-winner-2021-22-betting-odds


Processing URLs:  86%|████████▋ | 864/1000 [42:41<04:50,  2.14s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-21/zuma-s-time-running-out-as-ramaphosa-wields-south-african-power


Processing URLs:  87%|████████▋ | 868/1000 [42:46<03:26,  1.56s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_37356.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_37356.htm


Processing URLs:  87%|████████▋ | 872/1000 [42:51<02:42,  1.27s/it]

Error extracting text from http://globalriskinsights.com/2015/10/us-oil-export-ban-will-it-hold-up/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2015/10/us-oil-export-ban-will-it-hold-up/


Processing URLs:  88%|████████▊ | 876/1000 [43:10<06:40,  3.23s/it]

Error extracting text from https://abcnews.go.com/Business/wireStory/imf-downgrades-outlook-global-economy-face-virus-71426050#:~:text=For%202021%2C%20the%20IMF%20envisions%20a%20rebound%20in,percentage%20point%20weaker%20than%20in%20its%20April%20forecast: 404 Client Error: Not Found for url: https://abcnews.go.com/Business/wireStory/imf-downgrades-outlook-global-economy-face-virus-71426050#:~:text=For%202021%2C%20the%20IMF%20envisions%20a%20rebound%20in,percentage%20point%20weaker%20than%20in%20its%20April%20forecast


Processing URLs:  88%|████████▊ | 878/1000 [43:11<04:06,  2.02s/it]

Error extracting text from https://bit.ly/39Kaheq: HTTPSConnectionPool(host='uknews.lenexweb.com', port=443): Max retries exceeded with url: /2021/03/31/polling-guru-john-curtice-warns-nicola-sturgeon-has-lost-gains-for-scottish-independence/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'uknews.lenexweb.com'. (_ssl.c:1000)")))
Error extracting text from http://rasqoh.com/a-presidency-under-siege-will-uhuru-kenyatta-win-reelection/: HTTPConnectionPool(host='rasqoh.com', port=80): Max retries exceeded with url: /a-presidency-under-siege-will-uhuru-kenyatta-win-reelection/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcc8c0>: Failed to resolve 'rasqoh.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  89%|████████▉ | 891/1000 [44:44<04:08,  2.28s/it]

Error extracting text from http://www.ucsusa.org/sites/default/files/legacy/assets/documents/nwgs/Wright-Analysis-of-NK-launcher-3-18-09.pdf: 404 Client Error: Not Found for url: https://www.ucsusa.org/sites/default/files/legacy/assets/documents/nwgs/Wright-Analysis-of-NK-launcher-3-18-09.pdf
Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=9301: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=9301


Processing URLs:  89%|████████▉ | 892/1000 [44:44<02:57,  1.64s/it]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/2844074833x0x903616/220F0CCE-41A7-46B9-9740-8C2A53A86B6D/Q2_16_Update_Letter_-_final.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/2844074833x0x903616/220F0CCE-41A7-46B9-9740-8C2A53A86B6D/Q2_16_Update_Letter_-_final.pdf


Processing URLs:  90%|█████████ | 904/1000 [45:02<02:18,  1.45s/it]

Error extracting text from http://newsok.com/article/5512636: 404 Client Error: OK for url: https://www.oklahoman.com/article/5512636


Processing URLs:  91%|█████████ | 906/1000 [45:05<01:57,  1.25s/it]

Error extracting text from http://www.reuters.com/article/2015/10/26/us-southchinasea-usa-idUSKCN0SK2AC20151026: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/26/us-southchinasea-usa-idUSKCN0SK2AC20151026


Processing URLs:  91%|█████████ | 908/1000 [45:06<01:25,  1.08it/s]

Error extracting text from http://canadafreepress.com/article/75266: 403 Client Error: Forbidden for url: https://canadafreepress.com/article/75266


Processing URLs:  91%|█████████ | 911/1000 [45:08<01:14,  1.20it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160514/0905191662.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160514/0905191662.html


Processing URLs:  91%|█████████▏| 914/1000 [46:11<21:01, 14.66s/it]

Error extracting text from http://www.charlotteobserver.com/opinion/opn-columns-blogs/taylor-batten/article96736902.html: HTTPConnectionPool(host='www.charlotteobserver.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  92%|█████████▏| 916/1000 [46:13<11:44,  8.38s/it]

Error extracting text from http://www.reuters.com/article/2015/11/14/us-mideast-crisis-syria-steinmeier-idUSKCN0T312S20151114#I5SjkQMse0qLeYdq.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/14/us-mideast-crisis-syria-steinmeier-idUSKCN0T312S20151114#I5SjkQMse0qLeYdq.99


Processing URLs:  92%|█████████▏| 918/1000 [46:14<06:22,  4.66s/it]

Error extracting text from https://www.wsj.com/articles/a-hong-kong-dissidents-daring-escape-11610494379?mod=opinion_lead_pos5: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-hong-kong-dissidents-daring-escape-11610494379?mod=opinion_lead_pos5
Error extracting text from http://www.reuters.com/article/us-iran-election-start-idUSKCN0XQ09H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-start-idUSKCN0XQ09H


Processing URLs:  92%|█████████▏| 921/1000 [46:17<03:30,  2.66s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/burundi-presidential-aide-wounded-assassination-bid-43838309: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/burundi-presidential-aide-wounded-assassination-bid-43838309


Processing URLs:  92%|█████████▏| 922/1000 [46:18<02:50,  2.19s/it]

Error extracting text from https://www.bundesarchiv.de/DE/Content/Dokumente-zur-Zeitgeschichte/19181109_ausrufung-der-republik.html: HTTPSConnectionPool(host='www.bundesarchiv.de', port=443): Max retries exceeded with url: /DE/Content/Dokumente-zur-Zeitgeschichte/19181109_ausrufung-der-republik.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  92%|█████████▏| 924/1000 [47:21<23:08, 18.27s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-03-02/following-reports-of-jeff-sessions-meetings-russia-dismisses-claim-ambassador-is-a-spy-as-fake-news: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 929/1000 [47:25<04:32,  3.83s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN1320H5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKBN1320H5


Processing URLs:  93%|█████████▎| 932/1000 [47:28<02:05,  1.84s/it]

Error extracting text from http://www.washingtontimes.com/news/2015/nov/19/bernie-sanders-edges-hillary-clinton-new-hampshire/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/nov/19/bernie-sanders-edges-hillary-clinton-new-hampshire/


Processing URLs:  93%|█████████▎| 934/1000 [55:33<2:40:12, 145.64s/it]

Error extracting text from https://www.thespainreport.com/articles/875-160828144537-rajoy-and-rivera-sign-stillborn-confidence-vote-deal: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/875-160828144537-rajoy-and-rivera-sign-stillborn-confidence-vote-deal (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe273530>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  94%|█████████▍| 938/1000 [56:07<46:18, 44.81s/it]   

Error extracting text from http://gas2.org/2016/05/23/land-sea-san-francisco-doubles-hydrogen-fuel-cell-evs/: 522 Server Error:  for url: https://gas2.org/2016/05/23/land-sea-san-francisco-doubles-hydrogen-fuel-cell-evs/


Processing URLs:  94%|█████████▍| 940/1000 [56:13<23:15, 23.26s/it]

Error extracting text from https://www.reuters.com/world/kremlin-says-it-is-too-early-talk-about-putin-biden-summit-tangible-terms-2021-04-14/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/kremlin-says-it-is-too-early-talk-about-putin-biden-summit-tangible-terms-2021-04-14/


Processing URLs:  94%|█████████▍| 941/1000 [56:14<16:19, 16.59s/it]

Error extracting text from https://www.sofx.com/wp-content/uploads/2017/02/170221.LTR_.SOFX_-DRONES-IN-HUMANITARIAN-AND-MEDICAL-ASSISTANCE-1.pdf: 403 Client Error: Forbidden for url: https://www.sofx.com/wp-content/uploads/2017/02/170221.LTR_.SOFX_-DRONES-IN-HUMANITARIAN-AND-MEDICAL-ASSISTANCE-1.pdf


Processing URLs:  94%|█████████▍| 944/1000 [56:15<05:58,  6.40s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-kerry-lavrov-idUSKCN11102B?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-kerry-lavrov-idUSKCN11102B?il=0
Error extracting text from http://www.reuters.com/statesofthenation/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/statesofthenation/


Processing URLs:  94%|█████████▍| 945/1000 [56:23<06:16,  6.85s/it]

URL filtered: https://twitter.com/TPM/status/686958308121538562


Processing URLs:  95%|█████████▌| 951/1000 [56:33<02:18,  2.83s/it]

Error extracting text from http://blogs.reuters.com/breakingviews/2016/07/05/tesla-pressure-clouds-elon-musks-solar-gambit/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /breakingviews/2016/07/05/tesla-pressure-clouds-elon-musks-solar-gambit/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300402600>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  95%|█████████▌| 952/1000 [56:33<01:42,  2.13s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-tax-idUSKBN1601E8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-tax-idUSKBN1601E8
Error extracting text from https://www.hindustantimes.com/world-news/polls-show-trudeau-won-t-form-majority-govt-if-canada-holds-snap-polls-this-year-101627193184150.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/polls-show-trudeau-won-t-form-majority-govt-if-canada-holds-snap-polls-this-year-101627193184150.html


Processing URLs:  95%|█████████▌| 954/1000 [56:34<01:02,  1.36s/it]

Error extracting text from https://www.cia.gov/cia-labs/: 403 Client Error: Forbidden for url: https://www.cia.gov/cia-labs/


Processing URLs:  96%|█████████▌| 961/1000 [56:48<00:53,  1.37s/it]

Error extracting text from http://www.scotsman.com/news/politics/indyref2-scottish-government-workers-ready-for-new-vote-1-4361896: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/indyref2-scottish-government-workers-ready-for-new-vote-1-4361896


Processing URLs:  96%|█████████▌| 962/1000 [56:48<00:39,  1.05s/it]

Error extracting text from http://english.alarabiya.net/en/business/economy/2015/12/19/This-map-will-change-how-you-see-Africa-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/economy/2015/12/19/This-map-will-change-how-you-see-Africa-.html
URL filtered: https://twitter.com/ThomasErdbrink/status/640886110181134339


Processing URLs:  96%|█████████▋| 965/1000 [57:00<01:30,  2.58s/it]

Error extracting text from http://www.wsj.com/articles/iraqi-factions-vie-to-take-part-in-mosul-offensive-1469139683: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-factions-vie-to-take-part-in-mosul-offensive-1469139683


Processing URLs:  97%|█████████▋| 966/1000 [57:01<01:12,  2.14s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-ohio-senate-portman-vs-strickland: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-ohio-senate-portman-vs-strickland


Processing URLs:  97%|█████████▋| 968/1000 [57:03<00:53,  1.67s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-04/sweetened-pdvsa-bond-swap-has-traders-ignoring-congress-s-threat


Processing URLs:  97%|█████████▋| 971/1000 [57:05<00:28,  1.03it/s]

Error extracting text from http://www.wsj.com/articles/bayers-all-cash-offer-values-monsanto-at-62-billion-1463981986: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/bayers-all-cash-offer-values-monsanto-at-62-billion-1463981986


Processing URLs:  98%|█████████▊| 976/1000 [57:09<00:24,  1.02s/it]

Error extracting text from https://gcaptain.com/shipping-delays-throw-book-industry-nail-biting-chaos/?subscriber=true&amp;goal=0_f50174ef03-67415303d8-170102337&amp;mc_cid=67415303d8&amp;mc_eid=c74873c672: 403 Client Error: Forbidden for url: https://gcaptain.com/shipping-delays-throw-book-industry-nail-biting-chaos/?subscriber=true&amp;goal=0_f50174ef03-67415303d8-170102337&amp;mc_cid=67415303d8&amp;mc_eid=c74873c672


Processing URLs:  98%|█████████▊| 978/1000 [57:12<00:26,  1.21s/it]



Processing URLs:  98%|█████████▊| 982/1000 [57:22<00:42,  2.37s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/delivery-by-drone-switzerland-tests-it-in-populated-areas/2017/09/28/27cc04c2-a48a-11e7-b573-8ec86cdfe1ed_story.html?utm_term=.04eaecf1e31e: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/delivery-by-drone-switzerland-tests-it-in-populated-areas/2017/09/28/27cc04c2-a48a-11e7-b573-8ec86cdfe1ed_story.html?utm_term=.04eaecf1e31e
Error extracting text from http://www.reuters.com/article/us-usa-stocks-idUSKBN1321FT?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-idUSKBN1321FT?il=0


Processing URLs:  99%|█████████▊| 986/1000 [57:27<00:16,  1.20s/it]

Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-iran-angered-by-report-that-trump-wants-1501170057-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-iran-angered-by-report-that-trump-wants-1501170057-htmlstory.html


Processing URLs:  99%|█████████▊| 987/1000 [57:28<00:14,  1.14s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/turkish-president-accuses-eu-breaking-pact-migrants-40879501: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/turkish-president-accuses-eu-breaking-pact-migrants-40879501


Processing URLs:  99%|█████████▉| 990/1000 [57:34<00:17,  1.79s/it]

Error extracting text from http://www.newschannel5.com/newsy/these-skeptics-are-asking-trump-to-pull-out-of-a-un-climate-agreement: 404 Client Error: Not Found for url: https://www.newschannel5.com/newsy/these-skeptics-are-asking-trump-to-pull-out-of-a-un-climate-agreement


Processing URLs:  99%|█████████▉| 991/1000 [57:35<00:13,  1.54s/it]

Error extracting text from http://heraldvoice.com/2015/11/10/oil-prices-rise-following-opecs-forecast-on-balanced-market/: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=heraldvoice.com


Processing URLs:  99%|█████████▉| 993/1000 [57:40<00:14,  2.06s/it]

Error extracting text from https://www.imf.org/en/Publications/SPROLLS/world-economic-outlookdatabases#sort=%40imfdate%20descending: 400 Client Error: Bad Request for url: https://www.imf.org/en/Publications/SPROLLS/world-economic-outlook%02databases#sort=%40imfdate%20descending


Processing URLs: 100%|█████████▉| 995/1000 [57:42<00:06,  1.39s/it]

Error extracting text from http://www.wsj.com/articles/donald-trump-adviser-signals-plan-to-change-veterans-health-care-1463064129: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/donald-trump-adviser-signals-plan-to-change-veterans-health-care-1463064129


Processing URLs: 100%|█████████▉| 998/1000 [57:45<00:02,  1.04s/it]

Error extracting text from https://www.middleeastmonitor.com/20171025-moroccos-new-satellite-to-spy-on-algeria-polisario-front/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20171025-moroccos-new-satellite-to-spy-on-algeria-polisario-front/


Processing URLs: 100%|██████████| 1000/1000 [57:50<00:00,  3.47s/it]
Processing URLs:   0%|          | 5/1000 [00:05<15:07,  1.10it/s]

Error extracting text from http://www.yenisafak.com/en/news/thirty-one-pkk-terrorists-neutralized-in-se-turkey-2797398: 422 Client Error:  for url: http://www.yenisafak.com/en/news/thirty-one-pkk-terrorists-neutralized-in-se-turkey-2797398


Processing URLs:   1%|          | 9/1000 [00:09<13:55,  1.19it/s]

Error extracting text from http://www.ibtimes.com/meet-cyberberkut-pro-russian-hackers-waging-anonymous-style-cyberwarfare-against-2228902: 403 Client Error: Forbidden for url: https://www.ibtimes.com/meet-cyberberkut-pro-russian-hackers-waging-anonymous-style-cyberwarfare-against-2228902
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN14P278: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN14P278


Processing URLs:   1%|          | 11/1000 [00:10<11:19,  1.45it/s]

Error extracting text from https://rebootillinois.com/2017/01/24/soda-tax-service-taxes-illinois-lawmakers-seek-budget/: HTTPSConnectionPool(host='rebootillinois.com', port=443): Max retries exceeded with url: /2017/01/24/soda-tax-service-taxes-illinois-lawmakers-seek-budget/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'rebootillinois.com'. (_ssl.c:1000)")))
Error extracting text from http://www.nytimes.com/2016/01/17/magazine/the-trials-of-alice-goffman.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/17/magazine/the-trials-of-alice-goffman.html


Processing URLs:   1%|          | 12/1000 [00:11<12:16,  1.34it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-21/china-arrives-in-syria-as-putin-fights-west-over-postwar-cash


Processing URLs:   2%|▏         | 18/1000 [00:21<31:33,  1.93s/it]

Error extracting text from http://www.spacex.com/news/2016/09/01/anomaly-updates: 404 Client Error: Not Found for url: https://www.spacex.com/news/2016/09/01/anomaly-updates
URL filtered: https://www.forbes.com/advisor/investing/update-facebook-antitrust-lawsuit/


Processing URLs:   2%|▏         | 22/1000 [00:23<14:11,  1.15it/s]

Error extracting text from http://www.reuters.com/article/us-usa-court-obama-idUSKCN0WD2LE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-court-obama-idUSKCN0WD2LE
Error extracting text from https://english.ahram.org.eg/NewsContent/2/10/408810/World/Africa/Sudan-rules-out-armed-action-over-Ethiopias-GERD.aspx: 403 Client Error: Forbidden for url: https://english.ahram.org.eg/NewsContent/2/10/408810/World/Africa/Sudan-rules-out-armed-action-over-Ethiopias-GERD.aspx


Processing URLs:   2%|▎         | 25/1000 [00:24<09:56,  1.64it/s]

Error extracting text from http://nationalinterest.org/feature/russia-vs-japan-asias-forgotten-island-fight-15942: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/russia-vs-japan-asias-forgotten-island-fight-15942
Error extracting text from https://dpo.tothestarsacademy.com/: HTTPSConnectionPool(host='dpo.tothestarsacademy.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30650d6d0>: Failed to resolve 'dpo.tothestarsacademy.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   3%|▎         | 26/1000 [00:55<2:30:54,  9.30s/it]

Error extracting text from http://www.todayszaman.com/national_turkish-police-raid-factories-making-unsafe-boats_411609.html: 522 Server Error:  for url: http://www.todayszaman.com/national_turkish-police-raid-factories-making-unsafe-boats_411609.html


Processing URLs:   3%|▎         | 28/1000 [00:58<1:25:30,  5.28s/it]

Error extracting text from http://thehill.com/opinion/cybersecurity/360042-russias-online-propaganda-is-just-the-latest-incarnation-of-its-old: 403 Client Error: Forbidden for url: https://thehill.com/opinion/cybersecurity/360042-russias-online-propaganda-is-just-the-latest-incarnation-of-its-old/


Processing URLs:   3%|▎         | 32/1000 [01:03<32:17,  2.00s/it]  

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3686313/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3686313/
URL filtered: http://www.businessinsider.com/russias-rt-twitter-pushed-for-millions-in-ads-buys-before-election-2017-10
Error extracting text from http://www.reuters.com/article/us-europe-migrants-turkey-greece-idUSKCN0WA29J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-turkey-greece-idUSKCN0WA29J


Processing URLs:   3%|▎         | 34/1000 [01:06<30:37,  1.90s/it]

Error extracting text from http://www.reuters.com/video/2017/07/28/clashes-kill-five-as-venezuelans-protest: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2017/07/28/clashes-kill-five-as-venezuelans-protest


Processing URLs:   4%|▍         | 38/1000 [01:09<16:13,  1.01s/it]

Error extracting text from http://www.nytimes.com/2015/12/22/world/asia/north-korea-china-moranbong.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/22/world/asia/north-korea-china-moranbong.html


Processing URLs:   4%|▍         | 41/1000 [01:31<1:03:29,  3.97s/it]

Error extracting text from https://www.sadc.int/themes/politics-defence-security/regional-peacekeeping/standby-force/: 404 Client Error: Not Found for url: https://www.sadc.int/themes/politics-defence-security/regional-peacekeeping/standby-force/


Processing URLs:   4%|▍         | 44/1000 [01:34<30:43,  1.93s/it]  

Error extracting text from http://www.ibtimes.co.uk/theresa-may-deals-blow-eurosceptics-after-saying-cameron-has-basis-deal-eu-1541577: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/theresa-may-deals-blow-eurosceptics-after-saying-cameron-has-basis-deal-eu-1541577
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13E0RT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13E0RT


Processing URLs:   5%|▍         | 46/1000 [01:46<1:06:43,  4.20s/it]

URL filtered: https://www.youtube.com/watch?v=_VrFV5r8cs0


Processing URLs:   5%|▍         | 49/1000 [01:48<33:55,  2.14s/it]  

Error extracting text from http://nationalinterest.org/blog/the-buzz/the-hard-truth-about-thaad-south-korea-china-15295: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/the-hard-truth-about-thaad-south-korea-china-15295


Processing URLs:   5%|▌         | 54/1000 [01:53<18:18,  1.16s/it]

Error extracting text from https://www.predictit.org/Browse/Category/6/US-Elections: 403 Client Error: Forbidden for url: https://www.predictit.org/Browse/Category/6/US-Elections


Processing URLs:   6%|▌         | 56/1000 [01:54<12:52,  1.22it/s]

Error extracting text from http://files.shareholder.com/downloads/ABEA-36XVJR/5830382640x0x964515/00E93506-3820-43C3-80F1-EAD08AD70AFA/CAR-USQ_Transcript_2017-11-15_4_.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-36XVJR/5830382640x0x964515/00E93506-3820-43C3-80F1-EAD08AD70AFA/CAR-USQ_Transcript_2017-11-15_4_.pdf


Processing URLs:   6%|▌         | 61/1000 [01:59<10:36,  1.47it/s]

Error extracting text from http://www.wsj.com/articles/mosul-offensive-to-begin-within-a-month-u-s-general-says-1473339951: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/mosul-offensive-to-begin-within-a-month-u-s-general-says-1473339951


Processing URLs:   6%|▋         | 63/1000 [02:00<10:38,  1.47it/s]

Error extracting text from http://www.bls.gov/news.release/realer.t01.htm: 403 Client Error: Forbidden for url: http://www.bls.gov/news.release/realer.t01.htm


Processing URLs:   6%|▋         | 65/1000 [02:02<11:02,  1.41it/s]

URL filtered: https://www.youtube.com/watch?v=HtSXyGM0n-U&amp;t=20s


Processing URLs:   7%|▋         | 68/1000 [02:04<09:13,  1.68it/s]

Error extracting text from http://www.nioc.ir/Portal/home/default.aspx?categoryid=95949051-0d6f-4ca9-be99-45b894630ca5&amp;tabno=5: 403 Client Error: Forbidden for url: http://www.nioc.ir/Portal/home/default.aspx?categoryid=95949051-0d6f-4ca9-be99-45b894630ca5&amp;tabno=5
Error extracting text from http://www.wsj.com/articles/david-cameron-is-largely-at-the-mercy-of-events-in-u-k-s-eu-referendum-1455821580: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/david-cameron-is-largely-at-the-mercy-of-events-in-u-k-s-eu-referendum-1455821580
Error extracting text from https://www.reuters.com/lifestyle/sports/taiwan-hong-kong-welcome-beijing-2022-winter-games-ioc-2021-08-01/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/taiwan-hong-kong-welcome-beijing-2022-winter-games-ioc-2021-08-01/


Processing URLs:   7%|▋         | 72/1000 [02:11<19:44,  1.28s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-eu-hahn-idUSKBN17Q27G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-hahn-idUSKBN17Q27G


Processing URLs:   7%|▋         | 74/1000 [02:12<15:23,  1.00it/s]

Error extracting text from http://m.yenisafak.com/en/economy/turkeys-economy-records-over-3-percent-growth-2528892: 422 Client Error:  for url: https://www.yenisafak.com/en/economy/turkeys-economy-records-over-3-percent-growth-2528892


Processing URLs:   8%|▊         | 78/1000 [02:16<15:33,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKBN0TQ0VZ20151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKBN0TQ0VZ20151207


Processing URLs:   8%|▊         | 83/1000 [02:22<14:59,  1.02it/s]

Error extracting text from https://www.yahoo.com/news/un-wants-send-experts-burundi-mass-graves-probe-223656530.html?ref=gs: 404 Client Error: Not Found for url: https://www.yahoo.com/news/un-wants-send-experts-burundi-mass-graves-probe-223656530.html?ref=gs
Error extracting text from https://www.reuters.com/article/us-usa-trump-nafta-exclusive/exclusive-trump-says-terminating-nafta-would-yield-the-best-deal-in-renegotiations-idUSKBN1F703Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-nafta-exclusive/exclusive-trump-says-terminating-nafta-would-yield-the-best-deal-in-renegotiations-idUSKBN1F703Y


Processing URLs:   9%|▊         | 87/1000 [02:27<21:28,  1.41s/it]

Error extracting text from http://www.bankofengland.co.uk/publications/Pages/news/2016/015.aspx: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/publications/Pages/news/2016/015.aspx


Processing URLs:   9%|▉         | 92/1000 [02:35<19:44,  1.30s/it]

Error extracting text from https://www.nytimes.com/2017/07/19/world/middleeast/cia-arming-syrian-rebels.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/19/world/middleeast/cia-arming-syrian-rebels.html


Processing URLs:  10%|▉         | 95/1000 [02:42<26:20,  1.75s/it]

Error extracting text from https://www.un.org/en/ga/76/meetings/index.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/76/meetings/index.shtml


Processing URLs:  10%|▉         | 99/1000 [02:48<22:22,  1.49s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-primary#: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-iowa-presidential-republican-primary


Processing URLs:  10%|█         | 103/1000 [03:01<43:37,  2.92s/it]  

Error extracting text from http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/nato-chief-urges-montenegro-to-fight-corruption-02-09-2016


Processing URLs:  10%|█         | 104/1000 [03:03<35:28,  2.38s/it]

Error extracting text from http://www.iran-daily.com/News/175552.html?catid=3&amp;title=China-extends--1-3b-for-renovating-Abadan-refinery: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  11%|█         | 111/1000 [03:13<20:20,  1.37s/it]

Error extracting text from http://www.nytimes.com/2016/05/20/world/europe/nato-montenegro.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/20/world/europe/nato-montenegro.html


Processing URLs:  11%|█▏        | 114/1000 [03:21<28:41,  1.94s/it]

Error extracting text from http://www.tolonews.com/en/afghanistan/25612-us-skeptical-about-peace-process-with-new-taliban-leader: 404 Client Error: Not Found for url: https://tolonews.com/en/afghanistan/25612-us-skeptical-about-peace-process-with-new-taliban-leader


Processing URLs:  12%|█▏        | 116/1000 [03:25<25:18,  1.72s/it]

Error extracting text from https://www.fxtimes.com/option-banque-technical-analysis-report-20-nov-2015: HTTPSConnectionPool(host='www.fxtimes.com', port=443): Max retries exceeded with url: /option-banque-technical-analysis-report-20-nov-2015 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  12%|█▏        | 120/1000 [03:30<19:29,  1.33s/it]

Error extracting text from http://www.un.org/en/peacekeeping/operations/newoperation.shtml: 403 Client Error: Forbidden for url: https://www.un.org/en/peacekeeping/operations/newoperation.shtml


Processing URLs:  12%|█▏        | 121/1000 [03:33<30:04,  2.05s/it]

Error extracting text from http://www.wcny.org/andrew-yang-for-nyc-mayor-tbd/: 404 Client Error: Not Found for url: http://www.wcny.org/andrew-yang-for-nyc-mayor-tbd/


Processing URLs:  12%|█▎        | 125/1000 [03:38<17:47,  1.22s/it]

Error extracting text from http://news.yahoo.com/venezuelan-opposition-asks-icc-probe-maduro-crimes-against-144550533.html: 404 Client Error: Not Found for url: http://news.yahoo.com/venezuelan-opposition-asks-icc-probe-maduro-crimes-against-144550533.html


Processing URLs:  13%|█▎        | 131/1000 [03:45<15:40,  1.08s/it]

Error extracting text from http://seekingalpha.com/article/4050132-goldman-sachs-gives-tesla?page=2: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4050132-goldman-sachs-gives-tesla?page=2


Processing URLs:  14%|█▎        | 136/1000 [03:57<26:18,  1.83s/it]

Error extracting text from http://www.caam.org.cn/hangye/20160919/1305198794.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20160919/1305198794.html


Processing URLs:  14%|█▍        | 138/1000 [04:03<33:20,  2.32s/it]

Error extracting text from http://imes.com/2015/12/05/business/energy-environment/opec-meeting-oil-production-price.html?_r=0: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  14%|█▍        | 145/1000 [04:10<11:25,  1.25it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www1.folha.uol.com.br/poder/2016/02/1743480-pmdb-volta-a-considerar-impeachment-apos-prisao-de-santana.shtml&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www1.folha.uol.com.br/poder/2016/02/1743480-pmdb-volta-a-considerar-impeachment-apos-prisao-de-santana.shtml&amp;prev=search
Error extracting text from https://www.reuters.com/world/asia-pacific/new-zealand-gives-provisional-nod-astrazenecas-covid-19-vaccine-2021-07-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/new-zealand-gives-provisional-nod-astrazenecas-covid-19-vaccine-2021-07-28/


Processing URLs:  15%|█▍        | 146/1000 [04:12<14:55,  1.05s/it]

URL filtered: https://www.weforum.org/agenda/2016/06/will-these-countries-ever-repay-their-debts-fe9ea793-8cf0-4162-9704-08a363350142?utm_content=bufferbce30&amp;utm_medium=social&amp;utm_source=facebook.com&amp;utm_campaign=buffer


Processing URLs:  15%|█▌        | 150/1000 [04:36<1:17:58,  5.50s/it]

Error extracting text from https://www.wired.com/story/russia-fancy-bear-hackers-microsoft-office-flaw-and-nyc-terrorism-fears/: 504 Server Error: Gateway Time-out for url: https://www.wired.com/story/russia-fancy-bear-hackers-microsoft-office-flaw-and-nyc-terrorism-fears/
URL filtered: http://www.bloomberg.com/news/articles/2016-03-20/brazilians-brace-for-more-drama-at-top-court-congress


Processing URLs:  15%|█▌        | 152/1000 [04:36<45:14,  3.20s/it]  

Error extracting text from https://www.wsj.com/articles/venezuelan-bonds-up-after-new-u-s-sanctions-spare-debt-trading-1503680452: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelan-bonds-up-after-new-u-s-sanctions-spare-debt-trading-1503680452


Processing URLs:  15%|█▌        | 153/1000 [04:38<41:21,  2.93s/it]

Error extracting text from https://www.reuters.com/article/us-opec-meeting/opec-russia-agree-oil-cut-extension-to-end-of-2018-idUSKBN1DU0WWhave: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-meeting/opec-russia-agree-oil-cut-extension-to-end-of-2018-idUSKBN1DU0WWhave


Processing URLs:  16%|█▌        | 157/1000 [04:42<23:06,  1.65s/it]

Error extracting text from http://news.yahoo.com/russia-deployed-28-combat-aircraft-syria-us-officials-173419582.html: 404 Client Error: Not Found for url: http://news.yahoo.com/russia-deployed-28-combat-aircraft-syria-us-officials-173419582.html


Processing URLs:  16%|█▌        | 159/1000 [04:47<27:37,  1.97s/it]

Error extracting text from http://ec.europa.eu/taxation_customs/taxation/tax_fraud_evasion/a_huge_problem/index_en.htm: 404 Client Error: Not Found for url: https://taxation-customs.ec.europa.eu/taxation/tax_fraud_evasion/a_huge_problem/index_en.htm
Error extracting text from http://www.reuters.com/article/2015/11/23/us-usa-fed-yellen-idUSKBN0TC2EB20151123: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/23/us-usa-fed-yellen-idUSKBN0TC2EB20151123


Processing URLs:  16%|█▋        | 163/1000 [04:53<27:20,  1.96s/it]

Error extracting text from https://www.reuters.com/article/maersk-iran-oil-idUSL5N1KM3TM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/maersk-iran-oil-idUSL5N1KM3TM


Processing URLs:  17%|█▋        | 172/1000 [05:17<31:35,  2.29s/it]

Error extracting text from http://reuters.com/article/uk-opec-meeting-ceiling-idUKKBN0TN1KK20151204: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-opec-meeting-ceiling-idUKKBN0TN1KK20151204


Processing URLs:  17%|█▋        | 174/1000 [05:20<23:22,  1.70s/it]

Error extracting text from https://www.nilc.org/2016/07/18/nilc-applauds-justice-department-request-rehearing-u-s-v-texas/: 403 Client Error: Forbidden for url: https://www.nilc.org/2016/07/18/nilc-applauds-justice-department-request-rehearing-u-s-v-texas/
URL filtered: http://www.bloomberg.com/news/articles/2016-08-28/rajoy-seals-pact-with-ciudadanos-ahead-of-spain-confidence-vote


Processing URLs:  18%|█▊        | 176/1000 [05:21<15:37,  1.14s/it]

Error extracting text from http://www.radio.gov.pk/16-Mar-2017/pakistan-deplores-killing-of-innocent-kashmiris-by-indian-forces-in-occupied-kashmir: 404 Client Error: Not Found for url: https://www.radio.gov.pk/16-Mar-2017/pakistan-deplores-killing-of-innocent-kashmiris-by-indian-forces-in-occupied-kashmir


Processing URLs:  19%|█▊        | 186/1000 [05:31<11:02,  1.23it/s]

URL filtered: https://twitter.com/nadams/status/1420044630360535042
Error extracting text from http://www.financialexpress.com/world-news/us-house-conservatives-say-hurry-up-on-obamacare-repeal/535648/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/world-news/us-house-conservatives-say-hurry-up-on-obamacare-repeal/535648/
Error extracting text from http://www.nytimes.com/2015/11/17/business/dealbook/the-challenges-for-volkswagens-internal-investigation.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/17/business/dealbook/the-challenges-for-volkswagens-internal-investigation.html


Processing URLs:  19%|█▉        | 189/1000 [05:48<44:40,  3.31s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-11/junk-bond-fear-gauge-nears-3-year-high-after-third-avenue-freeze


Processing URLs:  19%|█▉        | 191/1000 [05:50<31:14,  2.32s/it]

Error extracting text from http://blogs.barrons.com/techtraderdaily/2016/01/20/apple-may-miss-fyq1-and-q2-estimates-on-older-iphone-popularity-says-ubs/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/techtraderdaily/2016/01/20/apple-may-miss-fyq1-and-q2-estimates-on-older-iphone-popularity-says-ubs/


Processing URLs:  20%|█▉        | 195/1000 [05:52<16:24,  1.22s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-usa-congress/u-s-congress-covid-19-deal-negotiations-could-drag-into-weekend-lawmaker-idUSKBN28R1EL?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-usa-congress/u-s-congress-covid-19-deal-negotiations-could-drag-into-weekend-lawmaker-idUSKBN28R1EL?il=0
Error extracting text from https://www.arabnews.com/node/1814131/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1814131/middle-east


Processing URLs:  20%|██        | 201/1000 [05:55<07:16,  1.83it/s]

Error extracting text from http://www.iol.co.za/news/politics/anc-meeting-extended-zumas-fate-uncertain-2093875: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/anc-meeting-extended-zumas-fate-uncertain-2093875
URL filtered: https://twitter.com/willsommer/status/1354176025274404864
Error extracting text from http://www.worldbulletin.net/diplomacy/184148/turkey-constitutional-change-bill-sent-to-presidency: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/diplomacy/184148/turkey-constitutional-change-bill-sent-to-presidency


Processing URLs:  20%|██        | 202/1000 [05:56<08:17,  1.60it/s]

Error extracting text from https://predict.hypermind.com/hypermind/app.html?fwd=#trading: HTTPSConnectionPool(host='predict.hypermind.com', port=443): Max retries exceeded with url: /hypermind/app.html?fwd= (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  20%|██        | 205/1000 [06:06<26:46,  2.02s/it]

Error extracting text from http://uk.reuters.com/article/uk-venezuela-politics-ortega-idUKKBN19V2KS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  21%|██        | 209/1000 [06:14<20:39,  1.57s/it]

Error extracting text from http://www2.politicalbetting.com/index.php/archives/2017/02/04/theresa-may-is-still-the-only-politician-with-a-net-favourable-rating-with-the-voters-yougov-finds/: 404 Client Error: Not Found for url: http://www2.politicalbetting.com/index.php/archives/2017/02/04/theresa-may-is-still-the-only-politician-with-a-net-favourable-rating-with-the-voters-yougov-finds/


Processing URLs:  21%|██        | 212/1000 [06:17<16:37,  1.27s/it]

Error extracting text from http://www.reuters.com/article/eurozone-greece-idUSL8N14X3X220160113: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/eurozone-greece-idUSL8N14X3X220160113


Processing URLs:  21%|██▏       | 213/1000 [06:17<13:06,  1.00it/s]

Error extracting text from http://www.realclearpolitics.com/articles/2015/10/17/why_biden_may_be_democrats_best_hope_in_16_128431.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2015/10/17/why_biden_may_be_democrats_best_hope_in_16_128431.html


Processing URLs:  21%|██▏       | 214/1000 [06:18<11:14,  1.17it/s]

Error extracting text from http://www.latimes.com/business/autos/la-fi-hy-musk-mercury-news-20160516-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/autos/la-fi-hy-musk-mercury-news-20160516-snap-story.html


Processing URLs:  22%|██▏       | 216/1000 [06:29<37:02,  2.83s/it]

Error extracting text from https://christopherashleyford.medium.com/the-lab-leak-inquiry-at-the-state-department-96973cff3a65: 403 Client Error: Forbidden for url: https://christopherashleyford.medium.com/the-lab-leak-inquiry-at-the-state-department-96973cff3a65


Processing URLs:  22%|██▏       | 217/1000 [06:30<27:17,  2.09s/it]

Error extracting text from http://www.wsj.com/articles/fed-faces-a-rates-puzzle-of-its-own-making-1445986207: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-faces-a-rates-puzzle-of-its-own-making-1445986207


Processing URLs:  22%|██▏       | 222/1000 [06:38<21:03,  1.62s/it]

Error extracting text from http://www.ibtimes.co.uk/malware-used-target-us-government-military-being-sold-dark-web-1580861: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/malware-used-target-us-government-military-being-sold-dark-web-1580861


Processing URLs:  22%|██▏       | 224/1000 [06:40<16:28,  1.27s/it]

Error extracting text from http://www.reuters.com/article/us-britain-nireland-talks/northern-irelands-dup-warns-of-speedy-return-to-rule-by-london-idUSKCN1BB2S4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-nireland-talks/northern-irelands-dup-warns-of-speedy-return-to-rule-by-london-idUSKCN1BB2S4


Processing URLs:  23%|██▎       | 227/1000 [06:44<13:50,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/07/13/technology/uber-chief-executive-officer-travis-kalanick.html?ref=business: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/13/technology/uber-chief-executive-officer-travis-kalanick.html?ref=business


Processing URLs:  23%|██▎       | 228/1000 [06:44<12:14,  1.05it/s]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook


Processing URLs:  23%|██▎       | 230/1000 [06:46<11:03,  1.16it/s]

Error extracting text from https://www.irinnews.org/analysis/2016/05/24/burundi’s-peace-talks-going-nowhere: 502 Server Error: Bad Gateway for url: https://www.irinnews.org/analysis/2016/05/24/burundi%E2%80%99s-peace-talks-going-nowhere


Processing URLs:  24%|██▍       | 238/1000 [07:06<17:19,  1.36s/it]

Error extracting text from https://www.nytimes.com/2017/12/11/world/europe/theresa-may-brexit-davis-gove.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/11/world/europe/theresa-may-brexit-davis-gove.html
Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-impeachment-idUSKCN0UW1ZU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-impeachment-idUSKCN0UW1ZU


Processing URLs:  24%|██▍       | 240/1000 [07:07<11:09,  1.13it/s]

Error extracting text from http://www.reuters.com/article/us-cyber-heist-swift-symantec-idUSKCN0YH29J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-heist-swift-symantec-idUSKCN0YH29J


Processing URLs:  24%|██▍       | 241/1000 [07:08<14:34,  1.15s/it]

Error extracting text from http://www.nytimes.com/2016/10/03/world/europe/brexit-talks-march-theresa-may-britain.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/03/world/europe/brexit-talks-march-theresa-may-britain.html


Processing URLs:  24%|██▍       | 244/1000 [07:09<07:49,  1.61it/s]

Error extracting text from http://news.archcoal.com/phoenix.zhtml?c=107109&amp;p=irol-newsArticle&amp;ID=2052855: 403 Client Error: Forbidden for url: http://news.archcoal.com/phoenix.zhtml?c=107109&amp;p=irol-newsArticle&amp;ID=2052855


Processing URLs:  25%|██▍       | 246/1000 [07:11<08:55,  1.41it/s]

Error extracting text from http://english.alarabiya.net/en/News/2016/08/10/45-000-ISIS-fighters-killed-in-past-two-years-US-general.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/2016/08/10/45-000-ISIS-fighters-killed-in-past-two-years-US-general.html
Error extracting text from https://www.reuters.com/article/us-health-coronavirus-newzealand/new-zealand-to-buy-enough-pfizer-covid-19-vaccines-for-entire-population-idUSKBN2B008J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-newzealand/new-zealand-to-buy-enough-pfizer-covid-19-vaccines-for-entire-population-idUSKBN2B008J


Processing URLs:  25%|██▍       | 248/1000 [07:12<06:45,  1.85it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/336220-under-trump-cia-creates-unit-to-focus-on-iran: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/336220-under-trump-cia-creates-unit-to-focus-on-iran/


Processing URLs:  25%|██▌       | 251/1000 [07:15<09:02,  1.38it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/285202-german-intelligence-blames-russia-china-for-cyber-attacks: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/285202-german-intelligence-blames-russia-china-for-cyber-attacks/


Processing URLs:  25%|██▌       | 253/1000 [07:17<11:01,  1.13it/s]

Error extracting text from http://aranews.net/2016/09/mosul-gunmen-assassinate-islamic-state-spokesman/: 404 Client Error: Not Found for url: http://aranews.net/2016/09/mosul-gunmen-assassinate-islamic-state-spokesman/


Processing URLs:  25%|██▌       | 254/1000 [07:35<1:13:52,  5.94s/it]

Error extracting text from http://www.washingtonpost.com/news/wonkblog/wp/2015/10/28/fed-less-worried-about-risks-from-chinas-slowdown/: 404 Client Error: Not Found for url: https://www.washingtonpost.com/news/wonkblog/wp/2015/10/28/fed-less-worried-about-risks-from-chinas-slowdown/


Processing URLs:  26%|██▌       | 255/1000 [07:36<53:10,  4.28s/it]  

Error extracting text from https://www.scotsman.com/news/opinion/columnists/snp-planning-shatter-scottish-press-thousand-little-local-pieces-john-mclellan-3133584: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/opinion/columnists/snp-planning-shatter-scottish-press-thousand-little-local-pieces-john-mclellan-3133584
URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/quiet-air-zone-shows-china-s-struggle-to-control-contested-seas


Processing URLs:  26%|██▌       | 257/1000 [07:37<31:51,  2.57s/it]

Error extracting text from https://theicct.org/: 403 Client Error: Forbidden for url: https://theicct.org/


Processing URLs:  26%|██▌       | 261/1000 [07:41<16:57,  1.38s/it]

Error extracting text from https://pitchbook.com/media/press-releases/total-venture-capital-dollars-invested-in-2017-on-track-to-reach-decade-high: 403 Client Error: Forbidden for url: https://pitchbook.com/media/press-releases/total-venture-capital-dollars-invested-in-2017-on-track-to-reach-decade-high


Processing URLs:  26%|██▋       | 265/1000 [07:45<11:28,  1.07it/s]

Error extracting text from https://www.wsj.com/articles/russias-navalny-fights-to-stay-in-public-eye-in-putin-standoff-11621771200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russias-navalny-fights-to-stay-in-public-eye-in-putin-standoff-11621771200
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15Y0AX?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKBN15Y0AX?mod=related&amp;channelName=worldNews


Processing URLs:  27%|██▋       | 267/1000 [07:49<14:07,  1.16s/it]

Error extracting text from http://38north.org/2015/09/sohae092415/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml
Error extracting text from http://www.nytimes.com/2015/12/17/business/economy/fed-interest-rates.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/17/business/economy/fed-interest-rates.html


Processing URLs:  27%|██▋       | 270/1000 [07:54<17:55,  1.47s/it]

Error extracting text from https://sfamjournals.onlinelibrary.wiley.com/doi/10.1111/1751-7915.13889: 403 Client Error: Forbidden for url: https://sfamjournals.onlinelibrary.wiley.com/doi/10.1111/1751-7915.13889


Processing URLs:  27%|██▋       | 272/1000 [07:56<15:39,  1.29s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-31/barnier-says-brexit-talks-are-far-from-sufficient-progress


Processing URLs:  28%|██▊       | 277/1000 [08:01<12:16,  1.02s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/iran-fm-calls-assads-removal-prolong-syrian-war-33579549: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/iran-fm-calls-assads-removal-prolong-syrian-war-33579549


Processing URLs:  28%|██▊       | 279/1000 [08:10<30:23,  2.53s/it]

Error extracting text from http://www.amazon.com/The-Religion-War-Scott-Adams/dp/0740747886: 500 Server Error: Internal Server Error for url: https://www.amazon.com/The-Religion-War-Scott-Adams/dp/0740747886


Processing URLs:  28%|██▊       | 284/1000 [08:22<27:33,  2.31s/it]

Error extracting text from https://www.debka.com/article/25102/US-Iranian-Russian-Iraqi-offensive-launched-to-recover-Ramadi-from-ISIS&gt: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25102/US-Iranian-Russian-Iraqi-offensive-launched-to-recover-Ramadi-from-ISIS&gt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/politics/articles/2016-01-30/des-moines-register-bloomberg-politics-iowa-poll-republicans


Processing URLs:  29%|██▊       | 287/1000 [08:44<1:06:08,  5.57s/it]

Error extracting text from https://www.washingtonpost.com/amphtml/world/europe/georgieva-un-would-be-inclusive-with-eeurope-woman-chief/2016/10/03/80e154ea-891f-11e6-8cdc-4fbb1973b506_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/amphtml/world/europe/georgieva-un-would-be-inclusive-with-eeurope-woman-chief/2016/10/03/80e154ea-891f-11e6-8cdc-4fbb1973b506_story.html


Processing URLs:  29%|██▉       | 290/1000 [08:50<36:51,  3.12s/it]  

Error extracting text from http://www.realclearpolitics.com/epolls/2012/president/ia/iowa_republican_presidential_primary-1588.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2012/president/ia/iowa_republican_presidential_primary-1588.html


Processing URLs:  29%|██▉       | 292/1000 [08:52<24:05,  2.04s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/taliban-appoints-new-military-chief-as-new-leader-settles-in/articleshow/53927329.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/taliban-appoints-new-military-chief-as-new-leader-settles-in/articleshow/53927329.cms


Processing URLs:  29%|██▉       | 293/1000 [08:54<24:19,  2.06s/it]

URL filtered: https://www.youtube.com/watch?v=r9WYlEhW5T0


Processing URLs:  30%|██▉       | 295/1000 [08:57<22:12,  1.89s/it]

URL filtered: https://www.twitter.com/feinsand


Processing URLs:  30%|██▉       | 299/1000 [09:06<22:30,  1.93s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-barnier-idUSKBN17Z0YB?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-barnier-idUSKBN17Z0YB?il=0


Processing URLs:  30%|███       | 302/1000 [09:13<24:19,  2.09s/it]

Error extracting text from http://www.humanosphere.org/global-health/2016/04/the-end-of-polio-is-nearish/: 404 Client Error: Not Found for url: http://www.humanosphere.org/global-health/2016/04/the-end-of-polio-is-nearish/


Processing URLs:  30%|███       | 304/1000 [09:16<21:44,  1.87s/it]

Error extracting text from http://www.reuters.com/article/us-usa-russia-cyber-ryan-idUSKBN14I1WJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-cyber-ryan-idUSKBN14I1WJ


Processing URLs:  31%|███       | 306/1000 [09:17<14:15,  1.23s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/chinas-great-south-china-sea-challenge-what-next-how-respond-14317?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/chinas-great-south-china-sea-challenge-what-next-how-respond-14317?page=2


Processing URLs:  31%|███       | 307/1000 [09:21<21:00,  1.82s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-22/why-trump-s-immigration-crackdown-could-sink-u-s-home-prices


Processing URLs:  31%|███       | 310/1000 [09:22<12:16,  1.07s/it]

Error extracting text from https://www.gorillaradio.tv/store/: HTTPSConnectionPool(host='www.gorillaradio.tv', port=443): Max retries exceeded with url: /store/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3029d6f00>: Failed to resolve 'www.gorillaradio.tv' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: http://projects.fivethirtyeight.com/2016-election-forecast/senate/north-carolina/?ex_cid=story-twitter


Processing URLs:  31%|███▏      | 313/1000 [09:23<08:08,  1.41it/s]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-iraq-mosul-idUKKBN1451DN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  32%|███▏      | 319/1000 [09:32<11:41,  1.03s/it]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-venezuela-idUSKBN2AY0IC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-venezuela-idUSKBN2AY0IC


Processing URLs:  32%|███▏      | 321/1000 [09:34<11:07,  1.02it/s]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-03/volkswagen-emissions-woes-deepen-as-800-000-more-cars-affected


Processing URLs:  32%|███▏      | 323/1000 [09:34<07:34,  1.49it/s]

Error extracting text from http://www.goldblattlawfirm.com/: HTTPConnectionPool(host='www.goldblattlawfirm.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff1ce7e0>: Failed to resolve 'www.goldblattlawfirm.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  33%|███▎      | 327/1000 [10:16<49:06,  4.38s/it]  

Error extracting text from http://www.nytimes.com/2016/04/25/us/politics/us-directs-cyberweapons-at-isis-for-first-time.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/25/us/politics/us-directs-cyberweapons-at-isis-for-first-time.html?_r=0


Processing URLs:  33%|███▎      | 329/1000 [10:20<33:03,  2.96s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/23/asia-pacific/s-korea-u-s-delay-talks-on-thaad-missile-shield-amid-talks-with-china/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/02/23/asia-pacific/s-korea-u-s-delay-talks-on-thaad-missile-shield-amid-talks-with-china/


Processing URLs:  33%|███▎      | 330/1000 [10:22<30:37,  2.74s/it]

Error extracting text from http://theworldweekly.com/newswire/reader/russia-seizes-the-initiative-on-syria-expecting-peace-talks-in-october-as-world-leaders-gather-at-the-un-general-assembly/4993: HTTPConnectionPool(host='theworldweekly.com', port=80): Max retries exceeded with url: /newswire/reader/russia-seizes-the-initiative-on-syria-expecting-peace-talks-in-october-as-world-leaders-gather-at-the-un-general-assembly/4993 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f2c30>: Failed to resolve 'theworldweekly.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  33%|███▎      | 331/1000 [10:23<24:27,  2.19s/it]

Error extracting text from http://www.un.org/en/sc/presidency/: 403 Client Error: Forbidden for url: https://www.un.org/en/sc/presidency/


Processing URLs:  33%|███▎      | 332/1000 [10:24<20:46,  1.87s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/52598960.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/52598960.cms


Processing URLs:  33%|███▎      | 333/1000 [10:26<19:54,  1.79s/it]

URL filtered: https://twitter.com/Dest_Pyongyang/status/696148328027856896/photo/1


Processing URLs:  34%|███▍      | 340/1000 [10:35<16:46,  1.52s/it]

Error extracting text from http://www.caissa.com/support/chess-ratings.php: 404 Client Error: Not Found for url: https://caissa.com/support/chess-ratings.php


Processing URLs:  34%|███▍      | 342/1000 [10:38<17:00,  1.55s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-11/with-lula-under-threat-of-arrest-pressure-builds-on-rousseff


Processing URLs:  34%|███▍      | 345/1000 [10:40<10:18,  1.06it/s]

Error extracting text from https://www.reuters.com/world/europe/russian-military-convoy-north-kyiv-stretches-40-miles-maxar-2022-03-01/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russian-military-convoy-north-kyiv-stretches-40-miles-maxar-2022-03-01/


Processing URLs:  35%|███▍      | 347/1000 [10:41<08:18,  1.31it/s]

Error extracting text from http://onlinelibrary.wiley.com/doi/10.1002/2017GL074934/abstract: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/10.1002/2017GL074934/abstract
Error extracting text from https://www.latimes.com/california/story/2020-11-10/coronavirus-san-francisco-shutdown-indoor-dining: 403 Client Error: Forbidden for url: https://www.latimes.com/california/story/2020-11-10/coronavirus-san-francisco-shutdown-indoor-dining


Processing URLs:  35%|███▍      | 349/1000 [10:42<05:49,  1.86it/s]

Error extracting text from https://www.wsj.com/amp/articles/china-buys-more-iranian-and-venezuelan-oil-in-a-test-for-biden-11616146203: 403 Client Error: Forbidden for url: https://www.wsj.com/amp/articles/china-buys-more-iranian-and-venezuelan-oil-in-a-test-for-biden-11616146203


Processing URLs:  35%|███▌      | 350/1000 [10:42<05:07,  2.12it/s]

Error extracting text from http://diginomica.com/2016/02/29/layering-on-the-piggy-lipstick-europe-and-the-us-apply-make-up-to-the-privacy-shield/: 403 Client Error: Forbidden for url: https://diginomica.com/2016/02/29/layering-on-the-piggy-lipstick-europe-and-the-us-apply-make-up-to-the-privacy-shield/


Processing URLs:  35%|███▌      | 352/1000 [10:44<08:38,  1.25it/s]

Error extracting text from https://www.omm.com/files/upload/AlmelingGonzagaLawReviewArticle.pdf: 404 Client Error: Not Found for url: https://www.omm.com/files/upload/AlmelingGonzagaLawReviewArticle.pdf


Processing URLs:  35%|███▌      | 353/1000 [10:46<11:31,  1.07s/it]

Error extracting text from http://www.chicagotribune.com/sns-wp-japan-cyberattack-49befc78-8fce-11e6-a6a3-d50061aa9fae-20161011-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/sns-wp-japan-cyberattack-49befc78-8fce-11e6-a6a3-d50061aa9fae-20161011-story.html
URL filtered: https://www.bloomberg.com/politics/articles/2017-03-01/republicans-hide-latest-obamacare-draft-under-shroud-of-secrecy


Processing URLs:  36%|███▌      | 356/1000 [10:46<05:54,  1.82it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://bncamazonas.com.br/2016/03/12/omar-e-braga-citados-pela-andrade-gutierrez-na-delacao-a-lava-jato/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://bncamazonas.com.br/2016/03/12/omar-e-braga-citados-pela-andrade-gutierrez-na-delacao-a-lava-jato/&amp;prev=search
Error extracting text from http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347868A1.pdf: 403 Client Error: Forbidden for url: http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347868A1.pdf


Processing URLs:  36%|███▌      | 359/1000 [10:50<09:45,  1.09it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/352732-sessions-media-portrays-russia-investigation-in-ways-not-justifiable: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/352732-sessions-media-portrays-russia-investigation-in-ways-not-justifiable/


Processing URLs:  36%|███▌      | 361/1000 [18:52<24:39:34, 138.93s/it]

Error extracting text from https://www.thespainreport.com/articles/852-160818122602-pablo-iglesias-reappears-to-suggest-left-wing-podemos-psoe-coalition-possible-if-rajoy-fails: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/852-160818122602-pablo-iglesias-reappears-to-suggest-left-wing-podemos-psoe-coalition-possible-if-rajoy-fails (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301660ec0>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  36%|███▌      | 362/1000 [18:56<17:39:23, 99.63s/it] 

Error extracting text from http://micanaldepanama.com/expansion/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/


Processing URLs:  37%|███▋      | 367/1000 [19:13<3:41:33, 21.00s/it] 

Error extracting text from http://southasianvoices.org/irans-tryst-with-the-modified-code-3-1/: 403 Client Error: Forbidden for url: http://southasianvoices.org/irans-tryst-with-the-modified-code-3-1/


Processing URLs:  37%|███▋      | 370/1000 [19:24<1:47:05, 10.20s/it]

Error extracting text from http://www.iranwatch.org/our-publications/nuclear-iran-weekly/ceo-us-metallurgical-company-charged-illicit-export-metallic-powder-iran: 403 Client Error: Forbidden for url: https://www.iranwatch.org/our-publications/nuclear-iran-weekly/ceo-us-metallurgical-company-charged-illicit-export-metallic-powder-iran


Processing URLs:  38%|███▊      | 377/1000 [19:34<18:51,  1.82s/it]  

Error extracting text from https://www.uscis.gov/ilink/docView/SLB/HTML/SLB/0-0-0-1/0-0-0-29/0-0-0-4391.html: 403 Client Error: Forbidden for url: https://www.uscis.gov/ilink/docView/SLB/HTML/SLB/0-0-0-1/0-0-0-29/0-0-0-4391.html


Processing URLs:  38%|███▊      | 378/1000 [19:35<14:30,  1.40s/it]

Error extracting text from https://www.thelocal.it/20170925/new-five-star-movement-leader-says-the-party-wants-italy-to-stay-in-eu: 403 Client Error: Forbidden for url: https://www.thelocal.it/20170925/new-five-star-movement-leader-says-the-party-wants-italy-to-stay-in-eu


Processing URLs:  38%|███▊      | 382/1000 [19:40<11:47,  1.15s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN15M05E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN15M05E
Error extracting text from https://www.reuters.com/article/us-cyber-northkorea-exclusive/exclusive-north-koreas-unit-180-the-cyber-warfare-cell-that-worries-the-west-idUSKCN18H020: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-northkorea-exclusive/exclusive-north-koreas-unit-180-the-cyber-warfare-cell-that-worries-the-west-idUSKCN18H020


Processing URLs:  39%|███▉      | 390/1000 [19:51<13:45,  1.35s/it]

Error extracting text from http://seekingalpha.com/author/logical-thought/comments: 403 Client Error: Forbidden for url: https://seekingalpha.com/author/logical-thought/comments


Processing URLs:  39%|███▉      | 392/1000 [19:53<11:27,  1.13s/it]

Error extracting text from https://www.reuters.com/article/us-ireland-politics/ireland-on-the-verge-of-snap-election-as-crisis-deepens-idUSKBN1DR15P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ireland-politics/ireland-on-the-verge-of-snap-election-as-crisis-deepens-idUSKBN1DR15P
URL filtered: https://www.bloomberg.com/news/articles/2017-11-13/agreeing-on-china-s-favorite-trade-deal-set-to-drag-into-2018


Processing URLs:  40%|███▉      | 395/1000 [19:56<09:43,  1.04it/s]

Error extracting text from https://blogs.intralinks.com/2017/03/asia-pacific-early-stage-ma-booming/#: HTTPSConnectionPool(host='blogs.intralinks.com', port=443): Max retries exceeded with url: /2017/03/asia-pacific-early-stage-ma-booming/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'blogs.intralinks.com'. (_ssl.c:1000)")))


Processing URLs:  40%|███▉      | 399/1000 [20:03<16:15,  1.62s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-01/30/c_135058619.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-01/30/c_135058619.htm


Processing URLs:  40%|████      | 401/1000 [20:05<12:54,  1.29s/it]

Error extracting text from https://blogs.nasa.gov/commercialcrew/2016/12/12/nasas-commercial-crew-program-target-flight-dates/: 404 Client Error: Not Found for url: https://blogs.nasa.gov/commercialcrew/2016/12/12/nasas-commercial-crew-program-target-flight-dates/


Processing URLs:  40%|████      | 405/1000 [20:06<05:37,  1.76it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-pmdb-idUSKCN0WE0UC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-pmdb-idUSKCN0WE0UC
Error extracting text from https://trends.google.com/trends/explore?geo=GB&amp;q=vote%20of%20no%20confidence: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?geo=GB&amp;q=vote%20of%20no%20confidence


Processing URLs:  41%|████      | 406/1000 [20:09<11:42,  1.18s/it]

URL filtered: https://www.cbsnews.com/news/facebook-gives-mueller-detailed-records-about-russian-ad-buys-report/


Processing URLs:  41%|████      | 410/1000 [20:12<10:27,  1.06s/it]

URL filtered: https://twitter.com/RussianEmbassy/status/814564127230271489


Processing URLs:  41%|████▏     | 414/1000 [20:17<12:50,  1.32s/it]

Error extracting text from http://dels.nas.edu/Upcoming-Event/Potential-Risks-Benefits-Gain/AUTO-9-61-70-Q: 404 Client Error: Not Found for url: https://www.nationalacademies.org/Upcoming-Event/Potential-Risks-Benefits-Gain/AUTO-9-61-70-Q
URL filtered: https://www.instagram.com/p/BfdPC31nogE


Processing URLs:  42%|████▏     | 419/1000 [20:27<16:00,  1.65s/it]

Error extracting text from http://wyss.harvard.edu/staticfiles/newsroom/pressreleases/Gene%20drives%20FAQ%20FINAL.pdf: 404 Client Error: Not Found for url: https://wyss.harvard.edu/staticfiles/newsroom/pressreleases/Gene%20drives%20FAQ%20FINAL.pdf


Processing URLs:  42%|████▏     | 421/1000 [20:29<13:56,  1.44s/it]

Error extracting text from http://www.cnbc.com/2015/12/03/brazils-congress-opens-impeachment-proceedings-against-president.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/03/brazils-congress-opens-impeachment-proceedings-against-president.html


Processing URLs:  42%|████▏     | 422/1000 [20:29<10:52,  1.13s/it]

Error extracting text from http://www.nytimes.com/2015/11/04/world/asia/china-wants-no-mention-of-south-sea-in-statement.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/04/world/asia/china-wants-no-mention-of-south-sea-in-statement.html


Processing URLs:  42%|████▎     | 425/1000 [20:34<15:50,  1.65s/it]

Error extracting text from http://www.europeaninstitute.org/index.php/ei-blog/251-european-affairs/ea-april-2015/2023-nato-enlargement-the-case-of-montenegro: HTTPSConnectionPool(host='www.europeaninstitute.org', port=443): Max retries exceeded with url: /index.php/ei-blog/251-european-affairs/ea-april-2015/2023-nato-enlargement-the-case-of-montenegro (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  43%|████▎     | 427/1000 [20:36<11:32,  1.21s/it]

Error extracting text from http://news.yahoo.com/jungle-camps-colombia-rebels-peace-lessons-043124176.html: 404 Client Error: Not Found for url: http://news.yahoo.com/jungle-camps-colombia-rebels-peace-lessons-043124176.html


Processing URLs:  43%|████▎     | 429/1000 [20:38<09:57,  1.05s/it]

Error extracting text from http://sos.nh.gov/2016ElecInfo.aspx: 404 Client Error: Not Found for url: https://sos.nh.gov/2016ElecInfo.aspx
Error extracting text from http://www.nytimes.com/2016/03/07/us/politics/donald-trump.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/07/us/politics/donald-trump.html


Processing URLs:  43%|████▎     | 431/1000 [20:40<10:23,  1.10s/it]

Error extracting text from http://www.ibtimes.co.uk/game-thrones-season-6-premiere-delayed-hbo-may-have-postponed-release-date-1526589: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/game-thrones-season-6-premiere-delayed-hbo-may-have-postponed-release-date-1526589


Processing URLs:  43%|████▎     | 432/1000 [20:40<07:56,  1.19it/s]

Error extracting text from http://www.who.int/influenza/preparedness/pandemic/h5n1phase/en/: 404 Client Error: Not Found for url: https://www.who.int/influenza/preparedness/pandemic/h5n1phase/en/


Processing URLs:  44%|████▎     | 436/1000 [20:44<07:11,  1.31it/s]

Error extracting text from http://www.wsj.com/articles/brazil-vice-president-sends-letter-criticizing-president-dilma-rousseff-1449576925: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-vice-president-sends-letter-criticizing-president-dilma-rousseff-1449576925


Processing URLs:  44%|████▍     | 438/1000 [20:45<06:16,  1.49it/s]

Error extracting text from https://www.scientificamerican.com/article/wolf-populations-drop-as-more-states-allow-hunting/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/wolf-populations-drop-as-more-states-allow-hunting/


Processing URLs:  44%|████▍     | 442/1000 [20:48<04:28,  2.08it/s]

Error extracting text from http://www.reuters.com/article/us-taiwan-southchinasea-idUSKCN0V502V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-taiwan-southchinasea-idUSKCN0V502V
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://blogdojoelrei.blogspot.com/2016/03/discursos-pro-inpeachment-predominam-em.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://blogdojoelrei.blogspot.com/2016/03/discursos-pro-inpeachment-predominam-em.html&amp;prev=search


Processing URLs:  44%|████▍     | 443/1000 [20:48<03:30,  2.65it/s]

Error extracting text from http://www.straitstimes.com/asia/east-asia/hong-kongs-leung-chun-ying-gets-re-election-boost: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  45%|████▍     | 446/1000 [20:51<05:51,  1.58it/s]

Error extracting text from http://larepublica.pe/politica/766866-datum-keiko-fujimori-y-ppk-empatados-con-50: 403 Client Error: Forbidden for url: https://larepublica.pe/politica/766866-datum-keiko-fujimori-y-ppk-empatados-con-50
Error extracting text from http://www.reuters.com/article/us-spacex-blast-idUSKCN11J2OJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spacex-blast-idUSKCN11J2OJ


Processing URLs:  45%|████▍     | 447/1000 [20:52<05:46,  1.59it/s]

Error extracting text from http://nyti.ms/2ifwiqI: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/14/us/politics/tax-plan-senate-obamacare-individual-mandate-trump.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column-region&region=top-news&WT.nav=top-news


Processing URLs:  45%|████▌     | 450/1000 [20:54<07:21,  1.24it/s]

Error extracting text from https://www.transparency.org/country/#ALB: 404 Client Error: Not Found for url: https://www.transparency.org/en/country/#ALB


Processing URLs:  45%|████▌     | 451/1000 [20:57<10:45,  1.18s/it]

Error extracting text from https://www.reuters.com/article/us-southsudan-security-un-idUSKBN18K2RH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southsudan-security-un-idUSKBN18K2RH


Processing URLs:  45%|████▌     | 453/1000 [20:59<10:43,  1.18s/it]

Error extracting text from http://www.international.gc.ca/sanctions/countries-pays/index.aspx?lang=eng: 404 Client Error: Not Found for url: https://www.international.gc.ca/sanctions/countries-pays/index.aspx?lang=eng


Processing URLs:  46%|████▌     | 456/1000 [21:02<09:23,  1.04s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-16/u-k-prime-minister-may-rules-out-new-scottish-referendum
Error extracting text from https://www.reuters.com/world/market-chinas-wuhan-likely-origin-covid-19-outbreak-study-2021-11-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/market-chinas-wuhan-likely-origin-covid-19-outbreak-study-2021-11-19/


Processing URLs:  46%|████▌     | 461/1000 [21:27<37:49,  4.21s/it]

Error extracting text from https://www.nytimes.com/2017/02/02/us/politics/trump-tax-imports.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/02/us/politics/trump-tax-imports.html?_r=0


Processing URLs:  47%|████▋     | 468/1000 [21:44<22:27,  2.53s/it]

Error extracting text from http://www.nytimes.com/2015/12/07/business/sarcasm-and-doubt-precede-vws-update-on-cheating-inquiry.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/07/business/sarcasm-and-doubt-precede-vws-update-on-cheating-inquiry.html


Processing URLs:  47%|████▋     | 469/1000 [21:46<20:44,  2.34s/it]

Error extracting text from http://www.ibtimes.com/tesla-motors-tsla-1q-2016-sales-14820-model-s-model-x-cars-were-delivered-first-three-2348000: 403 Client Error: Forbidden for url: https://www.ibtimes.com/tesla-motors-tsla-1q-2016-sales-14820-model-s-model-x-cars-were-delivered-first-three-2348000


Processing URLs:  48%|████▊     | 479/1000 [22:09<11:11,  1.29s/it]

Error extracting text from http://www.sowetanlive.co.za/news/2017/06/19/will-anyone-be-prosecuted-for-state-capture-will-you-resign---mps-questions-to-president-zuma: 404 Client Error: Not Found for url: https://www.sowetanlive.co.za/news/2017/06/19/will-anyone-be-prosecuted-for-state-capture-will-you-resign---mps-questions-to-president-zuma
Error extracting text from http://www.nytimes.com/2016/04/23/world/asia/china-nuclear-power-south-china-sea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/23/world/asia/china-nuclear-power-south-china-sea.html?_r=0


Processing URLs:  48%|████▊     | 481/1000 [22:12<11:00,  1.27s/it]

Error extracting text from https://www.amazon.com/Critical-Thinking-Strategic-Intelligence-Second/dp/1506316883/ref=pd_bxgy_14_img_3?_encoding=UTF8&amp;pd_rd_i=1506316883&amp;pd_rd_r=2HN236HQAW6YECSP3XC8&amp;pd_rd_w=xBboz&amp;pd_rd_wg=meJXH&amp;psc=1&amp;refRID=2HN236HQAW6YECSP3XC8: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Critical-Thinking-Strategic-Intelligence-Second/dp/1506316883/ref=pd_bxgy_14_img_3?_encoding=UTF8&amp;pd_rd_i=1506316883&amp;pd_rd_r=2HN236HQAW6YECSP3XC8&amp;pd_rd_w=xBboz&amp;pd_rd_wg=meJXH&amp;psc=1&amp;refRID=2HN236HQAW6YECSP3XC8


Processing URLs:  48%|████▊     | 482/1000 [22:13<10:36,  1.23s/it]

Error extracting text from http://www.gmfus.org/blog/2017/06/02/abandoning-liberal-international-order-spheres-influence: 404 Client Error: Not Found for url: https://www.gmfus.org/blog/2017/06/02/abandoning-liberal-international-order-spheres-influence


Processing URLs:  48%|████▊     | 484/1000 [22:15<08:37,  1.00s/it]

Error extracting text from https://www.scotsman.com/news/politics/scottish-elections-2021-snp-ahead-for-may-vote-but-new-poll-is-second-to-show-drop-in-support-for-independence-3146695: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-elections-2021-snp-ahead-for-may-vote-but-new-poll-is-second-to-show-drop-in-support-for-independence-3146695


Processing URLs:  49%|████▊     | 486/1000 [22:17<10:29,  1.22s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crsisis-syria-opposition-idUKKCN0UQ1WX20160112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from https://www.reuters.com/article/us-pakistan-usa/u-s-defense-chief-urges-pakistan-to-redouble-efforts-against-militants-idUSKBN1DY0O7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-pakistan-usa/u-s-defense-chief-urges-pakistan-to-redouble-efforts-against-militants-idUSKBN1DY0O7


Processing URLs:  49%|████▉     | 492/1000 [22:30<21:25,  2.53s/it]

Error extracting text from http://theiranproject.com/blog/2015/02/28/iran-oil-sector-needs-30b-a-year/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=iran-oil-sector-needs-30b-a-year
URL filtered: https://www.google.com/amp/s/www.bloomberg.com/amp/politics/articles/2017-04-26/state-department-memo-boosts-case-to-stay-in-paris-climate-pact


Processing URLs:  50%|████▉     | 495/1000 [22:31<11:26,  1.36s/it]

Error extracting text from http://www.conflict-news.com/articles/the-us-slides-deeper-into-the-syrian-war: HTTPSConnectionPool(host='www.conflict-news.com', port=443): Max retries exceeded with url: /articles/the-us-slides-deeper-into-the-syrian-war (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.conflict-news.com'. (_ssl.c:1000)")))


Processing URLs:  50%|████▉     | 498/1000 [22:45<32:49,  3.92s/it]

Error extracting text from https://www.reuters.com/article/us-italy-government/italy-set-for-early-election-berlusconi-ahead-idUSL0554842920080205: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-government/italy-set-for-early-election-berlusconi-ahead-idUSL0554842920080205


Processing URLs:  50%|█████     | 500/1000 [22:46<21:12,  2.55s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-20/iran-s-rouhani-doesn-t-rule-out-prisoner-swap-for-u-s-reporter


Processing URLs:  50%|█████     | 502/1000 [23:47<1:49:29, 13.19s/it]

Error extracting text from http://www.usnews.com/news/business/articles/2015/11/30/congress-returns-to-looming-deadlines-on-budget-highways: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  50%|█████     | 505/1000 [23:48<51:22,  6.23s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-gharawi-special-report-idUSKCN0I30Z820141014: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-gharawi-special-report-idUSKCN0I30Z820141014


Processing URLs:  51%|█████     | 506/1000 [23:51<44:08,  5.36s/it]

Error extracting text from http://www.lmtonline.com/insider/article/Unemployment-rises-in-Laredo-16373858.php&amp;ved=2ahUKEwiDtL-8oqnyAhUuSPEDHfHfBqQQxfQBMAN6BAgJEAM&amp;usg=AOvVaw3-O-T7SOL7nGDFeE-mZu2n: 404 Client Error: Not Found for url: http://www.lmtonline.com/insider/article/Unemployment-rises-in-Laredo-16373858.php&amp;ved=2ahUKEwiDtL-8oqnyAhUuSPEDHfHfBqQQxfQBMAN6BAgJEAM&amp;usg=AOvVaw3-O-T7SOL7nGDFeE-mZu2n/


Processing URLs:  51%|█████     | 509/1000 [23:54<20:10,  2.46s/it]

Error extracting text from https://www.nytimes.com/2017/09/02/world/americas/venezuela-nicholas-maduro-inflation-economic-collapse.html?mcubz=3: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/02/world/americas/venezuela-nicholas-maduro-inflation-economic-collapse.html?mcubz=3


Processing URLs:  51%|█████     | 510/1000 [24:12<57:39,  7.06s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN16V04A: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN16V04A


Processing URLs:  51%|█████▏    | 514/1000 [24:15<22:57,  2.84s/it]

Error extracting text from http://www.thepeninsulaqatar.com/news/middle-east/352563/assad-says-only-syrian-people-can-decide-if-he-quits: 404 Client Error: Not Found for url: https://thepeninsulaqatar.com/news/middle-east/352563/assad-says-only-syrian-people-can-decide-if-he-quits


Processing URLs:  52%|█████▏    | 516/1000 [24:18<16:11,  2.01s/it]

Error extracting text from http://www.nytimes.com/2016/05/15/world/americas/brazils-most-entertaining-show-may-be-congress.html?emc=edit_th_20160515&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/15/world/americas/brazils-most-entertaining-show-may-be-congress.html?emc=edit_th_20160515&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  52%|█████▏    | 519/1000 [24:19<08:01,  1.00s/it]

Error extracting text from http://thehill.com/business-a-lobbying/259209-after-bipartisan-budget-deal-congress-plunges-right-back-into-shutdown: 403 Client Error: Forbidden for url: https://thehill.com/business-a-lobbying/259209-after-bipartisan-budget-deal-congress-plunges-right-back-into-shutdown/
Error extracting text from http://www.reuters.com/article/2015/10/02/us-mideast-crisis-russia-syria-idUSKCN0RV41O20151002: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/02/us-mideast-crisis-russia-syria-idUSKCN0RV41O20151002


Processing URLs:  52%|█████▏    | 522/1000 [24:21<05:45,  1.38it/s]

Error extracting text from http://english.alarabiya.net/en/business/energy/2016/06/03/Saudi-Aramco-keen-to-invest-in-global-upstream-after-IPO.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/energy/2016/06/03/Saudi-Aramco-keen-to-invest-in-global-upstream-after-IPO.html


Processing URLs:  52%|█████▎    | 525/1000 [24:30<18:01,  2.28s/it]

Error extracting text from https://www.aa.com.tr/en/pg/photo-gallery/israeli-police-intervene-worshippers-at-al-aqsa-mosque/444: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  53%|█████▎    | 532/1000 [24:39<14:46,  1.90s/it]

Error extracting text from http://aaenergyterminal.com/news.php?newsid=7490946: 404 Client Error: Not Found for url: http://aaenergyterminal.com/news.php?newsid=7490946


Processing URLs:  53%|█████▎    | 533/1000 [24:40<11:55,  1.53s/it]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2400720&amp;CategoryId=10717: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2400720&amp;CategoryId=10717


Processing URLs:  53%|█████▎    | 534/1000 [24:41<10:42,  1.38s/it]

Error extracting text from https://www.bbc.com/news/science-environment-59388109): 404 Client Error: Not Found for url: https://www.bbc.com/news/science-environment-59388109)


Processing URLs:  54%|█████▎    | 536/1000 [24:44<11:28,  1.48s/it]

URL filtered: https://www.joe.co.uk/news/resignborisjohnson-trends-on-twitter-after-he-was-found-to-have-misled-parliament-265654


Processing URLs:  54%|█████▍    | 538/1000 [24:44<06:40,  1.15it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://g1.globo.com/politica/operacao-lava-jato/noticia/2016/03/oposicao-quer-incluir-delacao-de-delcidio-em-pedido-de-impeachment.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://g1.globo.com/politica/operacao-lava-jato/noticia/2016/03/oposicao-quer-incluir-delacao-de-delcidio-em-pedido-de-impeachment.html&amp;prev=search


Processing URLs:  54%|█████▍    | 542/1000 [24:49<08:07,  1.06s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-refugees-coalition-idUSKBN16Q0JW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-refugees-coalition-idUSKBN16Q0JW


Processing URLs:  54%|█████▍    | 543/1000 [24:50<06:40,  1.14it/s]

Error extracting text from https://www.jstor.org/stable/2151882?seq=1#page_scan_tab_contents: 420 Client Error: Enhance Your Calm for url: https://www.jstor.org/stable/2151882?seq=1#page_scan_tab_contents


Processing URLs:  54%|█████▍    | 544/1000 [24:53<12:19,  1.62s/it]

Error extracting text from http://www.ipsos.pe/sites/default/files/opinion_data/Opinion%20Data%20Mayo%20IV%202016.pdf: 404 Client Error: Not Found for url: https://www.ipsos.com/es-pe/sites/default/files/opinion_data/Opinion%20Data%20Mayo%20IV%202016.pdf


Processing URLs:  55%|█████▍    | 548/1000 [25:00<09:37,  1.28s/it]

Error extracting text from http://www.nytimes.com/2016/09/20/technology/self-driving-cars-guidelines.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/20/technology/self-driving-cars-guidelines.html?_r=0


Processing URLs:  55%|█████▌    | 554/1000 [25:10<10:15,  1.38s/it]

Error extracting text from http://www.nytimes.com/2016/07/11/world/asia/south-china-sea-philippines-hague.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/11/world/asia/south-china-sea-philippines-hague.html?_r=0


Processing URLs:  56%|█████▌    | 559/1000 [25:17<10:03,  1.37s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN11P29E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN11P29E


Processing URLs:  56%|█████▌    | 560/1000 [25:37<50:45,  6.92s/it]

URL filtered: https://variety.com/2022/digital/news/facebook-meta-q4-2021-earnings-1235170176/).
Error extracting text from http://www.handelsblatt.com/my/politik/deutschland/hohe-budgets-fuer-die-spd-so-teuer-musste-sich-merkel-ihre-kanzlerschaft-erkaufen/20941140.html: 403 Client Error: Forbidden for url: http://www.handelsblatt.com/my/politik/deutschland/hohe-budgets-fuer-die-spd-so-teuer-musste-sich-merkel-ihre-kanzlerschaft-erkaufen/20941140.html


Processing URLs:  56%|█████▋    | 563/1000 [25:37<22:32,  3.10s/it]

Error extracting text from https://www.nytimes.com/2019/03/26/world/africa/algeria-army-president-bouteflika.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2019/03/26/world/africa/algeria-army-president-bouteflika.html


Processing URLs:  56%|█████▋    | 564/1000 [25:37<18:17,  2.52s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-whitehouse-idUSKBN17L29X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-whitehouse-idUSKBN17L29X
URL filtered: http://www.bloomberg.com/news/articles/2016-09-21/pdvsa-bonds-gain-to-two-year-high-as-traders-see-desire-to-pay


Processing URLs:  57%|█████▋    | 567/1000 [25:38<09:59,  1.38s/it]

Error extracting text from http://newsinfo.inquirer.net/746934/kerry-to-head-to-moscow-for-talks-on-syrian-peace-plan: 403 Client Error: Forbidden for url: https://newsinfo.inquirer.net/746934/kerry-to-head-to-moscow-for-talks-on-syrian-peace-plan


Processing URLs:  57%|█████▋    | 569/1000 [25:40<07:57,  1.11s/it]

Error extracting text from https://blogs.scientificamerican.com/observations/five-sigmawhats-that/: 403 Client Error: Forbidden for url: https://blogs.scientificamerican.com/observations/five-sigmawhats-that/


Processing URLs:  57%|█████▋    | 571/1000 [25:44<09:35,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-thailand-king-constitution-idUSKBN15115D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-king-constitution-idUSKBN15115D


Processing URLs:  57%|█████▊    | 575/1000 [25:47<06:03,  1.17it/s]

Error extracting text from https://www.macrotrends.net/2534/wheat-prices-historical-chart-data: 403 Client Error: Forbidden for url: https://www.macrotrends.net/2534/wheat-prices-historical-chart-data
Error extracting text from http://www.reuters.com/article/venezuela-debtrenegotiation-china-idUSL2N15H0YZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-debtrenegotiation-china-idUSL2N15H0YZ


Processing URLs:  58%|█████▊    | 579/1000 [25:55<10:33,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/nuclear-deal-fuels-irans-hard-liners-1452294637: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nuclear-deal-fuels-irans-hard-liners-1452294637


Processing URLs:  58%|█████▊    | 581/1000 [25:56<07:43,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-china-idUSKBN1AJ04X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-china-idUSKBN1AJ04X


Processing URLs:  58%|█████▊    | 584/1000 [26:01<11:04,  1.60s/it]

Error extracting text from http://www.iol.co.za/news/politics/futuregrowth-decision-a-vote-of-no-confidence-in-zuma-2063524: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/futuregrowth-decision-a-vote-of-no-confidence-in-zuma-2063524


Processing URLs:  59%|█████▉    | 588/1000 [26:03<05:48,  1.18it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/news_125149.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_125149.htm
URL filtered: https://www.bloomberg.com/news/articles/2021-09-24/brazil-politics-bolsonaro-says-he-won-t-doubt-voting-system


Processing URLs:  59%|█████▉    | 592/1000 [26:11<10:32,  1.55s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-16/u-k-police-lawmaker-jo-cox-has-died-after-attack


Processing URLs:  60%|█████▉    | 596/1000 [26:16<08:55,  1.32s/it]

Error extracting text from https://www.fao.org/worldfoodsituation/foodpricesindex/en/).: 404 Client Error: Not Found for url: https://www.fao.org/worldfoodsituation/foodpricesindex/en/).
URL filtered: http://www.bloomberg.com/news/articles/2016-10-18/venezuela-extends-bond-swap-deadline-for-third-time-to-oct-21


Processing URLs:  60%|██████    | 600/1000 [26:30<14:30,  2.18s/it]

Error extracting text from http://www.scottaaronson.com/blog/?p=2725: 406 Client Error: Not Acceptable for url: http://www.scottaaronson.com/blog/?p=2725


Processing URLs:  60%|██████    | 605/1000 [26:34<08:21,  1.27s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/politics/341555-venezuela-russia-deal-threatens-us-energy-security: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/politics/341555-venezuela-russia-deal-threatens-us-energy-security/


Processing URLs:  61%|██████    | 606/1000 [26:37<11:35,  1.77s/it]

Error extracting text from http://asmdc.org/members/a56/news-room/press-releases/assemblyman-garcia-s-local-bills-head-to-senate-floor: 404 Client Error: Not Found for url: https://asmdc.org:443/members/a56/news-room/press-releases/assemblyman-garcia-s-local-bills-head-to-senate-floor


Processing URLs:  61%|██████    | 607/1000 [26:38<08:36,  1.31s/it]

Error extracting text from https://www.timesofisrael.com/violating-the-nuke-deal-iran-shows-it-wants-to-talk-to-biden/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/violating-the-nuke-deal-iran-shows-it-wants-to-talk-to-biden/


Processing URLs:  61%|██████    | 612/1000 [26:44<06:11,  1.05it/s]

Error extracting text from https://www.tsa.gov/coronavirus/passenger-throughput: 403 Client Error: Forbidden for url: https://www.tsa.gov/coronavirus/passenger-throughput


Processing URLs:  61%|██████▏   | 614/1000 [26:46<06:03,  1.06it/s]

Error extracting text from https://leaderpost.com/pmn/business-pmn/gazprom-says-controversial-russian-pipe-can-ship-gas-in-2021/wcm/f5dd69d3-9072-4ac1-8713-d2cd4dc10ad0/amp/: 403 Client Error: Forbidden for url: https://leaderpost.com/pmn/business-pmn/gazprom-says-controversial-russian-pipe-can-ship-gas-in-2021/wcm/f5dd69d3-9072-4ac1-8713-d2cd4dc10ad0/amp/


Processing URLs:  62%|██████▏   | 615/1000 [26:47<04:42,  1.36it/s]

Error extracting text from https://www.reuters.com/world/americas/brazils-bolsonaro-says-everyone-should-buy-rifle-2021-08-27/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazils-bolsonaro-says-everyone-should-buy-rifle-2021-08-27/


Processing URLs:  62%|██████▏   | 617/1000 [26:50<06:43,  1.05s/it]

Error extracting text from http://www.hybridcars.com/tesla-executive-provides-more-details-on-model-x-and-model-3-sales-in-china/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/tesla-executive-provides-more-details-on-model-x-and-model-3-sales-in-china/


Processing URLs:  62%|██████▏   | 619/1000 [26:51<05:43,  1.11it/s]

Error extracting text from http://www.nytimes.com/2003/05/08/international/worldspecial/08CND-NATO.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2003/05/08/international/worldspecial/08CND-NATO.html


Processing URLs:  62%|██████▏   | 620/1000 [26:53<06:47,  1.07s/it]

Error extracting text from http://www.newsweek.com/syria-civil-war-russia-putin-assad-obama-ceasefire-428037: 403 Client Error: Forbidden for url: https://www.newsweek.com/syria-civil-war-russia-putin-assad-obama-ceasefire-428037


Processing URLs:  62%|██████▏   | 623/1000 [26:57<07:58,  1.27s/it]

Error extracting text from https://www.nytimes.com/2017/05/03/us/politics/trump-filibuster.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/03/us/politics/trump-filibuster.html?_r=0


Processing URLs:  62%|██████▎   | 625/1000 [27:02<11:40,  1.87s/it]

Error extracting text from http://indicators.likely: HTTPConnectionPool(host='indicators.likely', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fbdee930>: Failed to resolve 'indicators.likely' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  64%|██████▎   | 637/1000 [27:29<07:04,  1.17s/it]

Error extracting text from http://www.reuters.com/article/2015/09/21/us-iran-nuclear-parchin-idUSKCN0RL0MT20150921?feedType=RSS&amp;feedName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/21/us-iran-nuclear-parchin-idUSKCN0RL0MT20150921?feedType=RSS&amp;feedName=worldNews


Processing URLs:  64%|██████▍   | 641/1000 [27:34<05:50,  1.02it/s]

Error extracting text from https://www.axios.com/the-complete-list-of-persons-of-interest-in-the-russia-probe-2444476852.html: 403 Client Error: Forbidden for url: https://www.axios.com/the-complete-list-of-persons-of-interest-in-the-russia-probe-2444476852.html
Error extracting text from http://www.crisis.acleddata.com/nigeria-may-2016-update/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /nigeria-may-2016-update/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302d136b0>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  64%|██████▍   | 645/1000 [27:38<06:20,  1.07s/it]

Error extracting text from http://www.southsudannewsagency.com/index.php/2016/11/19/south-sudan-beefs-security-amid-mounting-tension/: 404 Client Error: Not Found for url: https://www.southsudannewsagency.com/2016/11/19/south-sudan-beefs-security-amid-mounting-tension/


Processing URLs:  65%|██████▍   | 649/1000 [27:42<07:14,  1.24s/it]

Error extracting text from https://www.senate.gov/legislative/2017_schedule.htm: 403 Client Error: Forbidden for url: https://www.senate.gov/legislative/2017_schedule.htm


Processing URLs:  65%|██████▌   | 653/1000 [27:46<04:23,  1.32it/s]

Error extracting text from http://www.reuters.com/article/us-iran-usa-idUSKBN0UH0JZ20160103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-usa-idUSKBN0UH0JZ20160103


Processing URLs:  66%|██████▌   | 656/1000 [27:55<13:15,  2.31s/it]

Error extracting text from http://syriadirect.org/news/new-regime-division-drawn-from-%E2%80%98farmers%E2%80%99-and-%E2%80%98laborers%E2%80%99/: 404 Client Error: Not Found for url: http://syriadirect.org/news/new-regime-division-drawn-from-%E2%80%98farmers%E2%80%99-and-%E2%80%98laborers%E2%80%99/


Processing URLs:  66%|██████▌   | 657/1000 [27:58<12:56,  2.26s/it]

Error extracting text from http://en.trend.az/world/other/2655615.html: 404 Client Error: Not Found for url: https://www.trend.az/world/other/2655615.html


Processing URLs:  66%|██████▋   | 663/1000 [28:02<04:51,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0XZ0FC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0XZ0FC


Processing URLs:  67%|██████▋   | 666/1000 [28:04<04:49,  1.15it/s]

Error extracting text from http://af.reuters.com/article/commoditiesNews/idAFL1N11H1T020150911?pageNumber=1&amp;virtualBrandChannel=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  67%|██████▋   | 667/1000 [28:05<03:58,  1.40it/s]

Error extracting text from https://www.byline.com/column/11/article/1046: 403 Client Error: Forbidden for url: https://www.byline.com/column/11/article/1046


Processing URLs:  67%|██████▋   | 670/1000 [28:09<06:43,  1.22s/it]

Error extracting text from http://www.38North.org: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  68%|██████▊   | 675/1000 [28:15<06:26,  1.19s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/china-sends-missiles-to/2521792.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/china-sends-missiles-to/2521792.html
URL filtered: http://www.bloomberg.com/news/articles/2016-01-09/china-south-korea-to-make-efforts-to-halt-north-korean-nukes


Processing URLs:  68%|██████▊   | 677/1000 [28:32<24:26,  4.54s/it]

Error extracting text from http://www.investopedia.com/articles/investing/071515/6-factors-point-global-recession-2016.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/investing/071515/6-factors-point-global-recession-2016.asp


Processing URLs:  68%|██████▊   | 681/1000 [28:36<10:55,  2.05s/it]

Error extracting text from http://www.chicagotribune.com/news/local/politics/ct-bruce-rauner-illinois-credit-rating-met-0713-20170712-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/local/politics/ct-bruce-rauner-illinois-credit-rating-met-0713-20170712-story.html
Error extracting text from http://www.eubarnet.eu/wp-content/uploads/2012/06/Agroterrorism-Biological-Crimes.pdf: HTTPConnectionPool(host='www.eubarnet.eu', port=80): Max retries exceeded with url: /wp-content/uploads/2012/06/Agroterrorism-Biological-Crimes.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d9eb0>: Failed to resolve 'www.eubarnet.eu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 683/1000 [28:39<09:15,  1.75s/it]

Error extracting text from http://www.france24.com/en/20160818-syria-bombs-kurdish-stronghold-hasaka-staffan-de-mistura: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160818-syria-bombs-kurdish-stronghold-hasaka-staffan-de-mistura


Processing URLs:  69%|██████▊   | 686/1000 [28:41<06:06,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-tillerson-asia-china-idUSKBN16O2V9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tillerson-asia-china-idUSKBN16O2V9


Processing URLs:  69%|██████▊   | 687/1000 [28:43<06:30,  1.25s/it]

Error extracting text from https://www.reuters.com/business/energy/just-15-km-nord-stream-2-pipeline-go-says-putin-2021-08-20/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/just-15-km-nord-stream-2-pipeline-go-says-putin-2021-08-20/


Processing URLs:  69%|██████▉   | 691/1000 [28:46<04:24,  1.17it/s]

Error extracting text from https://www.nytimes.com/2017/12/06/business/house-senate-tax-bill-differences.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/06/business/house-senate-tax-bill-differences.html?_r=0


Processing URLs:  69%|██████▉   | 694/1000 [28:50<06:09,  1.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-04/opec-unity-shattered-as-saudi-led-policy-leads-to-no-limits-ihs9xu51


Processing URLs:  70%|██████▉   | 696/1000 [28:50<04:03,  1.25it/s]

Error extracting text from http://www.japantimes.co.jp/news/2015/10/10/national/politics-diplomacy/hopes-strengthening-ties-abe-may-travel-iran-later-year/#.VhibqLxB43Q: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/10/10/national/politics-diplomacy/hopes-strengthening-ties-abe-may-travel-iran-later-year/#.VhibqLxB43Q
URL filtered: https://twitter.com/asymco/status/793521394357243904/photo/1


Processing URLs:  70%|███████   | 702/1000 [29:01<08:48,  1.77s/it]

Error extracting text from http://www.lowyinterpreter.org/post/2016/07/20/More-heat-than-light-in-Australian-FONOPs-debate.aspx: 404 Client Error: Not Found for url: https://www.lowyinstitute.org/the-interpreter/post/2016/07/20/More-heat-than-light-in-Australian-FONOPs-debate.aspx


Processing URLs:  70%|███████   | 704/1000 [29:12<15:43,  3.19s/it]

Error extracting text from http://www.nationmultimedia.com/business/RCEP-members-agree-to-eliminate-tariffs-on-65-of-t-30271561.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/RCEP-members-agree-to-eliminate-tariffs-on-65-of-t-30271561.html


Processing URLs:  70%|███████   | 705/1000 [29:13<11:29,  2.34s/it]

Error extracting text from https://www.yahoo.com/news/turkey-admits-no-visa-free-eu-travel-deal-173556335.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/turkey-admits-no-visa-free-eu-travel-deal-173556335.html


Processing URLs:  71%|███████   | 707/1000 [29:15<08:22,  1.71s/it]

Error extracting text from https://www.nytimes.com/2017/03/06/technology/google-turkey-antitrust-android.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/06/technology/google-turkey-antitrust-android.html


Processing URLs:  71%|███████▏  | 713/1000 [29:23<05:27,  1.14s/it]

Error extracting text from http://www.nytimes.com/2016/01/30/business/media/gop-debate-without-trump-draws-12-5-million-viewers.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/30/business/media/gop-debate-without-trump-draws-12-5-million-viewers.html


Processing URLs:  72%|███████▏  | 715/1000 [29:26<05:56,  1.25s/it]

Error extracting text from https://www.agexinc.com/agex-therapeutics-obtains-10-million-in-capital-and-commences-operations/: 403 Client Error: Forbidden for url: https://www.agexinc.com/agex-therapeutics-obtains-10-million-in-capital-and-commences-operations/


Processing URLs:  72%|███████▏  | 716/1000 [29:29<09:28,  2.00s/it]

Error extracting text from http://theiranproject.com/blog/2016/03/23/irans-run-off-parliamentary-elections-set-april-29/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=irans-run-off-parliamentary-elections-set-april-29


Processing URLs:  72%|███████▏  | 718/1000 [29:31<07:04,  1.51s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S1877042814067056: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S1877042814067056


Processing URLs:  72%|███████▏  | 721/1000 [29:37<08:37,  1.85s/it]

Error extracting text from http://www.thenational.ae/world/middle-east/slow-mosul-offensive-threatens-this-years-largest-population-displacement#page1: 404 Client Error: Not Found for url: https://www.thenationalnews.com/mena/slow-mosul-offensive-threatens-this-years-largest-population-displacement/#page1


Processing URLs:  72%|███████▎  | 725/1000 [29:42<04:54,  1.07s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/ultimas-noticias/tag/dilma-rousseff/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://zh.clicrbs.com.br/rs/ultimas-noticias/tag/dilma-rousseff/&amp;prev=search


Processing URLs:  73%|███████▎  | 726/1000 [29:42<03:42,  1.23it/s]

Error extracting text from http://english.alarabiya.net/en/News/world/2016/05/25/Italy-5600-migrants-rescued-off-Libya-in-48-hours-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/world/2016/05/25/Italy-5600-migrants-rescued-off-Libya-in-48-hours-.html


Processing URLs:  73%|███████▎  | 727/1000 [29:42<03:09,  1.44it/s]

Error extracting text from https://www.nytimes.com/2021/01/23/world/europe/navalny-protests-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/23/world/europe/navalny-protests-russia.html


Processing URLs:  73%|███████▎  | 730/1000 [30:48<1:27:27, 19.44s/it]

Error extracting text from http://www.atmb.net.cn/select_List.aspx?search=ADIZ: HTTPConnectionPool(host='www.atmb.net.cn', port=80): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 734/1000 [30:53<24:31,  5.53s/it]  

Error extracting text from https://www.pakistantoday.com.pk/2017/12/16/taliban-haqqani-network-members-roam-free-in-pakistan-pentagon/: 403 Client Error: Forbidden for url: https://www.pakistantoday.com.pk/2017/12/16/taliban-haqqani-network-members-roam-free-in-pakistan-pentagon/


Processing URLs:  74%|███████▎  | 735/1000 [30:55<19:44,  4.47s/it]

URL filtered: https://www.youtube.com/watch?v=n6JIbGvwEoA


Processing URLs:  74%|███████▍  | 738/1000 [30:57<09:19,  2.14s/it]

Error extracting text from http://en.censor.net.ua/news/395376/technical_consultations_on_imf_memorandum_are_over_tranche_to_be_allocated_in_few_weeks_yatseniuk: 403 Client Error: Forbidden for url: https://censor.net/en/news/395376/technical_consultations_on_imf_memorandum_are_over_tranche_to_be_allocated_in_few_weeks_yatseniuk


Processing URLs:  74%|███████▍  | 739/1000 [30:58<08:13,  1.89s/it]

Error extracting text from http://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea#ixzz43o0ygP8n: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/130215/south-china-sea-arbitration-philippines-china-spratly-islands-west-philippine-sea#ixzz43o0ygP8n


Processing URLs:  74%|███████▍  | 742/1000 [31:01<05:37,  1.31s/it]

Error extracting text from http://1tvnews.af/en/news/afghanistan/23556-kabul-says-not-interested-in-reviving-taliban-talks: 406 Client Error: Not Acceptable for url: http://1tvnews.af/en/news/afghanistan/23556-kabul-says-not-interested-in-reviving-taliban-talks


Processing URLs:  74%|███████▍  | 744/1000 [31:05<06:09,  1.44s/it]

Error extracting text from http://news.yahoo.com/syria-rebels-face-rout-allies-saudi-turkey-may-071955719.html: 404 Client Error: Not Found for url: http://news.yahoo.com/syria-rebels-face-rout-allies-saudi-turkey-may-071955719.html


Processing URLs:  75%|███████▍  | 746/1000 [31:09<06:34,  1.55s/it]

URL filtered: https://www.youtube.com/watch?v=Xo8wjZdlcLQ


Processing URLs:  75%|███████▌  | 753/1000 [31:20<06:00,  1.46s/it]

Error extracting text from https://www.amazon.com/Command-Control-Damascus-Accident-Illusion/dp/0143125788: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Command-Control-Damascus-Accident-Illusion/dp/0143125788


Processing URLs:  75%|███████▌  | 754/1000 [32:20<1:15:30, 18.42s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2016-08-01/as-syria-transition-date-passes-us-makes-no-policy-change: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  76%|███████▌  | 757/1000 [32:21<27:04,  6.68s/it]  

Error extracting text from http://www.cdm.me/english/stoltenberg-we-expect-natos-decision-on-montenegros-membership-in-july: 403 Client Error: Forbidden for url: https://www.cdm.me/english/stoltenberg-we-expect-natos-decision-on-montenegros-membership-in-july
Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2021/03/02/global-human-rights-sanctions-regime-eu-sanctions-four-people-responsible-for-serious-human-rights-violations-in-russia/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2021/03/02/global-human-rights-sanctions-regime-eu-sanctions-four-people-responsible-for-serious-human-rights-violations-in-russia/


Processing URLs:  76%|███████▌  | 760/1000 [32:23<11:05,  2.77s/it]

Error extracting text from http://www.bakermckenzie.com/sanctionsnews/blog.aspx: 403 Client Error: Forbidden for url: http://www.bakermckenzie.com/sanctionsnews/blog.aspx


Processing URLs:  76%|███████▋  | 765/1000 [32:29<04:29,  1.15s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper_vs_cecil-5841.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/governor/nc/north_carolina_governor_mccrory_vs_cooper_vs_cecil-5841.html


Processing URLs:  77%|███████▋  | 768/1000 [32:34<06:30,  1.68s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-11-11/iran-tells-opec-it-boosted-output-by-most-since-sanctions-ended-ivdp58a5
URL filtered: https://www.turkishminute.com/2017/10/02/68-pkk-militants-neutralized-in-one-week/?utm_content=buffer19a3b&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  77%|███████▋  | 773/1000 [35:07<1:58:25, 31.30s/it]

Error extracting text from https://www.yang2020.com/policies/: HTTPSConnectionPool(host='www.yang2020.com', port=443): Max retries exceeded with url: /policies/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fe270770>, 'Connection to www.yang2020.com timed out. (connect timeout=60)'))


Processing URLs:  78%|███████▊  | 776/1000 [35:09<49:16, 13.20s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-eu-usa-ttip-idUSKCN0ZD2GW?mod=related&amp;channelName=gc03: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-usa-ttip-idUSKCN0ZD2GW?mod=related&amp;channelName=gc03
Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2017/02/07-prolongation-border-controls/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2017/02/07-prolongation-border-controls/
Error extracting text from https://www.reuters.com/world/asia-pacific/no-ones-safe-anymore-japans-osaka-city-crumples-under-covid-19-onslaught-2021-05-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/no-ones-safe-anymore-japans-osaka-city-crumples-under-covid-19-onslaught-2021-05-24/


Processing URLs:  78%|███████▊  | 783/1000 [35:35<19:04,  5.27s/it]

Error extracting text from http://gulftoday.ae/portal/827df8b0-a7eb-4e43-8852-2789a206c28c.aspx: 404 Client Error: Not Found for url: https://gulftoday.ae/portal/827df8b0-a7eb-4e43-8852-2789a206c28c.aspx


Processing URLs:  78%|███████▊  | 785/1000 [35:38<12:13,  3.41s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-13/the-u-s-yield-curve-is-flattening-and-here-s-why-it-matters


Processing URLs:  79%|███████▉  | 788/1000 [35:41<07:12,  2.04s/it]

Error extracting text from http://tass.ru/en/economy/842802: 404 Client Error: Not Found for url: https://tass.ru/en/economy/842802


Processing URLs:  79%|███████▉  | 790/1000 [35:46<08:01,  2.29s/it]

Error extracting text from http://www.ambafrance-me.org/: 429 Client Error: Too Many Requests for url: https://me.ambafrance.org/


Processing URLs:  79%|███████▉  | 794/1000 [35:49<03:34,  1.04s/it]

Error extracting text from https://olympics.com/tokyo-2020/en/schedule/: 403 Client Error: Forbidden for url: https://olympics.com/tokyo-2020/en/schedule/
Error extracting text from https://www.axios.com/ukraine-russian-invasion-talks-bedd8c3a-efc7-4549-92fe-b360e0655b5d.html: 403 Client Error: Forbidden for url: https://www.axios.com/ukraine-russian-invasion-talks-bedd8c3a-efc7-4549-92fe-b360e0655b5d.html


Processing URLs:  80%|███████▉  | 796/1000 [35:51<02:46,  1.23it/s]

Error extracting text from https://panampost.com/maria-teresa-romero/2016/02/11/economic-collapse-looms-venezuelas-opposition-has-no-time-to-lose/: 403 Client Error: Forbidden for url: https://panampost.com/maria-teresa-romero/2016/02/11/economic-collapse-looms-venezuelas-opposition-has-no-time-to-lose/


Processing URLs:  80%|███████▉  | 798/1000 [35:55<04:46,  1.42s/it]

Error extracting text from https://markets.businessinsider.com/currencies/news/bitcoin-investing-cryptocurrencies-wall-street-jpmorgan-morgan-stanley-blackrock-tesla-2021-2-1030083958: 404 Client Error: Not Found for url: https://markets.businessinsider.com/news/jpmorgan-and-morgan-stanley-are-eyeing-bitcoin-here-are-the-big-wall-street-names-warming-to-cryptocurrencies-1030083958?miRedirects=3


Processing URLs:  80%|███████▉  | 799/1000 [35:55<03:34,  1.07s/it]

Error extracting text from http://www.wsj.com/articles/irans-supreme-leader-vows-divine-revenge-for-saudi-execution-of-shiite-cleric-1451817615: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/irans-supreme-leader-vows-divine-revenge-for-saudi-execution-of-shiite-cleric-1451817615


Processing URLs:  80%|████████  | 802/1000 [35:56<01:54,  1.73it/s]

Error extracting text from https://www.thenation.com/article/trumps-handling-north-korea-going-lead-us-straight-nuclear-disaster/: 404 Client Error: Not Found for url: https://www.thenation.com/article/trumps-handling-north-korea-going-lead-us-straight-nuclear-disaster/
URL filtered: https://www.statista.com/statistics/1108307/covid-twitch-youtube-viewers/
Error extracting text from http://www.reuters.com/article/us-somalia-security-idUSKBN1AK19F?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-security-idUSKBN1AK19F?il=0


Processing URLs:  81%|████████  | 806/1000 [36:18<18:10,  5.62s/it]

Error extracting text from http://recode.net/2015/11/20/go-is-the-game-machines-cant-beat-googles-artificial-intelligence-whiz-hints-that-his-will/: Exceeded 30 redirects.


Processing URLs:  81%|████████  | 811/1000 [36:32<09:22,  2.97s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/brazil-judge-backs-rousse/2353370.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/brazil-judge-backs-rousse/2353370.html


Processing URLs:  82%|████████▏ | 815/1000 [36:35<03:34,  1.16s/it]

Error extracting text from http://www.ibtimes.co.uk/burundi-crisis-one-year-many-1500-dead-international-community-shuts-its-eyes-1556849: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/burundi-crisis-one-year-many-1500-dead-international-community-shuts-its-eyes-1556849
URL filtered: https://www.youtube.com/watch?v=u8-E2LswmGk
Error extracting text from http://www.balkaninsight.com/en/article/us-senate-panel-likely-to-back-montenegro-s-nato-bid-09-14-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/us-senate-panel-likely-to-back-montenegro-s-nato-bid-09-14-2016


Processing URLs:  82%|████████▏ | 817/1000 [36:35<02:17,  1.33it/s]

Error extracting text from http://www.reuters.com/article/2015/11/28/us-mideast-crisis-turkey-russia-erdogan-idUSKBN0TG18K20151128: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/28/us-mideast-crisis-turkey-russia-erdogan-idUSKBN0TG18K20151128


Processing URLs:  82%|████████▏ | 820/1000 [36:38<02:18,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-europe-trade-canada-germany-idUSKCN11P15O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-trade-canada-germany-idUSKCN11P15O?il=0


Processing URLs:  82%|████████▏ | 822/1000 [36:40<01:58,  1.50it/s]

Error extracting text from http://www.nti.org/learn/treaties-and-regimes/treaty-between-the-united-states-of-america-and-the-union-of-soviet-socialist-republics-on-the-elimination-of-their-intermediate-range-and-shorter-range-missiles/: 403 Client Error: Forbidden for url: https://www.nti.org/learn/treaties-and-regimes/treaty-between-the-united-states-of-america-and-the-union-of-soviet-socialist-republics-on-the-elimination-of-their-intermediate-range-and-shorter-range-missiles/


Processing URLs:  83%|████████▎ | 826/1000 [36:53<08:13,  2.84s/it]

Error extracting text from http://vestnikkavkaza.net/news/Russian-Transport-Ministry-too-early-to-restore-air-traffic-between-Russia-and-Egypt.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/Russian-Transport-Ministry-too-early-to-restore-air-traffic-between-Russia-and-Egypt.html


Processing URLs:  83%|████████▎ | 828/1000 [36:54<04:50,  1.69s/it]

Error extracting text from http://warontherocks.com/2016/03/there-is-no-russian-withdrawal-from-syria/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/03/there-is-no-russian-withdrawal-from-syria/


Processing URLs:  83%|████████▎ | 832/1000 [36:57<03:14,  1.16s/it]

Error extracting text from http://internationalbanker.com/brokerage/accounting-for-the-malaysian-ringgits-slump-in-2015/: 403 Client Error: Forbidden for url: http://internationalbanker.com/brokerage/accounting-for-the-malaysian-ringgits-slump-in-2015/


Processing URLs:  84%|████████▎ | 836/1000 [37:03<03:05,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-japan-china-southkorea-idUSKCN10S0DS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-china-southkorea-idUSKCN10S0DS


Processing URLs:  84%|████████▍ | 839/1000 [37:08<04:02,  1.50s/it]

Error extracting text from http://www.wsj.com/articles/time-inc-names-jen-wong-to-new-coo-position-1473941680: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-inc-names-jen-wong-to-new-coo-position-1473941680


Processing URLs:  84%|████████▍ | 842/1000 [37:11<02:35,  1.01it/s]

Error extracting text from http://www.wsj.com/articles/after-bold-talk-cameron-makes-limited-gains-in-draft-deal-on-eu-changes-1454450262: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/after-bold-talk-cameron-makes-limited-gains-in-draft-deal-on-eu-changes-1454450262


Processing URLs:  84%|████████▍ | 845/1000 [37:16<03:38,  1.41s/it]

Error extracting text from https://www.faa.gov/uas/programs_partnerships/uas_data_exchange/airports_participating_in_laanc/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/programs_partnerships/uas_data_exchange/airports_participating_in_laanc/


Processing URLs:  85%|████████▍ | 848/1000 [37:22<04:53,  1.93s/it]

Error extracting text from http://www.outerplaces.com/science-fiction/item/11858-early-box-office-predicts-captain-america-civil-war-will-murder-batman-v-superman: 404 Client Error: Not Found for url: https://www.outerplaces.com/science-fiction/item/11858-early-box-office-predicts-captain-america-civil-war-will-murder-batman-v-superman


Processing URLs:  85%|████████▌ | 851/1000 [37:27<03:23,  1.37s/it]

Error extracting text from http://www.bworldonline.com/content.php?section=Economy&amp;title=bataan-nuclear-plant-will-take-years-to-rehab-house-committee-told&amp;id=115276: 404 Client Error: Not Found for url: https://www.bworldonline.com/content.php?section=Economy&amp;title=bataan-nuclear-plant-will-take-years-to-rehab-house-committee-told&amp;id=115276


Processing URLs:  86%|████████▌ | 855/1000 [37:33<03:55,  1.62s/it]

Error extracting text from http://micanaldepanama.com/expansion/2016/02/testing-of-new-panama-canal-locks-carried-out-successfully/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/2016/02/testing-of-new-panama-canal-locks-carried-out-successfully/


Processing URLs:  86%|████████▌ | 860/1000 [37:37<01:46,  1.31it/s]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-anc-meets-zuma-as-pressure-mounts-for-him-to-quit-idUSKBN1FO0J8?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-anc-meets-zuma-as-pressure-mounts-for-him-to-quit-idUSKBN1FO0J8?il=0


Processing URLs:  86%|████████▋ | 864/1000 [37:43<02:56,  1.30s/it]

Error extracting text from http://in.reuters.com/article/india-petroleum-reserves-idINKCN0VD1WS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  87%|████████▋ | 867/1000 [37:49<03:42,  1.67s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Lavrov-to-visit-Japan-in-April-to-prepare-Abe-Putin-summit: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Lavrov-to-visit-Japan-in-April-to-prepare-Abe-Putin-summit
Error extracting text from https://www.reuters.com/article/us-britain-eu-money/cracking-deadlock-on-brexit-bill-may-require-eu-summit-talks-idUSKCN1BC595: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-money/cracking-deadlock-on-brexit-bill-may-require-eu-summit-talks-idUSKCN1BC595


Processing URLs:  88%|████████▊ | 875/1000 [37:55<01:55,  1.08it/s]

Error extracting text from https://www.noaa.gov/news-release/noaa-predicts-another-active-atlantic-hurricane-season: 403 Client Error: Forbidden for url: https://www.noaa.gov/news-release/noaa-predicts-another-active-atlantic-hurricane-season


Processing URLs:  88%|████████▊ | 880/1000 [38:04<02:36,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-safrica-zuma-insight-idUSKCN0XV1RB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-zuma-insight-idUSKCN0XV1RB


Processing URLs:  88%|████████▊ | 883/1000 [38:05<01:18,  1.49it/s]

Error extracting text from http://www.reuters.com/article/2015/09/12/us-mideast-crisis-syria-germany-idUSKCN0RC0LM20150912: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/12/us-mideast-crisis-syria-germany-idUSKCN0RC0LM20150912
Error extracting text from http://www.reuters.com/article/us-usa-trump-budget-idUSKCN1B41RT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-budget-idUSKCN1B41RT


Processing URLs:  89%|████████▉ | 888/1000 [38:24<07:18,  3.92s/it]

Error extracting text from https://www.washingtonpost.com/national/trump-sanders-look-to-emerge-from-new-hampshire-with-wins/2016/02/09/66ec79d4-cf8e-11e5-90d3-34c2c42653ac_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/national/trump-sanders-look-to-emerge-from-new-hampshire-with-wins/2016/02/09/66ec79d4-cf8e-11e5-90d3-34c2c42653ac_story.html


Processing URLs:  89%|████████▉ | 890/1000 [38:26<04:56,  2.70s/it]

Error extracting text from https://www.rit.edu/~w-cmmc/literature/Thomas_2004.pdf: 404 Client Error: Not Found for url: https://www.rit.edu/~w-cmmc/literature/Thomas_2004.pdf
URL filtered: http://www.reuters.com/article/2015/09/15/us-china-southchinasea-airstrips-idUSKCN0RE28220150915?feedType=RSS&amp;feedName=topNews&amp;utm_source=twitter


Processing URLs:  90%|████████▉ | 896/1000 [38:34<02:48,  1.62s/it]

Error extracting text from https://www.bls.gov/news.release/pdf/empsit.pdf: 403 Client Error: Forbidden for url: https://www.bls.gov/news.release/pdf/empsit.pdf


Processing URLs:  90%|█████████ | 902/1000 [38:41<01:58,  1.21s/it]

Error extracting text from https://www.npd.com/wps/portal/npd/us/news/press-releases/2021/adult-fiction-books-posted-highest-q1-sales-since-2013--the-npd-group-says/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/news/press-releases/2021/adult-fiction-books-posted-highest-q1-sales-since-2013--the-npd-group-says/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:  91%|█████████ | 906/1000 [38:48<02:47,  1.78s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/nkorea-fired-unidentified-projectile-off-east-coast-skorea-military-2022-03-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/nkorea-fired-unidentified-projectile-off-east-coast-skorea-military-2022-03-24/


Processing URLs:  91%|█████████▏| 914/1000 [39:13<03:55,  2.74s/it]

Error extracting text from http://uk.reuters.com/article/2015/09/04/uk-russia-forum-putin-extremism-idUKKCN0R408Y20150904: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  92%|█████████▏| 915/1000 [39:14<03:12,  2.27s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-mosul-idUSKCN0VI25G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-mosul-idUSKCN0VI25G


Processing URLs:  92%|█████████▏| 919/1000 [39:20<02:28,  1.83s/it]

Error extracting text from http://www.arbeitgeber.de/www/arbeitgeber.nsf/res/GDP_CPI_Prod_Forecasts.pdf/$file/GDP_CPI_Prod_Forecasts.pdf: 404 Client Error: Not Found for url: https://arbeitgeber.de/www/arbeitgeber.nsf/res/GDP_CPI_Prod_Forecasts.pdf/$file/GDP_CPI_Prod_Forecasts.pdf


Processing URLs:  93%|█████████▎| 926/1000 [39:31<02:06,  1.71s/it]

Error extracting text from https://rns.online/military/Rossiya-vivedet-iz-Sirii-okolo-80-samoletov-ostanutsya-sili-PVO-i-korabli-s--Kalibrami-2016-03-14/: 404 Client Error: Not Found for url: https://rns.online/military/Rossiya-vivedet-iz-Sirii-okolo-80-samoletov-ostanutsya-sili-PVO-i-korabli-s--Kalibrami-2016-03-14/


Processing URLs:  93%|█████████▎| 929/1000 [39:34<01:24,  1.19s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/25/gitrep-24mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/25/gitrep-24mar16pm/


Processing URLs:  93%|█████████▎| 933/1000 [39:39<01:05,  1.02it/s]

Error extracting text from http://thehill.com/homenews/campaign/363101-alabama-businesses-worry-roy-moore-winning-could-be-bad-for-state-economy: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/363101-alabama-businesses-worry-roy-moore-winning-could-be-bad-for-state-economy/
Error extracting text from http://www.reuters.com/article/us-iran-europe-rouhani-idUSKCN0V31DJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-europe-rouhani-idUSKCN0V31DJ


Processing URLs:  94%|█████████▎| 936/1000 [39:45<01:35,  1.49s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-27/trump-spurned-by-alabama-voters-who-reject-his-senate-candidate


Processing URLs:  94%|█████████▍| 938/1000 [39:45<00:52,  1.17it/s]

Error extracting text from https://robertscribbler.com/2016/05/02/arctic-sea-ice-is-falling-off-a-cliff-and-it-may-not-survive-the-summer/: 404 Client Error: Not Found for url: https://robertscribbler.com/2016/05/02/arctic-sea-ice-is-falling-off-a-cliff-and-it-may-not-survive-the-summer/


Processing URLs:  94%|█████████▍| 944/1000 [40:06<02:18,  2.47s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-hosts-nato-amid-protests-10-15-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-hosts-nato-amid-protests-10-15-2015


Processing URLs:  95%|█████████▍| 949/1000 [40:18<02:50,  3.35s/it]

Error extracting text from http://stripdistrictneighbors.com/: HTTPConnectionPool(host='stripdistrictneighbors.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301662270>: Failed to resolve 'stripdistrictneighbors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  95%|█████████▌| 951/1000 [40:22<02:06,  2.58s/it]

Error extracting text from http://www.auto-motor-und-sport.de/news/3717037/technik_ecotest-stickoxide-in-wltc-neu.pdf: 404 Client Error: Not Found for url: https://www.auto-motor-und-sport.de/news/3717037/technik_ecotest-stickoxide-in-wltc-neu.pdf


Processing URLs:  96%|█████████▌| 958/1000 [40:40<01:48,  2.59s/it]

Error extracting text from http://www.leader.ir/langs/en/index.php?p=contentShow&amp;id=13791: 404 Client Error: Not Found for url: https://www.leader.ir/error


Processing URLs:  96%|█████████▌| 960/1000 [40:59<04:11,  6.29s/it]

Error extracting text from http://www.people.com/article/joe-biden-decided-not-run-president-grief-son-beau-death: 406 Client Error: Not Acceptable for url: https://www.people.com/article/joe-biden-decided-not-run-president-grief-son-beau-death


Processing URLs:  96%|█████████▌| 962/1000 [41:03<02:36,  4.11s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-s-parliament-to-dismisses-speaker-02-23-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-s-parliament-to-dismisses-speaker-02-23-2016


Processing URLs:  97%|█████████▋| 966/1000 [41:06<01:00,  1.79s/it]

Error extracting text from https://www.reuters.com/article/us-iran-nuclear-rouhani-idUSKCN1AV0LW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-rouhani-idUSKCN1AV0LW


Processing URLs:  97%|█████████▋| 968/1000 [41:07<00:42,  1.32s/it]

URL filtered: http://www.bloomberg.com/news/videos/2016-09-14/padma-warrior-on-the-future-of-autonomous-and-electric-vehicles


Processing URLs:  97%|█████████▋| 971/1000 [41:08<00:25,  1.15it/s]

Error extracting text from https://www.nytimes.com/2017/12/08/world/europe/brexit-uk-eu.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/08/world/europe/brexit-uk-eu.html


Processing URLs:  97%|█████████▋| 973/1000 [41:10<00:23,  1.16it/s]

Error extracting text from https://bylinetimes.com/2021/01/13/government-official-reveals-uk-still-pursuing-ridiculous-herd-immunity-strategy/: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  97%|█████████▋| 974/1000 [41:10<00:18,  1.42it/s]

Error extracting text from http://www.wsj.com/articles/report-finds-islamic-state-weapons-factories-hone-munitions-with-fearsome-efficiency-1481906511: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/report-finds-islamic-state-weapons-factories-hone-munitions-with-fearsome-efficiency-1481906511


Processing URLs:  98%|█████████▊| 976/1000 [41:13<00:22,  1.09it/s]

URL filtered: https://www.youtube.com/watch?v=hjMWa2rgGD0


Processing URLs:  98%|█████████▊| 984/1000 [41:25<00:29,  1.84s/it]

Error extracting text from https://usukraine.org/news/articles/stronger-together/NjA4MTM=/).: 404 Client Error: Not Found for url: https://usukraine.org/news/articles/stronger-together/NjA4MTM=/).


Processing URLs:  99%|█████████▊| 986/1000 [41:31<00:35,  2.54s/it]

Error extracting text from http://web.de/magazine/politik/spiegel-militaerischer-geheimdienst-russland-hackte-bundestag-31318402: 404 Client Error: 404 for url: https://web.de/magazine/politik/spiegel-militaerischer-geheimdienst-russland-hackte-bundestag-31318402


Processing URLs:  99%|█████████▉| 989/1000 [41:35<00:20,  1.86s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/China-nearly-completes-2-more-runways-in-S.-China-Sea-U.S.-think-tank: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/China-nearly-completes-2-more-runways-in-S.-China-Sea-U.S.-think-tank


Processing URLs:  99%|█████████▉| 990/1000 [41:36<00:14,  1.41s/it]

Error extracting text from https://www.nytimes.com/2017/01/27/us/politics/refugee-muslim-executive-order-trump.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/27/us/politics/refugee-muslim-executive-order-trump.html?_r=1


Processing URLs:  99%|█████████▉| 991/1000 [41:37<00:11,  1.30s/it]

Error extracting text from http://www.altaiconsulting.com/mixedmigrationlibya/Altai_Consulting-UNHCR-Mixed_Migration_Libya.pdf: 404 Client Error: Not Found for url: https://www.altaiconsulting.com/mixedmigrationlibya/Altai_Consulting-UNHCR-Mixed_Migration_Libya.pdf


Processing URLs: 100%|██████████| 1000/1000 [41:56<00:00,  2.52s/it]


Error extracting text from https://sophosnews.files.wordpress.com/2016/04/petya-1200.png?w=780&amp;h=408&amp;crop=1: 404 Client Error: Not Found for url: https://sophosnews.files.wordpress.com/2016/04/petya-1200.png?w=780&amp;h=408&amp;crop=1


Processing URLs:   0%|          | 4/1000 [00:06<25:13,  1.52s/it]

Error extracting text from https://www.reuters.com/article/us-southsudan-protests/hundreds-protest-against-u-s-arms-embargo-in-south-sudan-journalists-attacked-idUSKBN1FQ22E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southsudan-protests/hundreds-protest-against-u-s-arms-embargo-in-south-sudan-journalists-attacked-idUSKBN1FQ22E


Processing URLs:   1%|          | 9/1000 [00:11<18:15,  1.10s/it]

Error extracting text from https://tradingeconomics.com/commodity/uk-natural-gas: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/uk-natural-gas


Processing URLs:   1%|          | 10/1000 [00:11<15:46,  1.05it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/14/excluding-outliers-in-the-smart-crowd-forecast/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/14/excluding-outliers-in-the-smart-crowd-forecast/


Processing URLs:   1%|▏         | 13/1000 [00:17<26:02,  1.58s/it]

Error extracting text from https://pca-cpa.org/wp-content/uploads/sites/175/2016/07/PH-CN-20160712-Press-Release-No-11-English.pdf: 404 Client Error: Not Found for url: https://pca-cpa.org/wp-content/uploads/sites/175/2016/07/PH-CN-20160712-Press-Release-No-11-English.pdf
URL filtered: https://www.bloomberg.com/news/articles/2017-07-17/gramercy-says-emerging-market-risk-could-be-higher-than-in-2008


Processing URLs:   2%|▏         | 15/1000 [00:18<15:24,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/brazils-rio-de-janeiro-state-misses-debt-payment-1464191036: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-rio-de-janeiro-state-misses-debt-payment-1464191036


Processing URLs:   2%|▏         | 17/1000 [00:21<20:07,  1.23s/it]

Error extracting text from https://www.weforum.org/agenda/2020/09/oecd-global-gdp-pre-pandemic-level-2021-sustainable-resilient/: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2020/09/oecd-global-gdp-pre-pandemic-level-2021-sustainable-resilient/


Processing URLs:   2%|▏         | 22/1000 [01:35<5:11:37, 19.12s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-06-16/if-donald-trump-fires-rod-rosenstein-rachel-brand-would-take-over: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   3%|▎         | 28/1000 [01:47<1:04:09,  3.96s/it]

URL filtered: https://twitter.com/ErikFritzsche
Error extracting text from http://www.reuters.com/article/2015/09/18/us-usa-oilexports-clinton-idUSKCN0RI20Y20150918: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/18/us-usa-oilexports-clinton-idUSKCN0RI20Y20150918


Processing URLs:   3%|▎         | 30/1000 [01:51<52:24,  3.24s/it]  

Error extracting text from http://38north.org/2017/05/jschilling051417/?utm_source=38+North+Bulletin+051417&amp;utm_campaign=38+North&amp;utm_medium=email: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:   3%|▎         | 32/1000 [01:54<37:52,  2.35s/it]

Error extracting text from http://www.un.org/press/en/2011/sc10471.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2011/sc10471.doc.htm


Processing URLs:   3%|▎         | 34/1000 [01:58<35:13,  2.19s/it]

URL filtered: http://www.bloomberg.com/quote/XOM:US
URL filtered: https://www.bloomberg.com/news/articles/2017-11-21/tesla-is-blowing-through-8-000-every-minute-amid-model-3-woes


Processing URLs:   4%|▍         | 38/1000 [02:03<27:41,  1.73s/it]

Error extracting text from http://www.predictwise.com/politics/2016demnomination: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016demnomination
URL filtered: https://www.bloomberg.com/news/articles/2018-02-11/trudeau-sees-clear-path-forward-on-nafta-despite-big-impasses


Processing URLs:   4%|▍         | 40/1000 [02:04<19:22,  1.21s/it]

Error extracting text from http://www.wsj.com/articles/in-brazil-senators-line-up-to-unseat-president-dilma-rousseff-1463014177: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-brazil-senators-line-up-to-unseat-president-dilma-rousseff-1463014177


Processing URLs:   5%|▌         | 54/1000 [02:38<32:51,  2.08s/it]  

URL filtered: https://www.youtube.com/watch?v=F65UUO6K5Lw


Processing URLs:   6%|▌         | 57/1000 [02:40<21:21,  1.36s/it]

Error extracting text from http://www.crisisgroup.org/en/regions/europe/turkey-cyprus/turkey/b077-a-sisyphean-task-resuming-turkey-pkk-peace-talks.aspx: 404 Client Error: Not Found for url: https://www.crisisgroup.org/en/regions/europe/turkey-cyprus/turkey/b077-a-sisyphean-task-resuming-turkey-pkk-peace-talks.aspx


Processing URLs:   6%|▌         | 59/1000 [02:48<38:00,  2.42s/it]

Error extracting text from https://www.reuters.com/article/germany-politics/german-coalition-talks-break-down-after-fdp-pulls-out-idUSB4N1LS01K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/germany-politics/german-coalition-talks-break-down-after-fdp-pulls-out-idUSB4N1LS01K


Processing URLs:   7%|▋         | 69/1000 [03:06<16:51,  1.09s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1670N9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN1670N9


Processing URLs:   7%|▋         | 72/1000 [03:08<10:42,  1.45it/s]

Error extracting text from http://www.nytimes.com/2015/04/29/world/middleeast/an-eroding-syrian-army-points-to-strain.html?referrer=: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/04/29/world/middleeast/an-eroding-syrian-army-points-to-strain.html?referrer=


Processing URLs:   8%|▊         | 75/1000 [03:13<18:03,  1.17s/it]

Error extracting text from https://ssi.armywarcollege.edu/pubs/display.cfm?pubID=1358: 404 Client Error: Not Found for url: https://ssi.armywarcollege.edu/pubs/display.cfm?pubID=1358


Processing URLs:   8%|▊         | 77/1000 [03:18<27:40,  1.80s/it]

Error extracting text from http://www.lucsala.nl/utopia.htm: 403 Client Error: ModSecurity Action for url: http://www.lucsala.nl/utopia.htm


Processing URLs:   8%|▊         | 82/1000 [03:27<29:47,  1.95s/it]

URL filtered: https://twitter.com/HQNigerianArmy


Processing URLs:   8%|▊         | 85/1000 [03:28<16:32,  1.08s/it]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2021/02/26/eu-uk-trade-and-cooperation-agreement-council-requests-european-parliament-s-consent/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2021/02/26/eu-uk-trade-and-cooperation-agreement-council-requests-european-parliament-s-consent/


Processing URLs:   9%|▊         | 87/1000 [03:44<1:00:52,  4.00s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-18/u-k-inflation-rate-surges-to-highest-in-almost-two-years


Processing URLs:   9%|▉         | 91/1000 [03:46<24:28,  1.62s/it]  

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://agenciabrasil.ebc.com.br/politica/noticia/2016-02/oposicao-na-camara-apoia-manifesto-pro-impeachment-de-dilma&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://agenciabrasil.ebc.com.br/politica/noticia/2016-02/oposicao-na-camara-apoia-manifesto-pro-impeachment-de-dilma&amp;prev=search


Processing URLs:   9%|▉         | 94/1000 [03:53<27:07,  1.80s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-04/21/c_135298173.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/21/c_135298173.htm


Processing URLs:  10%|▉         | 95/1000 [03:53<20:48,  1.38s/it]

Error extracting text from http://m.focustaiwan.tw/news/aipl/201606060012.aspx: HTTPSConnectionPool(host='m.focustaiwan.tw', port=443): Max retries exceeded with url: /news/aipl/201606060012.aspx (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'm.focustaiwan.tw'. (_ssl.c:1000)")))


Processing URLs:  10%|▉         | 99/1000 [04:08<36:12,  2.41s/it]

Error extracting text from http://www.hydroworld.com/articles/2017/09/stalemate-between-india-pakistan-continues-after-most-recent-round-of-indus-waters-treaty-discussions.html: 403 Client Error: Forbidden for url: https://www.hydroreview.com/


Processing URLs:  10%|█         | 101/1000 [04:15<49:57,  3.33s/it]

Error extracting text from http://soufangroup.com/tsg-intelbrief-more-u-s-troops-in-iraq/: 404 Client Error: Not Found for url: https://www.soufangroup.com/tsg-intelbrief-more-u-s-troops-in-iraq/


Processing URLs:  11%|█         | 111/1000 [04:27<14:03,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/aramco-buyer-beware-the-risky-track-record-of-government-oil-1469700002: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/aramco-buyer-beware-the-risky-track-record-of-government-oil-1469700002


Processing URLs:  12%|█▏        | 116/1000 [04:34<18:02,  1.22s/it]

Error extracting text from https://undocs.org/en/S/2019/961: HTTPSConnectionPool(host='daccess-ods.un.org', port=443): Max retries exceeded with url: /access.nsf/Get?OpenAgent&DS=S/2019/961&Lang=E (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from https://news.yahoo.com/russian-troops-seek-encircle-kyiv-012608847.html: 404 Client Error: Not Found for url: https://news.yahoo.com/russian-troops-seek-encircle-kyiv-012608847.html


Processing URLs:  12%|█▏        | 117/1000 [04:36<21:38,  1.47s/it]

Error extracting text from http://www.sabc.co.za/news/a/e11194804e0a219a8259b35173dc1eac/SACP,-ANC-at-loggerheads-over-Zuma-20160829: 404 Client Error: Not Found for url: https://www.sabc.co.za:443/news/a/e11194804e0a219a8259b35173dc1eac/SACP,-ANC-at-loggerheads-over-Zuma-20160829


Processing URLs:  12%|█▏        | 122/1000 [04:44<19:06,  1.31s/it]

Error extracting text from http://www.nytimes.com/2015/11/24/business/international/volkswagen-chief-says-emissions-inquiry-may-take-months.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/24/business/international/volkswagen-chief-says-emissions-inquiry-may-take-months.html
URL filtered: http://www.youtube.com/watch?v=15YgdrhrCM8


Processing URLs:  12%|█▎        | 125/1000 [04:46<13:55,  1.05it/s]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.timemediakit.com/audience/: Document is empty


Processing URLs:  13%|█▎        | 127/1000 [04:48<14:34,  1.00s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0Z8238: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0Z8238


Processing URLs:  13%|█▎        | 129/1000 [04:50<14:00,  1.04it/s]

Error extracting text from http://www.dispatch.com/opinion/20171119/column-russias-lies-aimed-at-destabilizing-west: 404 Client Error: OK for url: https://www.dispatch.com/opinion/20171119/column-russias-lies-aimed-at-destabilizing-west


Processing URLs:  13%|█▎        | 131/1000 [04:52<12:50,  1.13it/s]

Error extracting text from http://www.tradingeconomics.com/france/gdp-growth: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/france/gdp-growth


Processing URLs:  14%|█▍        | 138/1000 [05:11<18:52,  1.31s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-01/greece-said-to-approach-accord-with-creditors-on-bailout-actions
Error extracting text from http://www.reuters.com/article/us-germany-crime-munich-idUSKCN1021YZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-crime-munich-idUSKCN1021YZ
URL filtered: http://www.bloomberg.com/news/articles/2016-11-23/iraq-will-participate-in-opec-oil-production-cuts-al-abadi-says


Processing URLs:  14%|█▍        | 140/1000 [05:12<15:22,  1.07s/it]

Error extracting text from https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2020/december-2020: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/monetary-policy-summary-and-minutes/2020/december-2020


Processing URLs:  15%|█▍        | 147/1000 [05:23<25:28,  1.79s/it]

URL filtered: https://twitter.com/davidmcallister/status/1340762389499826176?s=19


Processing URLs:  15%|█▌        | 152/1000 [05:27<15:06,  1.07s/it]

Error extracting text from https://www.nytimes.com/2016/12/02/us/politics/obamacare-repeal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/12/02/us/politics/obamacare-repeal.html


Processing URLs:  15%|█▌        | 153/1000 [05:27<11:54,  1.18it/s]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3893937: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3893937


Processing URLs:  15%|█▌        | 154/1000 [05:29<13:47,  1.02it/s]

Error extracting text from http://www.newsweek.com/us-military-fire-iran-ship-machine-gun-persian-gulf-641684: 403 Client Error: Forbidden for url: https://www.newsweek.com/us-military-fire-iran-ship-machine-gun-persian-gulf-641684


Processing URLs:  16%|█▌        | 155/1000 [05:29<11:43,  1.20it/s]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0375960105002331: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0375960105002331


Processing URLs:  16%|█▌        | 158/1000 [05:33<13:29,  1.04it/s]

Error extracting text from https://www.timeoutabudhabi.com/news/alicia-keys-expo-2020-dubai: 403 Client Error: HTTP Forbidden for url: https://www.timeoutabudhabi.com/news/alicia-keys-expo-2020-dubai


Processing URLs:  16%|█▌        | 159/1000 [06:33<4:19:13, 18.49s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/national/article177744986.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  16%|█▋        | 165/1000 [06:42<44:15,  3.18s/it]  

Error extracting text from http://www.nytimes.com/2016/04/21/world/asia/kabul-explosion-afghanistan.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/21/world/asia/kabul-explosion-afghanistan.html


Processing URLs:  17%|█▋        | 166/1000 [06:44<36:24,  2.62s/it]

Error extracting text from http://www.newsweek.com/isis-mosul-iraq-islamic-state-iraqi-army-pentagon-baghdad-barack-obama-ash-479839?rx=us: 403 Client Error: Forbidden for url: https://www.newsweek.com/isis-mosul-iraq-islamic-state-iraqi-army-pentagon-baghdad-barack-obama-ash-479839?rx=us


Processing URLs:  17%|█▋        | 172/1000 [06:52<20:07,  1.46s/it]

Error extracting text from https://www.nytimes.com/2018/02/23/world/africa/eighteen-killed-in-somalia-attacks.html?rref=collection%2Fsectioncollection%2Fworld&amp;action=click&amp;contentCollection=world&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=3&amp;pgtype=sectionfront: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/23/world/africa/eighteen-killed-in-somalia-attacks.html?rref=collection%2Fsectioncollection%2Fworld&amp;action=click&amp;contentCollection=world&amp;region=stream&amp;module=stream_unit&amp;version=latest&amp;contentPlacement=3&amp;pgtype=sectionfront


Processing URLs:  17%|█▋        | 174/1000 [06:55<22:36,  1.64s/it]

URL filtered: http://www.ibloomberg.net/opec-likely-to-increase-production-ceiling-with-indonesia-addition-sources/


Processing URLs:  18%|█▊        | 176/1000 [06:57<17:52,  1.30s/it]

Error extracting text from http://www.ibtimes.com/thaad-korean-peninsula-south-korea-undoubtedly-has-intention-install-us-missile-2377967: 403 Client Error: Forbidden for url: https://www.ibtimes.com/thaad-korean-peninsula-south-korea-undoubtedly-has-intention-install-us-missile-2377967


Processing URLs:  18%|█▊        | 181/1000 [07:08<24:14,  1.78s/it]

Error extracting text from http://www.worldtribune.com/report-calls-iran-top-mideast-threat-to-u-s-interests-by-far/: 403 Client Error: Forbidden for url: http://www.worldtribune.com/report-calls-iran-top-mideast-threat-to-u-s-interests-by-far/
URL filtered: https://www.youtube.com/watch?v=Wp4fKplfGRc


Processing URLs:  19%|█▉        | 188/1000 [07:21<23:57,  1.77s/it]

Error extracting text from http://www.medialifemagazine.com/time-inc-s-long-rough-summer-continues/: HTTPConnectionPool(host='www.medialifemagazine.com', port=80): Max retries exceeded with url: /time-inc-s-long-rough-summer-continues/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300934500>: Failed to resolve 'www.medialifemagazine.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  19%|█▉        | 189/1000 [07:21<19:03,  1.41s/it]

Error extracting text from https://www.wsj.com/articles/why-chinese-men-are-dying-1487933230?=e2fb&amp;mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/why-chinese-men-are-dying-1487933230?=e2fb&amp;mod=e2fb


Processing URLs:  19%|█▉        | 190/1000 [07:22<14:34,  1.08s/it]

Error extracting text from http://www.france24.com/en/20151124-no-french-ground-troops-syria-hollande: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151124-no-french-ground-troops-syria-hollande


Processing URLs:  19%|█▉        | 191/1000 [07:22<11:14,  1.20it/s]

Error extracting text from http://www.wsj.com/articles/syria-islamic-state-dominate-europe-talks-1455191130?tesla=y: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-islamic-state-dominate-europe-talks-1455191130?tesla=y


Processing URLs:  19%|█▉        | 193/1000 [07:22<06:38,  2.03it/s]

Error extracting text from http://www.foreign.senate.gov/hearings/review-of-the-fy-2017-state-department-budget-request_022316: 403 Client Error: Forbidden for url: http://www.foreign.senate.gov/hearings/review-of-the-fy-2017-state-department-budget-request_022316
Error extracting text from http://www.reuters.com/article/us-tesla-deliveries-idUSKCN1220UR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-deliveries-idUSKCN1220UR


Processing URLs:  20%|█▉        | 195/1000 [07:25<11:37,  1.15it/s]

Error extracting text from http://www.caam.org.cn/qiyexinwen/20170104/0905203522.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/qiyexinwen/20170104/0905203522.html


Processing URLs:  20%|█▉        | 198/1000 [07:34<35:36,  2.66s/it]

URL filtered: https://www.youtube.com/watch?v=b9pQ5wtjaFM


Processing URLs:  20%|██        | 200/1000 [07:34<20:23,  1.53s/it]

Error extracting text from http://seekingalpha.com/article/3716786-apple-foxconn-data-is-not-a-false-positive: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3716786-apple-foxconn-data-is-not-a-false-positive


Processing URLs:  20%|██        | 201/1000 [07:35<19:10,  1.44s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-27/venezuela-state-oil-company-says-it-made-critical-debt-payment


Processing URLs:  21%|██        | 208/1000 [07:45<22:30,  1.70s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-18/south-africa-s-yield-premium-above-peer-average-for-first-time


Processing URLs:  21%|██        | 211/1000 [07:50<22:42,  1.73s/it]

Error extracting text from http://www.vestifinance.ru/articles/83862: HTTPSConnectionPool(host='vestifinance.ru', port=443): Max retries exceeded with url: /articles/83862 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  21%|██▏       | 213/1000 [07:52<17:30,  1.33s/it]

Error extracting text from http://www.wsj.com/articles/malaysias-attorney-general-najib-razak-received-681-million-personal-donation-from-saudi-royals-1453780909: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/malaysias-attorney-general-najib-razak-received-681-million-personal-donation-from-saudi-royals-1453780909


Processing URLs:  22%|██▏       | 216/1000 [11:59<10:58:19, 50.38s/it]

Error extracting text from http://seekingalpha.com/article/3719136-prospective-opec-meeting-communique: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3719136-prospective-opec-meeting-communique


Processing URLs:  22%|██▏       | 217/1000 [12:02<7:53:53, 36.31s/it] 

Error extracting text from http://www.worldairops.com/ASI/docs/ASI_MAP_ATSRoutesUpper_atWorldAirOps.com.pdf: EOF marker not found


Processing URLs:  22%|██▏       | 219/1000 [13:05<7:54:37, 36.46s/it]

Error extracting text from http://www.racked.com/2016/2/19/11052468/subscriptions-teen-vogue-self-asos-glamour: HTTPConnectionPool(host='www.racked.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  22%|██▏       | 220/1000 [13:05<5:36:38, 25.90s/it]

Error extracting text from http://www.un.org/sg/statements/index.asp?nid=9747: 403 Client Error: Forbidden for url: https://www.un.org/sg/statements/index.asp?nid=9747


Processing URLs:  22%|██▏       | 222/1000 [13:07<2:49:42, 13.09s/it]

Error extracting text from http://www.tradingeconomics.com/south-africa/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/south-africa/rating


Processing URLs:  23%|██▎       | 226/1000 [13:16<1:04:14,  4.98s/it]

Error extracting text from http://thehill.com/policy/finance/256224-gop-meeting-erupts-over-ex-im-power-play: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/256224-gop-meeting-erupts-over-ex-im-power-play/


Processing URLs:  23%|██▎       | 227/1000 [13:18<52:07,  4.05s/it]  

Error extracting text from http://iranfrontpage.com/headlines/id/4525/: 404 Client Error: Not Found for url: https://iranfrontpage.com/headlines/id/4525/


Processing URLs:  23%|██▎       | 232/1000 [13:25<23:52,  1.87s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0WU1AC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0WU1AC


Processing URLs:  23%|██▎       | 233/1000 [13:28<29:06,  2.28s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/tankan-s-tightest-job-market-since-1992-to-weigh-on-japan-bonds


Processing URLs:  24%|██▎       | 236/1000 [13:33<24:24,  1.92s/it]

URL filtered: https://twitter.com/navalny/status/966619284658155520


Processing URLs:  24%|██▍       | 238/1000 [13:34<16:20,  1.29s/it]

Error extracting text from https://www.newsweek.com/vladimir-putin-russia-popular-politician-1556686: 403 Client Error: Forbidden for url: https://www.newsweek.com/vladimir-putin-russia-popular-politician-1556686


Processing URLs:  24%|██▍       | 240/1000 [13:44<32:31,  2.57s/it]

Error extracting text from https://www.wsj.com/articles/oil-enters-bear-market-as-investors-lose-faith-in-opecs-cuts-1498993202: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-enters-bear-market-as-investors-lose-faith-in-opecs-cuts-1498993202


Processing URLs:  24%|██▍       | 242/1000 [13:48<30:14,  2.39s/it]

Error extracting text from http://www.gunpolicy.org/firearms/search/?l=Burundi: 503 Server Error: Service Unavailable for url: https://www.gunpolicy.org/firearms/search/?l=Burundi


Processing URLs:  24%|██▍       | 243/1000 [13:49<24:01,  1.90s/it]

Error extracting text from https://www.accountingtoday.com/opinion/token-taxonomy-act-20-cooking-low-and-cooking-slow: 403 Client Error: Forbidden for url: https://www.accountingtoday.com/opinion/token-taxonomy-act-20-cooking-low-and-cooking-slow


Processing URLs:  25%|██▌       | 250/1000 [14:17<40:43,  3.26s/it]  

Error extracting text from http://www.tribuneledgernews.com/extra/news/in-spite-of-tax-hike-ratings-agency-eyes-change-that/article_bf6fce83-a9d8-5e94-9079-00924711f2b9.html: 404 Client Error: Not Found for url: https://www.tribuneledgernews.com/extra/news/in-spite-of-tax-hike-ratings-agency-eyes-change-that/article_bf6fce83-a9d8-5e94-9079-00924711f2b9.html


Processing URLs:  25%|██▌       | 254/1000 [14:27<29:29,  2.37s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/other/ryan_favorableunfavorable-3468.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/other/ryan_favorableunfavorable-3468.html#polls


Processing URLs:  26%|██▌       | 256/1000 [14:29<21:36,  1.74s/it]

Error extracting text from http://www.cctv-america.com/2016/11/06/mosul-offensive-continues: 403 Client Error: Forbidden for url: http://america.cgtn.com/2016/11/06/mosul-offensive-continues


Processing URLs:  26%|██▌       | 258/1000 [14:31<15:07,  1.22s/it]

Error extracting text from http://mashable.com/2017/04/23/drones-social-good-humanitarian-aid/#I5g4ZXeVfiq7: 404 Client Error: Not Found for url: https://mashable.com/2017/04/23/drones-social-good-humanitarian-aid/#I5g4ZXeVfiq7


Processing URLs:  26%|██▌       | 259/1000 [14:33<19:44,  1.60s/it]

Error extracting text from http://af.reuters.com/article/topNews/idAFKCN0VG0BK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=af


Processing URLs:  26%|██▌       | 262/1000 [14:40<22:57,  1.87s/it]

Error extracting text from http://www.equalrightsamendment.org/states.htm: 404 Client Error: Not Found for url: https://www.equalrightsamendment.org/states.htm
Error extracting text from http://www.reuters.com/article/us-venezuela-india-oil-insight-idUSKBN16F0IR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-india-oil-insight-idUSKBN16F0IR


Processing URLs:  27%|██▋       | 268/1000 [14:55<23:21,  1.92s/it]

Error extracting text from http://www.elevenmyanmar.com/politics/myanmar-government-approves-political-dialogue-framework: 404 Client Error: Not Found for url: https://www.elevenmyanmar.com/politics/myanmar-government-approves-political-dialogue-framework


Processing URLs:  27%|██▋       | 269/1000 [14:56<22:21,  1.83s/it]

Error extracting text from https://www.justsecurity.org/42272/making-russian-spy-roadmap-fbi-resolve-russia-gate: 403 Client Error: Forbidden for url: https://www.justsecurity.org/42272/making-russian-spy-roadmap-fbi-resolve-russia-gate


Processing URLs:  27%|██▋       | 272/1000 [15:02<24:13,  2.00s/it]

Error extracting text from https://ec.europa.eu/eurostat/documents/2995521/14083883/2-07012022-AP-EN.pdf/49039c42-31ea-3513-8307-eece31d6b25a): 404 Client Error:  for url: https://ec.europa.eu/eurostat/documents/2995521/14083883/2-07012022-AP-EN.pdf/49039c42-31ea-3513-8307-eece31d6b25a)


Processing URLs:  27%|██▋       | 273/1000 [15:03<21:25,  1.77s/it]

Error extracting text from http://news.yahoo.com/oas-chief-says-conditions-dont-ensure-fair-venezuela-175154648.html: 404 Client Error: Not Found for url: http://news.yahoo.com/oas-chief-says-conditions-dont-ensure-fair-venezuela-175154648.html


Processing URLs:  28%|██▊       | 281/1000 [15:15<15:33,  1.30s/it]

Error extracting text from http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkwlKHYrLcv: 403 Client Error: Forbidden for url: http://gcaptain.com/a-concrete-sample-was-pulled-from-the-new-panama-canal-locks-and-it-does-not-look-good/#.VkwlKHYrLcv


Processing URLs:  28%|██▊       | 283/1000 [15:17<11:49,  1.01it/s]

Error extracting text from https://www.niaid.nih.gov/research/antiviral-discovery: 403 Client Error: Forbidden for url: https://www.niaid.nih.gov/research/antiviral-discovery
Error extracting text from http://www.nytimes.com/2015/12/01/us/politics/iraqi-forces-prepare-next-us-backed-attack-on-isis-with-mosul-on-horizon.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/01/us/politics/iraqi-forces-prepare-next-us-backed-attack-on-isis-with-mosul-on-horizon.html


Processing URLs:  29%|██▉       | 291/1000 [15:30<13:58,  1.18s/it]

Error extracting text from https://www.reuters.com/world/americas/brazils-bolsonaro-says-aid-poor-rise-nearly-60-due-food-costs-2021-06-16/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazils-bolsonaro-says-aid-poor-rise-nearly-60-due-food-costs-2021-06-16/


Processing URLs:  29%|██▉       | 292/1000 [15:31<13:03,  1.11s/it]

Error extracting text from http://link.washingtonpost.com/view/546d0c123b35d071228b77cf4nahm.5dof/7264fe76: HTTPConnectionPool(host='link.washingtonpost.com', port=80): Max retries exceeded with url: /view/546d0c123b35d071228b77cf4nahm.5dof/7264fe76 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300402960>: Failed to resolve 'link.washingtonpost.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 298/1000 [15:38<11:37,  1.01it/s]

Error extracting text from https://www.eureporter.co/frontpage/2016/10/28/mfa-makes-statement-on-polish-governments-response-to-commission-recommadation/: 403 Client Error: Forbidden for url: https://www.eureporter.co/frontpage/2016/10/28/mfa-makes-statement-on-polish-governments-response-to-commission-recommadation/


Processing URLs:  30%|██▉       | 299/1000 [15:38<09:14,  1.26it/s]

Error extracting text from https://www.wsj.com/articles/as-alliances-shift-syrias-tangle-of-wars-grows-more-dangerous-1518690600?mod=searchresults&amp;page=1&amp;pos=1: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/as-alliances-shift-syrias-tangle-of-wars-grows-more-dangerous-1518690600?mod=searchresults&amp;page=1&amp;pos=1


Processing URLs:  30%|███       | 301/1000 [15:41<11:38,  1.00it/s]

Error extracting text from https://tradingeconomics.com/qatar/rating: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/qatar/rating


Processing URLs:  30%|███       | 302/1000 [15:41<09:16,  1.25it/s]

Error extracting text from https://www.scotsman.com/news/politics/alex-salmonds-alba-party-to-essentially-manipulate-list-system-to-secure-support-for-scottish-independence-3180597: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/alex-salmonds-alba-party-to-essentially-manipulate-list-system-to-secure-support-for-scottish-independence-3180597
Error extracting text from http://www.business-standard.com/article/pti-stories/china-set-to-declare-adiz-over-south-china-sea-report-116060101617_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/pti-stories/china-set-to-declare-adiz-over-south-china-sea-report-116060101617_1.html


Processing URLs:  30%|███       | 304/1000 [15:42<06:37,  1.75it/s]

Error extracting text from http://allafrica.com/stories/201605170512.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201605170512.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30748f740>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  30%|███       | 305/1000 [15:52<35:41,  3.08s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/merkels-party-wins-election-in-rivals-german-heartland/2017/05/14/a3de8edc-3913-11e7-a59b-26e0451a96fd_story.html?utm_term=.bfabf8c18d3a: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/merkels-party-wins-election-in-rivals-german-heartland/2017/05/14/a3de8edc-3913-11e7-a59b-26e0451a96fd_story.html?utm_term=.bfabf8c18d3a


Processing URLs:  31%|███       | 306/1000 [15:53<27:26,  2.37s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/01/21/Syria-talks-to-start-even-if-opposition-boycotts-Russian-diplomat.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/01/21/Syria-talks-to-start-even-if-opposition-boycotts-Russian-diplomat.html


Processing URLs:  31%|███       | 307/1000 [15:54<25:24,  2.20s/it]

Error extracting text from http://www.theglobeandmail.com/globe-investor/inside-the-market/market-updates/premarket-global-stocks-climb-as-brexit-grexit-risks-ease/article30134320/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/globe-investor/inside-the-market/market-updates/premarket-global-stocks-climb-as-brexit-grexit-risks-ease/article30134320/


Processing URLs:  31%|███       | 308/1000 [15:55<20:04,  1.74s/it]

Error extracting text from http://www.hybridcars.com/chevy-bolt-production-confirmed-for-2016/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/chevy-bolt-production-confirmed-for-2016/
URL filtered: http://www.bloomberg.com/news/articles/2016-02-16/saudi-arabia-and-russia-agree-oil-output-freeze-in-qatar-talks


Processing URLs:  31%|███       | 310/1000 [15:58<19:23,  1.69s/it]

Error extracting text from https://reut.rs/2Ww7qyx: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  31%|███       | 312/1000 [16:09<42:09,  3.68s/it]

URL filtered: https://www.youtube.com/watch?v=cpJ5Oz6blXs


Processing URLs:  32%|███▏      | 322/1000 [16:31<25:38,  2.27s/it]

Error extracting text from http://pressroom.toyota.com/releases/toyota-uber-ridesharing-collaboration-may-24.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/toyota-uber-ridesharing-collaboration-may-24/


Processing URLs:  32%|███▎      | 325/1000 [16:34<15:08,  1.35s/it]

Error extracting text from http://www.nytimes.com/2016/06/11/business/tesla-motors-model-s-suspension.html?action=click&amp;contentCollection=Business%20Day&amp;module=RelatedCoverage&amp;region=EndOfArticle&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/11/business/tesla-motors-model-s-suspension.html?action=click&amp;contentCollection=Business%20Day&amp;module=RelatedCoverage&amp;region=EndOfArticle&amp;pgtype=article


Processing URLs:  33%|███▎      | 331/1000 [16:48<19:36,  1.76s/it]

Error extracting text from https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/111907/patrick-mouratoglou-explains-challenges-rafael-nadal-faces-in-french-open-preparation/: 403 Client Error: Forbidden for url: https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/111907/patrick-mouratoglou-explains-challenges-rafael-nadal-faces-in-french-open-preparation/


Processing URLs:  33%|███▎      | 333/1000 [17:20<1:54:08, 10.27s/it]

Error extracting text from http://www.todayszaman.com/columnist/joost-lagendijk/a-blessing-in-disguise_406723.html: 522 Server Error:  for url: http://www.todayszaman.com/columnist/joost-lagendijk/a-blessing-in-disguise_406723.html


Processing URLs:  34%|███▎      | 336/1000 [17:24<48:23,  4.37s/it]  

Error extracting text from https://www.salon.com/2017/01/18/congresswoman-gabbard-makes-unannounced-trip-to-syria/: 404 Client Error: Not Found for url: https://www.salon.com/2017/01/18/congresswoman-gabbard-makes-unannounced-trip-to-syria/


Processing URLs:  34%|███▍      | 338/1000 [17:28<34:18,  3.11s/it]

URL filtered: https://www.youtube.com/watch?v=upvZdVK913I


Processing URLs:  34%|███▍      | 340/1000 [17:28<19:06,  1.74s/it]

Error extracting text from https://www.nytimes.com/2021/06/08/us/politics/china-bill-passes.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/08/us/politics/china-bill-passes.html


Processing URLs:  34%|███▍      | 341/1000 [18:29<2:59:18, 16.33s/it]

Error extracting text from http://www.marshallcenter.org/mcpublicweb/en/nav-itemid-fix-news-en/1877-art-news-1-29-jan-16-en.html: HTTPConnectionPool(host='www.marshallcenter.org', port=80): Max retries exceeded with url: /mcpublicweb/en/nav-itemid-fix-news-en/1877-art-news-1-29-jan-16-en.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3011f0350>, 'Connection to www.marshallcenter.org timed out. (connect timeout=60)'))
URL filtered: https://www.linkedin.com/pulse/new-thinking-proposed-understand-insider-spying-human-steve-hammons


Processing URLs:  34%|███▍      | 344/1000 [18:33<1:28:32,  8.10s/it]

Error extracting text from https://www.osw.waw.pl/en/publikacje/osw-commentary/2021-05-17/great-ambitions-russia-expands-lng-market: HTTPSConnectionPool(host='www.osw.waw.pl', port=443): Max retries exceeded with url: /en/publikacje/osw-commentary/2021-05-17/great-ambitions-russia-expands-lng-market (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  35%|███▍      | 346/1000 [18:36<54:39,  5.01s/it]  

Error extracting text from http://www.latimes.com/business/technology/la-fi-tn-apple-earnings-20151027-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/technology/la-fi-tn-apple-earnings-20151027-story.html


Processing URLs:  35%|███▍      | 348/1000 [18:38<33:17,  3.06s/it]

Error extracting text from https://au.news.yahoo.com/world/a/35472824/eus-barnier-says-wants-brexit-talks-without-aggressivity/#page1: 404 Client Error: Not Found for url: https://au.news.yahoo.com/eus-barnier-says-wants-brexit-talks-without-aggressivity-35472824.html#page1


Processing URLs:  35%|███▍      | 349/1000 [18:38<24:33,  2.26s/it]

Error extracting text from http://www.reuters.com/article/2015/10/08/us-mideast-crisis-syria-nato-idUSKCN0S20HJ20151008#JecTh13CPAoDCTX8.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/08/us-mideast-crisis-syria-nato-idUSKCN0S20HJ20151008#JecTh13CPAoDCTX8.97


Processing URLs:  35%|███▌      | 351/1000 [18:40<19:01,  1.76s/it]

Error extracting text from http://europe.newsweek.com/brexit-boris-johnson-donald-tusk-hitler-political-amnesia-460681: 403 Client Error: Forbidden for url: https://www.newsweek.com/brexit-boris-johnson-donald-tusk-hitler-460681


Processing URLs:  35%|███▌      | 353/1000 [18:42<13:03,  1.21s/it]

Error extracting text from http://www.hybridcars.com/october-2016-dashboard/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/october-2016-dashboard/


Processing URLs:  35%|███▌      | 354/1000 [18:43<13:19,  1.24s/it]

URL filtered: https://mobile.twitter.com/WFUNA/status/780425312538652672


Processing URLs:  36%|███▌      | 356/1000 [18:44<10:00,  1.07it/s]

Error extracting text from http://www.amazon.com/gp/new-releases/books/3/ref=zg_bsnr_unv_b_2_2689_2: 503 Server Error: Service Unavailable for url: https://www.amazon.com/gp/new-releases/books/3/ref=zg_bsnr_unv_b_2_2689_2


Processing URLs:  36%|███▌      | 358/1000 [18:46<10:04,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-turkey-russia-diplomat-idUSKBN1490MK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-russia-diplomat-idUSKBN1490MK
URL filtered: https://www.youtube.com/watch?v=wgTPH5y1-ZI


Processing URLs:  36%|███▌      | 360/1000 [18:47<06:17,  1.69it/s]

Error extracting text from https://www.wsj.com/articles/senate-gop-hits-resistance-on-estate-tax-repealfrom-republicans-1507220889?wpmm=1&amp;wpisrc=nl_daily202: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/senate-gop-hits-resistance-on-estate-tax-repealfrom-republicans-1507220889?wpmm=1&amp;wpisrc=nl_daily202


Processing URLs:  36%|███▌      | 361/1000 [18:48<06:58,  1.53it/s]

Error extracting text from https://1tvnews.af/18/03/2021/7838/: 406 Client Error: Not Acceptable for url: https://1tvnews.af/18/03/2021/7838/


Processing URLs:  36%|███▋      | 363/1000 [18:52<12:18,  1.16s/it]

Error extracting text from http://www.inhouselawyer.co.uk/index.php/fraud-and-corporate-crime/10073-consent-and-connivance-the-criminal-liability-of-directors-and-senior-officers: 404 Client Error: Not Found for url: https://www.inhouselawyer.co.uk/fraud-and-corporate-crime/10073-consent-and-connivance-the-criminal-liability-of-directors-and-senior-officers
Error extracting text from https://www.reuters.com/world/middle-east/ahead-talks-opec-forecasts-point-oil-supply-deficit-august-2021-06-28/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/ahead-talks-opec-forecasts-point-oil-supply-deficit-august-2021-06-28/


Processing URLs:  37%|███▋      | 366/1000 [18:54<10:16,  1.03it/s]

Error extracting text from http://www.cfr.org/global/global-conflict-tracker/p32137#!/p32137: 404 Client Error: Not Found for url: https://www.cfr.org/global/global-conflict-tracker/p32137#!/p32137


Processing URLs:  37%|███▋      | 370/1000 [19:00<13:02,  1.24s/it]

Error extracting text from http://espn.go.com/espn/otl/story/_/id/13533995/split-nfl-new-england-patriots-apart: 403 Client Error: Forbidden for url: http://espn.go.com/espn/otl/story/_/id/13533995/split-nfl-new-england-patriots-apart


Processing URLs:  38%|███▊      | 375/1000 [19:03<06:39,  1.56it/s]

Error extracting text from http://www.nytimes.com/2016/04/16/world/europe/russian-forces-remain-heavily-involved-in-syria-despite-appearances.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/16/world/europe/russian-forces-remain-heavily-involved-in-syria-despite-appearances.html?_r=0


Processing URLs:  38%|███▊      | 376/1000 [19:04<08:01,  1.30it/s]

Error extracting text from http://www.un.org/en/ga/search/view_doc.asp?symbol=A/S-30/PV.3: 403 Client Error: Forbidden for url: https://www.un.org/en/ga/search/view_doc.asp?symbol=A/S-30/PV.3


Processing URLs:  38%|███▊      | 380/1000 [19:11<19:33,  1.89s/it]

URL filtered: https://www.campaignsandelections.com/campaign-insider/reaching-cord-cutting-voters?utm_medium=social&amp;utm_source=facebook&amp;utm_campaign=content


Processing URLs:  38%|███▊      | 382/1000 [19:12<12:09,  1.18s/it]

Error extracting text from http://aranews.net/2016/04/pentagon-expect-critical-peshmerga-role-mosul-415-million-aid/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/pentagon-expect-critical-peshmerga-role-mosul-415-million-aid/


Processing URLs:  38%|███▊      | 385/1000 [19:14<08:24,  1.22it/s]

Error extracting text from http://www.wsj.com/articles/the-feds-inflation-expectations-game-1448214981: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-feds-inflation-expectations-game-1448214981


Processing URLs:  39%|███▊      | 386/1000 [19:20<24:09,  2.36s/it]

Error extracting text from http://africa.tvcnews.tv/2016/06/30/anc-youths-call-fund-raisers-help-zuma-pay-nkandla-bill/: HTTPConnectionPool(host='africa.tvcnews.tv', port=80): Max retries exceeded with url: /2016/06/30/anc-youths-call-fund-raisers-help-zuma-pay-nkandla-bill/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe654410>: Failed to establish a new connection: [Errno 51] Network is unreachable'))


Processing URLs:  39%|███▉      | 388/1000 [19:23<18:47,  1.84s/it]

Error extracting text from http://thehill.com/policy/energy-environment/255618-senate-panel-approves-oil-export-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/255618-senate-panel-approves-oil-export-bill/


Processing URLs:  39%|███▉      | 390/1000 [19:25<12:52,  1.27s/it]

Error extracting text from http://mobile.nytimes.com/2016/05/28/world/asia/pakistan-nawaz-sharif-open-heart-surgery.html?_r=0&amp;referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/05/28/world/asia/pakistan-nawaz-sharif-open-heart-surgery.html?_r=0&amp;referer=https://www.google.com/


Processing URLs:  40%|███▉      | 396/1000 [19:33<09:05,  1.11it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-albab-kremlin-idUSKBN15P0ZI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-albab-kremlin-idUSKBN15P0ZI


Processing URLs:  40%|███▉      | 397/1000 [19:33<07:15,  1.38it/s]

Error extracting text from http://www.wsj.com/articles/north-korea-may-have-taken-steps-for-new-nuclear-bomb-test-1445341415: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-may-have-taken-steps-for-new-nuclear-bomb-test-1445341415


Processing URLs:  40%|████      | 400/1000 [20:35<3:06:01, 18.60s/it]

Error extracting text from http://www.mcclatchydc.com/news/politics-government/article127231799.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  40%|████      | 402/1000 [20:37<1:35:59,  9.63s/it]

Error extracting text from http://uk.reuters.com/article/uk-usa-china-drone-idUKKBN1452HV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  40%|████      | 403/1000 [20:38<1:11:16,  7.16s/it]

Error extracting text from http://www-odi.nhtsa.dot.gov/acms/cs/jaxrs/download/doc/UCM533484/RCAK-16V436-7499.pdf: 404 Client Error: Not Found for url: https://www-odi.nhtsa.dot.gov/acms/cs/jaxrs/download/doc/UCM533484/RCAK-16V436-7499.pdf


Processing URLs:  40%|████      | 405/1000 [20:39<36:57,  3.73s/it]  

Error extracting text from http://www.nytimes.com/2015/11/29/opinion/sunday/voting-and-self-interest.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/29/opinion/sunday/voting-and-self-interest.html


Processing URLs:  41%|████      | 409/1000 [20:53<34:56,  3.55s/it]

Error extracting text from http://postimg.org/image/hczrddqej/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/hczrddqej/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f2cc0>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  41%|████      | 412/1000 [20:59<25:07,  2.56s/it]

Error extracting text from https://www.politics.co.uk/comment-analysis/2020/10/28/polling-deep-dive-shows-labour-taking-first-steps-towards-po: 403 Client Error: Forbidden for url: https://www.politics.co.uk/comment-analysis/2020/10/28/polling-deep-dive-shows-labour-taking-first-steps-towards-po


Processing URLs:  42%|████▏     | 418/1000 [21:09<14:42,  1.52s/it]

Error extracting text from https://www.wsj.com/articles/india-china-strike-out-in-talks-to-ease-border-dispute-11633962853: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/india-china-strike-out-in-talks-to-ease-border-dispute-11633962853


Processing URLs:  42%|████▏     | 422/1000 [21:16<15:58,  1.66s/it]

Error extracting text from http://www.eutimes.net/2016/01/beyond-top-secret-hillary-clinton-emails-used-in-russian-court-against-ukraine-pilot/: 406 Client Error: Not Acceptable for url: http://www.eutimes.net/2016/01/beyond-top-secret-hillary-clinton-emails-used-in-russian-court-against-ukraine-pilot/


Processing URLs:  43%|████▎     | 427/1000 [21:21<07:49,  1.22it/s]

Error extracting text from https://www.reuters.com/article/us-britain-eu-vote/half-of-britons-support-a-second-vote-on-brexit-poll-finds-idUSKBN1DX0P6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-vote/half-of-britons-support-a-second-vote-on-brexit-poll-finds-idUSKBN1DX0P6
Error extracting text from http://www.researchgate.net/publication/281765164: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/281765164


Processing URLs:  43%|████▎     | 428/1000 [21:22<08:43,  1.09it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VD07T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VD07T


Processing URLs:  43%|████▎     | 432/1000 [21:28<13:28,  1.42s/it]

Error extracting text from http://en.trend.az/iran/politics/2491893.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2491893.html


Processing URLs:  43%|████▎     | 434/1000 [21:30<10:22,  1.10s/it]

Error extracting text from http://www.reuters.com/article/us-usa-military-iraq-idUSKCN0WW2IF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-military-iraq-idUSKCN0WW2IF


Processing URLs:  44%|████▍     | 438/1000 [21:36<16:02,  1.71s/it]

URL filtered: https://www.youtube.com/watch?v=WVqXE9ZY5wk


Processing URLs:  44%|████▍     | 443/1000 [21:41<10:36,  1.14s/it]

URL filtered: http://www.rand.org/blog/2016/03/beijing-ups-the-ante-in-south-china-sea-dispute-with.html?utm_source=linkedin.com&amp;utm_medium=rand_social


Processing URLs:  45%|████▌     | 454/1000 [21:52<06:37,  1.37it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://opiniao.estadao.com.br/noticias/geral,a-verdade-do-impeachment,10000007338&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://opiniao.estadao.com.br/noticias/geral,a-verdade-do-impeachment,10000007338&amp;prev=search


Processing URLs:  46%|████▌     | 457/1000 [21:56<11:06,  1.23s/it]

Error extracting text from http://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y31G2015120: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUSL8N13Y31G2015120
URL filtered: http://money.cnn.com/2017/01/25/technology/facebook-trending-topics-fake-news-update/


Processing URLs:  46%|████▌     | 462/1000 [21:59<07:24,  1.21it/s]

Error extracting text from https://www.amazon.com/Russians-Hedrick-Smith/dp/0812905210/ref=sr_1_1?crid=25WQ76KDESZO9&amp;dchild=1&amp;keywords=the+russians+hedrick+smith&amp;qid=1616042267&amp;s=books&amp;sprefix=the+russians+%2Cstripbooks%2C214&amp;sr=1-1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Russians-Hedrick-Smith/dp/0812905210/ref=sr_1_1?crid=25WQ76KDESZO9&amp;dchild=1&amp;keywords=the+russians+hedrick+smith&amp;qid=1616042267&amp;s=books&amp;sprefix=the+russians+%2Cstripbooks%2C214&amp;sr=1-1
URL filtered: http://www.bloomberg.com/news/articles/2015-09-24/brazil-impeachment-battle-heats-up-before-rousseff-flies-to-u-s-


Processing URLs:  47%|████▋     | 467/1000 [22:05<10:34,  1.19s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/12/gitrep-11mar16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/12/gitrep-11mar16pm/


Processing URLs:  47%|████▋     | 473/1000 [22:19<12:34,  1.43s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-04-06/leading-iranian-cleric-enters-election-in-threat-to-rouhani


Processing URLs:  48%|████▊     | 475/1000 [22:21<11:34,  1.32s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-20/greek-bailout-talks-end-without-deal-as-migrant-challenges-grow


Processing URLs:  48%|████▊     | 485/1000 [22:32<06:36,  1.30it/s]

Error extracting text from https://www.si.com/mlb/dodgers/news/mlb-news-scott-boras-leveraging-his-power-in-cba-negotiations-says-mlb-expert: 403 Client Error: Forbidden for url: https://www.si.com/mlb/dodgers/news/mlb-news-scott-boras-leveraging-his-power-in-cba-negotiations-says-mlb-expert


Processing URLs:  49%|████▊     | 487/1000 [22:34<08:50,  1.03s/it]

Error extracting text from https://reut.rs/3jyrf36: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-usa-carriers/two-u-s-carrier-groups-conduct-exercises-in-south-china-sea-idUSKBN2A90I5


Processing URLs:  49%|████▉     | 488/1000 [22:35<07:32,  1.13it/s]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/361384-poll-majority-believes-moore-should-be-expelled-if-elected: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/361384-poll-majority-believes-moore-should-be-expelled-if-elected/


Processing URLs:  49%|████▉     | 489/1000 [22:37<09:19,  1.10s/it]

Error extracting text from http://www.ibtimes.com/eu-refugee-crisis-2016-over-91k-asylum-seekers-arrived-germany-january-2293568: 403 Client Error: Forbidden for url: https://www.ibtimes.com/eu-refugee-crisis-2016-over-91k-asylum-seekers-arrived-germany-january-2293568


Processing URLs:  49%|████▉     | 490/1000 [22:37<08:35,  1.01s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-03/20/c_135204800.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-03/20/c_135204800.htm


Processing URLs:  49%|████▉     | 492/1000 [22:41<12:11,  1.44s/it]

Error extracting text from http://www.fews.net/critical-situation-nigeria: 404 Client Error: Not Found for url: https://fews.net:443/critical-situation-nigeria


Processing URLs:  50%|████▉     | 496/1000 [22:45<09:44,  1.16s/it]

Error extracting text from https://newlinesmag.com/argument/keir-starmer-won-the-war-within-labour-but-can-he-win-an-election/: 403 Client Error: Forbidden for url: https://newlinesmag.com/argument/keir-starmer-won-the-war-within-labour-but-can-he-win-an-election/


Processing URLs:  50%|████▉     | 498/1000 [22:46<06:56,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-china-corruption-idUSKBN1A203N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-corruption-idUSKBN1A203N


Processing URLs:  50%|█████     | 501/1000 [22:50<06:59,  1.19it/s]

Error extracting text from https://www.sciencedirect.com/science/article/pii/S0191886912000840: 403 Client Error: Forbidden for url: https://www.sciencedirect.com/science/article/pii/S0191886912000840
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-raqqa-idUSKBN16G1CB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-raqqa-idUSKBN16G1CB


Processing URLs:  51%|█████     | 506/1000 [23:06<25:12,  3.06s/it]

Error extracting text from https://icgstrategicteam.wikispaces.com/: HTTPSConnectionPool(host='icgstrategicteam.wikispaces.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x306e0da30>: Failed to resolve 'icgstrategicteam.wikispaces.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  51%|█████     | 510/1000 [23:35<1:06:12,  8.11s/it]

Error extracting text from https://www.investopedia.com/terms/t/truck-tonnage-index.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/t/truck-tonnage-index.asp


Processing URLs:  51%|█████     | 511/1000 [23:36<49:29,  6.07s/it]  

Error extracting text from https://www.hsgac.senate.gov/media/majority-media/lieberman-details-government-management-needs: 403 Client Error: Forbidden for url: https://www.hsgac.senate.gov/media/majority-media/lieberman-details-government-management-needs
URL filtered: http://www.bloomberg.com/news/articles/2016-09-25/citigroup-accelerates-saudi-return-plans-with-megabond-mandate


Processing URLs:  51%|█████▏    | 513/1000 [23:38<29:44,  3.66s/it]

Error extracting text from http://www.iiss.org/en/events/eu%20conference/sections/eu-conference-2015-6aba/special-session-1-a350/special-session-5-0352: 404 Client Error: Not Found for url: https://www.iiss.org/en/events/eu%20conference/sections/eu-conference-2015-6aba/special-session-1-a350/special-session-5-0352


Processing URLs:  51%|█████▏    | 514/1000 [23:39<23:56,  2.95s/it]

Error extracting text from http://www.reuters.com/article/us-iran-missiles-usa-idUSKCN0WQ1NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-missiles-usa-idUSKCN0WQ1NE


Processing URLs:  52%|█████▏    | 516/1000 [23:41<17:19,  2.15s/it]

Error extracting text from http://www.nationalreview.com/article/425750/gop-establishment-thinks-trump-could-win?target=topic&amp;tid=1707: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/425750/gop-establishment-thinks-trump-could-win/


Processing URLs:  52%|█████▏    | 521/1000 [23:50<13:17,  1.66s/it]

Error extracting text from http://www.samaa.tv/pakistan/2016/10/gunmen-storm-police-academy-college-in-quetta/: 403 Client Error: Forbidden for url: https://www.samaa.tv/pakistan/2016/10/gunmen-storm-police-academy-college-in-quetta/
Error extracting text from https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/publications/national_transportation_statistics/html/table_02_01.html: HTTPSConnectionPool(host='www.rita.dot.gov', port=443): Max retries exceeded with url: /bts/sites/rita.dot.gov.bts/files/publications/national_transportation_statistics/html/table_02_01.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303320650>: Failed to resolve 'www.rita.dot.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  52%|█████▏    | 524/1000 [23:51<06:29,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-witne-idUSKBN17E09O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-witne-idUSKBN17E09O


Processing URLs:  53%|█████▎    | 527/1000 [23:51<03:19,  2.37it/s]

Error extracting text from http://www.wsj.com/articles/omans-oil-minister-says-over-production-irresponsible-1447074611: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/omans-oil-minister-says-over-production-irresponsible-1447074611
URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/misbehaving-markets-seen-no-barrier-this-time-to-unstoppable-fed
Error extracting text from http://www.nytimes.com/2016/03/17/business/dealbook/london-stock-exchange-deutsche-borse-merger.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/17/business/dealbook/london-stock-exchange-deutsche-borse-merger.html?_r=0


Processing URLs:  53%|█████▎    | 528/1000 [23:52<04:07,  1.91it/s]

Error extracting text from https://finance.yahoo.com/news/analysis-iran-oil-output-faces-040845890.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/analysis-iran-oil-output-faces-040845890.html


Processing URLs:  53%|█████▎    | 530/1000 [23:56<07:40,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN18H14J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN18H14J


Processing URLs:  53%|█████▎    | 531/1000 [23:57<07:15,  1.08it/s]

Error extracting text from https://larswericson.wordpress.com/2015/12/15/monday-night-picks/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2015/12/15/monday-night-picks/


Processing URLs:  53%|█████▎    | 534/1000 [24:04<15:12,  1.96s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Jun-01/354752-syrian-opposition-proposes-nationwide-ramadan-truce.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Jun-01/354752-syrian-opposition-proposes-nationwide-ramadan-truce.ashx


Processing URLs:  54%|█████▍    | 539/1000 [24:11<09:34,  1.25s/it]

Error extracting text from http://www.timesofisrael.com/putin-told-assad-to-go-or-be-made-to-go-israeli-officials-say/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/putin-told-assad-to-go-or-be-made-to-go-israeli-officials-say/


Processing URLs:  55%|█████▌    | 553/1000 [24:28<08:32,  1.15s/it]

Error extracting text from https://www.recordedfuture.com/live/sc/16xVQXuDNh21: 404 Client Error: Not Found for url: https://www.recordedfuture.com/live/sc/16xVQXuDNh21


Processing URLs:  56%|█████▌    | 560/1000 [24:38<08:28,  1.15s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/05/06/national/politics-diplomacy/putin-invite-abe-september-vladivostok-economic-forum/#.Vyw5zdieqnM: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/05/06/national/politics-diplomacy/putin-invite-abe-september-vladivostok-economic-forum/#.Vyw5zdieqnM
URL filtered: https://www.youtube.com/watch?v=UR1OpEvpZ_o


Processing URLs:  56%|█████▌    | 562/1000 [25:38<1:45:46, 14.49s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-12-08/roy-moore-chastises-america-praises-putin-in-newly-resurfaced-interview: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  56%|█████▋    | 563/1000 [25:39<1:21:30, 11.19s/it]

Error extracting text from http://www.washingtoninstitute.org/policy-analysis/view/hezbollah-fatalities-in-the-syrian: 404 Client Error: Not Found for url: https://www.washingtoninstitute.org/policy-analysis/view/hezbollah-fatalities-in-the-syrian


Processing URLs:  56%|█████▋    | 565/1000 [25:43<49:44,  6.86s/it]  

Error extracting text from http://www.reuters.com/article/us-myanmar-politics-idUSKCN0VL16I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-myanmar-politics-idUSKCN0VL16I


Processing URLs:  57%|█████▋    | 567/1000 [25:45<28:36,  3.96s/it]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/sherman-kent-and-the-board-of-national-estimates-collected-essays/6words.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/sherman-kent-and-the-board-of-national-estimates-collected-essays/6words.html


Processing URLs:  57%|█████▋    | 569/1000 [25:48<20:20,  2.83s/it]

Error extracting text from http://www.reuters.com/article/us-usa-fed-minutes-idUSKBN1612GZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-fed-minutes-idUSKBN1612GZ


Processing URLs:  57%|█████▊    | 575/1000 [25:56<10:05,  1.43s/it]

Error extracting text from http://www.cnbc.com/2014/12/07/&gt: 404 Client Error: Not Found for url: https://www.cnbc.com/2014/12/07/&gt


Processing URLs:  58%|█████▊    | 578/1000 [26:02<12:44,  1.81s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-28/bond-traders-boost-bets-on-december-fed-interest-rate-increase


Processing URLs:  58%|█████▊    | 580/1000 [26:05<11:04,  1.58s/it]

Error extracting text from http://www.thenational.ae/business/energy/lenders-must-act-to-avoid-another-round-of-commodity-defaults: 404 Client Error: Not Found for url: https://www.thenationalnews.com/business/energy/lenders-must-act-to-avoid-another-round-of-commodity-defaults/


Processing URLs:  58%|█████▊    | 582/1000 [26:07<09:30,  1.36s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2017/01/27/0301000000AEN20170127003851320.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  58%|█████▊    | 584/1000 [26:10<09:38,  1.39s/it]

Error extracting text from https://the-japan-news.com/news/article/0007456518: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0007456518
Error extracting text from https://www.reuters.com/article/us-southsudan-diplomacy/south-sudan-ceasefire-body-says-leaders-breaking-peace-deal-could-face-sanctions-idUSKBN1FE1IG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southsudan-diplomacy/south-sudan-ceasefire-body-says-leaders-breaking-peace-deal-could-face-sanctions-idUSKBN1FE1IG?il=0


Processing URLs:  59%|█████▉    | 591/1000 [26:23<12:13,  1.79s/it]

Error extracting text from http://trailblazersblog.dallasnews.com/2016/01/ted-cruz-will-meet-privately-with-iowa-pastors-in-advance-of-february-caucuses.html/: 404 Client Error: Not Found for url: http://trailblazersblog.dallasnews.com/2016/01/ted-cruz-will-meet-privately-with-iowa-pastors-in-advance-of-february-caucuses.html/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-putin-tillerson-idUSKBN17C13K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-putin-tillerson-idUSKBN17C13K


Processing URLs:  59%|█████▉    | 592/1000 [26:25<11:38,  1.71s/it]



Processing URLs:  59%|█████▉    | 593/1000 [26:25<09:23,  1.39s/it]

URL filtered: https://www.youtube.com/watch?v=t_YXSHkAahE


Processing URLs:  60%|█████▉    | 596/1000 [26:30<09:37,  1.43s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&amp;s=mgfupus2&amp;f=m: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx
URL filtered: https://www.youtube.com/watch?v=oOUjqxec4bA
URL filtered: https://www.youtube.com/watch?v=pRD3jaX1LAI


Processing URLs:  60%|██████    | 604/1000 [26:38<08:58,  1.36s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-vote-idUKKBN0TR1P520151208: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  60%|██████    | 605/1000 [26:38<07:28,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-safrica-zuma-guptas-idUSKCN11817D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-zuma-guptas-idUSKCN11817D


Processing URLs:  61%|██████    | 612/1000 [26:45<06:11,  1.04it/s]

Error extracting text from http://archive.seti.org/epo/news/features/rio-scale.php: HTTPConnectionPool(host='archive.seti.org', port=80): Max retries exceeded with url: /epo/news/features/rio-scale.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302f18830>: Failed to resolve 'archive.seti.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  61%|██████▏   | 613/1000 [26:45<04:51,  1.33it/s]

Error extracting text from https://www.nytimes.com/2020/12/24/health/herd-immunity-covid-coronavirus.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/24/health/herd-immunity-covid-coronavirus.html


Processing URLs:  62%|██████▏   | 618/1000 [26:53<07:26,  1.17s/it]

Error extracting text from https://www.wsj.com/video/why-boeings-starliner-test-launch-is-mission-critical/BD8F48BF-8B3D-4DCA-BAE0-F3473C6D094D.html: 403 Client Error: Forbidden for url: https://www.wsj.com/video/why-boeings-starliner-test-launch-is-mission-critical/BD8F48BF-8B3D-4DCA-BAE0-F3473C6D094D.html


Processing URLs:  62%|██████▏   | 619/1000 [26:53<05:41,  1.12it/s]

Error extracting text from https://www.nytimes.com/2017/07/03/world/asia/trump-xi-jinping-china-north-korea.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/03/world/asia/trump-xi-jinping-china-north-korea.html


Processing URLs:  62%|██████▏   | 621/1000 [26:59<10:31,  1.67s/it]

Error extracting text from http://uk.reuters.com/article/2015/09/30/usa-crude-exports-senate-idUKL1N1203BX20150930: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  62%|██████▏   | 622/1000 [26:59<07:45,  1.23s/it]

Error extracting text from http://www.wsj.com/articles/brazilian-police-seek-to-question-former-president-in-petrobras-probe-1442008371: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazilian-police-seek-to-question-former-president-in-petrobras-probe-1442008371


Processing URLs:  63%|██████▎   | 627/1000 [27:03<05:07,  1.21it/s]

Error extracting text from http://ukraine.csis.org/#496: HTTPConnectionPool(host='ukraine.csis.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x302f1b620>: Failed to resolve 'ukraine.csis.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 629/1000 [27:04<04:43,  1.31it/s]

Error extracting text from https://www.ibtimes.com/scottish-seafood-sector-warns-brexit-risking-its-future-3123182: 403 Client Error: Forbidden for url: https://www.ibtimes.com/scottish-seafood-sector-warns-brexit-risking-its-future-3123182


Processing URLs:  63%|██████▎   | 631/1000 [27:05<03:27,  1.78it/s]

Error extracting text from http://www.reuters.com/article/2015/11/15/us-un-northkorea-idUSKCN0T419R20151115#GptUh3c7eq2JqWeb.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/15/us-un-northkorea-idUSKCN0T419R20151115#GptUh3c7eq2JqWeb.97


Processing URLs:  63%|██████▎   | 633/1000 [27:07<05:23,  1.13it/s]

Error extracting text from http://www.adweek.com/fishbowlny/another-editor-departs-instyle/382393: 403 Client Error: Forbidden for url: https://www.adweek.com/fishbowlny/another-editor-departs-instyle/382393


Processing URLs:  64%|██████▍   | 638/1000 [27:19<09:01,  1.50s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-joe-biden-campaign-helps-hillary-clinton-20151014-column.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-joe-biden-campaign-helps-hillary-clinton-20151014-column.html


Processing URLs:  64%|██████▍   | 640/1000 [28:21<1:54:59, 19.17s/it]

Error extracting text from http://world.einnews.com/article/289502045/yulSFag8ESA8UfbW: HTTPConnectionPool(host='world.einnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  64%|██████▍   | 641/1000 [28:22<1:21:30, 13.62s/it]

Error extracting text from http://thehill.com/blogs/in-the-know/in-the-know/352646-steven-seagal-says-anyone-who-believes-russia-fixed-election-is: 403 Client Error: Forbidden for url: https://thehill.com/blogs/in-the-know/in-the-know/352646-steven-seagal-says-anyone-who-believes-russia-fixed-election-is/
Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-22-may-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-22-may-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3054a5700>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▍   | 647/1000 [28:29<17:27,  2.97s/it]  

Error extracting text from https://www.nytimes.com/2021/02/12/world/asia/myanmar-military-protest.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/12/world/asia/myanmar-military-protest.html
URL filtered: http://www.telegraph.co.uk/finance/newsbysector/mediatechnologyandtelecoms/12153947/The-Independent-newspaper-confirms-an-end-to-print-production.html?utm_source=dlvr.it&amp;utm_medium=twitter


Processing URLs:  65%|██████▍   | 649/1000 [28:30<11:39,  1.99s/it]

Error extracting text from http://sana.sy/en/?p=62206: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  65%|██████▌   | 651/1000 [28:31<07:35,  1.31s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html#: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/wi/wisconsin_senate_johnson_vs_feingold-3740.html


Processing URLs:  66%|██████▌   | 656/1000 [29:03<22:39,  3.95s/it]

Error extracting text from https://www.yahoo.com/news/brazil-impeachment-committee-vote-thursday-014822442--oly.html?ref=gs: 404 Client Error: Not Found for url: https://www.yahoo.com/news/brazil-impeachment-committee-vote-thursday-014822442--oly.html?ref=gs


Processing URLs:  66%|██████▌   | 657/1000 [29:05<20:01,  3.50s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X5237: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X5237
URL filtered: http://www.bloomberg.com/news/articles/2016-04-27/venezuela-faces-its-strangest-shortage-yet-as-inflation-explodes
Error extracting text from http://www.financialexpress.com/economy/cabinet-clears-expansion-of-india-chile-trade-agreement/240495/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/economy/cabinet-clears-expansion-of-india-chile-trade-agreement/240495/


Processing URLs:  66%|██████▌   | 662/1000 [29:07<06:14,  1.11s/it]

Error extracting text from https://www.iol.co.za/news/africa/eritrea-clashes-claim-more-lives-404258: 403 Client Error: Forbidden for url: https://www.iol.co.za/news/africa/eritrea-clashes-claim-more-lives-404258


Processing URLs:  66%|██████▋   | 663/1000 [29:10<08:52,  1.58s/it]

Error extracting text from https://www.france24.com/en/africa/20210709-security-council-backs-african-union-mediation-bid-over-disputed-nile-dam: 403 Client Error: Forbidden for url: https://www.france24.com/en/africa/20210709-security-council-backs-african-union-mediation-bid-over-disputed-nile-dam
URL filtered: https://www.youtube.com/watch?v=rOn2cP1Vk1Y


Processing URLs:  67%|██████▋   | 670/1000 [29:17<06:18,  1.15s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/geos/ve.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/publications/the-world-factbook/geos/ve.html
Error extracting text from http://www.opec.org/opec_web/en/press_room/923.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/923.htm


Processing URLs:  67%|██████▋   | 672/1000 [29:23<11:16,  2.06s/it]

Error extracting text from http://www.parl.gc.ca/housechamberbusiness/chambercalendar.aspx?Language=E: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  68%|██████▊   | 675/1000 [29:28<08:59,  1.66s/it]

URL filtered: https://twitter.com/maximebernier/status/989970810679984128?lang=en


Processing URLs:  68%|██████▊   | 678/1000 [29:31<07:12,  1.34s/it]

Error extracting text from http://www.nytimes.com/2015/12/29/upshot/are-primary-polls-finally-predictive-no-but-this-is-when-the-fun-starts.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/29/upshot/are-primary-polls-finally-predictive-no-but-this-is-when-the-fun-starts.html?_r=0


Processing URLs:  68%|██████▊   | 680/1000 [29:35<08:19,  1.56s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN13S05C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-philippines-idUSKBN13S05C


Processing URLs:  68%|██████▊   | 684/1000 [29:37<04:31,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-opec-iran-idUSKCN18C1H6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-iran-idUSKCN18C1H6


Processing URLs:  69%|██████▊   | 686/1000 [29:39<04:20,  1.20it/s]

Error extracting text from http://ca.reuters.com/article/businessNews/idCAKBN0TQ05F20151207?pageNumber=2&amp;virtualBrandChannel=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=ca


Processing URLs:  69%|██████▉   | 689/1000 [30:44<50:02,  9.65s/it]  

Error extracting text from https://translate.google.com/translate?sl=auto&amp;tl=en&amp;u=https://popis2021.stat.gov.mk/default.aspx: 400 Client Error: Bad Request for url: https://translate.google.com/translate?sl=auto&amp;tl=en&amp;u=https://popis2021.stat.gov.mk/default.aspx
Error extracting text from http://www.reuters.com/article/us-usa-trump-fbi-kushner-exclusive-idUSKBN18N018: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-fbi-kushner-exclusive-idUSKBN18N018


Processing URLs:  69%|██████▉   | 693/1000 [30:51<18:50,  3.68s/it]

Error extracting text from http://earthjustice.org/sites/default/files/files/order-denying-PI.pdf: 404 Client Error: Not Found for url: https://earthjustice.org/wp-content/uploads/order-denying-PI.pdf


Processing URLs:  70%|██████▉   | 696/1000 [30:56<12:25,  2.45s/it]

Error extracting text from http://www.foxnews.com/politics/2015/10/21/biden-to-speak-in-rose-garden/?intcmp=hpbt1: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/10/21/biden-to-speak-in-rose-garden/?intcmp=hpbt1


Processing URLs:  70%|███████   | 703/1000 [31:03<04:40,  1.06it/s]

Error extracting text from http://www.cdm.me/english/brussels-montenegros-progress-is-an-encouraging-sign-ahead-of-nato-ministerial-meeting: 403 Client Error: Forbidden for url: https://www.cdm.me/english/brussels-montenegros-progress-is-an-encouraging-sign-ahead-of-nato-ministerial-meeting


Processing URLs:  71%|███████   | 710/1000 [31:16<08:06,  1.68s/it]

Error extracting text from https://cleantechnica.com/2016/05/15/ev-battery-prices-looking-back-years-forward-yet/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/05/15/ev-battery-prices-looking-back-years-forward-yet/


Processing URLs:  71%|███████▏  | 714/1000 [31:22<06:41,  1.40s/it]

Error extracting text from https://www.bls.gov/news.release/empsit.t01.htm: 403 Client Error: Forbidden for url: https://www.bls.gov/news.release/empsit.t01.htm


Processing URLs:  72%|███████▏  | 718/1000 [31:31<09:03,  1.93s/it]

Error extracting text from http://www.rand.org/pubs/research_reports/RR768.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR768.html


Processing URLs:  72%|███████▏  | 721/1000 [31:37<08:10,  1.76s/it]

Error extracting text from http://greece.greekreporter.com/2016/07/20/greece-lost-e50-billion-from-brain-drain/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/07/20/greece-lost-e50-billion-from-brain-drain/


Processing URLs:  72%|███████▏  | 723/1000 [31:40<07:09,  1.55s/it]

Error extracting text from http://www.thenational.ae/world/middle-east/iraqs-shiite-militias-say-they-will-be-part-of-mosul-battle-despite-war-crimes-claims: 404 Client Error: Not Found for url: https://www.thenationalnews.com/mena/iraqs-shiite-militias-say-they-will-be-part-of-mosul-battle-despite-war-crimes-claims/
Error extracting text from http://www.realclearpolitics.com/epolls/latest_polls/: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/latest_polls/
URL filtered: https://amp.usatoday.com/amp/6778938002?__twitter_impression=true


Processing URLs:  73%|███████▎  | 728/1000 [31:44<05:08,  1.13s/it]

Error extracting text from https://cleantechnica.com/2017/02/28/goldman-sachs-analyst-downgrades-tesla-stock-sell-no-substantial-reason-cited/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2017/02/28/goldman-sachs-analyst-downgrades-tesla-stock-sell-no-substantial-reason-cited/


Processing URLs:  73%|███████▎  | 731/1000 [31:47<04:36,  1.03s/it]

URL filtered: https://www.youtube.com/watch?v=Y1q4Eb34mwM


Processing URLs:  74%|███████▎  | 735/1000 [31:51<04:51,  1.10s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-23/trudeau-off-to-rocky-start-in-canada-s-snap-election-campaign


Processing URLs:  74%|███████▍  | 739/1000 [31:52<02:21,  1.85it/s]

Error extracting text from https://www.reuters.com/article/us-ukraine-parliament-language-idUSKCN1S111N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-parliament-language-idUSKCN1S111N
Error extracting text from http://www.reuters.com/article/us-china-thailand-military-idUSKBN0TD0B120151124: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-thailand-military-idUSKBN0TD0B120151124


Processing URLs:  74%|███████▍  | 741/1000 [31:57<05:35,  1.29s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-06/ukraine-seen-resuming-imf-loan-in-summer-as-graft-ranking-stuck
URL filtered: https://www.youtube.com/watch?v=i-56cmhY_zA


Processing URLs:  74%|███████▍  | 745/1000 [31:58<02:37,  1.62it/s]

Error extracting text from https://financialpost.com/commodities/agriculture/worlds-top-lumber-firm-to-expand-u-s-mill-capacity-amid-boom: 403 Client Error: Forbidden for url: https://financialpost.com/commodities/agriculture/worlds-top-lumber-firm-to-expand-u-s-mill-capacity-amid-boom


Processing URLs:  75%|███████▍  | 747/1000 [32:00<03:43,  1.13it/s]

Error extracting text from http://www.motortrend.com/news/toyota-to-offer-electrified-versions-of-every-car-it-sells-by-2025/: 403 Client Error: Forbidden for url: http://www.motortrend.com/news/toyota-to-offer-electrified-versions-of-every-car-it-sells-by-2025/


Processing URLs:  75%|███████▌  | 751/1000 [32:07<06:58,  1.68s/it]

URL filtered: https://twitter.com/SteffenBilger/status/684070699175022592


Processing URLs:  76%|███████▌  | 757/1000 [32:17<06:16,  1.55s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-14/wall-street-tours-the-tesla-factory-and-loves-what-it-sees


Processing URLs:  76%|███████▌  | 760/1000 [32:19<04:15,  1.06s/it]

Error extracting text from http://www.reuters.com/article/2015/11/02/global-markets-idUSL3N12W0Y820151102#fAJYc33fqluFg4qZ.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/02/global-markets-idUSL3N12W0Y820151102#fAJYc33fqluFg4qZ.99


Processing URLs:  76%|███████▌  | 762/1000 [32:21<03:45,  1.05it/s]

Error extracting text from https://apps.fcc.gov/oetcf/eas/reports/GenericSearch.cfm: 403 Client Error: Forbidden for url: https://apps.fcc.gov/oetcf/eas/reports/GenericSearch.cfm


Processing URLs:  76%|███████▋  | 763/1000 [32:22<03:36,  1.09it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-07-20/venezuela-s-bonds-propelled-by-speculation-pdvsa-is-eyeing-swap


Processing URLs:  77%|███████▋  | 768/1000 [32:27<04:16,  1.11s/it]

Error extracting text from http://asmdc.org/members/a14/news-room/press-releases/bonilla-s-bill-supporting-the-innovation-of-autonomous-vehicles-in-california-passes-senate-committee: 404 Client Error: Not Found for url: https://asmdc.org:443/members/a14/news-room/press-releases/bonilla-s-bill-supporting-the-innovation-of-autonomous-vehicles-in-california-passes-senate-committee
Error extracting text from http://www.reuters.com/article/us-turkey-eu-visa-idUSKBN16I081?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-visa-idUSKBN16I081?il=0


Processing URLs:  77%|███████▋  | 769/1000 [32:28<03:54,  1.01s/it]

Error extracting text from http://thehill.com/homenews/house/254927-house-gop-stunned-by-boehner-resignation: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/254927-house-gop-stunned-by-boehner-resignation/


Processing URLs:  77%|███████▋  | 770/1000 [32:29<03:22,  1.14it/s]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-france-military-b9540252-c0ff-11e5-98c8-7fab78677d51-20160123-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-france-military-b9540252-c0ff-11e5-98c8-7fab78677d51-20160123-story.html


Processing URLs:  77%|███████▋  | 771/1000 [32:29<02:52,  1.33it/s]

Error extracting text from http://seekingalpha.com/article/4032851-tesla-q4-delivery-items-watch: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4032851-tesla-q4-delivery-items-watch


Processing URLs:  77%|███████▋  | 772/1000 [32:30<03:24,  1.12it/s]

Error extracting text from https://www.bundesnetzagentur.de/DE/Beschlusskammern/1_GZ/BK7-GZ/2021/BK7-21-0056/BK7-21-0056_Beiladungsbeschluss_DL_BF.pdf?__blob=publicationFile&amp;v=4: 404 Client Error: Not Found for url: https://www.bundesnetzagentur.de/DE/Beschlusskammern/1_GZ/BK7-GZ/2021/BK7-21-0056/BK7-21-0056_Beiladungsbeschluss_DL_BF.pdf?__blob=publicationFile&amp;v=4


Processing URLs:  78%|███████▊  | 777/1000 [32:36<03:09,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-apec-summit-philippines-russia-idUSKBN13F09V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apec-summit-philippines-russia-idUSKBN13F09V
URL filtered: https://twitter.com/nntaleb/status/1497332363386359809


Processing URLs:  78%|███████▊  | 779/1000 [32:37<02:29,  1.48it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN1062PH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN1062PH


Processing URLs:  78%|███████▊  | 781/1000 [32:39<03:04,  1.18it/s]

Error extracting text from https://www.neweurope.eu/article/putin-erdogan-jump-start-turkish-stream-akkuyu-npp/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/putin-erdogan-jump-start-turkish-stream-akkuyu-npp/


Processing URLs:  78%|███████▊  | 784/1000 [32:41<02:16,  1.59it/s]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN16G0DP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN16G0DP
Error extracting text from http://www.geekwire.com/2017/trumps-pick-transportation-secretary-plays-safe-seattle-transit-spending/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2017/trumps-pick-transportation-secretary-plays-safe-seattle-transit-spending/


Processing URLs:  79%|███████▊  | 786/1000 [32:53<08:51,  2.49s/it]

Error extracting text from http://m.news24.com.ng/Nigeria/National/News/armed-herdsmen-attack-village-leaving-many-dead-20160714: HTTPConnectionPool(host='m.news24.com.ng', port=80): Max retries exceeded with url: /Nigeria/National/News/armed-herdsmen-attack-village-leaving-many-dead-20160714 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebfda0>: Failed to resolve 'm.news24.com.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▊  | 787/1000 [32:54<07:13,  2.03s/it]

Error extracting text from https://ec.europa.eu/commission/sites/beta-political/files/eu-uk-art-50-terms-reference_agreed_amends_en.pdf: 404 Client Error: (Not Found) for url: https://ec.europa.eu/commission/sites/beta-political/files/eu-uk-art-50-terms-reference_agreed_amends_en.pdf


Processing URLs:  79%|███████▉  | 789/1000 [32:57<06:34,  1.87s/it]

Error extracting text from https://www.chathamhouse.org/sites/files/chathamhouse/field/field_document/20151005CyberSecurityNuclearBaylonBruntLivingstoneUpdate.pdf: 404 Client Error: Not Found for url: https://www.chathamhouse.org/sites/files/chathamhouse/field/field_document/20151005CyberSecurityNuclearBaylonBruntLivingstoneUpdate.pdf


Processing URLs:  79%|███████▉  | 791/1000 [33:00<06:17,  1.80s/it]

URL filtered: https://twitter.com/StefanMolyneux/status/1277720270304919552


Processing URLs:  79%|███████▉  | 793/1000 [34:01<50:40, 14.69s/it]

Error extracting text from https://dc.isda.org/documents/2017/11/pdvsa-dc-decision-nov-7.pdfAllianceBernstein: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /documents/2017/11/pdvsa-dc-decision-nov-7.pdfAllianceBernstein (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x304ebec90>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))


Processing URLs:  80%|███████▉  | 795/1000 [34:03<29:42,  8.70s/it]

Error extracting text from https://www.nytimes.com/2015/09/21/world/asia/us-soldiers-told-to-ignore-afghan-allies-abuse-of-boys.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/09/21/world/asia/us-soldiers-told-to-ignore-afghan-allies-abuse-of-boys.html
URL filtered: http://www.bloomberg.com/news/articles/2016-04-01/tesla-s-model-3-lives-up-to-the-hype


Processing URLs:  80%|████████  | 801/1000 [34:11<09:07,  2.75s/it]

Error extracting text from http://inhomelandsecurity.com/why-americans-keep-falling-for-russian-propaganda/: 403 Client Error: Forbidden for url: https://amuedge.com/why-americans-keep-falling-for-russian-propaganda/


Processing URLs:  80%|████████  | 804/1000 [34:14<05:11,  1.59s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/donald-trump-leads-top-ahead-iowa-new-hampshire-poll: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/donald-trump-leads-top-ahead-iowa-new-hampshire-poll/


Processing URLs:  81%|████████  | 806/1000 [34:17<05:32,  1.71s/it]

Error extracting text from http://thebulletin.org/save-inf-treaty%E2%80%94-not-repeating-history11109: 404 Client Error: Not Found for url: https://thebulletin.org/save-inf-treaty%E2%80%94-not-repeating-history11109/


Processing URLs:  81%|████████  | 808/1000 [34:22<05:49,  1.82s/it]

Error extracting text from https://www.scpr.org/programs/airtalk/2017/11/29/60432/scotus-tackles-digital-privacy-in-landmark-case-in/: 403 Client Error: Forbidden for url: https://www.kpcc.org/programs/airtalk/2017/11/29/60432/scotus-tackles-digital-privacy-in-landmark-case-in/
Error extracting text from https://www.reuters.com/article/us-olympics-2018-northkorea-southkorea/south-koreas-moon-hosts-talks-with-north-korean-leaders-sister-idUSKBN1FU05F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-olympics-2018-northkorea-southkorea/south-koreas-moon-hosts-talks-with-north-korean-leaders-sister-idUSKBN1FU05F


Processing URLs:  81%|████████  | 812/1000 [34:27<04:09,  1.33s/it]

Error extracting text from http://training.goodjudgment.com/Ordered_Categorical_Scoring_Rule.pdf: HTTPConnectionPool(host='training.goodjudgment.com', port=80): Max retries exceeded with url: /Ordered_Categorical_Scoring_Rule.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306322240>: Failed to resolve 'training.goodjudgment.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/04/02/business/international/tesla-model-3.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/02/business/international/tesla-model-3.html?_r=0


Processing URLs:  81%|████████▏ | 814/1000 [34:28<02:30,  1.24it/s]

Error extracting text from https://www.nytimes.com/2020/09/02/business/media/trump-biden-debate-moderators.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/09/02/business/media/trump-biden-debate-moderators.html
Error extracting text from http://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212#EIFJhrhbq8rOeV1Q.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212#EIFJhrhbq8rOeV1Q.99
URL filtered: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw


Processing URLs:  82%|████████▏ | 817/1000 [34:32<03:44,  1.23s/it]

Error extracting text from http://thinkprogress.org/justice/2016/05/10/3776868/senate-judiciary-chair-no-problem-trump-appointing-people-supreme-court/: 403 Client Error: Forbidden for url: https://thinkprogress.org/justice/2016/05/10/3776868/senate-judiciary-chair-no-problem-trump-appointing-people-supreme-court/


Processing URLs:  83%|████████▎ | 827/1000 [34:54<05:54,  2.05s/it]

Error extracting text from http://www.cnbc.com/2016/01/12/reuters-america-update-2-us-helping-ukraine-investigate-power-grid-hack.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/01/12/reuters-america-update-2-us-helping-ukraine-investigate-power-grid-hack.html


Processing URLs:  83%|████████▎ | 829/1000 [34:55<04:21,  1.53s/it]

Error extracting text from http://www.rigzone.com/news/oil_gas/a/142903/As_Oil_Suffers_Colombia_Eyes_Tough_Time_for_Bond_Offer: 403 Client Error: Forbidden for url: http://www.rigzone.com/news/oil_gas/a/142903/As_Oil_Suffers_Colombia_Eyes_Tough_Time_for_Bond_Offer


Processing URLs:  83%|████████▎ | 830/1000 [34:56<03:18,  1.17s/it]

Error extracting text from http://www.el-nacional.com/noticias/economia/ano-familias-pagaron-cuatro-veces-mas-por-cesta-alimentaria_188978: 403 Client Error: Forbidden for url: https://www.elnacional.com/noticias/economia/ano-familias-pagaron-cuatro-veces-mas-por-cesta-alimentaria_188978


Processing URLs:  83%|████████▎ | 832/1000 [34:58<03:12,  1.15s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/403549-trump-meets-with-promoter-of-qanon-conspiracy-theory-in-oval?rnd=1535151221: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/403549-trump-meets-with-promoter-of-qanon-conspiracy-theory-in-oval/?rnd=1535151221


Processing URLs:  83%|████████▎ | 834/1000 [34:59<01:56,  1.42it/s]

Error extracting text from https://www.sony.com/en/SonyInfo/IR/library/presen/er/pdf/20q3_supplement.pdf: PyCryptodome is required for AES algorithm
Error extracting text from http://www.nytimes.com/2015/10/20/world/middleeast/iranian-lawmaker-accuses-washington-post-reporter-jason-rezaian-of-sedition-plot.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/20/world/middleeast/iranian-lawmaker-accuses-washington-post-reporter-jason-rezaian-of-sedition-plot.html?_r=0


Processing URLs:  84%|████████▎ | 835/1000 [35:02<03:55,  1.43s/it]

Error extracting text from http://www.theweek.co.uk/eu-referendum/65461/eu-referendum-poll-support-for-brexit-highest-in-two-years: 404 Client Error: Not Found for url: https://theweek.com/eu-referendum/65461/eu-referendum-poll-support-for-brexit-highest-in-two-years
Error extracting text from http://www.malaysia-chronicle.com/index.php?option=com_k2&amp;view=item&amp;id=607334:opec-members-revolt-against-saudis-as-oil-prices-crash&amp;Itemid=3#ixzz3t1uFkAWX: HTTPConnectionPool(host='www.malaysia-chronicle.com', port=80): Max retries exceeded with url: /index.php?option=com_k2&amp;view=item&amp;id=607334:opec-members-revolt-against-saudis-as-oil-prices-crash&amp;Itemid=3 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x300a5f020>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  84%|████████▎ | 837/1000 [35:04<03:15,  1.20s/it]

Error extracting text from http://news.sky.com/story/1505997/taliban-launches-hotline-for-defectors: 404 Client Error: Not Found for url: https://news.sky.com/story/1505997/taliban-launches-hotline-for-defectors


Processing URLs:  84%|████████▍ | 839/1000 [35:09<05:01,  1.87s/it]

Error extracting text from http://www.nepia.com/maritime-alerts/: 404 Client Error: Not Found for url: https://www.nepia.com/maritime-alerts/


Processing URLs:  84%|████████▍ | 842/1000 [35:12<03:41,  1.40s/it]

Error extracting text from https://sites.hks.harvard.edu/fs/rzeckhau/Geopolitical%20Risks.pdf: 404 Client Error: Not Found for url: https://sites.hks.harvard.edu/fs/rzeckhau/Geopolitical%20Risks.pdf
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YM0VG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-idUSKCN0YM0VG


Processing URLs:  85%|████████▍ | 848/1000 [35:18<02:21,  1.08it/s]

Error extracting text from http://www.bls.gov/news.release/pdf/empsit.pdf: 403 Client Error: Forbidden for url: http://www.bls.gov/news.release/pdf/empsit.pdf


Processing URLs:  85%|████████▌ | 852/1000 [35:25<03:16,  1.33s/it]

Error extracting text from https://fcw.com/articles/2017/11/09/voting-hacks-hype-johnson.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2017/11/09/voting-hacks-hype-johnson.aspx/
Error extracting text from http://www.latimes.com/sports/sportsnow/la-sp-sn-nfl-owners-meeting-20151202-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/sports/sportsnow/la-sp-sn-nfl-owners-meeting-20151202-story.html


Processing URLs:  86%|████████▌ | 856/1000 [35:31<03:20,  1.39s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-israel-idUSKCN0XJ14O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-israel-idUSKCN0XJ14O?il=0


Processing URLs:  86%|████████▌ | 858/1000 [35:35<04:11,  1.77s/it]

Error extracting text from https://asean.org/?static_post=rcep-regional-comprehensive-economic-partnership: 403 Client Error: Forbidden for url: https://asean.org/?static_post=rcep-regional-comprehensive-economic-partnership


Processing URLs:  86%|████████▌ | 862/1000 [35:38<02:08,  1.08it/s]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2015/12/07-tusk-letter-to-28ms-on-uk/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2015/12/07-tusk-letter-to-28ms-on-uk/
Error extracting text from http://www.nytimes.com/2016/02/28/us/politics/donald-trump-republican-party.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/28/us/politics/donald-trump-republican-party.html?_r=1


Processing URLs:  86%|████████▋ | 863/1000 [35:38<01:37,  1.40it/s]

Error extracting text from http://www.enewspf.com/latest-news/latest-national/military-casualties/65257-department-of-defense-press-briefing-by-general-breedlove-oct-30-2015.html: 403 Client Error: Forbidden for url: http://www.enewspf.com/latest-news/latest-national/military-casualties/65257-department-of-defense-press-briefing-by-general-breedlove-oct-30-2015.html


Processing URLs:  86%|████████▋ | 864/1000 [35:39<01:41,  1.34it/s]

Error extracting text from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/winter99-00/art6.html: 403 Client Error: Forbidden for url: https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/winter99-00/art6.html
URL filtered: https://www.bloomberg.com/news/articles/2017-05-08/russia-backs-saudi-proposal-to-extend-opec-oil-cuts-beyond-2017


Processing URLs:  87%|████████▋ | 868/1000 [35:43<01:51,  1.19it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-latam-idUSKCN0Y50ST: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-latam-idUSKCN0Y50ST


Processing URLs:  87%|████████▋ | 871/1000 [35:45<01:56,  1.11it/s]

Error extracting text from http://www.nigeriatoday.ng/2016/07/militia-fulani-herdsmen-kill-two-on-taraba-benue-border/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/07/militia-fulani-herdsmen-kill-two-on-taraba-benue-border/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301538050>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: http://www.bloomberg.com/news/articles/2015-12-14/junk-bond-misery-backs-fed-case-for-gradual-rate-rise-no-delay
URL filtered: https://www.youtube.com/watch?v=VIYthyp2lto


Processing URLs:  88%|████████▊ | 875/1000 [35:46<00:59,  2.11it/s]

Error extracting text from http://www.gryphonscientific.com/gain-of-function/: 403 Client Error: Forbidden for url: http://www.gryphonscientific.com/gain-of-function/


Processing URLs:  88%|████████▊ | 876/1000 [35:47<00:54,  2.28it/s]

Error extracting text from https://www.nytimes.com/2017/10/03/us/politics/gerrymandering-supreme-court-wisconsin.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/03/us/politics/gerrymandering-supreme-court-wisconsin.html?_r=0


Processing URLs:  88%|████████▊ | 880/1000 [35:54<02:20,  1.17s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-02/19/c_135111288.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/19/c_135111288.htm


Processing URLs:  88%|████████▊ | 883/1000 [35:58<02:20,  1.20s/it]

Error extracting text from http://www.tandfonline.com/doi/abs/10.1080/17457289.2011.588439: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/abs/10.1080/17457289.2011.588439


Processing URLs:  89%|████████▊ | 886/1000 [36:01<02:10,  1.14s/it]

Error extracting text from http://www.clark.com/major-retailers-closing-2017: 403 Client Error: Forbidden for url: http://clark.com/major-retailers-closing-2017


Processing URLs:  89%|████████▊ | 887/1000 [36:02<01:42,  1.11it/s]

Error extracting text from http://www.fastcompany.com/3035564/after-30-years-macworld-is-no-longer-a-magazine: 403 Client Error: Forbidden for url: https://www.fastcompany.com/3035564/after-30-years-macworld-is-no-longer-a-magazine


Processing URLs:  89%|████████▉ | 894/1000 [36:12<01:52,  1.06s/it]

Error extracting text from http://www.wsj.com/articles/turkish-bank-yapi-kredi-cancels-bond-after-attempted-coup-1468935296: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkish-bank-yapi-kredi-cancels-bond-after-attempted-coup-1468935296


Processing URLs:  90%|█████████ | 900/1000 [36:24<03:09,  1.89s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/turkey-warns-strikes-syrian-kurds-retreat-41713126: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/turkey-warns-strikes-syrian-kurds-retreat-41713126


Processing URLs:  90%|█████████ | 901/1000 [36:25<02:36,  1.58s/it]

Error extracting text from http://www.pravdareport.com/russia/kremlin/12-01-2016/133034-putin_obama_assad-0/#sthash.Bu6JK5of.dpuf: 404 Client Error: Not Found for url: https://www.pravda.ru/russia/kremlin/12-01-2016/133034-putin_obama_assad-0/#sthash.Bu6JK5of.dpuf


Processing URLs:  90%|█████████ | 904/1000 [36:33<03:39,  2.28s/it]

Error extracting text from http://www.ibtimes.co.uk/isis-executioner-killed-by-assassins-occupied-iraqi-city-mosul-1551756: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/isis-executioner-killed-by-assassins-occupied-iraqi-city-mosul-1551756


Processing URLs:  90%|█████████ | 905/1000 [36:34<02:54,  1.83s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-north-insight-idUSKCN0Z8238: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-north-insight-idUSKCN0Z8238


Processing URLs:  91%|█████████ | 908/1000 [36:37<02:02,  1.33s/it]

Error extracting text from http://www.washingtontimes.com/news/2017/may/10/missile-threats-abound-from-rogue-states/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/may/10/missile-threats-abound-from-rogue-states/


Processing URLs:  91%|█████████ | 910/1000 [36:38<01:07,  1.34it/s]

Error extracting text from http://www.wsj.com/articles/oil-little-changed-as-investors-eye-opec-meeting-1449222485: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-little-changed-as-investors-eye-opec-meeting-1449222485
Error extracting text from http://www.reuters.com/article/us-opec-iraq-idUSKBN0U405720151221: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-opec-iraq-idUSKBN0U405720151221


Processing URLs:  91%|█████████ | 912/1000 [36:39<00:51,  1.71it/s]

Error extracting text from https://www.middleeastmonitor.com/20160122-russias-role-in-the-yemen-conflict/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20160122-russias-role-in-the-yemen-conflict/


Processing URLs:  92%|█████████▏| 916/1000 [36:45<01:37,  1.16s/it]

Error extracting text from https://www.23andme.com/publications/for-scientists/: 403 Client Error: Forbidden for url: https://research.23andme.com/publications
Error extracting text from http://www.debkafile.org: HTTPConnectionPool(host='www.debkafile.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3042e67e0>: Failed to resolve 'www.debkafile.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▏| 917/1000 [36:47<01:43,  1.25s/it]

Error extracting text from http://ekurd.net/kurdish-fighters-mosul-2016-10-16: 403 Client Error: Forbidden for url: https://ekurd.net/kurdish-fighters-mosul-2016-10-16


Processing URLs:  92%|█████████▏| 918/1000 [36:47<01:21,  1.00it/s]

Error extracting text from https://www.c-span.org/video/?508652-1/student-athlete-compensation-consolidated-oral-argument: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?508652-1/student-athlete-compensation-consolidated-oral-argument


Processing URLs:  92%|█████████▏| 921/1000 [36:51<01:31,  1.16s/it]

Error extracting text from https://tradingeconomics.com/united-states/interest-rate: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/united-states/interest-rate


Processing URLs:  93%|█████████▎| 926/1000 [37:01<01:36,  1.31s/it]

Error extracting text from https://www.nytimes.com/2021/11/10/us/politics/russia-blinken-ukraine.html): 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/11/10/us/politics/russia-blinken-ukraine.html)


Processing URLs:  93%|█████████▎| 932/1000 [37:10<01:28,  1.30s/it]

Error extracting text from http://thehill.com/homenews/administration/350269-report-putin-proposed-warmer-relationship-with-trump: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/350269-report-putin-proposed-warmer-relationship-with-trump/


Processing URLs:  94%|█████████▍| 940/1000 [37:23<01:42,  1.71s/it]

Error extracting text from http://www.davidhumeinstitute.com/scotland-eu-options-law/: 404 Client Error: Not Found for url: https://davidhumeinstitute.org/scotland-eu-options-law/


Processing URLs:  94%|█████████▍| 944/1000 [37:31<01:51,  1.98s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-11/opinium-poll-shows-u-k-brexit-decision-still-too-close-to-call


Processing URLs:  95%|█████████▍| 946/1000 [37:33<01:17,  1.44s/it]

Error extracting text from http://www.aina.org/news/20160831132845.htm: 404 Client Error:  for url: http://www.aina.org/news/20160831132845.htm


Processing URLs:  95%|█████████▍| 947/1000 [37:33<01:02,  1.18s/it]

Error extracting text from https://www.nytimes.com/2017/02/02/world/middleeast/iran-missile-test-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/02/world/middleeast/iran-missile-test-trump.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=a-lede-package-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  95%|█████████▍| 948/1000 [37:34<00:55,  1.07s/it]

Error extracting text from http://www.counterpunch.org/2016/06/02/the-battle-for-mosul-the-peshmerga-advances-on-isis-stronghold/: 403 Client Error: Forbidden for url: http://www.counterpunch.org/2016/06/02/the-battle-for-mosul-the-peshmerga-advances-on-isis-stronghold/


Processing URLs:  95%|█████████▌| 952/1000 [37:39<00:52,  1.09s/it]

Error extracting text from http://townhall.com/columnists/joyoverbeck/2016/01/29/will-the-world-survive-obamas-last-12-months-n2111637: 403 Client Error: Forbidden for url: https://townhall.com/columnists/joyoverbeck/2016/01/29/will-the-world-survive-obamas-last-12-months-n2111637


Processing URLs:  95%|█████████▌| 953/1000 [37:41<01:03,  1.36s/it]

Error extracting text from https://www.reuters.com/article/uk-britain-eu-hammond/britain-to-submit-brexit-bill-proposal-before-december-eu-meeting-idUSKBN1DJ0AJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-eu-hammond/britain-to-submit-brexit-bill-proposal-before-december-eu-meeting-idUSKBN1DJ0AJ


Processing URLs:  96%|█████████▌| 955/1000 [37:45<01:15,  1.67s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-05/mcconnell-says-moore-will-face-ethics-probe-if-he-wins-election


Processing URLs:  96%|█████████▌| 961/1000 [37:50<00:44,  1.15s/it]

URL filtered: https://twitter.com/sturdyAlex/status/1338052426516082688


Processing URLs:  96%|█████████▋| 965/1000 [37:53<00:28,  1.25it/s]

Error extracting text from https://www.reuters.com/article/us-health-coronavirus-vaccines-kids-excl/exclusive-u-s-decision-on-pfizer-covid-19-shot-for-kids-age-5-11-could-come-in-october-sources-idUKKBN2G620D?edition-redirect=uk: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-coronavirus-vaccines-kids-excl/exclusive-u-s-decision-on-pfizer-covid-19-shot-for-kids-age-5-11-could-come-in-october-sources-idUKKBN2G620D?edition-redirect=uk


Processing URLs:  97%|█████████▋| 968/1000 [37:56<00:23,  1.38it/s]

Error extracting text from http://www.wsj.com/articles/islamic-state-under-attack-in-two-strongholds-1464218889: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/islamic-state-under-attack-in-two-strongholds-1464218889
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://veja.abril.com.br/noticia/brasil/mais-um-listao&amp;usg=ALkJrhjl6sM_cleW6pQPRA_XM0XyBa43mw: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://veja.abril.com.br/noticia/brasil/mais-um-listao&amp;usg=ALkJrhjl6sM_cleW6pQPRA_XM0XyBa43mw


Processing URLs:  97%|█████████▋| 969/1000 [37:56<00:17,  1.77it/s]

Error extracting text from http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/us/politics/donald-trumps-iowa-ground-game-seems-to-be-missing-a-coach.html


Processing URLs:  98%|█████████▊| 977/1000 [38:07<00:28,  1.25s/it]

Error extracting text from http://www.stripes.com/news/us-iraq-consider-more-troops-to-fight-for-mosul-1.405003: 404 Client Error: Not Found for url: https://www.stripes.com:443/theaters/us-iraq-consider-more-troops-to-fight-for-mosul-1.405003


Processing URLs:  98%|█████████▊| 978/1000 [38:07<00:22,  1.04s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/09/25/asia-pacific/launch-long-range-rocket-unlikely-north-korea-anniversary-nuke-test-site-activity-seen-institute/#.VgWAJ-k0r04: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/09/25/asia-pacific/launch-long-range-rocket-unlikely-north-korea-anniversary-nuke-test-site-activity-seen-institute/#.VgWAJ-k0r04


Processing URLs:  98%|█████████▊| 980/1000 [38:23<01:41,  5.07s/it]

Error extracting text from https://www.almasdarnews.com/article/russians-pound-isis-in-raqqa-as-government-forces-advance-east-towards-tabaqa/: 522 Server Error:  for url: https://www.almasdarnews.com/article/russians-pound-isis-in-raqqa-as-government-forces-advance-east-towards-tabaqa/


Processing URLs:  98%|█████████▊| 981/1000 [38:26<01:24,  4.45s/it]

Error extracting text from http://host.madison.com/business/investment/markets-and-stocks/spacex-delayed-again/article_bda6c161-9741-57d5-8234-179613263a89.html: 404 Client Error: Not Found for url: https://madison.com/business/investment/markets-and-stocks/spacex-delayed-again/article_bda6c161-9741-57d5-8234-179613263a89.html


Processing URLs:  99%|█████████▊| 987/1000 [38:37<00:24,  1.89s/it]

Error extracting text from https://gulfstreamnews.com/en/news/?id=e677e8fd-ea33-446e-8d51-15a0c4a65e9d: 403 Client Error: Forbidden for url: https://gulfstreamnews.com/en/news/?id=e677e8fd-ea33-446e-8d51-15a0c4a65e9d
URL filtered: https://www.bloomberg.com/news/articles/2017-04-13/now-the-real-work-begins-for-marijuana-lobbyists-in-canada


Processing URLs:  99%|█████████▉| 989/1000 [38:40<00:17,  1.63s/it]

Error extracting text from http://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/de240398-df23-47b6-8470-91977d38b749.pdf: 404 Client Error: Not Found for url: https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/de240398-df23-47b6-8470-91977d38b749.pdf


Processing URLs:  99%|█████████▉| 994/1000 [38:47<00:07,  1.30s/it]

Error extracting text from http://www.nytimes.com/2016/01/09/business/vw-refuses-to-give-us-states-documents-in-emissions-inquiries.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/09/business/vw-refuses-to-give-us-states-documents-in-emissions-inquiries.html


Processing URLs: 100%|█████████▉| 995/1000 [38:48<00:05,  1.12s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/16/gitrep-15apr16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/16/gitrep-15apr16pm/
URL filtered: https://twitter.com/johncarlosbaez/status/1042609961216335874


Processing URLs: 100%|██████████| 1000/1000 [38:55<00:00,  2.34s/it]
Processing URLs:   0%|          | 2/1000 [00:03<29:29,  1.77s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYHGV76TTDS201-6R8RVVRF5I06MM32E8AJCVREGC


Processing URLs:   0%|          | 4/1000 [00:04<14:00,  1.18it/s]

Error extracting text from http://www.pravdareport.com/news/world/americas/19-12-2016/136442-nato_russia-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/news/world/americas/19-12-2016/136442-nato_russia-0/


Processing URLs:   0%|          | 5/1000 [00:04<12:33,  1.32it/s]

Error extracting text from http://www.politics.co.uk//blogs/2016/06/13/buried-in-a-migration-watch-report-the-truth-about-immigrati?utm_source=Editorial+newsletter&amp;utm_campaign=7b0fb992c2-Pick_of_the_week_17_June6_17_2016&amp;utm_medium=email&amp;utm_term=0_cb6d3a8c9c-7b0fb992c2-180972697&amp;mc_cid=7b0fb992c2&amp;mc_eid=466ab05013: 403 Client Error: Forbidden for url: http://www.politics.co.uk//blogs/2016/06/13/buried-in-a-migration-watch-report-the-truth-about-immigrati?utm_source=Editorial+newsletter&amp;utm_campaign=7b0fb992c2-Pick_of_the_week_17_June6_17_2016&amp;utm_medium=email&amp;utm_term=0_cb6d3a8c9c-7b0fb992c2-180972697&amp;mc_cid=7b0fb992c2&amp;mc_eid=466ab05013


Processing URLs:   1%|          | 12/1000 [00:15<16:29,  1.00s/it]

Error extracting text from http://www.reuters.com/article/2015/10/23/usa-congress-eximbank-idUSL1N12N2BW20151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/usa-congress-eximbank-idUSL1N12N2BW20151023


Processing URLs:   2%|▏         | 17/1000 [00:21<24:49,  1.52s/it]

Error extracting text from http://www.pcpsr.org/en/node/674: 404 Client Error: Not Found for url: http://www.pcpsr.org/en/node/674


Processing URLs:   2%|▏         | 18/1000 [00:23<24:27,  1.49s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X81YA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X81YA


Processing URLs:   2%|▏         | 22/1000 [00:27<17:14,  1.06s/it]

Error extracting text from http://news.yahoo.com/no-signs-north-korea-preparing-rocket-launch-seoul-065653222.html: 404 Client Error: Not Found for url: http://news.yahoo.com/no-signs-north-korea-preparing-rocket-launch-seoul-065653222.html


Processing URLs:   2%|▏         | 23/1000 [00:28<17:36,  1.08s/it]

Error extracting text from http://www.payvand.com/news/16/may/1040.html: 404 Client Error: Not Found for url: http://www.payvand.com/news/16/may/1040.html


Processing URLs:   2%|▎         | 25/1000 [00:40<50:22,  3.10s/it]  

Error extracting text from http://www.maritime-executive.com/article/criminal-complaint-filed-in-panama-cost-dispute: 404 Client Error: Not Found for url: https://www.maritime-executive.com/403.shtml


Processing URLs:   3%|▎         | 29/1000 [00:49<46:40,  2.88s/it]

Error extracting text from http://www.cityindex.co.uk/market-analysis/market-news/39082902015/special-report-us-interest-rates/: 404 Client Error: Not Found for url: https://www.cityindex.com/en-uk/news-and-analysis/market-news/39082902015/special-report-us-interest-rates/


Processing URLs:   3%|▎         | 31/1000 [00:51<31:48,  1.97s/it]

Error extracting text from http://www.alternet.org/tea-party-and-right/robert-reich-what-happened-my-tour-through-red-state-america: 404 Client Error: Not Found for url: https://www.alternet.org/tea-party-and-right/robert-reich-what-happened-my-tour-through-red-state-america


Processing URLs:   4%|▍         | 39/1000 [01:34<1:27:41,  5.48s/it]

Error extracting text from http://www.huffingtonpost.ca/2016/01/05/legalizing-pot-in-canada-will-run-afoul-of-global-treaties-trudeau-warned_n_8918384.html: 502 Server Error: Bad Gateway for url: https://www.huffingtonpost.ca/2016/01/05/legalizing-pot-in-canada-will-run-afoul-of-global-treaties-trudeau-warned_n_8918384.html


Processing URLs:   4%|▍         | 41/1000 [02:03<2:20:11,  8.77s/it]

Error extracting text from http://aliran.com/web-specials/2015-web-specials/dont-let-trans-pacific-partnership-pact-jeopardise-malaysias-future/: 403 Client Error: Forbidden for url: http://aliran.com/web-specials/2015-web-specials/dont-let-trans-pacific-partnership-pact-jeopardise-malaysias-future/


Processing URLs:   4%|▍         | 44/1000 [02:08<59:31,  3.74s/it]  

Error extracting text from http://www.reuters.com/article/us-iran-us-gulf-tension-idUSKBN1AE0DN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-us-gulf-tension-idUSKBN1AE0DN


Processing URLs:   6%|▌         | 55/1000 [02:28<24:41,  1.57s/it]

Error extracting text from http://www.wsj.com/articles/brazils-kingmaker-party-the-pmdb-takes-center-stage-1443487963: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazils-kingmaker-party-the-pmdb-takes-center-stage-1443487963


Processing URLs:   6%|▌         | 57/1000 [02:30<18:26,  1.17s/it]

Error extracting text from http://www.nationmultimedia.com/business/RCEP-talks-extended-to-next-year-amid-hiccups-30295062.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/RCEP-talks-extended-to-next-year-amid-hiccups-30295062.html


Processing URLs:   6%|▌         | 60/1000 [02:36<25:37,  1.64s/it]

Error extracting text from http://aaj.tv/2016/06/chinese-prime-minister-extends-best-wishes-for-speedy-recovery-of-nawaz/: 404 Client Error: Not Found for url: https://www.aaj.tv/2016/06/chinese-prime-minister-extends-best-wishes-for-speedy-recovery-of-nawaz/


Processing URLs:   7%|▋         | 66/1000 [02:44<19:40,  1.26s/it]

Error extracting text from http://aranews.net/2016/06/mosul-islamic-state-executes-four-militants-escaping-battlefront/: 404 Client Error: Not Found for url: http://aranews.net/2016/06/mosul-islamic-state-executes-four-militants-escaping-battlefront/


Processing URLs:   7%|▋         | 71/1000 [02:52<24:52,  1.61s/it]

Error extracting text from https://www.reuters.com/article/us-russia-election-navalny-kremlin/kremlin-eyeing-election-says-opposition-leader-navalny-not-a-threat-idUSKBN1FI103?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-election-navalny-kremlin/kremlin-eyeing-election-says-opposition-leader-navalny-not-a-threat-idUSKBN1FI103?il=0


Processing URLs:   8%|▊         | 77/1000 [03:02<30:24,  1.98s/it]

Error extracting text from http://www.smh.com.au/business/markets/odds-increase-on-extra-bank-of-japan-stimulus-next-month-20150910-gjk3g0.html: 404 Client Error: Not Found for url: https://www.smh.com.au/business/markets/odds-increase-on-extra-bank-of-japan-stimulus-next-month-20150910-gjk3g0.html


Processing URLs:   8%|▊         | 78/1000 [03:05<34:49,  2.27s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-imf-idUSKCN0W5146: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-imf-idUSKCN0W5146


Processing URLs:   8%|▊         | 80/1000 [03:06<23:02,  1.50s/it]

Error extracting text from http://www.southcarolinasc.com/2015/09/boehners-out-however-who-will-take-his-place/: 404 Client Error: Not Found for url: http://southcarolinasc.com/2015/09/boehners-out-however-who-will-take-his-place/


Processing URLs:   8%|▊         | 82/1000 [03:09<21:04,  1.38s/it]

Error extracting text from https://www.nytimes.com/1997/02/05/opinion/a-fateful-error.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/1997/02/05/opinion/a-fateful-error.html


Processing URLs:   8%|▊         | 83/1000 [03:11<22:30,  1.47s/it]

Error extracting text from http://wamc.org/post/500k-heroin-seized-27-arrested-massive-drug-bust-involving-albany: HTTPConnectionPool(host='wamc.org', port=80): Max retries exceeded with url: /post/500k-heroin-seized-27-arrested-massive-drug-bust-involving-albany (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe448e90>: Failed to resolve 'wamc.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   8%|▊         | 85/1000 [03:18<41:39,  2.73s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-germany-autos-idUSKBN1500VJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-germany-autos-idUSKBN1500VJ


Processing URLs:   9%|▊         | 87/1000 [03:19<25:52,  1.70s/it]

Error extracting text from https://seekingalpha.com/article/4076850-spark-therapeutics-upcoming-fda-catalyst: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4076850-spark-therapeutics-upcoming-fda-catalyst


Processing URLs:   9%|▉         | 90/1000 [03:25<27:13,  1.80s/it]

Error extracting text from http://www.wsj.com/articles/anti-donald-trump-forces-see-convention-coup-as-within-reach-1467839099: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/anti-donald-trump-forces-see-convention-coup-as-within-reach-1467839099


Processing URLs:   9%|▉         | 93/1000 [03:28<22:24,  1.48s/it]

Error extracting text from https://stacker.com/stories/3604/15-companies-us-government-tried-break-monopolies.: 404 Client Error: Not Found for url: https://stacker.com/stories/3604/15-companies-us-government-tried-break-monopolies.


Processing URLs:   9%|▉         | 94/1000 [03:30<21:55,  1.45s/it]

Error extracting text from http://www.orbspace.com/Background-Information/Suborbital-vs-Orbital.html: 404 Client Error: Not Found for url: https://www.orbspace.com/Background-Information/Suborbital-vs-Orbital.html


Processing URLs:  10%|▉         | 95/1000 [11:30<35:45:58, 142.27s/it]

Error extracting text from https://transcripts.factcheck.org/: HTTPSConnectionPool(host='transcripts.factcheck.org', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ff3da390>, 'Connection to transcripts.factcheck.org timed out. (connect timeout=60)'))


Processing URLs:  10%|▉         | 97/1000 [11:32<17:46:02, 70.83s/it] 

Error extracting text from http://www.nytimes.com/2016/03/22/world/asia/indonesia-south-china-sea-fishing-boat.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/22/world/asia/indonesia-south-china-sea-fishing-boat.html


Processing URLs:  10%|▉         | 99/1000 [11:38<9:02:55, 36.15s/it] 

Error extracting text from https://www.nytimes.com/2020/08/27/us/kyle-rittenhouse-kenosha-shooting-video.html?smid=tw-nytimes&smtyp=cur: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/08/27/us/kyle-rittenhouse-kenosha-shooting-video.html?smid=tw-nytimes&smtyp=cur


Processing URLs:  10%|█         | 102/1000 [11:42<3:19:12, 13.31s/it]

Error extracting text from http://www.reuters.com/article/us-spain-election-rajoy-idUSKCN0ZD0SY?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-election-rajoy-idUSKCN0ZD0SY?il=0


Processing URLs:  11%|█         | 108/1000 [11:47<32:25,  2.18s/it]  

Error extracting text from http://strategicstudiesinstitute.army.mil/pubs/display.cfm?pubID=1275: HTTPConnectionPool(host='strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /pubs/display.cfm?pubID=1275 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fae60f20>: Failed to resolve 'strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  11%|█         | 109/1000 [11:48<28:35,  1.93s/it]

Error extracting text from http://www.cer.org.uk/sites/default/files/smc2016_20april2016.pdf: 404 Client Error: Not Found for url: https://www.cer.eu/sites/default/files/smc2016_20april2016.pdf


Processing URLs:  11%|█         | 110/1000 [11:51<30:58,  2.09s/it]

Error extracting text from https://registrar.berkeley.edu/sites/default/files/pdf/UCB_AcademicCalendar_2021-22_V2.pdf: 404 Client Error: Not Found for url: https://registrar.berkeley.edu/wp-content/uploads/2021/03/UCB_AcademicCalendar_2021-22_V2.pdf


Processing URLs:  11%|█▏        | 113/1000 [11:54<18:37,  1.26s/it]

Error extracting text from http://news.yahoo.com/china-imports-fall-sign-battered-domestic-demand-025200336.html: 404 Client Error: Not Found for url: http://news.yahoo.com/china-imports-fall-sign-battered-domestic-demand-025200336.html
URL filtered: http://www.bloomberg.com/graphics/2015-fed-rate-hike-predictions/


Processing URLs:  12%|█▏        | 117/1000 [11:57<14:20,  1.03it/s]

Error extracting text from https://evcentral.com.au/volkswagen-vows-to-overtake-tesla-on-tech-and-production-by-2023/#:~:text=Volkswagen%20vows%20to%20overtake%20Tesla%20on%20tech%20and%20production%20by%202023&text=Volkswagen%20says%20it%20will%20have,into%20the%20electric%2Dvehicle%20space.: 406 Client Error: Not Acceptable for url: https://evcentral.com.au/volkswagen-vows-to-overtake-tesla-on-tech-and-production-by-2023/#:~:text=Volkswagen%20vows%20to%20overtake%20Tesla%20on%20tech%20and%20production%20by%202023&text=Volkswagen%20says%20it%20will%20have,into%20the%20electric-vehicle%20space.


Processing URLs:  12%|█▏        | 119/1000 [11:58<11:42,  1.25it/s]

Error extracting text from http://www.france24.com/en/20161101-ban-ki-moon-united-nations-south-sudan-juba-peacekeeping-commander: 403 Client Error: Forbidden for url: http://www.france24.com/en/20161101-ban-ki-moon-united-nations-south-sudan-juba-peacekeeping-commander
Error extracting text from https://www.tesla.com/en_EU/blog/tesla-powerpack-enable-large-scale-sustainable-energy-south-australia?redirect=no: 403 Client Error: Forbidden for url: https://www.tesla.com/en_EU/blog/tesla-powerpack-enable-large-scale-sustainable-energy-south-australia?redirect=no


Processing URLs:  12%|█▏        | 120/1000 [11:58<09:24,  1.56it/s]

Error extracting text from https://www.thestreet.com/story/13633983/1/advocacy-group-criticism-shows-potential-material-impact-of-tesla-crash.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/13633983/1/advocacy-group-criticism-shows-potential-material-impact-of-tesla-crash.html


Processing URLs:  12%|█▏        | 121/1000 [12:01<17:36,  1.20s/it]

Error extracting text from http://inserbia.info/today/2016/01/montenegro-to-begin-accession-negotiations-with-nato-in-mid-february/: 404 Client Error: Not Found for url: https://inserbia.info/today/2016/01/montenegro-to-begin-accession-negotiations-with-nato-in-mid-february/


Processing URLs:  12%|█▏        | 123/1000 [12:06<27:34,  1.89s/it]

URL filtered: https://www.youtube.com/watch?v=eXcUnhtRpdM


Processing URLs:  13%|█▎        | 131/1000 [12:22<31:08,  2.15s/it]

Error extracting text from https://www.google.ch/amp/www.govtech.com/fs/transportation/General-Motors-Anticipates-Completely-Driverless-Cars-on-the-Streets-in-2019.html%3fAMP: 403 Client Error: Forbidden for url: https://www.govtech.com/fs/transportation/General-Motors-Anticipates-Completely-Driverless-Cars-on-the-Streets-in-2019.html?AMP


Processing URLs:  14%|█▎        | 137/1000 [12:33<23:05,  1.61s/it]

Error extracting text from http://www.sciencemag.org/news/2017/01/diamond-vise-turns-hydrogen-metal-potentially-ending-80-year-quest: 403 Client Error: Forbidden for url: https://www.science.org/news/2017/01/diamond-vise-turns-hydrogen-metal-potentially-ending-80-year-quest


Processing URLs:  15%|█▍        | 146/1000 [12:56<30:42,  2.16s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/house-approves-bill-sanction-north-korea-nuke-test-36250242: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/house-approves-bill-sanction-north-korea-nuke-test-36250242


Processing URLs:  15%|█▍        | 149/1000 [13:03<31:16,  2.21s/it]

Error extracting text from http://www.state.gov/p/af/rls/fs/2014/230552.htm: 404 Client Error: Not Found for url: https://www.state.gov/p/af/rls/fs/2014/230552.htm


Processing URLs:  15%|█▌        | 150/1000 [13:03<25:19,  1.79s/it]

Error extracting text from https://www.newsweek.com/tech-ceo-brad-rukstales-who-stormed-capitol-calls-it-worst-decision-his-life-1560046: 403 Client Error: Forbidden for url: https://www.newsweek.com/tech-ceo-brad-rukstales-who-stormed-capitol-calls-it-worst-decision-his-life-1560046
URL filtered: https://www.bloomberg.com/news/articles/2017-11-21/trump-backs-roy-moore-for-senate-seat-despite-sex-allegations


Processing URLs:  15%|█▌        | 154/1000 [13:09<21:59,  1.56s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-15/the-bizarre-theory-that-says-fed-increases-will-spur-inflation


Processing URLs:  16%|█▌        | 157/1000 [13:11<14:24,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/10/16/us-china-asean-idUSKCN0SA05420151016: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/16/us-china-asean-idUSKCN0SA05420151016


Processing URLs:  17%|█▋        | 170/1000 [13:27<11:08,  1.24it/s]

Error extracting text from http://m.state.gov/md253115.htm: HTTPConnectionPool(host='m.state.gov', port=80): Max retries exceeded with url: /md253115.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fb776f00>: Failed to resolve 'm.state.gov' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/01/03/world/middleeast/saudi-arabia-executes-47-sheikh-nimr-shiite-cleric.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/03/world/middleeast/saudi-arabia-executes-47-sheikh-nimr-shiite-cleric.html?_r=0


Processing URLs:  18%|█▊        | 179/1000 [13:39<14:45,  1.08s/it]

Error extracting text from http://www.reuters.com/article/us-usa-china-military-idUSKBN18X2W8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-military-idUSKBN18X2W8
URL filtered: https://www.youtube.com/watch?v=axN-hs4slpY


Processing URLs:  18%|█▊        | 182/1000 [13:44<22:01,  1.62s/it]

Error extracting text from https://reut.rs/3gmxiWL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  19%|█▊        | 187/1000 [13:53<24:19,  1.80s/it]

Error extracting text from http://blog.dilbert.com/post/136042658956/is-iowa-a-caucus-or-a-mental-health-problem: HTTPConnectionPool(host='blog.dilbert.com', port=80): Max retries exceeded with url: /post/136042658956/is-iowa-a-caucus-or-a-mental-health-problem (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3073cebd0>: Failed to resolve 'blog.dilbert.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  19%|█▉        | 189/1000 [13:54<17:01,  1.26s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-turkey-idUSKCN11F0RV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-turkey-idUSKCN11F0RV?il=0
URL filtered: https://www.metaculus.com/questions/558/vaquita-porpoise-declared-extinct-before-2020/mammal-down-30-individuals?utm_source=sciencemagazine&utm_medium=facebook-text&utm_campaign=30vaquitas-10887


Processing URLs:  19%|█▉        | 194/1000 [14:00<17:32,  1.31s/it]

Error extracting text from https://thehill.com/homenews/state-watch/566131-democrats-renew-calls-for-cuomo-to-resign-over-harassment-findings: 403 Client Error: Forbidden for url: https://thehill.com/homenews/state-watch/566131-democrats-renew-calls-for-cuomo-to-resign-over-harassment-findings/


Processing URLs:  20%|█▉        | 196/1000 [15:04<4:01:00, 17.99s/it]

Error extracting text from http://www.fda.gov/BiologicsBloodVaccines/GuidanceComplianceRegulatoryInformation/ProceduresSOPPs/ucm073074.htm#RevWrap: HTTPSConnectionPool(host='www.fda.gov', port=443): Read timed out. (read timeout=60)


Processing URLs:  20%|█▉        | 199/1000 [15:21<2:14:27, 10.07s/it]

Error extracting text from http://www.theaustralian.com.au/news/world/former-senior-fiji-military-commander-ratu-tevita-mara-to-campaign-against-fiji-in-australia/story-e6frg6so-1226072294977: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/news/world/former-senior-fiji-military-commander-ratu-tevita-mara-to-campaign-against-fiji-in-australia/story-e6frg6so-1226072294977?nk=b74932ad17ccda60a328c4e4cc62374d-1706875553


Processing URLs:  20%|██        | 200/1000 [15:21<1:37:04,  7.28s/it]

URL filtered: https://theconversation.com/fifty-nine-labs-around-world-handle-the-deadliest-pathogens-only-a-quarter-score-high-on-safety-161777?utm_source=linkedin&amp;utm_medium=bylinelinkedinbutton


Processing URLs:  21%|██        | 208/1000 [15:32<19:43,  1.49s/it]  

Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-putin-idUSKBN18V16E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-putin-idUSKBN18V16E
Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2021/02/25/iran-declaration-by-the-high-representative-on-behalf-of-the-eu-on-the-joint-comprehensive-plan-of-action-jcpoa/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2021/02/25/iran-declaration-by-the-high-representative-on-behalf-of-the-eu-on-the-joint-comprehensive-plan-of-action-jcpoa/


Processing URLs:  21%|██        | 209/1000 [15:34<18:30,  1.40s/it]

Error extracting text from https://www.reuters.com/article/us-usa-congress-moore/trump-says-u-s-senate-candidate-moore-should-leave-race-if-allegations-true-idUSKBN1DG35G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-moore/trump-says-u-s-senate-candidate-moore-should-leave-race-if-allegations-true-idUSKBN1DG35G


Processing URLs:  21%|██        | 211/1000 [15:36<16:34,  1.26s/it]

Error extracting text from http://www.nhtsa.gov/About+NHTSA/Press+Releases/dot-federal-policy-for-automated-vehicles-09202016: 404 Client Error: Not Found for url: https://www.nhtsa.gov/About+NHTSA/Press+Releases/dot-federal-policy-for-automated-vehicles-09202016


Processing URLs:  21%|██        | 212/1000 [15:37<16:28,  1.25s/it]

Error extracting text from https://www.reuters.com/article/us-somalia-landrights-diaspora/back-to-the-land-friction-as-somali-exiles-return-home-idUSKBN1CH1YK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-landrights-diaspora/back-to-the-land-friction-as-somali-exiles-return-home-idUSKBN1CH1YK


Processing URLs:  21%|██▏       | 214/1000 [15:38<12:40,  1.03it/s]

Error extracting text from https://www.gjopen.com/comments/1370725.: 404 Client Error: Not Found for url: https://www.gjopen.com/comments/1370725.


Processing URLs:  22%|██▏       | 215/1000 [15:39<13:15,  1.01s/it]

Error extracting text from https://whoswhos.org/178627-pavel-grudinin-mahnul-rukoy-na-vyiboryi-v-rf-i-vstretil-novyiy-god-v-germanii: HTTPSConnectionPool(host='whoswhos.org', port=443): Max retries exceeded with url: /178627-pavel-grudinin-mahnul-rukoy-na-vyiboryi-v-rf-i-vstretil-novyiy-god-v-germanii (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1000)')))


Processing URLs:  22%|██▏       | 216/1000 [15:40<11:47,  1.11it/s]

Error extracting text from http://thehill.com/homenews/senate/363535-sasse-rnc-help-for-roy-moore-doesnt-make-any-sense: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/363535-sasse-rnc-help-for-roy-moore-doesnt-make-any-sense/


Processing URLs:  22%|██▏       | 217/1000 [15:41<11:08,  1.17it/s]

Error extracting text from http://ndb.int/BRICS-Bank-to-finance-Indian-Chinese-infrastructure-projects.php: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /BRICS-Bank-to-finance-Indian-Chinese-infrastructure-projects.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304bf07a0>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  22%|██▏       | 219/1000 [15:44<14:51,  1.14s/it]

Error extracting text from http://www.nasdaq.com/article/adp-says-private-payrolls-increased-more-than-expected-in-november-20151202-00489: 403 Client Error: Forbidden for url: http://www.nasdaq.com/article/adp-says-private-payrolls-increased-more-than-expected-in-november-20151202-00489


Processing URLs:  22%|██▏       | 221/1000 [15:46<11:50,  1.10it/s]

Error extracting text from https://www.pnas.org/content/116/9/3460: 403 Client Error: Forbidden for url: https://www.pnas.org/content/116/9/3460


Processing URLs:  23%|██▎       | 229/1000 [15:59<10:50,  1.19it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-abadi-idUSKCN0YV14Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-abadi-idUSKCN0YV14Z
Error extracting text from http://www.timesofisrael.com/corruption-watchdog-seeks-antitrust-probe-of-netanyahu/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/corruption-watchdog-seeks-antitrust-probe-of-netanyahu/


Processing URLs:  23%|██▎       | 231/1000 [16:00<08:27,  1.51it/s]

Error extracting text from https://www.devex.com/news/senate-hearing-urges-diplomatic-pressure-in-famine-response-90694: 403 Client Error: Forbidden for url: https://www.devex.com/news/senate-hearing-urges-diplomatic-pressure-in-famine-response-90694
Error extracting text from http://www.latimes.com/business/la-fi-export-import-bank-vote-20151009-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-export-import-bank-vote-20151009-story.html


Processing URLs:  23%|██▎       | 233/1000 [16:12<37:52,  2.96s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://tribunadonorte.com.br/noticia/pgr-avalia-se-abre-inqua-rito-sobre-dilma/340651&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://tribunadonorte.com.br/noticia/pgr-avalia-se-abre-inqua-rito-sobre-dilma/340651&amp;prev=search


Processing URLs:  23%|██▎       | 234/1000 [16:13<30:04,  2.36s/it]

Error extracting text from http://www.who.int/health_financing/GlobalPushforUHC_final_11Jul14-1.pdf: 404 Client Error: Not Found for url: https://www.who.int/health_financing/GlobalPushforUHC_final_11Jul14-1.pdf


Processing URLs:  24%|██▎       | 235/1000 [16:14<23:37,  1.85s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2015/10/02/0301000000AEN20151002003700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  24%|██▍       | 240/1000 [16:20<16:07,  1.27s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.metrojornal.com.br/nacional/claudio-humberto/cunha-na-camara-blinda-dilma-de-impeachment-262587&amp;usg=ALkJrhi3ZKj6zguWmRLiL9nZHReHb04Zdw: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://www.metrojornal.com.br/nacional/claudio-humberto/cunha-na-camara-blinda-dilma-de-impeachment-262587&amp;usg=ALkJrhi3ZKj6zguWmRLiL9nZHReHb04Zdw


Processing URLs:  24%|██▍       | 243/1000 [16:22<08:59,  1.40it/s]

URL filtered: http://www.telegraph.co.uk/technology/2017/01/16/facebook-combating-fake-news-germany-ahead-election/


Processing URLs:  24%|██▍       | 244/1000 [16:26<21:31,  1.71s/it]

Error extracting text from http://ec.europa.eu/finance/general-policy/docs/banking-union/european-deposit-insurance-scheme/161011-edis-effect-analysis_en.pdf: 404 Client Error: Not Found for url: https://commission.europa.eu/business-economy-euro/banking-and-finance/financial-reforms-and-their-progress_en


Processing URLs:  25%|██▌       | 252/1000 [16:40<18:20,  1.47s/it]

Error extracting text from http://www.business-standard.com/article/current-affairs/afghanistan-president-ashraf-ghani-s-2-day-india-visit-starts-tomorrow-116091300292_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/current-affairs/afghanistan-president-ashraf-ghani-s-2-day-india-visit-starts-tomorrow-116091300292_1.html


Processing URLs:  25%|██▌       | 254/1000 [16:43<15:54,  1.28s/it]

URL filtered: https://www.bloomberg.com/view/articles/2018-01-16/putin-s-real-opposition-is-a-collective-shrug


Processing URLs:  26%|██▌       | 259/1000 [17:01<42:41,  3.46s/it]

URL filtered: https://twitter.com/nancyayoussef/status/711952411921928193


Processing URLs:  26%|██▌       | 261/1000 [17:01<24:35,  2.00s/it]

Error extracting text from https://www.nytimes.com/2017/03/17/world/asia/rex-tillerson-north-korea-nuclear.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/17/world/asia/rex-tillerson-north-korea-nuclear.html
URL filtered: https://twitter.com/ecoanalitica/status/926402792809467904


Processing URLs:  27%|██▋       | 266/1000 [17:07<17:30,  1.43s/it]

Error extracting text from http://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm: 403 Client Error: Forbidden for url: http://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm


Processing URLs:  27%|██▋       | 267/1000 [17:07<14:31,  1.19s/it]

Error extracting text from https://www.france24.com/en/live-news/20210808-tokyo-hands-olympic-baton-to-beijing-but-virus-boycott-calls-weigh: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210808-tokyo-hands-olympic-baton-to-beijing-but-virus-boycott-calls-weigh


Processing URLs:  27%|██▋       | 273/1000 [17:30<27:12,  2.25s/it]

Error extracting text from http://atimes.com/2016/06/iran-assumes-bigger-role-in-syria-by-deploying-its-regular-forces/: 404 Client Error: Not Found for url: https://atimes.com/2016/06/iran-assumes-bigger-role-in-syria-by-deploying-its-regular-forces/


Processing URLs:  28%|██▊       | 276/1000 [17:33<15:35,  1.29s/it]

URL filtered: https://twitter.com/QuicoToro/status/928096602094493696
Error extracting text from http://www.reuters.com/article/us-china-politics-idUSKCN10616O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-politics-idUSKCN10616O?il=0


Processing URLs:  28%|██▊       | 280/1000 [17:45<22:37,  1.89s/it]

Error extracting text from http://goo.gl/F7UyC0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/14/us-brazil-rousseff-idUSKCN0S71V520151014
URL filtered: https://twitter.com/IntelCrab/status/787824039838429184


Processing URLs:  28%|██▊       | 285/1000 [17:53<19:31,  1.64s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/defense/290267-congress-asleep-at-the-wheel-on-us-operations-in-libya: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/defense/290267-congress-asleep-at-the-wheel-on-us-operations-in-libya/


Processing URLs:  29%|██▉       | 288/1000 [17:58<18:45,  1.58s/it]

Error extracting text from https://www.top500.org/statistics/details/country/US: 404 Client Error: Not Found for url: https://www.top500.org/statistics/details/country/US/


Processing URLs:  29%|██▉       | 289/1000 [17:59<17:42,  1.49s/it]

URL filtered: https://www.theguardian.com/world/2018/feb/13/russian-watchdog-orders-youtube-to-remove-navalny-video


Processing URLs:  29%|██▉       | 291/1000 [18:00<10:50,  1.09it/s]

Error extracting text from https://thehill.com/homenews/administration/537012-biden-admin-deeply-concerned-by-russian-court-sentencing-of-navalny: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/537012-biden-admin-deeply-concerned-by-russian-court-sentencing-of-navalny/


Processing URLs:  29%|██▉       | 293/1000 [18:02<11:33,  1.02it/s]

URL filtered: https://www.youtube.com/watch?v=x0zlHyg3odA


Processing URLs:  30%|██▉       | 298/1000 [18:07<11:59,  1.02s/it]

URL filtered: http://www.newsweek.com/russia-putin-bots-linkedin-facebook-trump-clinton-kremlin-critics-poison-war-645696


Processing URLs:  30%|███       | 301/1000 [18:11<14:51,  1.28s/it]

Error extracting text from https://sinocism.com/about-2/: 404 Client Error: Not Found for url: https://sinocism.com/about-2


Processing URLs:  31%|███       | 312/1000 [18:32<27:27,  2.39s/it]

Error extracting text from http://www.europarl.europa.eu/legislative-train/theme-deeper-and-fairer-economic-and-monetary-union/file-european-deposit-insurance-scheme-(edis): 500 Server Error:  for url: https://www.europarl.europa.eu/legislative-train/theme-deeper-and-fairer-economic-and-monetary-union/file-european-deposit-insurance-scheme-(edis)


Processing URLs:  32%|███▏      | 315/1000 [18:36<19:29,  1.71s/it]

Error extracting text from https://www.reuters.com/article/us-lasvegas-shooting/las-vegas-gunman-emailed-about-bump-stocks-months-before-rampage-documents-idUSKBN1F205F: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-lasvegas-shooting/las-vegas-gunman-emailed-about-bump-stocks-months-before-rampage-documents-idUSKBN1F205F


Processing URLs:  32%|███▏      | 319/1000 [18:40<12:51,  1.13s/it]

Error extracting text from https://www.windhorse.aero/article.php/64/windhorse-events-2017-so-far: 404 Client Error: Not Found for url: https://www.aircraftalpha.com/article.php/64/windhorse-events-2017-so-far


Processing URLs:  32%|███▏      | 321/1000 [18:42<11:33,  1.02s/it]

Error extracting text from http://www.worldbulletin.net/news/170224/colombia-govt-confident-of-impending-peace-deal-with-farc: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/news/170224/colombia-govt-confident-of-impending-peace-deal-with-farc


Processing URLs:  32%|███▏      | 323/1000 [18:43<08:00,  1.41it/s]

Error extracting text from http://www.reuters.com/article/2015/10/09/us-eurozone-greece-growth-idUSKCN0S32DB20151009: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/09/us-eurozone-greece-growth-idUSKCN0S32DB20151009


Processing URLs:  33%|███▎      | 328/1000 [18:47<09:28,  1.18it/s]

Error extracting text from http://www.wsj.com/articles/u-k-officials-argue-brexit-effects-on-housing-prices-health-care-immigration-1463778000: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-k-officials-argue-brexit-effects-on-housing-prices-health-care-immigration-1463778000


Processing URLs:  33%|███▎      | 329/1000 [18:57<40:15,  3.60s/it]

Error extracting text from https://www.washingtonpost.com/business/venezuela-says-it-will-seek-to-restructure-foreign-debt/2017/11/02/dae2c756-c02f-11e7-9294-705f80164f6e_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/business/venezuela-says-it-will-seek-to-restructure-foreign-debt/2017/11/02/dae2c756-c02f-11e7-9294-705f80164f6e_story.html


Processing URLs:  33%|███▎      | 330/1000 [19:24<1:56:40, 10.45s/it]

Error extracting text from http://www.worldbulletin.net/news/180948/russia-not-demanding-turkey-quit-eu-for-shanghai-pact: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/news/180948/russia-not-demanding-turkey-quit-eu-for-shanghai-pact


Processing URLs:  33%|███▎      | 332/1000 [20:24<3:37:41, 19.55s/it]

Error extracting text from http://www.irantracker.org/basics/political-structures-iran: HTTPConnectionPool(host='www.irantracker.org', port=80): Max retries exceeded with url: /basics/political-structures-iran (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x302e084d0>, 'Connection to www.irantracker.org timed out. (connect timeout=60)'))


Processing URLs:  33%|███▎      | 333/1000 [20:26<2:48:50, 15.19s/it]

Error extracting text from http://www.ibtimes.com/political-capital/koch-brothers-want-new-constitution-theyre-closer-you-think-2552039: 403 Client Error: Forbidden for url: https://www.ibtimes.com/political-capital/koch-brothers-want-new-constitution-theyre-closer-you-think-2552039


Processing URLs:  33%|███▎      | 334/1000 [20:30<2:16:05, 12.26s/it]

Error extracting text from http://pulse.ng/world/in-south-africa-police-fire-teargas-at-student-protesters-in-johannesburg-id5515841.html: 404 Client Error: Not Found for url: https://www.pulse.ng/world/in-south-africa-police-fire-teargas-at-student-protesters-in-johannesburg-id5515841.html
Error extracting text from http://www.visualcapitalist.com/by-this-measure-the-u-s-has-the-2nd-highest-national-debt/: 403 Client Error: Forbidden for url: http://www.visualcapitalist.com/by-this-measure-the-u-s-has-the-2nd-highest-national-debt/


Processing URLs:  34%|███▎      | 336/1000 [20:34<1:28:22,  7.99s/it]

Error extracting text from http://www.parl.gc.ca/housechamberbusiness/ChamberCalendar.aspx?View=F&amp;Language=E&amp;Mode=1&amp;Parl=42&amp;Ses=1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  34%|███▍      | 339/1000 [20:36<43:03,  3.91s/it]  

Error extracting text from http://www.topspeed.com/cars/car-news/michigan-legislation-to-allow-public-testing-of-autonomous-driving-vehicles-ar174398.html: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
URL filtered: https://twitter.com/UKActionteam/status/1472972539957764096
Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-democrats-plan-filibuster-against-1490283517-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-democrats-plan-filibuster-against-1490283517-htmlstory.html


Processing URLs:  34%|███▍      | 340/1000 [21:36<3:00:22, 16.40s/it]

Error extracting text from https://dc.isda.org/documents/2017/11/pdvsa-dc-decision-nov-7.pdf: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /documents/2017/11/pdvsa-dc-decision-nov-7.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x307570ec0>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))


Processing URLs:  34%|███▍      | 341/1000 [21:38<2:21:07, 12.85s/it]

Error extracting text from http://www.arirang.co.kr/News/News_View.asp?nseq=188578: 404 Client Error:  for url: http://www.arirang.co.kr/News/News_View.asp?nseq=188578


Processing URLs:  35%|███▌      | 352/1000 [21:58<24:38,  2.28s/it]  

Error extracting text from http://seekingalpha.com/article/3752396-apple-foxconn-data-points-to-moderating-iphone-sales-growth?auth_param=o80rv:1b6re1a:17b8fd519744c18fcf2d87d8e68ef63c&amp;uprof=4&amp;dr=1: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3752396-apple-foxconn-data-points-to-moderating-iphone-sales-growth?auth_param=o80rv:1b6re1a:17b8fd519744c18fcf2d87d8e68ef63c&amp;uprof=4&amp;dr=1


Processing URLs:  35%|███▌      | 354/1000 [22:03<25:42,  2.39s/it]

Error extracting text from https://www.econlib.org/my-complete-bet-wiki/: 403 Client Error: Forbidden for url: https://www.econlib.org/my-complete-bet-wiki/


Processing URLs:  36%|███▌      | 356/1000 [22:15<39:32,  3.68s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/israel-court-releases-netanyahu-aides-corruption-case-53355902: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/israel-court-releases-netanyahu-aides-corruption-case-53355902
URL filtered: https://twitter.com/BBCJLandale/status/610794634558709760


Processing URLs:  36%|███▌      | 360/1000 [22:50<1:44:35,  9.81s/it]

Error extracting text from https://aboutcroatia.net/news/croatia/govt-endorses-motion-ratify-protocol-montenegros-nato-entry-28056: 522 Server Error:  for url: https://aboutcroatia.net/news/croatia/govt-endorses-motion-ratify-protocol-montenegros-nato-entry-28056
URL filtered: https://www.bloomberg.com/politics/articles/2017-04-04/merkel-s-election-challenger-schulz-sinks-in-first-national-poll


Processing URLs:  36%|███▌      | 362/1000 [22:50<1:00:24,  5.68s/it]

Error extracting text from http://www.wsj.com/articles/opec-accepts-indonesias-return-to-oil-group-1441719614: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-accepts-indonesias-return-to-oil-group-1441719614


Processing URLs:  37%|███▋      | 367/1000 [22:59<26:27,  2.51s/it]  

Error extracting text from http://www.oddschecker.com/politics/british-politics/scottish-politics/date-of-next-scottish-referendum: 403 Client Error: Forbidden for url: http://www.oddschecker.com/politics/british-politics/scottish-politics/date-of-next-scottish-referendum
URL filtered: https://www.bloomberg.com/news/articles/2017-12-19/saudi-arabia-slows-pace-of-energy-subsidy-cuts-to-boost-economy


Processing URLs:  37%|███▋      | 371/1000 [23:04<14:52,  1.42s/it]

Error extracting text from https://www.nytimes.com/2017/06/27/technology/eu-google-fine.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/27/technology/eu-google-fine.html?_r=0


Processing URLs:  37%|███▋      | 373/1000 [23:07<17:17,  1.65s/it]

Error extracting text from http://www.polioeradication.org/mediaroom/newsstories/Stopping-vaccine-derived-polioviruses/tabid/526/news/1330/Default.aspx: 404 Client Error: Not Found for url: https://polioeradication.org/mediaroom/newsstories/Stopping-vaccine-derived-polioviruses/tabid/526/news/1330/Default.aspx
URL filtered: https://www.bloomberg.com/news/articles/2021-05-04/goldman-readies-its-u-s-workforce-for-return-to-offices-in-june?sref=i2Bc5OtW


Processing URLs:  38%|███▊      | 377/1000 [23:11<11:31,  1.11s/it]

Error extracting text from http://www.nytimes.com/2016/06/27/us/politics/donald-trump-and-rnc-see-common-foe-rogue-delegates.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/27/us/politics/donald-trump-and-rnc-see-common-foe-rogue-delegates.html?_r=0


Processing URLs:  38%|███▊      | 378/1000 [23:12<10:41,  1.03s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-02/09/c_135086905.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-02/09/c_135086905.htm


Processing URLs:  38%|███▊      | 383/1000 [23:24<27:30,  2.67s/it]

Error extracting text from http://www.tv360nigeria.com/un-averts-famine-in-northeast-nigeria/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  38%|███▊      | 385/1000 [23:25<15:39,  1.53s/it]

Error extracting text from http://phx.corporate-ir.net/phoenix.zhtml?c=79687&amp;p=irol-rigcountsoverview: 403 Client Error: Forbidden for url: https://bakerhughesrigcount.gcs-web.com//phoenix.zhtml?c=79687&amp;p=irol-rigcountsoverview
Error extracting text from https://www.oddschecker.com/politics/british-politics/scottish-politics/scottish-election-2021-snp-majority: 403 Client Error: Forbidden for url: https://www.oddschecker.com/politics/british-politics/scottish-politics/scottish-election-2021-snp-majority


Processing URLs:  39%|███▊      | 387/1000 [23:27<14:07,  1.38s/it]

Error extracting text from https://transportevolved.com/2016/01/14/toyota-limits-2016-mirai-hydrogen-fuel-cell-sedan-deliveries-due-to-slow-infrastructure-rollout/: 404 Client Error: Not Found for url: https://www.transportevolved.com/2016/01/14/toyota-limits-2016-mirai-hydrogen-fuel-cell-sedan-deliveries-due-to-slow-infrastructure-rollout/


Processing URLs:  39%|███▉      | 388/1000 [23:28<11:50,  1.16s/it]

Error extracting text from https://www.cicnews.com/2021/09/election-2021-polls-say-conservatives-are-leading-0919081.html#gs.a9296b: 403 Client Error: Forbidden for url: https://www.cicnews.com/2021/09/election-2021-polls-say-conservatives-are-leading-0919081.html#gs.a9296b


Processing URLs:  39%|███▉      | 389/1000 [23:28<09:58,  1.02it/s]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/oh/ohio_senate_portman_vs_strickland-5386.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/oh/ohio_senate_portman_vs_strickland-5386.html


Processing URLs:  39%|███▉      | 390/1000 [23:29<08:51,  1.15it/s]

Error extracting text from http://thehill.com/blogs/ballot-box/256586-poll-clinton-drops-10-points-in-5-days: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/256586-poll-clinton-drops-10-points-in-5-days/


Processing URLs:  39%|███▉      | 392/1000 [23:32<11:57,  1.18s/it]

Error extracting text from http://atimes.com/2016/09/xi-set-to-consolidate-power-by-curbing-communist-youth-league/: 404 Client Error: Not Found for url: https://atimes.com/2016/09/xi-set-to-consolidate-power-by-curbing-communist-youth-league/


Processing URLs:  40%|███▉      | 396/1000 [23:41<19:39,  1.95s/it]

Error extracting text from https://www.nejm.org/doi/full/10.1056/NEJMc2100362: 403 Client Error: Forbidden for url: https://www.nejm.org/doi/full/10.1056/NEJMc2100362


Processing URLs:  40%|███▉      | 398/1000 [23:45<18:26,  1.84s/it]

Error extracting text from http://micanaldepanama.com/expansion/2016/03/panama-canal-inaugurates-scale-model-training-facility-announces-expansion-inauguration-date/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/2016/03/panama-canal-inaugurates-scale-model-training-facility-announces-expansion-inauguration-date/


Processing URLs:  40%|████      | 400/1000 [23:49<18:47,  1.88s/it]

Error extracting text from http://europe.newsweek.com/no-future-assad-syria-saudi-foreign-minister-333865: 403 Client Error: Forbidden for url: https://www.newsweek.com/no-future-assad-syria-saudi-foreign-minister-333865


Processing URLs:  40%|████      | 404/1000 [23:56<14:05,  1.42s/it]

Error extracting text from https://www.nytimes.com/2021/10/05/us/politics/debt-ceiling-filibuster-biden.html?referringSource=articleShare: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/10/05/us/politics/debt-ceiling-filibuster-biden.html?referringSource=articleShare


Processing URLs:  41%|████      | 406/1000 [23:57<10:14,  1.03s/it]

Error extracting text from https://www.lesswrong.com/posts/QEYWkRoCn4fZxXQAY/prizes-for-elk-proposals: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/QEYWkRoCn4fZxXQAY/prizes-for-elk-proposals


Processing URLs:  41%|████      | 408/1000 [23:59<10:33,  1.07s/it]

URL filtered: https://www.bloomberg.com/billionaires/


Processing URLs:  41%|████▏     | 413/1000 [24:09<17:58,  1.84s/it]

Error extracting text from http://www.dhakatribune.com/op-ed/2016/jan/06/russia-clearing-decks-assad: 403 Client Error: Forbidden for url: https://www.dhakatribune.com/op-ed/2016/jan/06/russia-clearing-decks-assad
URL filtered: https://twitter.com/i/web/status/918901399080935424


Processing URLs:  42%|████▏     | 417/1000 [24:14<13:35,  1.40s/it]

Error extracting text from https://www.reuters.com/article/us-russia-egypt-military-airspace-planes/russian-military-working-on-deal-to-use-egyptian-air-bases-document-idUSKBN1DU11D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-egypt-military-airspace-planes/russian-military-working-on-deal-to-use-egyptian-air-bases-document-idUSKBN1DU11D


Processing URLs:  42%|████▏     | 423/1000 [24:20<07:23,  1.30it/s]

URL filtered: https://www.youtube.com/watch?v=75SEy1qu71I
Error extracting text from http://www.nytimes.com/2016/06/02/world/asia/pakistan-nawaz-sharif-heart-surgery.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/02/world/asia/pakistan-nawaz-sharif-heart-surgery.html?_r=0


Processing URLs:  42%|████▏     | 424/1000 [24:20<06:06,  1.57it/s]

Error extracting text from http://www.wsj.com/articles/for-a-46-return-bond-investors-go-to-venezuelaif-they-dare-1477220404: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-a-46-return-bond-investors-go-to-venezuelaif-they-dare-1477220404


Processing URLs:  43%|████▎     | 427/1000 [24:27<13:46,  1.44s/it]

Error extracting text from http://www.latimes.com/opinion/op-ed/la-oe-litman-mueller-firing-20170921-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/op-ed/la-oe-litman-mueller-firing-20170921-story.html


Processing URLs:  43%|████▎     | 430/1000 [24:29<08:41,  1.09it/s]

Error extracting text from https://thehill.com/policy/energy-environment/546741-putin-plans-to-go-to-biden-climate-summit-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/546741-putin-plans-to-go-to-biden-climate-summit-report/


Processing URLs:  44%|████▎     | 437/1000 [24:41<11:10,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN18X1SO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-debt-idUSKBN18X1SO


Processing URLs:  44%|████▍     | 445/1000 [24:59<13:51,  1.50s/it]

Error extracting text from http://www.reuters.com/article/us-asia-security-idUSKCN0YR054?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asia-security-idUSKCN0YR054?il=0


Processing URLs:  45%|████▍     | 447/1000 [25:02<12:18,  1.34s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKCN0XO01U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKCN0XO01U


Processing URLs:  45%|████▌     | 453/1000 [25:16<23:21,  2.56s/it]

Error extracting text from http://tass.ru/en/politics/862267: 404 Client Error: Not Found for url: https://tass.ru/en/politics/862267


Processing URLs:  45%|████▌     | 454/1000 [25:20<27:36,  3.03s/it]

URL filtered: http://www.bloombergview.com/articles/2015-08-28/clinton-s-superdelegate-tipping-point


Processing URLs:  46%|████▌     | 457/1000 [25:22<16:09,  1.79s/it]

Error extracting text from https://www.thecipherbrief.com/article/comey-fallout-1093: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/comey-fallout-1093


Processing URLs:  46%|████▌     | 458/1000 [25:23<12:54,  1.43s/it]

Error extracting text from https://www.predictit.org/Contract/5357/Will-Scottish-Parliament-call-for-an-independence-referendum-in-2017#rules: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/5357/Will-Scottish-Parliament-call-for-an-independence-referendum-in-2017#rules


Processing URLs:  46%|████▋     | 465/1000 [25:32<08:34,  1.04it/s]

Error extracting text from http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html?_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/07/world/middleeast/iran-panel-reverses-disqualification-of-election-candidates.html?_r=1


Processing URLs:  47%|████▋     | 467/1000 [25:35<10:24,  1.17s/it]

Error extracting text from http://www.nationalreview.com/article/433369/hillary-clinton-e-mail-scandal-justice-department-stonewall: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/433369/hillary-clinton-e-mail-scandal-justice-department-stonewall/


Processing URLs:  47%|████▋     | 470/1000 [25:39<09:59,  1.13s/it]

Error extracting text from http://www.reuters.com/article/us-usa-cybersecurity-russia-exclusive-idUSKCN0R12FE20150901: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cybersecurity-russia-exclusive-idUSKCN0R12FE20150901


Processing URLs:  47%|████▋     | 471/1000 [25:40<10:14,  1.16s/it]

Error extracting text from http://tass.ru/en/opinions/834547: 404 Client Error: Not Found for url: https://tass.ru/en/opinions/834547


Processing URLs:  47%|████▋     | 473/1000 [25:42<09:22,  1.07s/it]

Error extracting text from https://www.sofx.com/2016/12/31/philippines-president-duterte-says-u-s-special-forces-gtfo-daily-beast/: 403 Client Error: Forbidden for url: https://www.sofx.com/2016/12/31/philippines-president-duterte-says-u-s-special-forces-gtfo-daily-beast/


Processing URLs:  48%|████▊     | 476/1000 [25:47<11:23,  1.30s/it]

Error extracting text from http://www.ticotimes.net/2015/10/01/builders-vow-to-repair-leak-in-panama-canal: 403 Client Error: Forbidden for url: http://www.ticotimes.net/2015/10/01/builders-vow-to-repair-leak-in-panama-canal


Processing URLs:  48%|████▊     | 479/1000 [25:52<14:10,  1.63s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/eu-pushing-for-ttip-talks%E2%80%99-completion-hefty-trade-agenda-in-2016: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/eu-pushing-for-ttip-talks%E2%80%99-completion-hefty-trade-agenda-in-2016


Processing URLs:  48%|████▊     | 483/1000 [25:57<10:35,  1.23s/it]

Error extracting text from https://uk.reuters.com/article/us-yemen-cholera-un-idUKKBN19W1QF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.nytimes.com/2016/01/12/science/as-us-modernizes-nuclear-weapons-smaller-leaves-some-uneasy.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/12/science/as-us-modernizes-nuclear-weapons-smaller-leaves-some-uneasy.html?_r=0


Processing URLs:  48%|████▊     | 485/1000 [25:59<10:00,  1.17s/it]

Error extracting text from http://www.fda.gov/BiologicsBloodVaccines/DevelopmentApprovalProcess/BiologicalApprovalsbyYear/ucm482397.htm: 404 Client Error: Not Found for url: https://www.fda.gov/vaccines-blood-biologics/biological-approvals-year/2016-biological-license-application-approvals


Processing URLs:  49%|████▉     | 488/1000 [26:16<29:40,  3.48s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-security/crime-casualties-undermine-u-s-gains-on-afghan-battlefield-idUSKBN1DX00W?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-security/crime-casualties-undermine-u-s-gains-on-afghan-battlefield-idUSKBN1DX00W?il=0


Processing URLs:  49%|████▉     | 492/1000 [26:21<15:39,  1.85s/it]

Error extracting text from http://www.mayoclinic.org/tests-procedures/gene-therapy/details/risks/cmc-20243698: 403 Client Error: Forbidden for url: https://www.mayoclinic.org/tests-procedures/gene-therapy/details/risks/cmc-20243698


Processing URLs:  50%|████▉     | 495/1000 [26:25<12:18,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-vietnam-idUSKCN0XA0N2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-vietnam-idUSKCN0XA0N2


Processing URLs:  50%|████▉     | 497/1000 [26:30<14:09,  1.69s/it]

Error extracting text from http://warisboring.com/the-21st-centurys-refugee-crisis-is-just-getting-started/: 403 Client Error: Forbidden for url: http://warisboring.com/the-21st-centurys-refugee-crisis-is-just-getting-started/


Processing URLs:  50%|████▉     | 499/1000 [26:32<11:59,  1.44s/it]

Error extracting text from https://2015burundi.crowdmap.com/page/index/3: 404 Client Error: Not Found for url: https://2015burundi.crowdmap.com/page/index/3


Processing URLs:  50%|█████     | 502/1000 [26:35<09:26,  1.14s/it]

Error extracting text from http://finance.yahoo.com/echarts?s=EURTRY=X&amp;t=5d&amp;l=on&amp;z=m&amp;q=l&amp;c=#{&quot;range&quot;:&quot;5y&quot;,&quot;allowChartStacking&quot;:true: 404 Client Error: Not Found for url: https://finance.yahoo.com/echarts?s=EURTRY=X&amp;t=5d&amp;l=on&amp;z=m&amp;q=l&amp;c=#%7B&quot;range&quot;:&quot;5y&quot;,&quot;allowChartStacking&quot;:true


Processing URLs:  51%|█████     | 509/1000 [27:56<2:46:26, 20.34s/it]

Error extracting text from https://www.betfair.com/exchange/plus/#/politics/market/1.123333545: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3073cf950>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:  51%|█████▏    | 513/1000 [28:04<50:46,  6.26s/it]  

Error extracting text from https://www.nytimes.com/2017/09/06/world/asia/cia-afghanistan-war.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/06/world/asia/cia-afghanistan-war.html


Processing URLs:  51%|█████▏    | 514/1000 [28:04<35:59,  4.44s/it]

Error extracting text from https://www.nytimes.com/2017/10/14/world/americas/chile-coup-cia-museum.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/14/world/americas/chile-coup-cia-museum.html


Processing URLs:  52%|█████▏    | 515/1000 [28:16<54:39,  6.76s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-05/what-can-venezuela-sell-to-pay-bondholders-no-one-really-knows


Processing URLs:  52%|█████▏    | 521/1000 [28:20<12:49,  1.61s/it]

Error extracting text from http://www.nytimes.com/2008/09/22/world/middleeast/22olmert.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2008/09/22/world/middleeast/22olmert.html


Processing URLs:  52%|█████▏    | 522/1000 [28:21<09:42,  1.22s/it]

Error extracting text from https://www.fastcompany.com/40495158/mysterious-hackers-theft-of-cyberweapons-from-nsa-exceeds-damage-done-by-snowden-leaks: 403 Client Error: Forbidden for url: https://www.fastcompany.com/40495158/mysterious-hackers-theft-of-cyberweapons-from-nsa-exceeds-damage-done-by-snowden-leaks


Processing URLs:  52%|█████▏    | 524/1000 [28:25<13:40,  1.72s/it]

Error extracting text from https://fcw.com/articles/2017/08/22/senate-intel-russia-cyber.aspx: 404 Client Error: NOT FOUND for url: https://www.nextgov.com/articles/2017/08/22/senate-intel-russia-cyber.aspx/


Processing URLs:  52%|█████▎    | 525/1000 [28:28<16:33,  2.09s/it]

Error extracting text from https://reut.rs/39AcgCh: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUSKBN29T05J


Processing URLs:  53%|█████▎    | 526/1000 [28:28<12:59,  1.64s/it]

Error extracting text from http://thehill.com/homenews/campaign/362668-pro-trump-groups-poll-finds-moore-up-1-in-alabama: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/362668-pro-trump-groups-poll-finds-moore-up-1-in-alabama/


Processing URLs:  53%|█████▎    | 528/1000 [28:31<10:08,  1.29s/it]

Error extracting text from http://seekingalpha.com/article/3967810-tesla-raise-delivery-guidance-2016: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/3967810-tesla-raise-delivery-guidance-2016


Processing URLs:  53%|█████▎    | 529/1000 [28:32<10:55,  1.39s/it]

Error extracting text from https://www.reuters.com/article/tennessee-blast/nashville-blast-investigation-leads-u-s-agents-to-suburban-home-idUSKBN2900CX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/tennessee-blast/nashville-blast-investigation-leads-u-s-agents-to-suburban-home-idUSKBN2900CX


Processing URLs:  54%|█████▎    | 536/1000 [28:39<06:17,  1.23it/s]

Error extracting text from http://www.autonews.com/article/20150302/BLOG06/150309994/could-toyota-mirais-hot-demand-and-2-year-backlog-undermine-its: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20150302/BLOG06/150309994/could-toyota-mirais-hot-demand-and-2-year-backlog-undermine-its


Processing URLs:  54%|█████▎    | 537/1000 [28:40<07:02,  1.10it/s]

Error extracting text from http://www.gov.me/en/News/159369/Deputy-Prime-Minister-Igor-Luksic-sends-Letter-of-Intent-to-NATO-Secretary-General-Jens-Stoltenberg.html: 404 Client Error: not found for url: https://www.gov.me/en/News/159369/Deputy-Prime-Minister-Igor-Luksic-sends-Letter-of-Intent-to-NATO-Secretary-General-Jens-Stoltenberg.html


Processing URLs:  54%|█████▍    | 539/1000 [28:43<07:39,  1.00it/s]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-trump-putin-moscow-tower-deal-20170828-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-trump-putin-moscow-tower-deal-20170828-story.html


Processing URLs:  54%|█████▍    | 540/1000 [28:43<06:53,  1.11it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0VL16I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0VL16I


Processing URLs:  55%|█████▍    | 549/1000 [29:24<26:42,  3.55s/it]

Error extracting text from https://www.lme.com/en/Metals/Ferrous/LME-Steel-HRC-N-America-Platts#Trading+day+summary: 403 Client Error: Forbidden for url: https://www.lme.com/en/Metals/Ferrous/LME-Steel-HRC-N-America-Platts#Trading+day+summary


Processing URLs:  55%|█████▌    | 552/1000 [29:29<17:24,  2.33s/it]

Error extracting text from http://www.gallup.com/opinion/polling-matters/192695/american-public-opinion-terrorism-guns.aspx: 404 Client Error: Not Found for url: https://www.gallup.com/opinion/polling-matters/192695/american-public-opinion-terrorism-guns.aspx


Processing URLs:  56%|█████▌    | 557/1000 [29:36<11:50,  1.60s/it]

Error extracting text from https://www.who.int/csr/don/10-february-2021-ebola-drc/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/10-february-2021-ebola-drc/en/


Processing URLs:  56%|█████▌    | 558/1000 [29:38<11:55,  1.62s/it]

Error extracting text from http://toyotanews.pressroom.toyota.com/releases/april-2016-sales-chart.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/april-2016-sales-chart/


Processing URLs:  56%|█████▌    | 561/1000 [29:42<09:12,  1.26s/it]

Error extracting text from http://www.nytimes.com/2016/09/28/world/asia/afghanistan-corruption-financial-disclosure.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/28/world/asia/afghanistan-corruption-financial-disclosure.html


Processing URLs:  56%|█████▌    | 562/1000 [29:45<12:34,  1.72s/it]

URL filtered: https://www.bloomberg.com/gadfly/articles/2016-09-13/asia-s-junk-bond-rally-rests-on-a-chinese-house-of-cards


Processing URLs:  57%|█████▋    | 567/1000 [29:51<10:14,  1.42s/it]

Error extracting text from https://bit.ly/2Q4pvEe: 403 Client Error: Forbidden for url: https://www.ipsos.com/ipsos-mori/en-uk/snp-retains-strong-lead-independence-dominates-voters-concerns


Processing URLs:  57%|█████▋    | 569/1000 [29:53<07:29,  1.04s/it]

Error extracting text from https://www.advancedligo.mit.edu/: HTTPSConnectionPool(host='www.advancedligo.mit.edu', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from http://www.nytimes.com/2012/12/24/world/asia/north-korean-rocket-had-military-purpose-seoul-says.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2012/12/24/world/asia/north-korean-rocket-had-military-purpose-seoul-says.html


Processing URLs:  58%|█████▊    | 578/1000 [30:01<03:55,  1.79it/s]

Error extracting text from http://beta.latimes.com/world/asia/la-fg-afghanistan-protest-20170703-story.html: 400 Client Error: Bad Request for url: http://beta.latimes.com/world/asia/la-fg-afghanistan-protest-20170703-story.html
URL filtered: https://twitter.com/hashtag/caucusfortrump
Error extracting text from https://onlinelibrary.wiley.com/doi/epdf/10.1002/bies.202000240: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/epdf/10.1002/bies.202000240
URL filtered: https://twitter.com/geopolitiquee/status/796676802894852100


Processing URLs:  58%|█████▊    | 584/1000 [31:06<1:53:18, 16.34s/it]

Error extracting text from http://www.thecountrycaller.com/28585-tesla-motors-inc-tsla-model-3-launch-to-be-delayed-by-two-years-car-and-driver/: HTTPConnectionPool(host='www.thecountrycaller.com', port=80): Max retries exceeded with url: /28585-tesla-motors-inc-tsla-model-3-launch-to-be-delayed-by-two-years-car-and-driver/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30787fb00>, 'Connection to www.thecountrycaller.com timed out. (connect timeout=60)'))


Processing URLs:  58%|█████▊    | 585/1000 [31:09<1:26:49, 12.55s/it]

Error extracting text from http://www.foxnews.com/world/2015/10/28/two-poets-jailed-in-iranian-hard-liners-crackdown/?intcmp=hplnws: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/10/28/two-poets-jailed-in-iranian-hard-liners-crackdown/?intcmp=hplnws


Processing URLs:  59%|█████▉    | 589/1000 [31:16<30:55,  4.51s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-10-22/south-africa-faces-downgrade-risk-with-budget-marred-by-protests


Processing URLs:  59%|█████▉    | 591/1000 [31:17<18:35,  2.73s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15P23Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN15P23Z


Processing URLs:  59%|█████▉    | 592/1000 [31:18<15:16,  2.25s/it]

Error extracting text from http://nationalinterest.org/feature/irans-plan-syria-without-assad-14762: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/irans-plan-syria-without-assad-14762


Processing URLs:  60%|█████▉    | 597/1000 [31:23<09:10,  1.37s/it]

Error extracting text from http://www.cnbc.com/2016/02/16/oil-prices-spike-on-reports-of-saudi-russia-output-cut-talks.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/02/16/oil-prices-spike-on-reports-of-saudi-russia-output-cut-talks.html


Processing URLs:  60%|██████    | 600/1000 [31:29<11:41,  1.75s/it]

Error extracting text from https://presidencia.gob.do/noticias/escuelas-vocacionales-de-las-fuerzas-armadas-graduan-793-jovenes-en-varias-areas: 404 Client Error: Not Found for url: https://presidencia.gob.do/noticias/escuelas-vocacionales-de-las-fuerzas-armadas-graduan-793-jovenes-en-varias-areas
URL filtered: https://www.bloomberg.com/news/articles/2016-10-19/scotland-prepares-a-new-independence-vote-in-bid-to-beat-brexit
URL filtered: https://www.bloomberg.com/news/articles/2017-12-11/saudis-are-said-to-plan-80-gasoline-price-increase-in-january


Processing URLs:  60%|██████    | 604/1000 [31:31<05:30,  1.20it/s]

Error extracting text from http://www.brookings.edu/research/opinions/2013/07/31-russia-china-pacific-pivot-hill: 404 Client Error: Not Found for url: https://www.brookings.edu/articles/opinions/2013/07/31-russia-china-pacific-pivot-hill


Processing URLs:  60%|██████    | 605/1000 [31:31<04:55,  1.34it/s]

Error extracting text from https://www.nytimes.com/2017/06/06/us/politics/comey-sessions-trump.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/06/us/politics/comey-sessions-trump.html?_r=0


Processing URLs:  61%|██████    | 611/1000 [31:40<08:06,  1.25s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-bonds/venezuela-bonds-rally-ahead-of-monday-creditors-meeting-idUSKBN1DA2EG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-bonds/venezuela-bonds-rally-ahead-of-monday-creditors-meeting-idUSKBN1DA2EG


Processing URLs:  61%|██████    | 612/1000 [31:45<15:49,  2.45s/it]

Error extracting text from http://www.asean.org/images/2015/July/external_trade_statistic/table20_asof17June15.pdf: 404 Client Error: Not Found for url: https://asean.org/images/2015/July/external_trade_statistic/table20_asof17June15.pdf


Processing URLs:  61%|██████▏   | 613/1000 [31:46<13:46,  2.14s/it]

Error extracting text from https://asmbs.org/resources/weight-and-type-2-diabetes-after-bariatric-surgery-fact-sheet: 404 Client Error: Not Found for url: https://asmbs.org/resources/weight-and-type-2-diabetes-after-bariatric-surgery-fact-sheet


Processing URLs:  62%|██████▏   | 619/1000 [31:54<09:00,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-nunes-idUSKBN1781TE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-nunes-idUSKBN1781TE


Processing URLs:  62%|██████▏   | 620/1000 [31:55<07:38,  1.21s/it]

Error extracting text from http://amti.csis.org/south-china-sea-civilian-air-patrol-capability-and-the-u-s-japan-alliance/: 403 Client Error: Forbidden for url: http://amti.csis.org/south-china-sea-civilian-air-patrol-capability-and-the-u-s-japan-alliance/


Processing URLs:  62%|██████▏   | 623/1000 [31:57<04:43,  1.33it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=it&amp;u=https://www.repubblica.it/politica/2021/02/09/news/governo-draghi-ultime-notizie-di-oggi-286699764/&amp;prev=search&amp;pto=aue: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=it&amp;u=https://www.repubblica.it/politica/2021/02/09/news/governo-draghi-ultime-notizie-di-oggi-286699764/&amp;prev=search&amp;pto=aue
Error extracting text from https://www.reuters.com/article/us-russia-election-navalny-kremlin/kremlin-eyeing-election-says-opposition-leader-navalny-not-a-threat-idUSKBN1FI103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-election-navalny-kremlin/kremlin-eyeing-election-says-opposition-leader-navalny-not-a-threat-idUSKBN1FI103


Processing URLs:  63%|██████▎   | 630/1000 [32:12<08:15,  1.34s/it]

Error extracting text from http://www.reuters.com/article/2015/10/06/us-imf-g20-japan-idUSKCN0S003420151006: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/06/us-imf-g20-japan-idUSKCN0S003420151006


Processing URLs:  64%|██████▎   | 637/1000 [32:27<08:53,  1.47s/it]

Error extracting text from http://www.presstv.com/Detail/2016/03/09/454833/hillary-clinton-us-iran-ballistic-missile-jcpoa/: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2016/03/09/454833/hillary-clinton-us-iran-ballistic-missile-jcpoa/
Error extracting text from http://www.nytimes.com/2015/11/18/opinion/venezuelas-threatened-elections.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/18/opinion/venezuelas-threatened-elections.html?_r=0


Processing URLs:  64%|██████▍   | 638/1000 [32:27<06:34,  1.09s/it]

Error extracting text from http://www.reuters.com/article/us-china-military-idUSKCN0V714B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-military-idUSKCN0V714B


Processing URLs:  64%|██████▍   | 640/1000 [32:29<06:49,  1.14s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-icm-idUKKCN0YM1UF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  64%|██████▍   | 641/1000 [32:30<05:31,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKCN10Q28M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN10Q28M


Processing URLs:  64%|██████▍   | 645/1000 [32:40<08:50,  1.49s/it]

URL filtered: http://blogs.cfr.org/geographics/2015/12/04/sdr/?cid=soc-twitter-in-want_to_borrow_from_imf-120415


Processing URLs:  65%|██████▍   | 647/1000 [32:46<10:48,  1.84s/it]

Error extracting text from http://www.latimes.com/business/autos/la-fi-hy-lapd-chooses-bmw-20160608-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/autos/la-fi-hy-lapd-chooses-bmw-20160608-snap-story.html


Processing URLs:  65%|██████▍   | 649/1000 [32:47<07:36,  1.30s/it]

Error extracting text from https://www.justsecurity.org/43568/presidents-pardons-paradox-granting-aid-prosecution/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/43568/presidents-pardons-paradox-granting-aid-prosecution/
Error extracting text from http://www.nytimes.com/2016/12/13/us/politics/saudi-arabia-arms-sale-yemen-war.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/13/us/politics/saudi-arabia-arms-sale-yemen-war.html


Processing URLs:  65%|██████▌   | 650/1000 [32:47<05:35,  1.04it/s]

Error extracting text from http://www.reuters.com/article/us-japan-navy-southchinasea-china-idUSKBN16N167: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-japan-navy-southchinasea-china-idUSKBN16N167


Processing URLs:  65%|██████▌   | 652/1000 [32:50<07:12,  1.24s/it]

Error extracting text from http://peacekeeper.ru/en/?module=news&amp;action=view&amp;id=28639: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  65%|██████▌   | 653/1000 [32:51<06:17,  1.09s/it]

Error extracting text from https://bit.ly/3dKLxVA: 403 Client Error: Forbidden for url: https://nationalpost.com/opinion/john-ivison-new-evidence-in-the-meng-case-an-opportunity-for-lametti-to-end-it


Processing URLs:  66%|██████▌   | 659/1000 [33:10<13:35,  2.39s/it]

Error extracting text from http://www.nytimes.com/reuters/2016/02/25/world/middleeast/25reuters-mideast-crisis-syria-town.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/02/25/world/middleeast/25reuters-mideast-crisis-syria-town.html


Processing URLs:  66%|██████▌   | 662/1000 [33:14<08:39,  1.54s/it]

Error extracting text from http://sevendaynews.com/2016/09/16/media-system-of-katyusha-will-monitor-the-media-and-social-networks-for-government/: 403 Client Error: Forbidden for url: http://sevendaynews.com/2016/09/16/media-system-of-katyusha-will-monitor-the-media-and-social-networks-for-government/


Processing URLs:  66%|██████▋   | 663/1000 [33:15<07:29,  1.33s/it]

Error extracting text from http://www.hybridcars.com/december-2009-dashboard/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/december-2009-dashboard/


Processing URLs:  67%|██████▋   | 666/1000 [33:18<05:48,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-immigration-sadr-iraq-idUSKBN15D0H9?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-immigration-sadr-iraq-idUSKBN15D0H9?il=0


Processing URLs:  67%|██████▋   | 672/1000 [33:29<08:57,  1.64s/it]

Error extracting text from http://www.mangalorean.com/15-jihadis-killed-attacks-near-mosul/: 404 Client Error: Not Found for url: https://www.mangalorean.com/15-jihadis-killed-attacks-near-mosul/


Processing URLs:  67%|██████▋   | 673/1000 [33:29<07:11,  1.32s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/energy-environment/328132-should-us-falter-china-poised-to-capitalize-on-clean: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/energy-environment/328132-should-us-falter-china-poised-to-capitalize-on-clean/


Processing URLs:  68%|██████▊   | 676/1000 [33:31<04:46,  1.13it/s]

Error extracting text from http://www.thestreet.com/story/13489954/1/departure-of-vw-s-u-s-chief-is-a-serious-blow.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/13489954/1/departure-of-vw-s-u-s-chief-is-a-serious-blow.html


Processing URLs:  68%|██████▊   | 679/1000 [33:34<04:27,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-un-idUSKCN10Y0S8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-un-idUSKCN10Y0S8


Processing URLs:  68%|██████▊   | 680/1000 [33:35<04:35,  1.16it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55224#.V_bKBjKZOL4: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55224#.V_bKBjKZOL4


Processing URLs:  68%|██████▊   | 683/1000 [33:39<04:56,  1.07it/s]

Error extracting text from http://www.autonews.com/article/20151214/OEM06/312149992/lg-chem-quietly-surges-in-battery-race: 403 Client Error: Forbidden for url: https://www.autonews.com/article/20151214/OEM06/312149992/lg-chem-quietly-surges-in-battery-race


Processing URLs:  69%|██████▊   | 686/1000 [33:43<05:59,  1.14s/it]

Error extracting text from https://shadowproof.com/2012/03/14/we-cant-win-a-war-with-iran-not-if-they-have-half-a-brain/: 403 Client Error: Forbidden for url: https://shadowproof.com/2012/03/14/we-cant-win-a-war-with-iran-not-if-they-have-half-a-brain/


Processing URLs:  69%|██████▉   | 691/1000 [33:51<07:15,  1.41s/it]

Error extracting text from https://bit.ly/31sp6hl: 403 Client Error: Forbidden for url: https://capx.co/salmonds-game-playing-could-be-a-headache-for-unionists-or-a-nightmare-for-the-nats/?omhide=true&utm_source=newsletter&utm_medium=email&utm_campaign=29/03/21&cmid=496f7a06-f1e9-4080-8ea4-bbfe5ba2c48d


Processing URLs:  69%|██████▉   | 692/1000 [33:51<05:25,  1.06s/it]

Error extracting text from https://www.nytimes.com/2017/05/02/world/asia/navy-south-china-sea.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/02/world/asia/navy-south-china-sea.html


Processing URLs:  69%|██████▉   | 693/1000 [33:52<05:47,  1.13s/it]

Error extracting text from http://www.jsonline.com/news/statepolitics/poll-wisconsinites-dont-want-walker-ryan-to-step-up-in-cleveland-b99710694z1-376410901.html: 404 Client Error: OK for url: https://www.jsonline.com/news/statepolitics/poll-wisconsinites-dont-want-walker-ryan-to-step-up-in-cleveland-b99710694z1-376410901.html/


Processing URLs:  70%|██████▉   | 695/1000 [34:02<14:34,  2.87s/it]

Error extracting text from https://dmv.ny.gov/forms/autonomousvehiclelaw.pdf: 404 Client Error: Not Found for url: https://dmv.ny.gov/forms/autonomousvehiclelaw.pdf


Processing URLs:  70%|██████▉   | 699/1000 [34:07<07:42,  1.54s/it]

Error extracting text from https://www.nytimes.com/2017/06/26/us/politics/syria-will-pay-a-heavy-price-for-another-chemical-attack-trump-says.html?emc=edit_na_20170626&amp;nl=breaking-news&amp;nlid=52725637&amp;ref=headline&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/26/us/politics/syria-will-pay-a-heavy-price-for-another-chemical-attack-trump-says.html?emc=edit_na_20170626&amp;nl=breaking-news&amp;nlid=52725637&amp;ref=headline&amp;_r=0


Processing URLs:  70%|███████   | 703/1000 [34:15<09:42,  1.96s/it]

Error extracting text from http://www.theepochtimes.com/n3/1961614-why-beijing-declared-us-billionaire-investor-george-soros-an-enemy-of-the-chinese-people/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/1961614-why-beijing-declared-us-billionaire-investor-george-soros-an-enemy-of-the-chinese-people/


Processing URLs:  70%|███████   | 704/1000 [34:15<07:33,  1.53s/it]

Error extracting text from http://thehill.com/homenews/administration/351132-wh-officials-fear-colleagues-are-wearing-a-wire-for-mueller-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/351132-wh-officials-fear-colleagues-are-wearing-a-wire-for-mueller-report/


Processing URLs:  70%|███████   | 705/1000 [34:16<05:58,  1.22s/it]

Error extracting text from http://amti.csis.org/xis-visit-to-cool-down-the-south-china-sea/: 403 Client Error: Forbidden for url: http://amti.csis.org/xis-visit-to-cool-down-the-south-china-sea/


Processing URLs:  71%|███████   | 711/1000 [34:34<18:21,  3.81s/it]

Error extracting text from http://www.painresearchforum.org/news/49969-biogen-idec-announces-plans-buy-nav17-program: 404 Client Error: Not Found for url: https://www.iasp-pain.org/publications/pain-research-forum/prf-news/49969-biogen-idec-announces-plans-buy-nav17-program


Processing URLs:  72%|███████▏  | 715/1000 [34:40<09:02,  1.90s/it]

Error extracting text from http://www.nytimes.com/2016/06/11/business/tesla-motors-model-s-suspension.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/11/business/tesla-motors-model-s-suspension.html


Processing URLs:  72%|███████▏  | 719/1000 [34:42<03:31,  1.33it/s]

URL filtered: http://nymag.com/daily/intelligencer/2017/05/trump-lobbies-on-twitter-to-eliminate-legislative-filibuster.html
Error extracting text from http://www.reuters.com/article/2015/08/27/saudi-reserves-idUSL5N1123ZL20150827: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/08/27/saudi-reserves-idUSL5N1123ZL20150827


Processing URLs:  72%|███████▏  | 720/1000 [34:43<03:48,  1.22it/s]

URL filtered: https://twitter.com/WHO/status/1217043229427761152


Processing URLs:  72%|███████▏  | 723/1000 [34:43<02:17,  2.01it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-protests-idUSKCN0WF0IX: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-protests-idUSKCN0WF0IX


Processing URLs:  73%|███████▎  | 731/1000 [34:57<06:29,  1.45s/it]

Error extracting text from http://www.newsweek.com/germany-greece-lesbos-412165: 403 Client Error: Forbidden for url: https://www.newsweek.com/germany-greece-lesbos-412165


Processing URLs:  73%|███████▎  | 734/1000 [35:00<05:02,  1.14s/it]

Error extracting text from https://www.nytimes.com/2021/01/03/us/politics/biden-russia-iran.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/03/us/politics/biden-russia-iran.html
URL filtered: https://twitter.com/elonmusk/status/1078180361346068480


Processing URLs:  74%|███████▎  | 736/1000 [35:02<04:00,  1.10it/s]

Error extracting text from https://www.justsecurity.org/75698/the-us-military-should-stay-out-of-mozambiques-cabo-delgado-send-diplomats-who-know-the-terrain/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/75698/the-us-military-should-stay-out-of-mozambiques-cabo-delgado-send-diplomats-who-know-the-terrain/


Processing URLs:  74%|███████▍  | 738/1000 [35:04<04:04,  1.07it/s]

Error extracting text from http://www.extremetech.com/extreme/215384-loophole-in-1970-clean-air-act-may-prevent-criminal-prosecution-of-vw: 403 Client Error: Forbidden for url: http://www.extremetech.com/extreme/215384-loophole-in-1970-clean-air-act-may-prevent-criminal-prosecution-of-vw


Processing URLs:  74%|███████▍  | 741/1000 [36:10<1:20:00, 18.53s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2020-12-24/eu-parliament-to-analyse-brexit-deal-before-giving-green-light-in-2021: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  74%|███████▍  | 745/1000 [36:29<30:44,  7.24s/it]  

URL filtered: https://twitter.com/JakeSherman/status/652472514753507328


Processing URLs:  75%|███████▍  | 747/1000 [36:30<17:22,  4.12s/it]

Error extracting text from https://www.google.ca/amp/www.telegraph.co.uk/news/2016/06/30/adnan-syed-to-be-given-a-retrial-after-serial-podcast-questioned/amp/?client=ms-android-rogers-ca#: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/news/2016/06/30/adnan-syed-to-be-given-a-retrial-after-serial-podcast-questioned/amp/


Processing URLs:  75%|███████▍  | 749/1000 [36:32<11:02,  2.64s/it]

Error extracting text from http://www.reuters.com/article/2015/10/02/brazil-fiscal-idUSL1N1220XY20151002: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/02/brazil-fiscal-idUSL1N1220XY20151002


Processing URLs:  76%|███████▌  | 755/1000 [36:44<07:54,  1.94s/it]

Error extracting text from http://www.barrons.com/articles/BL-231B-9912: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/BL-231B-9912


Processing URLs:  76%|███████▌  | 760/1000 [36:50<05:40,  1.42s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-10/29/c_134763565.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-10/29/c_134763565.htm


Processing URLs:  76%|███████▌  | 761/1000 [36:51<05:17,  1.33s/it]

Error extracting text from https://tradingeconomics.com/commodity/heating-oil: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/heating-oil


Processing URLs:  76%|███████▌  | 762/1000 [36:52<04:03,  1.02s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_50115.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_50115.htm


Processing URLs:  76%|███████▋  | 763/1000 [36:54<06:13,  1.58s/it]

Error extracting text from http://www.thenational.ae/world/middle-east/iraq-forces-launch-push-to-retake-town-south-of-mosul: 404 Client Error: Not Found for url: https://www.thenationalnews.com/mena/iraq-forces-launch-push-to-retake-town-south-of-mosul/


Processing URLs:  76%|███████▋  | 764/1000 [36:55<05:03,  1.28s/it]

Error extracting text from https://www.whitehouse.gov/the-press-office/2017/01/27/presidential-memorandum-rebuilding-us-armed-forces: 404 Client Error: Not Found for url: https://www.whitehouse.gov/the-press-office/2017/01/27/presidential-memorandum-rebuilding-us-armed-forces


Processing URLs:  76%|███████▋  | 765/1000 [36:55<03:49,  1.02it/s]

Error extracting text from https://casetext.com/case/men-v-selective-serv-sys-2: 403 Client Error: Forbidden for url: https://casetext.com/case/men-v-selective-serv-sys-2


Processing URLs:  77%|███████▋  | 770/1000 [37:00<02:46,  1.38it/s]

Error extracting text from http://www.washingtontimes.com/news/2015/sep/8/export-import-bank-helping-ship-us-jobs-overseas-w/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2015/sep/8/export-import-bank-helping-ship-us-jobs-overseas-w/


Processing URLs:  77%|███████▋  | 772/1000 [37:05<05:48,  1.53s/it]

Error extracting text from https://www.nytimes.com/live/2022/02/15/world/russia-ukraine-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/02/15/world/russia-ukraine-news


Processing URLs:  78%|███████▊  | 775/1000 [37:08<04:13,  1.12s/it]

Error extracting text from https://www.nytimes.com/2018/01/22/us/pennsylvania-maps-congress.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;re: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/22/us/pennsylvania-maps-congress.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;re


Processing URLs:  78%|███████▊  | 779/1000 [37:18<08:40,  2.36s/it]

Error extracting text from http://www.theaustralian.com.au/news/latest-news/skorea-us-japan-plan-sanctions-on-north/news-story/34376db3090696030a6b90e79ddee7fc: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/404.php


Processing URLs:  78%|███████▊  | 781/1000 [37:20<06:10,  1.69s/it]

Error extracting text from http://diario16.com/rajoy-y-la-investidura-de-nunca-acabar/: 403 Client Error: Forbidden for url: http://diario16plus.com/rajoy-y-la-investidura-de-nunca-acabar/


Processing URLs:  78%|███████▊  | 782/1000 [37:21<06:01,  1.66s/it]

URL filtered: https://twitter.com/DAlperovitch/status/1497222339733467136


Processing URLs:  78%|███████▊  | 785/1000 [37:25<05:13,  1.46s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/asi-van-keiko-y-ppk-encuestas-antes-primer-debate-noticia-1903092?ref=nota_politica&amp;ft=mod_leatambien&amp;e=titulo: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/asi-van-keiko-y-ppk-encuestas-antes-primer-debate-noticia-1903092/?ref=nota_politica&amp;ft=mod_leatambien&amp;e=titulo


Processing URLs:  79%|███████▊  | 787/1000 [37:27<04:01,  1.13s/it]

Error extracting text from http://news.yahoo.com/un-wants-send-experts-burundi-mass-graves-probe-223656530.html: 404 Client Error: Not Found for url: http://news.yahoo.com/un-wants-send-experts-burundi-mass-graves-probe-223656530.html


Processing URLs:  79%|███████▉  | 791/1000 [37:31<03:23,  1.03it/s]

Error extracting text from http://www.ghanaweb.com/GhanaHomePage/worldNews/Burundi-accuses-Kagame-of-trying-to-stoke-conflict-412827: 403 Client Error: Forbidden for url: https://www.ghanaweb.com/GhanaHomePage/worldNews/Burundi-accuses-Kagame-of-trying-to-stoke-conflict-412827


Processing URLs:  79%|███████▉  | 792/1000 [37:32<03:31,  1.02s/it]

Error extracting text from https://www.predictit.org/Contract/5642/Will-Turkish-voters-pass-a-constitutional-referendum-in-April-2017#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/5642/Will-Turkish-voters-pass-a-constitutional-referendum-in-April-2017#data


Processing URLs:  80%|███████▉  | 797/1000 [37:36<02:29,  1.36it/s]

Error extracting text from http://www.nytimes.com/2016/02/27/world/middleeast/syria-truce-comes-with-price-but-not-for-assad.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/27/world/middleeast/syria-truce-comes-with-price-but-not-for-assad.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-blasts-idUSKBN13107V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-blasts-idUSKBN13107V


Processing URLs:  80%|████████  | 800/1000 [37:36<01:07,  2.94it/s]

Error extracting text from http://www.khaama.com/karzai-to-ask-govt-leaders-for-loya-jirga-regarding-new-us-strategy-03597: 403 Client Error: Forbidden for url: http://www.khaama.com/karzai-to-ask-govt-leaders-for-loya-jirga-regarding-new-us-strategy-03597
URL filtered: https://twitter.com/EricTopol/status/1376181289104011269/photo/1
Error extracting text from http://www.nytimes.com/reuters/2016/02/25/world/asia/25reuters-china-defence-usa-exercises.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/02/25/world/asia/25reuters-china-defence-usa-exercises.html


Processing URLs:  80%|████████  | 802/1000 [37:38<01:29,  2.21it/s]

Error extracting text from http://www.wsj.com/articles/chinas-rigged-ipos-1449102648: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinas-rigged-ipos-1449102648


Processing URLs:  80%|████████  | 803/1000 [37:41<04:25,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-china-indonesia-ship-idUSKCN0Z50FG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-indonesia-ship-idUSKCN0Z50FG?il=0


Processing URLs:  81%|████████  | 810/1000 [37:56<06:42,  2.12s/it]

Error extracting text from http://www.english.rfi.fr/general/20151205-bashar-al-assad-no-longer-has-go-french-foreign-fm-laurent-fabius: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/general/20151205-bashar-al-assad-no-longer-has-go-french-foreign-fm-laurent-fabius
URL filtered: http://www.brookings.edu/blogs/up-front/posts/2015/09/10-fed-interest-rates-data-dependent-olson-wessel#.VfGWtTyU6qQ.twitter


Processing URLs:  81%|████████▏ | 813/1000 [37:57<03:17,  1.05s/it]

Error extracting text from https://www.wsj.com/articles/india-is-a-natural-u-s-ally-in-the-new-cold-war-11590600011: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/india-is-a-natural-u-s-ally-in-the-new-cold-war-11590600011


Processing URLs:  82%|████████▏ | 815/1000 [37:58<02:25,  1.27it/s]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-bill/2144/all-actions?overview=closed: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-bill/2144/all-actions?overview=closed


Processing URLs:  82%|████████▏ | 818/1000 [38:00<01:51,  1.63it/s]

Error extracting text from https://shadowproof.com/2017/01/26/repealing-obamacare-the-push-to-eliminate-the-individual-mandate/: 403 Client Error: Forbidden for url: https://shadowproof.com/2017/01/26/repealing-obamacare-the-push-to-eliminate-the-individual-mandate/
Error extracting text from http://passblue.com/2016/07/21/men-voting-for-men-un-security-council-holds-its-first-straw-poll-to-pick-a-secretary-general/: 403 Client Error: Forbidden for url: https://passblue.com/2016/07/21/men-voting-for-men-un-security-council-holds-its-first-straw-poll-to-pick-a-secretary-general/


Processing URLs:  82%|████████▏ | 820/1000 [38:01<01:59,  1.51it/s]

Error extracting text from http://www.reuters.com/article/us-eu-google-antitrust-idUSKBN18I1EV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-google-antitrust-idUSKBN18I1EV


Processing URLs:  82%|████████▏ | 821/1000 [38:02<01:57,  1.52it/s]

Error extracting text from https://www.espn.com/olympics/story/_/id/31459936/full-blown-boycott-pushed-2022-winter-olympics-beijing: 403 Client Error: Forbidden for url: https://www.espn.com/olympics/story/_/id/31459936/full-blown-boycott-pushed-2022-winter-olympics-beijing


Processing URLs:  82%|████████▏ | 824/1000 [38:05<02:24,  1.22it/s]

Error extracting text from http://www.38north.org/2016/12/sinpo111916/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-usa-analy-idUSKBN16925O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-usa-analy-idUSKBN16925O


Processing URLs:  83%|████████▎ | 826/1000 [38:09<03:44,  1.29s/it]

Error extracting text from https://reut.rs/3cA8drs: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-usa-security/china-says-u-s-military-in-south-china-sea-not-good-for-peace-idUSKBN29U0P0


Processing URLs:  83%|████████▎ | 829/1000 [38:14<03:57,  1.39s/it]

Error extracting text from http://www.wsj.com/articles/china-to-build-naval-logistics-facility-in-djibouti-1448557719: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-to-build-naval-logistics-facility-in-djibouti-1448557719


error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserErro

Error extracting text from http://www.foliomag.com/2015/time-inc-adds-hulu-yahoo-zealot-networks-video-distribution-partners/: Document is empty


Processing URLs:  83%|████████▎ | 831/1000 [38:18<04:17,  1.52s/it]

Error extracting text from https://www.ipsos.com/ipsos-mori/en-uk/rishi-sunaks-job-satisfaction-ratings-remain-strong-even-labour-supporters: 403 Client Error: Forbidden for url: https://www.ipsos.com/ipsos-mori/en-uk/rishi-sunaks-job-satisfaction-ratings-remain-strong-even-labour-supporters


Processing URLs:  83%|████████▎ | 833/1000 [38:20<03:22,  1.21s/it]

URL filtered: https://twitter.com/lilolvvorn/status/718130093768503298


Processing URLs:  84%|████████▎ | 836/1000 [38:22<02:33,  1.07it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55536#.WCgYJ_krKUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55536#.WCgYJ_krKUk


Processing URLs:  84%|████████▍ | 842/1000 [38:34<04:22,  1.66s/it]

Error extracting text from http://www.wsj.com/articles/the-trick-to-making-better-forecasts-1443235983: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-trick-to-making-better-forecasts-1443235983


Processing URLs:  84%|████████▍ | 844/1000 [38:36<03:30,  1.35s/it]

Error extracting text from http://www.wsj.com/articles/time-launches-new-site-called-motto-for-young-women-1454502141: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/time-launches-new-site-called-motto-for-young-women-1454502141


Processing URLs:  84%|████████▍ | 845/1000 [38:38<04:01,  1.56s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-aramco-idUSKBN16D1ZO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-idUSKBN16D1ZO


Processing URLs:  85%|████████▍ | 849/1000 [38:42<02:43,  1.09s/it]

Error extracting text from https://www.theafricareport.com/in-depth/gerd-the-dam-of-discord/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/in-depth/gerd-the-dam-of-discord/


Processing URLs:  85%|████████▌ | 850/1000 [38:44<03:21,  1.34s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges/news/us-italy-leaders-push-for-ttip-outcome-amid-questions-over-timeline: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges/news/us-italy-leaders-push-for-ttip-outcome-amid-questions-over-timeline


Processing URLs:  86%|████████▌ | 857/1000 [39:08<07:32,  3.17s/it]

Error extracting text from http://www.wsj.com/articles/trump-rides-a-blue-collar-wave-1447803248: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-rides-a-blue-collar-wave-1447803248


Processing URLs:  86%|████████▌ | 858/1000 [40:08<47:46, 20.19s/it]

Error extracting text from http://sanjose.granicus.com/MetaViewer.php?meta_id=564254: HTTPConnectionPool(host='sanjose.granicus.com', port=80): Max retries exceeded with url: /MetaViewer.php?meta_id=564254 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x302d11580>, 'Connection to sanjose.granicus.com timed out. (connect timeout=60)'))


Processing URLs:  86%|████████▌ | 859/1000 [41:08<1:15:26, 32.11s/it]

Error extracting text from http://www.usnews.com/opinion/articles/2016-10-04/questions-after-news-obama-agreed-to-lift-un-sanctions-on-iranian-banks: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  86%|████████▋ | 863/1000 [41:12<18:48,  8.24s/it]  

Error extracting text from https://www.weforum.org/agenda/2016/10/restructuring-venezuelan-debt-navigating-new-rules: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2016/10/restructuring-venezuelan-debt-navigating-new-rules
Error extracting text from https://www.arabnews.com/node/1877861/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1877861/middle-east


Processing URLs:  87%|████████▋ | 870/1000 [41:32<04:48,  2.22s/it]

Error extracting text from http://www.nytimes.com/aponline/2015/10/07/us/politics/ap-us-export-import-bank.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2015/10/07/us/politics/ap-us-export-import-bank.html
URL filtered: https://www.instagram.com/benandjerrys/


Processing URLs:  87%|████████▋ | 872/1000 [41:34<03:08,  1.47s/it]

Error extracting text from http://www.reuters.com/article/2015/10/13/us-iran-nuclear-parliament-idUSKCN0S70F220151013: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/13/us-iran-nuclear-parliament-idUSKCN0S70F220151013


Processing URLs:  87%|████████▋ | 873/1000 [41:34<02:32,  1.20s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-idUSKBN1631WG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-idUSKBN1631WG


Processing URLs:  88%|████████▊ | 875/1000 [41:37<02:47,  1.34s/it]

Error extracting text from http://www.ibtimes.com/russia-turkey-crisis-27-russian-ships-blocked-after-moscow-detains-turkish-vessels-2231946: 403 Client Error: Forbidden for url: https://www.ibtimes.com/russia-turkey-crisis-27-russian-ships-blocked-after-moscow-detains-turkish-vessels-2231946


Processing URLs:  88%|████████▊ | 882/1000 [41:44<01:33,  1.26it/s]

Error extracting text from https://www.reuters.com/world/americas/brazilians-take-streets-again-demand-bolsonaros-impeachment-2021-07-24/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazilians-take-streets-again-demand-bolsonaros-impeachment-2021-07-24/


Processing URLs:  88%|████████▊ | 885/1000 [41:48<01:59,  1.04s/it]

Error extracting text from http://www.theindependent.com/news/world/burundi-gov-t-people-buried-without-notifying-families/article_39d1fe16-9d59-5747-bb18-b3ba8010a8f0.html: 404 Client Error: Not Found for url: https://theindependent.com/news/world/burundi-gov-t-people-buried-without-notifying-families/article_39d1fe16-9d59-5747-bb18-b3ba8010a8f0.html
Error extracting text from http://fox6now.com/2015/09/26/u-s-russia-may-be-seeking-proxy-in-case-syrias-al-assad-falls/: 403 Client Error: Forbidden for url: http://fox6now.com/2015/09/26/u-s-russia-may-be-seeking-proxy-in-case-syrias-al-assad-falls/


Processing URLs:  89%|████████▊ | 886/1000 [41:48<01:30,  1.26it/s]

Error extracting text from http://www.advisorperspectives.com/dshort/updates/PCE-Price-Index.php: 403 Client Error: Forbidden for url: https://www.advisorperspectives.com/dshort/updates/PCE-Price-Index.php


Processing URLs:  89%|████████▉ | 889/1000 [41:49<00:56,  1.96it/s]

Error extracting text from https://www.fastcoexist.com/3064623/the-worlds-first-national-drone-delivery-service-just-launched-in-rwanda: 403 Client Error: Forbidden for url: https://www.fastcoexist.com/3064623/the-worlds-first-national-drone-delivery-service-just-launched-in-rwanda
Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=5426: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=5426


Processing URLs:  89%|████████▉ | 891/1000 [41:51<01:12,  1.50it/s]

Error extracting text from http://commentators.com/u-s-on-verge-of-losing-iraq-completely-to-iran/: 404 Client Error: Not Found for url: http://commentators.com/u-s-on-verge-of-losing-iraq-completely-to-iran/


Processing URLs:  89%|████████▉ | 893/1000 [41:53<01:42,  1.05it/s]

Error extracting text from http://www.newsweek.com/mosul-isis-syria-iraq-troops-coalition-445182: 403 Client Error: Forbidden for url: https://www.newsweek.com/mosul-isis-syria-iraq-troops-coalition-445182


Processing URLs:  90%|████████▉ | 895/1000 [41:57<02:11,  1.26s/it]

Error extracting text from http://www.cdm.me/english/markovic-nato-is-the-greatest-civilisation-framework: 403 Client Error: Forbidden for url: https://www.cdm.me/english/markovic-nato-is-the-greatest-civilisation-framework


Processing URLs:  90%|████████▉ | 898/1000 [42:01<02:12,  1.30s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil.gov.br/governo/2016/03/dilma-2018solicitar-a-minha-renuncia-e-reconhecer-que-nao-existe-base-para-impeachment2019-1&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.brasil.gov.br/governo/2016/03/dilma-2018solicitar-a-minha-renuncia-e-reconhecer-que-nao-existe-base-para-impeachment2019-1&amp;prev=search


Processing URLs:  90%|█████████ | 900/1000 [42:04<02:11,  1.31s/it]

URL filtered: https://tr.euronews.com/2021/12/06/abd-cin-in-uygurlara-yonelik-tutumu-nedeniyle-pekin-olimpiyatlar-na-boykot-uygulayacak?utm_medium=Social&amp;utm_source=Twitter#Echobox=1638817430


Processing URLs:  90%|█████████ | 903/1000 [42:05<01:10,  1.37it/s]

Error extracting text from http://www.nytimes.com/2015/09/12/business/economy/signs-of-weak-growth-and-tepid-inflation-ahead-of-fed-meeting.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/12/business/economy/signs-of-weak-growth-and-tepid-inflation-ahead-of-fed-meeting.html


Processing URLs:  90%|█████████ | 904/1000 [42:08<02:19,  1.45s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-04-25/turkey-cabinet-overhaul-said-coming-as-erdogan-eyes-party-return


Processing URLs:  91%|█████████ | 906/1000 [42:09<01:29,  1.05it/s]

Error extracting text from http://thehill.com/policy/finance/262076-senate-propels-ex-im-renewal-to-obamas-desk: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/262076-senate-propels-ex-im-renewal-to-obamas-desk/


Processing URLs:  91%|█████████ | 911/1000 [42:15<02:08,  1.44s/it]

Error extracting text from http://keepcontracostamoving.net/the-plan/#providing-pg: 436 Client Error:  for url: http://ww16.keepcontracostamoving.net/the-plan/?sub1=20240202-2332-5080-8735-c3eb17b026eb#providing-pg


Processing URLs:  91%|█████████ | 912/1000 [42:15<01:42,  1.17s/it]

Error extracting text from http://news.softpedia.com/news/norway-makes-it-official-accuses-china-of-hacking-and-stealing-military-secrets-501060.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/norway-makes-it-official-accuses-china-of-hacking-and-stealing-military-secrets-501060.shtml


Processing URLs:  92%|█████████▏| 918/1000 [43:29<11:19,  8.29s/it]

Error extracting text from http://www.worldoil.com/news/2016/8/18/dno-drilling-new-tawke-wells-on-back-of-profitable-second-quarter: 500 Server Error: Internal Server Error for url: https://worldoil.com/news/2016/8/18/dno-drilling-new-tawke-wells-on-back-of-profitable-second-quarter


Processing URLs:  93%|█████████▎| 928/1000 [43:56<02:25,  2.01s/it]

Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0JG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0JG
Error extracting text from http://english.aawsat.com/2016/07/article55354100/ankara-approve-assads-stay-power-transition: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/07/article55354100/ankara-approve-assads-stay-power-transition


Processing URLs:  93%|█████████▎| 932/1000 [44:16<06:41,  5.91s/it]

Error extracting text from http://www.vinereport.com/article/winds.of.winter.release.fans.still.unsure.if.book.will.make.it.in.time.for.game.of.thrones.6.release/5937.htm: 522 Server Error:  for url: https://www.vinereport.com/article/winds.of.winter.release.fans.still.unsure.if.book.will.make.it.in.time.for.game.of.thrones.6.release/5937.htm


Processing URLs:  94%|█████████▎| 935/1000 [45:22<22:56, 21.18s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2015/10/04/china-sending-senior-official-for-north-korean-anniversary: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  94%|█████████▎| 937/1000 [45:24<11:19, 10.79s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0ZS2E6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0ZS2E6


Processing URLs:  94%|█████████▍| 942/1000 [45:33<03:16,  3.39s/it]

Error extracting text from http://www.siliconbeat.com/2016/06/29/biz-break-elon-musk-much-plate/: 404 Client Error: Not Found for url: https://www.mercurynews.com/tag/siliconbeat/2016/06/29/biz-break-elon-musk-much-plate/


Processing URLs:  94%|█████████▍| 945/1000 [45:37<01:49,  1.99s/it]

URL filtered: http://www.reuters.com/article/us-france-election-facebook-idUSKBN15L0QU?il=0


Processing URLs:  95%|█████████▍| 949/1000 [45:44<01:22,  1.61s/it]

Error extracting text from http://news.yahoo.com/bank-japan-cut-forecast-economic-price-growth-report-064952431--finance.html: 404 Client Error: Not Found for url: http://news.yahoo.com/bank-japan-cut-forecast-economic-price-growth-report-064952431--finance.html


Processing URLs:  95%|█████████▌| 951/1000 [45:50<01:40,  2.04s/it]

Error extracting text from http://news.yahoo.com/eu-announce-end-iran-sanctions-sunday-sources-210112390.html: 404 Client Error: Not Found for url: http://news.yahoo.com/eu-announce-end-iran-sanctions-sunday-sources-210112390.html


Processing URLs:  95%|█████████▌| 953/1000 [45:55<01:30,  1.93s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-may-idUSKBN15L1TS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-may-idUSKBN15L1TS


Processing URLs:  96%|█████████▌| 956/1000 [46:00<01:19,  1.80s/it]

Error extracting text from https://jfedsrq.org/crconnect-section/why-does-israel-oppose-the-us-reentering-the-nuclear-agreement-with-iran: 403 Client Error: Forbidden for url: https://jfedsrq.org/crconnect-section/why-does-israel-oppose-the-us-reentering-the-nuclear-agreement-with-iran


Processing URLs:  96%|█████████▌| 958/1000 [46:01<00:45,  1.09s/it]

Error extracting text from https://trade.ec.europa.eu/doclib/docs/2012/june/tradoc_149616.pdf: 404 Client Error: Not Found for url: https://trade.ec.europa.eu/doclib/docs/2012/june/tradoc_149616.pdf
Error extracting text from http://www.nytimes.com/2015/12/21/world/asia/afghan-government-faces-new-set-of-rivals.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/21/world/asia/afghan-government-faces-new-set-of-rivals.html


Processing URLs:  96%|█████████▌| 960/1000 [46:04<00:50,  1.25s/it]

Error extracting text from http://news.nationalpost.com/news/canada/canadians-will-be-key-in-the-final-battle-against-isil-with-more-than-200-trainers-in-place: 403 Client Error: Forbidden for url: https://nationalpost.com/category/news//


Processing URLs:  96%|█████████▋| 963/1000 [46:12<01:27,  2.37s/it]

Error extracting text from https://www.reuters.com/article/us-russia-election-navalny-release/russian-opposition-leader-navalny-released-after-rally-lawyer-idUSKBN1FH0X4?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-election-navalny-release/russian-opposition-leader-navalny-released-after-rally-lawyer-idUSKBN1FH0X4?il=0


Processing URLs:  96%|█████████▋| 965/1000 [46:14<00:59,  1.71s/it]

Error extracting text from http://www.foxnews.com/politics/2015/10/09/exclusive-us-officials-conclude-iran-deal-violates-federal-law/: 404 Client Error: Not Found for url: https://www.foxnews.com/politics/2015/10/09/exclusive-us-officials-conclude-iran-deal-violates-federal-law/


Processing URLs:  97%|█████████▋| 966/1000 [46:16<01:05,  1.91s/it]

Error extracting text from https://detroit.craigslist.org/wyn/rvs/d/rockwood-fleetwood-southwind-32h-rv/7251692713.html: 404 Client Error: Not Found for url: https://detroit.craigslist.org/wyn/rvs/d/rockwood-fleetwood-southwind-32h-rv/7251692713.html


Processing URLs:  97%|█████████▋| 970/1000 [46:26<01:21,  2.72s/it]

URL filtered: http://www.bloomberg.com/view/articles/2016-06-09/u-s-taxpayers-are-funding-iran-s-military-expansion


Processing URLs:  97%|█████████▋| 972/1000 [46:26<00:43,  1.55s/it]

Error extracting text from https://www.nytimes.com/2021/06/27/opinion/covid-vaccine-variants.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/27/opinion/covid-vaccine-variants.html
URL filtered: http://www.bloomberg.com/news/articles/2016-08-04/bayer-said-to-review-monsanto-s-accounts-as-it-weighs-higher-bid


Processing URLs:  98%|█████████▊| 977/1000 [46:32<00:27,  1.21s/it]

Error extracting text from http://en.trend.az/world/turkey/2465631.html: 404 Client Error: Not Found for url: https://www.trend.az/world/turkey/2465631.html
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.dm.com.br/politica/2016/02/otoni-a-oposicao-perdeu-a-guerra-do-impeachment.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.dm.com.br/politica/2016/02/otoni-a-oposicao-perdeu-a-guerra-do-impeachment.html&amp;prev=search


Processing URLs:  98%|█████████▊| 979/1000 [46:34<00:21,  1.04s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/foreign-policy/288896-senate-should-fix-natos-montenegro-problem: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/foreign-policy/288896-senate-should-fix-natos-montenegro-problem/


Processing URLs:  98%|█████████▊| 983/1000 [47:32<02:30,  8.85s/it]

URL filtered: https://www.youtube.com/watch?v=ZFCre48A_zM


Processing URLs:  99%|█████████▊| 987/1000 [47:34<00:40,  3.12s/it]

Error extracting text from https://www.reuters.com/article/us-petrobras-bolsonaro-idUSKBN2AJ00Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-petrobras-bolsonaro-idUSKBN2AJ00Z


Processing URLs:  99%|█████████▉| 991/1000 [47:39<00:14,  1.61s/it]

URL filtered: https://www.youtube.com/watch?v=eKyhFnT_Iso


Processing URLs:  99%|█████████▉| 993/1000 [47:40<00:07,  1.11s/it]

Error extracting text from http://www.aina.org/news/20160818183152.htm: 404 Client Error:  for url: http://www.aina.org/news/20160818183152.htm


Processing URLs: 100%|█████████▉| 995/1000 [47:47<00:10,  2.11s/it]

Error extracting text from https://www.reuters.com/article/us-global-oil-idUSKBN19H02L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN19H02L


Processing URLs: 100%|█████████▉| 999/1000 [47:57<00:02,  2.49s/it]

Error extracting text from http://blogs.wsj.com/moneybeat/2016/11/01/tesla-sales-need-a-recharge/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2016/11/01/tesla-sales-need-a-recharge/


Processing URLs: 100%|██████████| 1000/1000 [48:08<00:00,  2.89s/it]
Processing URLs:   0%|          | 3/1000 [00:04<30:40,  1.85s/it]

Error extracting text from http://i100.independent.co.uk/article/how-excited-was-japanese-prime-minister-shinzo-abe-to-see-vladimir-putin-this-excited--b1xmIvPt1Px: 404 Client Error: Not Found for url: https://www.independent.co.uk/article/how-excited-was-japanese-prime-minister-shinzo-abe-to-see-vladimir-putin-this-excited--b1xmivpt1px


Processing URLs:   0%|          | 5/1000 [00:09<35:40,  2.15s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-02-02/italy-s-search-for-a-new-government-stalls-as-latest-talks-fail


Processing URLs:   1%|          | 8/1000 [00:15<35:29,  2.15s/it]

Error extracting text from https://www.ihs.com/industry/aerospace-defense-security.html: 403 Client Error: Forbidden for url: https://www.accuristech.com/industries/aerospace-defense


Processing URLs:   1%|▏         | 13/1000 [00:24<32:22,  1.97s/it]

Error extracting text from http://www.ipsos.pe/La_exclusion: 403 Client Error: Forbidden for url: https://www.ipsos.com/es-pe/La_exclusion


Processing URLs:   2%|▏         | 15/1000 [00:26<24:21,  1.48s/it]

Error extracting text from https://phys.org/news/2019-02-richard-branson-hell-space-july.html: 400 Client Error: Bad request for url: https://phys.org/news/2019-02-richard-branson-hell-space-july.html


Processing URLs:   2%|▏         | 16/1000 [00:27<20:55,  1.28s/it]

Error extracting text from http://mobile.nytimes.com/2015/12/03/world/middleeast/iran-nuclear-report-atomic-agency.html?referer=https://www.google.com/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/12/03/world/middleeast/iran-nuclear-report-atomic-agency.html?referer=https://www.google.com/


Processing URLs:   2%|▏         | 17/1000 [00:29<23:48,  1.45s/it]

Error extracting text from http://www.ibtimes.com/israeli-electric-grid-hit-paralyzing-cyberattack-no-blackouts-reported-2281969: 403 Client Error: Forbidden for url: https://www.ibtimes.com/israeli-electric-grid-hit-paralyzing-cyberattack-no-blackouts-reported-2281969


Processing URLs:   2%|▏         | 18/1000 [00:30<21:52,  1.34s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-12/farm-boom-fizzles-as-u-s-crop-surplus-expands-financial-strain


Processing URLs:   2%|▏         | 22/1000 [00:36<26:06,  1.60s/it]

Error extracting text from http://nationalinterest.org/feature/irans-elections-reformists-hardliners-the-%E2%80%98deep-state%E2%80%99-15212: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/irans-elections-reformists-hardliners-the-%E2%80%98deep-state%E2%80%99-15212


Processing URLs:   2%|▏         | 24/1000 [00:43<38:42,  2.38s/it]

Error extracting text from https://www.38north.org/2017/11/sinpo111617/: 403 Client Error: Forbidden for url: https://www.38north.org/2017/11/sinpo111617/


Processing URLs:   3%|▎         | 27/1000 [00:45<19:56,  1.23s/it]

Error extracting text from http://news.softpedia.com/news/us-agencies-recorded-77-183-cybersecurity-incidents-in-2015-10-percent-rise-502201.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/us-agencies-recorded-77-183-cybersecurity-incidents-in-2015-10-percent-rise-502201.shtml


Processing URLs:   3%|▎         | 31/1000 [00:51<19:53,  1.23s/it]

Error extracting text from https://www.nytimes.com/2017/07/18/world/middleeast/iran-deal-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/18/world/middleeast/iran-deal-trump.html


Processing URLs:   3%|▎         | 32/1000 [00:54<26:52,  1.67s/it]

Error extracting text from http://www.bridgingthegapresearch.org/_asset/s2b5pb/BTG_soda_tax_fact_sheet_April2014.pdf&quot: 404 Client Error: Not Found for url: https://bridgingthegap.ihrp.uic.edu/_asset/s2b5pb/BTG_soda_tax_fact_sheet_April2014.pdf&quot
URL filtered: https://www.statista.com/statistics/380586/number-of-mobile-facebook-users-in-france/


Processing URLs:   3%|▎         | 34/1000 [00:54<15:24,  1.04it/s]

Error extracting text from https://www.nytimes.com/2017/12/26/world/americas/homeland-security-customs-border-patrol.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/26/world/americas/homeland-security-customs-border-patrol.html


Processing URLs:   4%|▍         | 39/1000 [01:01<20:02,  1.25s/it]

Error extracting text from http://online.wsj.com/public/resources/documents/gsChart.pdf: 404 Client Error: Not Found for url: http://online.wsj.com/public/resources/documents/gsChart.pdf


Processing URLs:   4%|▍         | 44/1000 [01:05<10:29,  1.52it/s]

Error extracting text from http://www.nytimes.com/aponline/2016/07/21/world/ap-un-united-nations-next-secretary-general.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/07/21/world/ap-un-united-nations-next-secretary-general.html?_r=0


Processing URLs:   5%|▍         | 46/1000 [01:07<15:02,  1.06it/s]

URL filtered: https://www.youtube.com/watch?v=JKBk_Kfucs4


Processing URLs:   5%|▍         | 49/1000 [01:09<13:46,  1.15it/s]

Error extracting text from https://www.chathamhouse.org/2021/05/myths-and-misconceptions-debate-russia/myth-11-peoples-ukraine-belarus-and-russia-are-one: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/2021/05/myths-and-misconceptions-debate-russia/myth-11-peoples-ukraine-belarus-and-russia-are-one


Processing URLs:   5%|▌         | 51/1000 [01:11<12:51,  1.23it/s]

Error extracting text from https://www.reuters.com/lifestyle/sports/japans-olympics-minister-hopes-torch-relay-turns-around-public-sentiment-2021-03-25/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/lifestyle/sports/japans-olympics-minister-hopes-torch-relay-turns-around-public-sentiment-2021-03-25/


Processing URLs:   5%|▌         | 52/1000 [01:16<32:37,  2.06s/it]

URL filtered: https://twitter.com/navalny/status/954007354101698560


Processing URLs:   6%|▌         | 56/1000 [01:20<20:56,  1.33s/it]

Error extracting text from http://www.hybridcars.com/2008-hybrid-cars/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/2008-hybrid-cars/
URL filtered: https://twitter.com/navalny/status/956842601361141760
URL filtered: https://www.thedailybeast.com/twitter-has-turned-over-zero-new-russian-troll-accounts-to-congress


Processing URLs:   6%|▌         | 61/1000 [01:26<19:40,  1.26s/it]

Error extracting text from https://www.adl.org/blog/al-qaeda-releases-america-burns-video-framing-us-as-nation-in-crisis: 403 Client Error: Forbidden for url: https://www.adl.org/blog/al-qaeda-releases-america-burns-video-framing-us-as-nation-in-crisis


Processing URLs:   6%|▋         | 65/1000 [01:35<31:15,  2.01s/it]

URL filtered: https://techcrunch.com/2021/10/19/facebook-scales-back-its-crypto-ambitions-once-again/


Processing URLs:   7%|▋         | 67/1000 [01:36<19:00,  1.22s/it]

Error extracting text from https://thehill.com/policy/international/538962-myanmar-military-our-objective-is-to-hold-an-election-and-hand-power-to: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/538962-myanmar-military-our-objective-is-to-hold-an-election-and-hand-power-to/


Processing URLs:   7%|▋         | 68/1000 [01:36<16:24,  1.06s/it]

Error extracting text from http://www.wsj.com/articles/brazil-watchdog-rules-against-rousseff-fueling-impeachment-talk-1444267770: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-watchdog-rules-against-rousseff-fueling-impeachment-talk-1444267770


Processing URLs:   7%|▋         | 72/1000 [01:46<31:36,  2.04s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN1711BR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN1711BR


Processing URLs:   7%|▋         | 74/1000 [01:49<24:04,  1.56s/it]

Error extracting text from http://parlinfo.aph.gov.au/parlInfo/search/display/display.w3p;query=Id%3A%22legislation%2Fems%2Fr5550_ems_2c6c6336-590f-4596-abd1-90305b256f5b%22: 403 Client Error: Forbidden for url: http://parlinfo.aph.gov.au/parlInfo/search/display/display.w3p;query=Id%3A%22legislation%2Fems%2Fr5550_ems_2c6c6336-590f-4596-abd1-90305b256f5b%22


Processing URLs:   8%|▊         | 75/1000 [01:49<18:21,  1.19s/it]

Error extracting text from https://www.thestreet.com/story/13959561/1/forget-exxon-mobil-here-s-a-smarter-permian-basin-play-now.html: 403 Client Error: Forbidden for url: https://www.thestreet.com/story/13959561/1/forget-exxon-mobil-here-s-a-smarter-permian-basin-play-now.html


Processing URLs:   8%|▊         | 77/1000 [01:50<13:12,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-gdp-idUSKCN18B14M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-gdp-idUSKCN18B14M


Processing URLs:   8%|▊         | 81/1000 [01:57<23:58,  1.57s/it]

Error extracting text from http://www.sciencemag.org/news/2011/11/scientists-brace-media-storm-around-controversial-flu-studies: 403 Client Error: Forbidden for url: https://www.science.org/news/2011/11/scientists-brace-media-storm-around-controversial-flu-studies


Processing URLs:   8%|▊         | 83/1000 [02:04<40:45,  2.67s/it]

Error extracting text from http://www.telecomengine.com/article/hsbc-says-internet-banking-services-down-after-cyber-attack: 404 Client Error: Not Found for url: https://horizonhouse.com/article/hsbc-says-internet-banking-services-down-after-cyber-attack


Processing URLs:   8%|▊         | 84/1000 [02:05<31:05,  2.04s/it]

Error extracting text from http://thehill.com/policy/transportation/260884-obama-signs-two-week-highway-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/transportation/260884-obama-signs-two-week-highway-bill/


Processing URLs:   8%|▊         | 85/1000 [02:06<29:04,  1.91s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/myanmar-parliament-to/2559650.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/myanmar-parliament-to/2559650.html


Processing URLs:   9%|▉         | 88/1000 [03:08<4:50:06, 19.09s/it]

Error extracting text from https://www.betfair.com/exchange/#/cricket/event/27183450/market?marketId=1.113633330: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30748cf80>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:   9%|▉         | 91/1000 [04:12<6:22:00, 25.21s/it]

Error extracting text from http://www.syriadeeply.org/articles/2016/04/10312/syria-deeply-executive-summary-april-11/: HTTPConnectionPool(host='www.syriadeeply.org', port=80): Max retries exceeded with url: /articles/2016/04/10312/syria-deeply-executive-summary-april-11/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x307d78950>, 'Connection to www.syriadeeply.org timed out. (connect timeout=60)'))


Processing URLs:   9%|▉         | 92/1000 [04:15<4:43:01, 18.70s/it]

Error extracting text from http://investors.dna.com/2015-12-08-Recent-Outbreak-of-Zika-Virus-in-Brazil-Creates-Pressing-Need-for-Effective-Vector-Control-Solutions: HTTPConnectionPool(host='investors.dna.com', port=80): Max retries exceeded with url: /2015-12-08-Recent-Outbreak-of-Zika-Virus-in-Brazil-Creates-Pressing-Need-for-Effective-Vector-Control-Solutions (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3063229f0>: Failed to resolve 'investors.dna.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 94/1000 [04:17<2:36:29, 10.36s/it]

URL filtered: https://twitter.com/mfa_russia/status/915574791171657728/photo/1?ref_src=twsrc%5Etfw&amp;ref_url=https%3A%2F%2Fwww.rt.com%2Fpolitics%2F405752-foreign-ministry-denounces-another-desecration%2F


Processing URLs:  10%|█         | 101/1000 [04:26<50:42,  3.38s/it] 

URL filtered: https://twitter.com/GameOfThrones/status/672505659368148992


Processing URLs:  10%|█         | 103/1000 [04:27<31:00,  2.07s/it]

Error extracting text from https://uk.finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC: 404 Client Error: Not Found for url: https://uk.finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC


Processing URLs:  10%|█         | 104/1000 [04:28<26:26,  1.77s/it]

Error extracting text from http://thehill.com/homenews/senate/363734-republican-senator-says-roy-moore-shouldnt-undergo-trial-by-newspaper: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/363734-republican-senator-says-roy-moore-shouldnt-undergo-trial-by-newspaper/


Processing URLs:  11%|█         | 106/1000 [04:29<18:08,  1.22s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-idUSKBN14X1SK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-idUSKBN14X1SK


Processing URLs:  11%|█         | 108/1000 [04:31<14:55,  1.00s/it]

Error extracting text from https://bit.ly/2KUuC7p: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2020/12/24/press-release-european-council-president-charles-michel-on-the-agreement-on-the-future-eu-uk-relationship/


Processing URLs:  11%|█         | 111/1000 [04:36<21:21,  1.44s/it]

Error extracting text from https://www.dailystar.com.lb/News/World/2016/Jul-11/361483-montenegro-calls-october-elections-amid-nato-splits.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/World/2016/Jul-11/361483-montenegro-calls-october-elections-amid-nato-splits.ashx


Processing URLs:  11%|█         | 112/1000 [04:38<23:28,  1.59s/it]

Error extracting text from http://www.cidcm.umd.edu/mar/: HTTPSConnectionPool(host='cidcm.umd.edu', port=443): Max retries exceeded with url: /mar/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  11%|█▏        | 114/1000 [04:39<17:22,  1.18s/it]

Error extracting text from http://www.38north.org/2017/10/mwilliams100117/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml
Error extracting text from http://www.reuters.com/article/us-space-spacex-idUSKBN13C085: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-space-spacex-idUSKBN13C085


Processing URLs:  12%|█▏        | 115/1000 [04:41<17:48,  1.21s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1593547/000139834421006410/fp0063501_485apos.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1593547/000139834421006410/fp0063501_485apos.htm


Processing URLs:  12%|█▏        | 116/1000 [04:42<17:13,  1.17s/it]

Error extracting text from https://stratasadvisors.com/Insights/101016-Iran-Oil: 404 Client Error: Not Found for url: https://www.stratasadvisors.com/Insights/101016-Iran-Oil


Processing URLs:  12%|█▏        | 117/1000 [04:43<15:03,  1.02s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-19/who-d-lose-most-from-u-s-ban-on-venezuela-crude-quicktake-q-a


Processing URLs:  12%|█▏        | 119/1000 [04:46<20:16,  1.38s/it]

Error extracting text from https://www.faa.gov/uas/beyond_the_basics/#waiver: 404 Client Error: Not Found for url: https://www.faa.gov/uas/beyond_the_basics/#waiver


Processing URLs:  12%|█▏        | 121/1000 [04:49<22:53,  1.56s/it]

Error extracting text from http://www.debtclocks.eu/eu-ranking-public-debt-in-percent-of-gdp.html: 404 Client Error: Not Found for url: https://www.haushaltssteuerung.de//eu-ranking-public-debt-in-percent-of-gdp.html


Processing URLs:  12%|█▏        | 124/1000 [04:55<25:06,  1.72s/it]

URL filtered: https://www.youtube.com/watch?v=fEJwM1Bh3R0


Processing URLs:  13%|█▎        | 126/1000 [04:56<19:42,  1.35s/it]

Error extracting text from http://www.urbanresearchmaps.org/nyredistricting: 403 Client Error: Forbidden for url: http://www.urbanresearchmaps.org/nyredistricting/


Processing URLs:  13%|█▎        | 128/1000 [04:57<12:04,  1.20it/s]

Error extracting text from http://www.reuters.com/article/us-oil-opec-iran-idUSKBN17H0NF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-opec-iran-idUSKBN17H0NF
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-philippines-idUSKBN1AI17W: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-philippines-idUSKBN1AI17W


Processing URLs:  13%|█▎        | 129/1000 [04:57<09:26,  1.54it/s]

Error extracting text from http://www.latimes.com/sns-bc-us-trump-russia-probe-20170922-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/sns-bc-us-trump-russia-probe-20170922-story.html


Processing URLs:  13%|█▎        | 130/1000 [04:58<09:04,  1.60it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/02/07/0200000000AEN20160207000953315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  13%|█▎        | 133/1000 [05:01<15:31,  1.07s/it]

Error extracting text from http://www.caam.org.cn/hangye/20170206/1405204496.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20170206/1405204496.html


Processing URLs:  13%|█▎        | 134/1000 [05:02<12:41,  1.14it/s]

Error extracting text from http://euandgreece.blogactiv.eu/2017/02/02/is-the-imf-necessary-for-the-3rd-greek-program/: 404 Client Error: Not Found for url: http://euandgreece.blogactiv.eu/2017/02/02/is-the-imf-necessary-for-the-3rd-greek-program/


Processing URLs:  14%|█▎        | 136/1000 [05:05<17:17,  1.20s/it]

Error extracting text from http://www.newsweek.com/poland-plans-removal-500-soviet-monuments-442661: 403 Client Error: Forbidden for url: https://www.newsweek.com/poland-plans-removal-500-soviet-monuments-442661


Processing URLs:  14%|█▍        | 138/1000 [05:06<13:24,  1.07it/s]

Error extracting text from http://www.reuters.com/article/2015/10/05/us-markets-stocks-idUSKCN0RZ13820151005: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/05/us-markets-stocks-idUSKCN0RZ13820151005


Processing URLs:  14%|█▍        | 140/1000 [05:13<30:40,  2.14s/it]

Error extracting text from http://buenosairesherald.com/article/210460/ruling-coalition-at-breaking-point-as-rousseff-loses-support: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/210460/ruling-coalition-at-breaking-point-as-rousseff-loses-support


Processing URLs:  14%|█▍        | 141/1000 [05:13<25:15,  1.76s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55126#.V-mke5A8KrU: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55126#.V-mke5A8KrU
URL filtered: https://twitter.com/Rouhani_ir


Processing URLs:  14%|█▍        | 143/1000 [05:14<15:46,  1.10s/it]

Error extracting text from https://larswericson.wordpress.com/2016/03/27/gitrep-27mar16pm: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/03/27/gitrep-27mar16pm


Processing URLs:  14%|█▍        | 144/1000 [05:15<13:54,  1.03it/s]

Error extracting text from https://olympics.com/tokyo-2020/en/torch/route/: 403 Client Error: Forbidden for url: https://olympics.com/tokyo-2020/en/torch/route/


Processing URLs:  15%|█▌        | 150/1000 [05:26<32:29,  2.29s/it]

Error extracting text from http://theiranproject.com/blog/2015/02/16/iran-oil-industry-needs-money/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=iran-oil-industry-needs-money


Processing URLs:  15%|█▌        | 154/1000 [05:38<54:37,  3.87s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/iran-reformists-moderates-unite-ahead-of-vote/2016/02/15/f47ec12e-d3fa-11e5-a65b-587e721fb231_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/iran-reformists-moderates-unite-ahead-of-vote/2016/02/15/f47ec12e-d3fa-11e5-a65b-587e721fb231_story.html


Processing URLs:  16%|█▌        | 155/1000 [05:38<39:19,  2.79s/it]

Error extracting text from http://www.geekwire.com: 403 Client Error: Forbidden for url: https://www.geekwire.com/
Error extracting text from https://blogs.wsj.com/moneybeat/2017/11/01/pdvsa-debt-payment-keeps-venezuelas-bondholders-on-edge/: 403 Client Error: Forbidden for url: https://blogs.wsj.com/moneybeat/2017/11/01/pdvsa-debt-payment-keeps-venezuelas-bondholders-on-edge/


Processing URLs:  16%|█▌        | 160/1000 [06:04<48:15,  3.45s/it]  

URL filtered: https://www.nytimes.com/2016/11/28/technology/facebook-germany-hate-speech-fake-news.html?pagewanted=all


Processing URLs:  17%|█▋        | 167/1000 [06:23<33:41,  2.43s/it]  



Processing URLs:  17%|█▋        | 169/1000 [06:29<37:16,  2.69s/it]

Error extracting text from http://38north.org/2017/03/punggye030917/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  17%|█▋        | 171/1000 [06:35<43:37,  3.16s/it]

Error extracting text from http://goo.gl/gLMwkF: 404 Client Error: Not Found for url: https://www.ipsos.com/es-pe/sites/default/files/opinion_data/Opinion%20Data%20Febrero%202016.pdf


Processing URLs:  18%|█▊        | 177/1000 [06:49<33:37,  2.45s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.streetinsider.com/Credit+Ratings/Time+(TIME)+Downgraded+to+BB-+by+S%26P%3B+Outlook+is+Stable/11682765.html: Document is empty


Processing URLs:  18%|█▊        | 181/1000 [06:52<16:40,  1.22s/it]

Error extracting text from http://tass.ru/en/politics/855371: 404 Client Error: Not Found for url: https://tass.ru/en/politics/855371
Error extracting text from http://www.nytimes.com/2016/12/17/world/europe/russia-propaganda-elections.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/17/world/europe/russia-propaganda-elections.html


Processing URLs:  18%|█▊        | 182/1000 [06:54<16:44,  1.23s/it]

Error extracting text from https://www.cnbc.com/2017/06/01/reuters-america-sp-cuts-illinois-credit-rating-to-one-notch-above-junk.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/06/01/reuters-america-sp-cuts-illinois-credit-rating-to-one-notch-above-junk.html


Processing URLs:  18%|█▊        | 185/1000 [06:57<19:22,  1.43s/it]

URL filtered: https://www.youtube.com/watch?v=3CNeDtZmpjU
URL filtered: http://www.bloomberg.com/news/articles/2016-07-01/venezuelan-credit-dashboard-bonds-crude-export-prices-rebound


Processing URLs:  19%|█▉        | 189/1000 [07:00<13:18,  1.02it/s]

URL filtered: https://twitter.com/hazemaq/status/1404932764483342344?s=19


Processing URLs:  19%|█▉        | 194/1000 [07:04<12:58,  1.04it/s]

Error extracting text from https://www.reuters.com/article/britain-sterling/sterling-steady-as-outright-majority-for-scottish-nationalists-seen-unlikely-idUSL1N2MS0HP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-sterling/sterling-steady-as-outright-majority-for-scottish-nationalists-seen-unlikely-idUSL1N2MS0HP


Processing URLs:  20%|█▉        | 199/1000 [07:07<10:35,  1.26it/s]

Error extracting text from http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html?emc=edit_th_20160105&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html?emc=edit_th_20160105&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  20%|██        | 203/1000 [07:11<10:03,  1.32it/s]

Error extracting text from https://www.weforum.org/agenda/2016/07/how-long-do-trade-deals-take-after-brexit/: 403 Client Error: Forbidden for url: https://www.weforum.org/agenda/2016/07/how-long-do-trade-deals-take-after-brexit/
Error extracting text from http://www.latimes.com/business/la-fi-oil-embargo-fight-20151202-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-oil-embargo-fight-20151202-story.html


Processing URLs:  21%|██        | 206/1000 [07:14<09:46,  1.35it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-19/venezuela-prepares-for-biggest-military-exercise-in-history


Processing URLs:  21%|██        | 212/1000 [07:17<08:51,  1.48it/s]

Error extracting text from http://theiowarepublican.com/2015/state-of-the-race-how-i-see-things-in-iowa-with-just-over-100-days-to-go/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/state-of-the-race-how-i-see-things-in-iowa-with-just-over-100-days-to-go/


Processing URLs:  21%|██▏       | 213/1000 [07:18<08:32,  1.54it/s]

URL filtered: https://www.youtube.com/watch?v=ITvTWN_dcmg


Processing URLs:  22%|██▏       | 220/1000 [07:52<47:59,  3.69s/it]  

Error extracting text from https://angusreid.org/federal-election-post-debate/: 403 Client Error: Forbidden for url: https://angusreid.org/federal-election-post-debate/
Error extracting text from https://www.reuters.com/article/uk-health-coronavirus-variant-children/uk-coronavirus-variant-may-be-more-able-to-infect-children-scientists-idUKKBN28V2EV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-health-coronavirus-variant-children/uk-coronavirus-variant-may-be-more-able-to-infect-children-scientists-idUKKBN28V2EV


Processing URLs:  22%|██▏       | 224/1000 [07:55<21:34,  1.67s/it]

Error extracting text from https://www.middleeastmonitor.com/20161013-iraqi-shia-leader-mosul-operation-will-be-vengeance-for-hussein/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20161013-iraqi-shia-leader-mosul-operation-will-be-vengeance-for-hussein/


Processing URLs:  23%|██▎       | 227/1000 [07:57<13:22,  1.04s/it]

URL filtered: https://www.youtube.com/watch?v=RDvoBoxv028


Processing URLs:  23%|██▎       | 230/1000 [07:58<09:34,  1.34it/s]

Error extracting text from https://www.hindustantimes.com/world-news/us-iran-clash-on-sanctions-us-sees-possible-impasse-101618011940737-amp.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/world-news/us-iran-clash-on-sanctions-us-sees-possible-impasse-101618011940737-amp.html


Processing URLs:  24%|██▍       | 238/1000 [08:17<18:01,  1.42s/it]

Error extracting text from https://www.unmultimedia.org/radio/english/2016/07/afghanistan-violence-claims-record-death-toll/: HTTPSConnectionPool(host='www.unmultimedia.org', port=443): Max retries exceeded with url: /radio/english/2016/07/afghanistan-violence-claims-record-death-toll/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  25%|██▍       | 249/1000 [08:33<12:54,  1.03s/it]

Error extracting text from http://m.state.gov/md256697.htm: HTTPConnectionPool(host='m.state.gov', port=80): Max retries exceeded with url: /md256697.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307b2dd00>: Failed to resolve 'm.state.gov' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/12/01/us-brazil-economy-gdp-idUSKBN0TK48E20151201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/us-brazil-economy-gdp-idUSKBN0TK48E20151201


Processing URLs:  25%|██▌       | 250/1000 [08:35<14:54,  1.19s/it]

URL filtered: http://money.cnn.com/2017/02/06/technology/france-elections-fake-news-facebook-google/


Processing URLs:  25%|██▌       | 252/1000 [08:37<14:22,  1.15s/it]

Error extracting text from https://www.dni.gov/index.php/newsroom/congressional-testimonies/item/1845-statement-for-the-record-worldwide-threat-assessment-of-the-us-intelligence-community: 404 Client Error: Not Found for url: https://www.dni.gov/index.php/newsroom/congressional-testimonies/item/1845-statement-for-the-record-worldwide-threat-assessment-of-the-us-intelligence-community


Processing URLs:  26%|██▌       | 255/1000 [08:41<15:40,  1.26s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-02/china-south-korea-agree-to-seek-summit-with-japan-by-november


Processing URLs:  26%|██▌       | 258/1000 [08:48<22:47,  1.84s/it]

Error extracting text from http://www.trust.org/item/20151217222527-li7ah: 404 Client Error:  for url: https://www.trust.org:443/item/20151217222527-li7ah


Processing URLs:  26%|██▌       | 261/1000 [08:51<16:14,  1.32s/it]

Error extracting text from http://news.antiwar.com/2016/09/07/taliban-offensive-approaches-central-afghan-provincial-capital/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2016/09/07/taliban-offensive-approaches-central-afghan-provincial-capital/
Error extracting text from http://www.reuters.com/article/us-space-spacex-idUSKBN14Y0H3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-space-spacex-idUSKBN14Y0H3


Processing URLs:  26%|██▌       | 262/1000 [08:51<12:01,  1.02it/s]

Error extracting text from http://www.reuters.com/article/2015/11/18/us-usa-fed-liftoff-idUSKCN0T720Q20151118: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/18/us-usa-fed-liftoff-idUSKCN0T720Q20151118


Processing URLs:  26%|██▋       | 263/1000 [08:51<09:25,  1.30it/s]

Error extracting text from http://www.wsj.com/articles/venezuela-pdvsa-restructures-debt-1477333172: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuela-pdvsa-restructures-debt-1477333172


Processing URLs:  27%|██▋       | 266/1000 [09:02<24:37,  2.01s/it]

Error extracting text from http://www.asahi.com/sp/ajw/articles/AJ201606030059.html: 404 Client Error: Not Found for url: https://www.asahi.com/sp/ajw/articles/AJ201606030059.html


Processing URLs:  27%|██▋       | 267/1000 [09:02<18:13,  1.49s/it]

Error extracting text from https://www.nytimes.com/2021/02/18/us/politics/biden-iran-nuclear.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/18/us/politics/biden-iran-nuclear.html


Processing URLs:  28%|██▊       | 279/1000 [09:24<26:29,  2.20s/it]

Error extracting text from https://www.eurogroupforanimals.org/news/opposition-eu-mercosur-deal-growing: 404 Client Error: Not Found for url: https://www.eurogroupforanimals.org/news/opposition-eu-mercosur-deal-growing


Processing URLs:  28%|██▊       | 280/1000 [09:26<28:04,  2.34s/it]

Error extracting text from http://www.theaustralian.com.au/in-depth/terror/syria-labor-leaves-option-of-interim-deal-with-bashar-al-assad-open/story-fnpdbcmu-1227613745321?sv=aaa246fba0de6f9faefa2232e628cdbc: 404 Client Error: Not Found for url: https://www.theaustralian.com.au/in-depth/terror/syria-labor-leaves-option-of-interim-deal-with-bashar-al-assad-open/story-fnpdbcmu-1227613745321?sv=aaa246fba0de6f9faefa2232e628cdbc&nk=c200725a47d7fea92ad5d8d1b9581652-1706878090


Processing URLs:  29%|██▊       | 286/1000 [09:40<23:58,  2.01s/it]

URL filtered: http://www.wsj.com/articles/brazils-embattled-president-dilma-rousseff-seizes-on-fight-against-zika-1455143055?utm_campaign=Contact+SNS+For+More+Referrer&amp;utm_medium=twitter&amp;utm_source=snsanalytics


Processing URLs:  30%|███       | 300/1000 [10:57<3:39:38, 18.83s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-07-09/trump-says-discussed-forming-cyber-security-unit-with-putin: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  30%|███       | 305/1000 [11:04<50:36,  4.37s/it]  

Error extracting text from https://www.afghanistan-analysts.org/when-the-political-agreement-runs-out-on-the-future-of-afghanistans-national-unity-government/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/when-the-political-agreement-runs-out-on-the-future-of-afghanistans-national-unity-government/


Processing URLs:  31%|███       | 308/1000 [12:09<3:53:47, 20.27s/it]

Error extracting text from http://www.spaceflightinsider.com/organizations/space-exploration-technologies/space-florida-seeking-funding-help-spacex-modify-lc-39a/: HTTPConnectionPool(host='www.spaceflightinsider.com', port=80): Max retries exceeded with url: /organizations/space-exploration-technologies/space-florida-seeking-funding-help-spacex-modify-lc-39a/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x305df2900>, 'Connection to www.spaceflightinsider.com timed out. (connect timeout=60)'))


Processing URLs:  31%|███       | 311/1000 [12:13<1:29:47,  7.82s/it]

Error extracting text from https://www.nytimes.com/2017/08/10/world/europe/putin-trump-embassy-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/10/world/europe/putin-trump-embassy-russia.html
URL filtered: https://www.bloomberg.com/view/articles/2017-08-29/felix-sater-is-a-lean-mean-trump-russia-machine


Processing URLs:  31%|███▏      | 314/1000 [12:20<49:01,  4.29s/it]  

Error extracting text from https://www.reuters.com/article/us-volvocars-geely-electric-idUSKBN19Q0BJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volvocars-geely-electric-idUSKBN19Q0BJ


Processing URLs:  32%|███▏      | 317/1000 [12:24<29:00,  2.55s/it]

Error extracting text from https://www.debka.com/syrian-air-defenses-spread-target-us-well-israeli-overflights/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /syrian-air-defenses-spread-target-us-well-israeli-overflights/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  32%|███▏      | 319/1000 [13:22<2:27:29, 12.99s/it]

Error extracting text from http://www.recode.net/2016/3/23/11587220/fueling-the-success-of-the-hydrogen-powered-toyota-mirai-is-a: Exceeded 30 redirects.
Error extracting text from http://www.reuters.com/article/2015/10/28/us-china-economy-imf-idUSKCN0SM1Z720151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-china-economy-imf-idUSKCN0SM1Z720151028


Processing URLs:  32%|███▏      | 320/1000 [13:27<2:01:54, 10.76s/it]

Error extracting text from http://www.itsa.org/awards-media/industry-and-member-news/1691-seld-driving-cars-are-headed-to-contra-costa: 404 Client Error: Not Found for url: https://itsa.org/awards-media/industry-and-member-news/1691-seld-driving-cars-are-headed-to-contra-costa


Processing URLs:  32%|███▏      | 321/1000 [13:29<1:32:13,  8.15s/it]

Error extracting text from http://www.migrationobservatory.ox.ac.uk/briefings/migrants-uk-overview: 403 Client Error: Forbidden for url: http://migrationobservatory.ox.ac.uk/briefings/migrants-uk-overview


Processing URLs:  32%|███▏      | 324/1000 [13:32<38:26,  3.41s/it]  

Error extracting text from http://www.nytimes.com/2015/10/01/us/politics/government-shutdown-congress.html?emc=edit_th_20151001&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/us/politics/government-shutdown-congress.html?emc=edit_th_20151001&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  32%|███▎      | 325/1000 [13:33<27:41,  2.46s/it]

Error extracting text from https://www.nytimes.com/2017/05/14/us/politics/trump-republican-senators.html?emc=edit_th_20170515&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/14/us/politics/trump-republican-senators.html?emc=edit_th_20170515&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  33%|███▎      | 326/1000 [13:33<20:12,  1.80s/it]

Error extracting text from https://www.nytimes.com/2018/02/11/world/middleeast/israel-iran-syria-clash.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/11/world/middleeast/israel-iran-syria-clash.html


Processing URLs:  33%|███▎      | 329/1000 [13:33<08:24,  1.33it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/01/18/German-defense-minister-raises-prospect-of-Libya-mission.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/01/18/German-defense-minister-raises-prospect-of-Libya-mission.html


Processing URLs:  33%|███▎      | 330/1000 [13:35<12:36,  1.13s/it]

Error extracting text from https://www.essex.gov.uk/school-terms-and-holidays/academic-year-2021-to-2022: 404 Client Error: Not Found for url: https://www.essex.gov.uk/school-terms-and-holidays/academic-year-2021-to-2022


Processing URLs:  33%|███▎      | 332/1000 [13:38<13:37,  1.22s/it]

Error extracting text from https://www.militarynews.com/news/national/us-navy-says-it-won-t-be-deterred-by-chinese/image_c1200e4d-943d-527f-ac5e-b9e4d31940f7.html: 404 Client Error: Not Found for url: https://www.militarynews.com/news/national/us-navy-says-it-won-t-be-deterred-by-chinese/image_c1200e4d-943d-527f-ac5e-b9e4d31940f7.html


Processing URLs:  34%|███▎      | 337/1000 [13:46<13:40,  1.24s/it]

Error extracting text from http://www.scottaaronson.com/blog/?p=208: 406 Client Error: Not Acceptable for url: http://www.scottaaronson.com/blog/?p=208


Processing URLs:  34%|███▍      | 340/1000 [13:49<11:29,  1.04s/it]

Error extracting text from http://gpseducation.oecd.org/CountryProfile?primaryCountry=NLD&amp;treshold=10&amp;topic=PI: 403 Client Error: Forbidden for url: https://gpseducation.oecd.org/CountryProfile?primaryCountry=NLD&amp;treshold=10&amp;topic=PI


Processing URLs:  34%|███▍      | 343/1000 [14:00<22:38,  2.07s/it]

Error extracting text from https://www.nytimes.com/aponline/2017/08/06/world/asia/ap-as-united-states-russia-the-latest.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/aponline/2017/08/06/world/asia/ap-as-united-states-russia-the-latest.html
URL filtered: https://www.youtube.com/watch?v=pmN25Zb1xfg


Processing URLs:  35%|███▍      | 347/1000 [14:02<12:58,  1.19s/it]

Error extracting text from http://www.timesofisrael.com/liveblog_entry/putin-told-assad-hed-have-to-step-down-officials/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/liveblog_entry/putin-told-assad-hed-have-to-step-down-officials/


Processing URLs:  35%|███▍      | 349/1000 [14:03<08:49,  1.23it/s]

Error extracting text from http://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#hsTxCWwlTClx2u0s.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/18/us-iran-nuclear-iaea-idUSKCN0T72B720151118#hsTxCWwlTClx2u0s.97


Processing URLs:  35%|███▌      | 351/1000 [14:05<08:40,  1.25it/s]

Error extracting text from http://www.business-standard.com/article/companies/ovl-seeks-oil-in-lieu-of-537-mn-due-from-venezuela-116091800223_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/companies/ovl-seeks-oil-in-lieu-of-537-mn-due-from-venezuela-116091800223_1.html


Processing URLs:  35%|███▌      | 353/1000 [14:07<08:36,  1.25it/s]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/south-africas-jacob-zuma-defeats-no-confidence-vote-in-parliament/articleshow/55359293.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/south-africas-jacob-zuma-defeats-no-confidence-vote-in-parliament/articleshow/55359293.cms
Error extracting text from https://medium.com/connected/icml-accepted-papers-stats-2018-1f9c0a9a6eaf: 403 Client Error: Forbidden for url: https://medium.com/connected/icml-accepted-papers-stats-2018-1f9c0a9a6eaf


Processing URLs:  36%|███▌      | 355/1000 [14:13<19:51,  1.85s/it]

URL filtered: https://www.thedailybeast.com/russia-recruited-youtubers-to-bash-racist-btch-hillary-clinton-over-rap-beats


Processing URLs:  36%|███▌      | 357/1000 [14:14<12:06,  1.13s/it]

Error extracting text from https://www.nytimes.com/2017/08/02/us/politics/tillerson-north-korea-negotiations-missile-test.html?emc=edit_th_20170803&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/02/us/politics/tillerson-north-korea-negotiations-missile-test.html?emc=edit_th_20170803&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0


Processing URLs:  36%|███▌      | 359/1000 [14:15<08:51,  1.21it/s]

Error extracting text from http://www.latimes.com/world/europe/la-fg-germany-shulz-2017-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/europe/la-fg-germany-shulz-2017-story.html


Processing URLs:  36%|███▌      | 360/1000 [14:15<07:22,  1.45it/s]

Error extracting text from http://in.rbth.com/news/2016/01/21/russian-navy-to-hold-drills-with-china-egypt-india-this-year_561151: HTTPConnectionPool(host='in.rbth.com', port=80): Max retries exceeded with url: /news/2016/01/21/russian-navy-to-hold-drills-with-china-egypt-india-this-year_561151 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3076d8380>: Failed to resolve 'in.rbth.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  36%|███▌      | 362/1000 [14:18<09:59,  1.06it/s]

Error extracting text from http://www.timesargus.com/article/20151030/OPINION01/151039972: 404 Client Error: Not Found for url: https://www.timesargus.com/article/20151030/opinion01/151039972/
Error extracting text from http://www.reuters.com/article/japan-economy-gdp-idUST9N13E01T20151207: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/japan-economy-gdp-idUST9N13E01T20151207


Processing URLs:  36%|███▋      | 363/1000 [14:19<10:48,  1.02s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-03/asian-futures-signal-more-stock-losses-on-draghi-disappointment


Processing URLs:  37%|███▋      | 367/1000 [14:23<10:33,  1.00s/it]

Error extracting text from http://www.businessinsider.com/r-spain-heading-for-more-deadlock-as-socialists-reject-backing-pm-rajoy-2016-7?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-spain-heading-for-more-deadlock-as-socialists-reject-backing-pm-rajoy-2016-7?IR=T


Processing URLs:  37%|███▋      | 374/1000 [14:33<15:08,  1.45s/it]

Error extracting text from http://www.economist.com/printedition/covers/2017-02-09/ap-e-eu-la-me-na-uk: 404 Client Error: Not Found for url: https://www.economist.com/weeklyedition/covers/2017-02-09/ap-e-eu-la-me-na-uk


Processing URLs:  38%|███▊      | 377/1000 [14:39<17:37,  1.70s/it]

Error extracting text from http://macedoniaonline.eu/content/view/28136/2/: 404 Client Error: Not Found for url: https://macedoniaonline.eu/content/view/28136/2


Processing URLs:  38%|███▊      | 378/1000 [14:40<17:21,  1.67s/it]

Error extracting text from https://www.reuters.com/world/when-biden-meets-putin-old-foes-could-cool-off-not-reset-2021-05-13/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/when-biden-meets-putin-old-foes-could-cool-off-not-reset-2021-05-13/
URL filtered: http://bigthink.com/neurobonkers/did-facebook-just-finally-kiss-goodbye-to-hoax-news


Processing URLs:  38%|███▊      | 382/1000 [14:41<07:40,  1.34it/s]

Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16Y2QF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16Y2QF?il=0
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKCN0YV1VO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN0YV1VO


Processing URLs:  38%|███▊      | 384/1000 [14:41<05:09,  1.99it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-wrapup-idUSKCN0ZR19P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-wrapup-idUSKCN0ZR19P


Processing URLs:  39%|███▉      | 388/1000 [14:46<07:36,  1.34it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VE082: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-satellite-idUSKCN0VE082


Processing URLs:  39%|███▉      | 389/1000 [14:47<08:37,  1.18it/s]

Error extracting text from https://cleantechnica.com/2016/08/11/china-electric-car-sales-188-still-dominated-byd/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/08/11/china-electric-car-sales-188-still-dominated-byd/


Processing URLs:  39%|███▉      | 392/1000 [14:51<10:10,  1.00s/it]

Error extracting text from http://www.policyexchange.org.uk/images/WolfsonPrize/depart%20default%20devalue%20wolfson.pdf: 403 Client Error: Forbidden for url: http://www.policyexchange.org.uk/images/WolfsonPrize/depart%20default%20devalue%20wolfson.pdf


Processing URLs:  39%|███▉      | 393/1000 [14:52<09:33,  1.06it/s]

URL filtered: https://www.youtube.com/watch?v=Nt9N-DV1Ccs


Processing URLs:  40%|███▉      | 395/1000 [14:52<06:20,  1.59it/s]

Error extracting text from http://allafrica.com/stories/201602100194.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201602100194.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x307d79610>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  40%|███▉      | 397/1000 [14:55<09:34,  1.05it/s]

Error extracting text from http://www.ohio.com/news/politics/state/political-ad-buys-in-ohio-cancelled-as-outside-groups-see-portman-pulling-ahead-1.709272: 404 Client Error: OK for url: https://www.beaconjournal.com/news/politics/state/political-ad-buys-in-ohio-cancelled-as-outside-groups-see-portman-pulling-ahead-1.709272/
URL filtered: http://www.bloomberg.com/news/articles/2015-10-29/malaysia-central-bank-said-to-take-interbank-dollar-deposits


Processing URLs:  40%|████      | 400/1000 [14:56<06:56,  1.44it/s]

Error extracting text from http://www.reuters.com/article/us-yemen-security-missiles-idUSKCN12C294: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-missiles-idUSKCN12C294


Processing URLs:  40%|████      | 402/1000 [14:58<06:54,  1.44it/s]

Error extracting text from http://boingboing.net/2016/06/27/chinas-10byear-pr-ministry.html: 403 Client Error: Forbidden for url: https://boingboing.net/2016/06/27/chinas-10byear-pr-ministry.html


Processing URLs:  40%|████      | 404/1000 [15:01<09:16,  1.07it/s]

Error extracting text from http://micanaldepanama.com/expansion/faq/: 403 Client Error: Forbidden for url: https://pancanal.com/expansion/faq/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://rodapenews.blogspot.com/2016/03/mirian-dutra-denuncia-jovelino-mineiro.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://rodapenews.blogspot.com/2016/03/mirian-dutra-denuncia-jovelino-mineiro.html&amp;prev=search


Processing URLs:  41%|████      | 406/1000 [15:03<08:35,  1.15it/s]

Error extracting text from https://www.nytimes.com/2017/03/07/us/politics/affordable-care-act-obama-care-health.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/07/us/politics/affordable-care-act-obama-care-health.html


Processing URLs:  41%|████      | 408/1000 [15:08<17:10,  1.74s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1114448/000117184317004057/f6k_071217.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1114448/000117184317004057/f6k_071217.htm


Processing URLs:  41%|████      | 409/1000 [15:10<15:58,  1.62s/it]

Error extracting text from http://www.newsweek.com/assad-says-aleppo-must-be-cleaned-siege-city-continues-509980: 403 Client Error: Forbidden for url: https://www.newsweek.com/assad-says-aleppo-must-be-cleaned-siege-city-continues-509980


Processing URLs:  42%|████▏     | 416/1000 [15:21<14:34,  1.50s/it]

Error extracting text from http://colombiapeace.org/2016/03/23/why-colombias-negotiators-couldnt-manage-a-cease-fire-by-march-23/: 403 Client Error: Forbidden for url: http://colombiapeace.org/2016/03/23/why-colombias-negotiators-couldnt-manage-a-cease-fire-by-march-23/
Error extracting text from http://www.reuters.com/article/us-afghanistan-minister-idUSKCN0XX0G7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-minister-idUSKCN0XX0G7


Processing URLs:  42%|████▏     | 418/1000 [15:21<09:25,  1.03it/s]

Error extracting text from http://www.japantimes.co.jp/news/2013/11/27/national/criticism-of-chinas-adiz-increases-japanese-airlines-do-a-policy-u-turn/#.Vskqx85yvIo: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2013/11/27/national/criticism-of-chinas-adiz-increases-japanese-airlines-do-a-policy-u-turn/#.Vskqx85yvIo


Processing URLs:  42%|████▏     | 419/1000 [15:23<10:22,  1.07s/it]

Error extracting text from https://www.independentsciencenews.org/commentaries/why-china-and-the-who-will-never-find-a-zoonotic-origin-for-the-covid19-pandemic-virus/: 403 Client Error: Forbidden for url: https://www.independentsciencenews.org/commentaries/why-china-and-the-who-will-never-find-a-zoonotic-origin-for-the-covid19-pandemic-virus/


Processing URLs:  42%|████▏     | 420/1000 [15:24<10:32,  1.09s/it]

Error extracting text from http://www.state.gov/s/greatlakes_drc/releases/2015/240838.htm: 404 Client Error: Not Found for url: https://www.state.gov/s/greatlakes_drc/releases/2015/240838.htm


Processing URLs:  42%|████▏     | 422/1000 [15:27<13:00,  1.35s/it]

URL filtered: https://www.youtube.com/watch?rid=3140355&amp;v=dCm11C3f-YI&amp;mid=69369&amp;app=desktop


Processing URLs:  43%|████▎     | 430/1000 [15:36<09:47,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-canada-marijuana-idUSKBN17F25I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-canada-marijuana-idUSKBN17F25I


Processing URLs:  43%|████▎     | 432/1000 [15:37<06:52,  1.38it/s]

Error extracting text from http://www.reuters.com/article/us-nato-montenegro-idUSKBN0TL0J620151202: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-montenegro-idUSKBN0TL0J620151202


Processing URLs:  43%|████▎     | 433/1000 [15:39<10:27,  1.11s/it]

Error extracting text from http://defensetech.org/2015/11/02/china-flies-armed-jets-over-disputed-islands/: 404 Client Error: Not Found for url: https://leon.bet/blog/contribution/collaborations-defensetech/2015/11/02/china-flies-armed-jets-over-disputed-islands/
URL filtered: http://www.bloomberg.com/news/articles/2015-11-19/goldman-sees-yellen-call-limiting-2016-u-s-stock-market-gains


Processing URLs:  44%|████▎     | 436/1000 [15:44<13:33,  1.44s/it]

Error extracting text from http://uk.mobile.reuters.com/article/idUKKCN10B071: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUKKCN10B071


Processing URLs:  44%|████▍     | 438/1000 [15:45<10:15,  1.10s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53097#.VrTqUtDAo2w: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53097#.VrTqUtDAo2w


Processing URLs:  44%|████▍     | 440/1000 [15:46<06:55,  1.35it/s]

Error extracting text from http://www.wsj.com/articles/keiko-fujimori-and-pedro-pablo-kuczynski-in-tight-race-for-president-of-peru-1464990377: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/keiko-fujimori-and-pedro-pablo-kuczynski-in-tight-race-for-president-of-peru-1464990377


Processing URLs:  44%|████▍     | 443/1000 [15:53<19:21,  2.09s/it]

URL filtered: https://www.washingtonpost.com/technology/2019/09/19/facebooks-mark-zuckerberg-dined-with-lawmakers-last-night-privacy-cryptocurrency-were-menu/


Processing URLs:  44%|████▍     | 445/1000 [15:53<11:28,  1.24s/it]

Error extracting text from http://nyti.ms/QdL4vA: 403 Client Error: Forbidden for url: http://www.nytimes.com/2012/11/23/us/politics/one-party-control-opens-states-to-partisan-rush.html?smid=tw-share


Processing URLs:  45%|████▍     | 448/1000 [15:58<12:32,  1.36s/it]

Error extracting text from http://finance.yahoo.com/news/u-november-employment-report-seen-052252580.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/u-november-employment-report-seen-052252580.html


Processing URLs:  45%|████▌     | 450/1000 [16:00<10:14,  1.12s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56191#.WLOKwxLyt8c: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56191#.WLOKwxLyt8c


Processing URLs:  45%|████▌     | 451/1000 [16:00<08:47,  1.04it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/358710-homeland-security-cyber-unit-on-alert-for-election-day: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/358710-homeland-security-cyber-unit-on-alert-for-election-day/


Processing URLs:  45%|████▌     | 452/1000 [16:00<06:57,  1.31it/s]

Error extracting text from http://www.cdm.me/english/luksic-over-70-percent-of-citizens-believe-that-we-will-be-a-member-of-nato: 403 Client Error: Forbidden for url: https://www.cdm.me/english/luksic-over-70-percent-of-citizens-believe-that-we-will-be-a-member-of-nato


Processing URLs:  45%|████▌     | 453/1000 [17:01<2:46:58, 18.32s/it]

Error extracting text from https://www.nafta-sec-alena.org/Home/Legal-Texts/North-American-Free-Trade-Agreement?mvid=1&amp;secid=d5a8ba07-1fb2-4f28-88d0-a8eac08611a2: HTTPSConnectionPool(host='www.nafta-sec-alena.org', port=443): Read timed out. (read timeout=60)


Processing URLs:  45%|████▌     | 454/1000 [17:03<2:03:02, 13.52s/it]

Error extracting text from https://www.ers.usda.gov/topics/crops/corn/trade/: 404 Client Error: Not Found for url: https://www.ers.usda.gov/topics/crops/corn/trade/


Processing URLs:  46%|████▌     | 455/1000 [17:04<1:28:04,  9.70s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-01-18/rustbelt-china-province-admits-it-faked-fiscal-data-from-2011-14


Processing URLs:  46%|████▌     | 457/1000 [17:04<48:50,  5.40s/it]  

Error extracting text from http://www.aina.org/news/20160907123752.htm: 404 Client Error:  for url: http://www.aina.org/news/20160907123752.htm


Processing URLs:  46%|████▋     | 464/1000 [17:19<18:22,  2.06s/it]

Error extracting text from http://pulitzercenter.org/reporting/welcome-demokrasi-how-erdogan-got-more-popular-ever: 403 Client Error: Forbidden for url: http://pulitzercenter.org/reporting/welcome-demokrasi-how-erdogan-got-more-popular-ever


Processing URLs:  46%|████▋     | 465/1000 [17:19<14:28,  1.62s/it]

Error extracting text from https://webcache.googleusercontent.com/search?q=cache:iaiAIG54eZ8J:english.yonhapnews.co.kr/news/2016/06/15/69/0200000000AEN20160615001300315F.html+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=au&amp;client=opera: 404 Client Error: Not Found for url: https://webcache.googleusercontent.com/search?q=cache:iaiAIG54eZ8J:english.yonhapnews.co.kr/news/2016/06/15/69/0200000000AEN20160615001300315F.html+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=au&amp;client=opera


Processing URLs:  47%|████▋     | 466/1000 [17:20<11:43,  1.32s/it]

Error extracting text from https://www.amnesty.org/en/countries/africa/burundi/report-burundi/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/location/africa/east-africa-the-horn-and-great-lakes/burundi/report-burundi/


Processing URLs:  47%|████▋     | 474/1000 [17:35<13:05,  1.49s/it]

Error extracting text from http://www.boardability.com/profile.php?id=demis_hassabis: 406 Client Error: Not Acceptable for url: http://www.boardability.com/profile.php?id=demis_hassabis
Error extracting text from http://www.reuters.com/article/us-turkey-referendum-germany-gabriel-idUSKBN16P0EA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-referendum-germany-gabriel-idUSKBN16P0EA


Processing URLs:  48%|████▊     | 476/1000 [17:38<12:43,  1.46s/it]

Error extracting text from http://news.antiwar.com/2016/08/28/mosul-fight-is-redrawing-the-map-of-northern-iraq/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2016/08/28/mosul-fight-is-redrawing-the-map-of-northern-iraq/


Processing URLs:  48%|████▊     | 478/1000 [17:41<11:51,  1.36s/it]

Error extracting text from http://38north.org/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml
Error extracting text from http://www.reuters.com/article/2015/09/25/usa-boehner-eximbank-idUSL1N11V27S20150925: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/25/usa-boehner-eximbank-idUSL1N11V27S20150925


Processing URLs:  48%|████▊     | 481/1000 [17:46<13:11,  1.52s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-controversy-us-will-continue-operate-disputed-region-top-marine-corps-2369992: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-controversy-us-will-continue-operate-disputed-region-top-marine-corps-2369992


Processing URLs:  48%|████▊     | 482/1000 [17:47<11:37,  1.35s/it]

Error extracting text from https://www.makeuseof.com/tag/how-to-use-serval-mesh-to-chat-to-other-mobile-phones-without-a-phone-network-android/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  48%|████▊     | 484/1000 [17:49<11:26,  1.33s/it]

Error extracting text from http://in.reuters.com/article/india-brics-development-bank-idINKBN1720C9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  49%|████▉     | 490/1000 [18:04<15:52,  1.87s/it]

Error extracting text from http://www.businessinsider.com.au/russia-vs-saudi-arabia-in-chinas-oil-market-2016-2?r=US&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/russia-vs-saudi-arabia-in-chinas-oil-market-2016-2?r=US&amp;IR=T


Processing URLs:  50%|████▉     | 496/1000 [18:13<11:36,  1.38s/it]

Error extracting text from http://www.imagesatintl.com/possible-iskander-launch-syria/: 404 Client Error: Not Found for url: http://www.imagesatintl.com/possible-iskander-launch-syria/
Error extracting text from http://www.nytimes.com/2016/02/27/world/middleeast/syria-truce-comes-with-price-but-not-for-assad.html?mwrsm=Email&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/27/world/middleeast/syria-truce-comes-with-price-but-not-for-assad.html?mwrsm=Email&amp;_r=0


Processing URLs:  50%|████▉     | 497/1000 [18:13<08:38,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-aleppo-militants-idUSKCN0YQ0J0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-aleppo-militants-idUSKCN0YQ0J0


Processing URLs:  51%|█████     | 508/1000 [18:35<11:58,  1.46s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/2016/03/oposicao-anuncia-que-ira-impedir-todas-as-votacoes-ate-que-impeachment-volte-a-pauta-00822691.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/2016/03/oposicao-anuncia-que-ira-impedir-todas-as-votacoes-ate-que-impeachment-volte-a-pauta-00822691.html&amp;prev=search


Processing URLs:  51%|█████     | 510/1000 [18:36<08:02,  1.01it/s]

Error extracting text from https://www.business-standard.com/article/economy-policy/rbi-s-covid-relief-good-move-but-late-says-health-care-industry-121050501535_1.html: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/economy-policy/rbi-s-covid-relief-good-move-but-late-says-health-care-industry-121050501535_1.html


Processing URLs:  51%|█████     | 511/1000 [18:42<19:26,  2.39s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-10/abe-adviser-says-oct-30-good-opportunity-for-more-boj-easing


Processing URLs:  51%|█████▏    | 513/1000 [18:43<12:21,  1.52s/it]

Error extracting text from https://cleantechnica.com/2016/11/02/us-nissan-leaf-sales-reach-100000/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/11/02/us-nissan-leaf-sales-reach-100000/


Processing URLs:  52%|█████▏    | 518/1000 [19:13<43:01,  5.36s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/japan-helps-87-companies-to-exit-china-after-pandemic-exposed-overreliance/2020/07/21/4889abd2-cb2f-11ea-99b0-8426e26d203b_story.htmlhttps://carnegieendowment.org/2020/10/21/south-korea-is-caught-between-china-and-united-states-pub-83019https://thediplomat.com/2020/11/south-korea-and-japan-continue-to-struggle-to-bridge-their-differences/: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/japan-helps-87-companies-to-exit-china-after-pandemic-exposed-overreliance/2020/07/21/4889abd2-cb2f-11ea-99b0-8426e26d203b_story.htmlhttps://carnegieendowment.org/2020/10/21/south-korea-is-caught-between-china-and-united-states-pub-83019https://thediplomat.com/2020/11/south-korea-and-japan-continue-to-struggle-to-bridge-their-differences/


Processing URLs:  52%|█████▏    | 519/1000 [19:13<31:35,  3.94s/it]

Error extracting text from https://lobelog.com/scientists-to-trump-the-jcpoa-is-working/: 403 Client Error: Forbidden for url: https://lobelog.com/scientists-to-trump-the-jcpoa-is-working/


Processing URLs:  52%|█████▏    | 521/1000 [19:15<18:54,  2.37s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/the-latest-trump-proud-to-support-gop-health-plan/articleshow/57526432.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/the-latest-trump-proud-to-support-gop-health-plan/articleshow/57526432.cms
Error extracting text from http://www.wsj.com/articles/the-paradox-hindering-syrian-peace-1456277745: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-paradox-hindering-syrian-peace-1456277745


Processing URLs:  52%|█████▏    | 522/1000 [19:16<15:51,  1.99s/it]

Error extracting text from http://www.ff.com/subscribe/: 404 Client Error: Not Found for url: http://www.ff.com/subscribe/


Processing URLs:  53%|█████▎    | 527/1000 [19:22<07:23,  1.07it/s]

Error extracting text from http://www.business-standard.com/article/news-ians/chinese-shares-reverse-fall-115113000551_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/chinese-shares-reverse-fall-115113000551_1.html
Error extracting text from http://www.reuters.com/article/us-sony-cyber-idUSKCN0VX1IR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-sony-cyber-idUSKCN0VX1IR
URL filtered: https://www.youtube.com/watch?v=ZytfMzTGFZU


Processing URLs:  53%|█████▎    | 531/1000 [19:33<17:30,  2.24s/it]

Error extracting text from http://news.usni.org/2016/01/28/panel-russian-nuclear-saber-rattling-prompting-nato-to-rethink-its-role: 403 Client Error: Forbidden for url: http://news.usni.org/2016/01/28/panel-russian-nuclear-saber-rattling-prompting-nato-to-rethink-its-role


Processing URLs:  53%|█████▎    | 533/1000 [19:42<26:58,  3.47s/it]

Error extracting text from http://www.philstar.com/headlines/2017/04/29/1695167/asean-accepting-south-china-sea-chinas-lake-says-analyst: 404 Client Error: Not Found for url: https://www.philstar.com/404?msg=article%20404%20-%201


Processing URLs:  54%|█████▍    | 540/1000 [19:50<09:30,  1.24s/it]

Error extracting text from https://www.newstimes.com/technology/businessinsider/article/US-signals-major-change-on-North-Korea-after-12606506.php: 403 Client Error: Forbidden for url: https://www.newstimes.com/technology/businessinsider/article/US-signals-major-change-on-North-Korea-after-12606506.php


Processing URLs:  54%|█████▍    | 542/1000 [19:52<06:59,  1.09it/s]

Error extracting text from https://www.telegraph.co.uk/politics/2021/01/22/covid-lockdown-when-end-uk-rules-national-restrictions-review/: Exceeded 30 redirects.
Error extracting text from http://www.reuters.com/article/us-thailand-king-constitution-idUSKBN14X0IF?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-thailand-king-constitution-idUSKBN14X0IF?il=0


Processing URLs:  55%|█████▍    | 547/1000 [19:55<06:33,  1.15it/s]

Error extracting text from http://www.cnbc.com/2015/08/14/reuters-america-timeline-push-to-relax-us-oil-export-ban-gathers-steam-in-2015.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/08/14/reuters-america-timeline-push-to-relax-us-oil-export-ban-gathers-steam-in-2015.html


Processing URLs:  55%|█████▌    | 554/1000 [20:06<09:09,  1.23s/it]

URL filtered: https://www.youtube.com/watch?v=If3SXJeZzMQ


Processing URLs:  56%|█████▌    | 559/1000 [20:08<03:57,  1.86it/s]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-eu-turkey-idUSKCN0YA1E9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-eu-turkey-idUSKCN0YA1E9


Processing URLs:  56%|█████▌    | 561/1000 [20:09<04:28,  1.64it/s]

Error extracting text from http://finance.yahoo.com/news/citigroup-says-saudi-arabia-best-114635299.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/citigroup-says-saudi-arabia-best-114635299.html


Processing URLs:  57%|█████▋    | 566/1000 [20:17<09:06,  1.26s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-idUSKBN16H066: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-idUSKBN16H066


Processing URLs:  57%|█████▋    | 568/1000 [20:20<10:46,  1.50s/it]

Error extracting text from http://reut.rs/1XJV8u3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/us-trade-europe-usa-idUSKBN0TK4O420151201#DcKlrk0wHbkQcGam.97


Processing URLs:  57%|█████▋    | 569/1000 [20:22<11:44,  1.63s/it]

Error extracting text from https://www.dni.gov/index.php/newsroom/press-releases/press-releases-2021/item/2218-odni-statement-on-covid-19-origins: 404 Client Error: Not Found for url: https://www.dni.gov/index.php/newsroom/press-releases/press-releases-2021/item/2218-odni-statement-on-covid-19-origins


Processing URLs:  57%|█████▋    | 570/1000 [21:22<2:17:17, 19.16s/it]



Processing URLs:  58%|█████▊    | 577/1000 [21:51<24:54,  3.53s/it]  

Error extracting text from https://abcnews.go.com/International/wireStory/haiti-faces-fresh-instability-pm-scrutiny-80039222: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/haiti-faces-fresh-instability-pm-scrutiny-80039222


Processing URLs:  58%|█████▊    | 581/1000 [21:56<12:01,  1.72s/it]

Error extracting text from http://www.govtech.com/policy/Will-Changes-to-Californias-Driverless-Vehicle-Bill-Ease-Driver-Data-Protections.html: 403 Client Error: Forbidden for url: https://www.govtech.com/policy/Will-Changes-to-Californias-Driverless-Vehicle-Bill-Ease-Driver-Data-Protections.html


Processing URLs:  58%|█████▊    | 582/1000 [21:57<09:47,  1.41s/it]

URL filtered: https://twitter.com/RichardBSpencer


Processing URLs:  58%|█████▊    | 584/1000 [21:58<07:38,  1.10s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-02-27/trump-tells-large-health-insurers-to-expect-something-special


Processing URLs:  59%|█████▊    | 586/1000 [21:59<05:55,  1.16it/s]

Error extracting text from http://www.mars-one.com/faq/mission-to-mars/how-long-does-it-take-to-travel-to-mars: 404 Client Error: Not Found for url: http://www.mars-one.com/faq/mission-to-mars/how-long-does-it-take-to-travel-to-mars


Processing URLs:  59%|█████▊    | 587/1000 [22:02<08:04,  1.17s/it]

Error extracting text from http://infosoc.iis.ru/content/2014/201402_authors_eng.html: 404 Client Error: Not Found for url: http://infosoc.iis.ru/content/2014/201402_authors_eng.html


Processing URLs:  59%|█████▉    | 590/1000 [22:35<46:12,  6.76s/it]  

Error extracting text from http://uk.reuters.com/article/uk-britain-eu-scotland-may-referendum-idUKKBN16K2QG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  60%|█████▉    | 595/1000 [22:41<13:05,  1.94s/it]

Error extracting text from http://news.yahoo.com/peru-opens-vote-buying-probe-against-fujimori-015805306.html: 404 Client Error: Not Found for url: http://news.yahoo.com/peru-opens-vote-buying-probe-against-fujimori-015805306.html


Processing URLs:  60%|█████▉    | 598/1000 [22:52<16:10,  2.41s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-12-18/imf-approves-3-9-billion-in-aid-for-ukraine-as-debt-pinch-looms
Error extracting text from http://www.nytimes.com/2016/01/22/world/americas/zika-virus-may-be-linked-to-surge-in-rare-syndrome-in-brazil.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/22/world/americas/zika-virus-may-be-linked-to-surge-in-rare-syndrome-in-brazil.html


Processing URLs:  60%|██████    | 605/1000 [23:09<21:18,  3.24s/it]

Error extracting text from http://news.trust.org/item/20160314110302-vc71s: 404 Client Error:  for url: https://news.trust.org:443/item/20160314110302-vc71s


Processing URLs:  61%|██████    | 607/1000 [23:14<16:28,  2.51s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/286384-trump-sources-say-clinton-wont-be-charged: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/286384-trump-sources-say-clinton-wont-be-charged/


Processing URLs:  61%|██████▏   | 613/1000 [23:30<13:37,  2.11s/it]

Error extracting text from http://www.nytimes.com/2015/10/10/us/politics/a-biden-run-would-expose-foreign-policy-differences-with-hillary-clinton.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/10/us/politics/a-biden-run-would-expose-foreign-policy-differences-with-hillary-clinton.html?_r=0
URL filtered: https://www.bloomberg.com/news/articles/2017-03-07/oil-drops-below-53-as-report-shows-rising-u-s-crude-stockpiles


Processing URLs:  62%|██████▏   | 619/1000 [23:48<20:10,  3.18s/it]

Error extracting text from https://www.scientificamerican.com/article/finding-conclusive-animal-origins-of-the-coronavirus-will-take-time/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/finding-conclusive-animal-origins-of-the-coronavirus-will-take-time/


Processing URLs:  62%|██████▏   | 621/1000 [23:51<14:02,  2.22s/it]

Error extracting text from http://www.wsj.com/articles/germany-turkey-to-intensify-illegal-immigration-fight-1453482293: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germany-turkey-to-intensify-illegal-immigration-fight-1453482293


Processing URLs:  62%|██████▏   | 622/1000 [23:51<10:36,  1.68s/it]

Error extracting text from https://www.nytimes.com/2021/03/03/us/politics/house-voting-rights-bill.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/03/us/politics/house-voting-rights-bill.html


Processing URLs:  62%|██████▏   | 624/1000 [23:53<07:35,  1.21s/it]

Error extracting text from http://www.globalfiredata.org/regional: 404 Client Error: Not Found for url: http://www.globalfiredata.org/regional
URL filtered: https://www.youtube.com/results?search_query=Mayfair+Set


Processing URLs:  63%|██████▎   | 626/1000 [23:53<04:25,  1.41it/s]

Error extracting text from http://emarketalerts.forecast1.com/mic/eabstract.cfm?recno=237515: HTTPConnectionPool(host='emarketalerts.forecast1.com', port=80): Max retries exceeded with url: /mic/eabstract.cfm?recno=237515 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30082ea50>: Failed to resolve 'emarketalerts.forecast1.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  63%|██████▎   | 629/1000 [23:57<06:55,  1.12s/it]

Error extracting text from http://www.ibtimes.co.uk/chinas-xi-jinping-open-dutertes-proposal-turning-panatag-shoal-into-marine-sanctuary-1592580: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/chinas-xi-jinping-open-dutertes-proposal-turning-panatag-shoal-into-marine-sanctuary-1592580


Processing URLs:  63%|██████▎   | 631/1000 [23:59<05:43,  1.07it/s]

Error extracting text from http://news.yahoo.com/many-months-start-battle-mosul-coalition-190540445.html: 404 Client Error: Not Found for url: http://news.yahoo.com/many-months-start-battle-mosul-coalition-190540445.html


Processing URLs:  63%|██████▎   | 632/1000 [24:01<06:47,  1.11s/it]

Error extracting text from http://www.responsemagazine.com/direct-response-marketing/news/lawmakers-foreign-business-leaders-urge-fcc-halt-net-10383: HTTPConnectionPool(host='www.responsemagazine.com', port=80): Max retries exceeded with url: /direct-response-marketing/news/lawmakers-foreign-business-leaders-urge-fcc-halt-net-10383 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30364e690>: Failed to resolve 'www.responsemagazine.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  64%|██████▎   | 635/1000 [24:04<06:22,  1.05s/it]

Error extracting text from https://www.fda.gov/vaccines-blood-biologics/vaccines/emergency-use-authorization-vaccines-: 404 Client Error: Not Found for url: https://www.fda.gov/vaccines-blood-biologics/vaccines/emergency-use-authorization-vaccines-


Processing URLs:  64%|██████▎   | 636/1000 [24:05<06:35,  1.09s/it]

Error extracting text from http://www.cnbc.com/2017/07/22/congress-reaches-bipartisan-agreement-on-sweeping-russia-sanctions.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/07/22/congress-reaches-bipartisan-agreement-on-sweeping-russia-sanctions.html


Processing URLs:  64%|██████▎   | 637/1000 [24:07<07:31,  1.24s/it]

Error extracting text from http://www.ibtimes.com/adnan-syed-update-will-murder-conviction-be-overturned-new-evidence-2068533: 403 Client Error: Forbidden for url: https://www.ibtimes.com/adnan-syed-update-will-murder-conviction-be-overturned-new-evidence-2068533


Processing URLs:  64%|██████▍   | 639/1000 [24:17<16:50,  2.80s/it]

Error extracting text from http://www.reuters.com/article/us-usa-cyber-russia-idUSKCN12729B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-idUSKCN12729B


Processing URLs:  64%|██████▍   | 640/1000 [24:19<14:22,  2.40s/it]

Error extracting text from http://tiny.cc/j0anmy: 403 Client Error: Forbidden for url: https://tiny.cc/j0anmy


Processing URLs:  65%|██████▍   | 646/1000 [25:26<25:21,  4.30s/it]  

Error extracting text from https://news.abs-cbn.com/overseas/03/28/21/china-announces-further-military-exercises-in-disputed-south-china-sea: 403 Client Error: Forbidden for url: https://news.abs-cbn.com/overseas/03/28/21/china-announces-further-military-exercises-in-disputed-south-china-sea


Processing URLs:  65%|██████▍   | 648/1000 [25:30<18:22,  3.13s/it]

Error extracting text from https://supchina.com/2021/08/16/calls-intensify-for-a-boycott-of-beijing-2022/: 403 Client Error: Forbidden for url: https://thechinaproject.com/2021/08/16/calls-intensify-for-a-boycott-of-beijing-2022/


Processing URLs:  65%|██████▍   | 649/1000 [25:32<17:30,  2.99s/it]

URL filtered: https://www.investopedia.com/news/why-apple-could-reach-1-trillion-next-year/?utm_campaign=quote-bloomberg&amp;utm_source=bloomberg&amp;utm_medium=referral&amp;utm_term=fb-capture&amp;utm_content=/#ec%7Crss-bloomberg


Processing URLs:  65%|██████▌   | 652/1000 [25:37<12:18,  2.12s/it]

Error extracting text from http://www.forbesmiddleeast.com/en/news/read/2017/saudi-aramco-behind-the-world-s/articleid/11277#: 404 Client Error: Not Found for url: https://www.forbesmiddleeast.com/news/read/2017/saudi-aramco-behind-the-world-s/articleid/11277


Processing URLs:  66%|██████▌   | 656/1000 [26:13<58:05, 10.13s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/18864-union-parliament-sets-march-17-as-deadline-for-presidential-nominations.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/18864-union-parliament-sets-march-17-as-deadline-for-presidential-nominations.html


Processing URLs:  66%|██████▌   | 657/1000 [26:14<43:56,  7.69s/it]

Error extracting text from http://economictimes.indiatimes.com/news/defence/un-security-council-to-hold-emergency-meeting-on-north-korea/articleshow/59449775.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/defence/un-security-council-to-hold-emergency-meeting-on-north-korea/articleshow/59449775.cms


Processing URLs:  66%|██████▌   | 659/1000 [26:15<23:00,  4.05s/it]

Error extracting text from https://www.nytimes.com/2017/01/28/us/refugees-detained-at-us-airports-prompting-legal-challenges-to-trumps-immigration-order.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/28/us/refugees-detained-at-us-airports-prompting-legal-challenges-to-trumps-immigration-order.html


Processing URLs:  66%|██████▌   | 662/1000 [26:21<16:36,  2.95s/it]

Error extracting text from http://www.suffolk.edu/documents/SUPRC/2008-1-8-democrattables.pdf: 404 Client Error: Not Found for url: https://www.suffolk.edu/documents/SUPRC/2008-1-8-democrattables.pdf


Processing URLs:  67%|██████▋   | 671/1000 [26:34<06:49,  1.24s/it]

Error extracting text from https://www.reuters.com/article/us-usa-stocks/wall-street-hits-high-score-as-videogame-makers-rally-idUSKBN1D81M8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks/wall-street-hits-high-score-as-videogame-makers-rally-idUSKBN1D81M8


Processing URLs:  67%|██████▋   | 673/1000 [26:36<05:49,  1.07s/it]

Error extracting text from http://www.crossmap.com/news/after-massacre-in-benue-nigeria-muslim-herdsmen-remain-in-villages-sources-say-26033: 404 Client Error: Not Found for url: https://www.crossmap.com/news/after-massacre-in-benue-nigeria-muslim-herdsmen-remain-in-villages-sources-say-26033
Error extracting text from https://www.business-standard.com/article/defence/defence-minister-rajnath-singh-speaks-to-us-on-china-s-lac-intrusion-120053000028_1.html/: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/defence/defence-minister-rajnath-singh-speaks-to-us-on-china-s-lac-intrusion-120053000028_1.html/


Processing URLs:  67%|██████▋   | 674/1000 [26:37<06:07,  1.13s/it]

Error extracting text from http://mobile.nytimes.com/2015/09/16/business/energy-environment/vote-near-to-repeal-ban-on-oil-exports-house-leader-says.html?referrer=: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/09/16/business/energy-environment/vote-near-to-repeal-ban-on-oil-exports-house-leader-says.html?referrer=


Processing URLs:  68%|██████▊   | 675/1000 [26:38<06:00,  1.11s/it]

Error extracting text from https://www.reuters.com/article/us-afghanistan-iran-disaster-idUSKBN2AD0B8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-iran-disaster-idUSKBN2AD0B8


Processing URLs:  68%|██████▊   | 677/1000 [26:39<04:20,  1.24it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/10/30/0200000000AEN20151030005500320.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  68%|██████▊   | 683/1000 [26:47<06:10,  1.17s/it]

Error extracting text from https://www.wsj.com/articles/democrats-minimum-wage-setback-could-kick-start-talks-with-republicans-11615057218: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/democrats-minimum-wage-setback-could-kick-start-talks-with-republicans-11615057218


Processing URLs:  69%|██████▊   | 686/1000 [26:52<08:16,  1.58s/it]

Error extracting text from http://timesofindia.indiatimes.com/business/international-business/ivory-coast-rain-sun-boost-prospects-for-cocoa-main-crop/articleshow/60262706.cms: 410 Client Error: Gone for url: https://timesofindia.indiatimes.com/business/international-business/ivory-coast-rain-sun-boost-prospects-for-cocoa-main-crop/articleshow/60262706.cms


Processing URLs:  69%|██████▊   | 687/1000 [26:53<06:22,  1.22s/it]

Error extracting text from http://reneweconomy.com.au/2016/electric-vehicle-numbers-hit-1-3m-as-costs-predicted-to-beat-petrol-cars-37517: 403 Client Error: Forbidden for url: http://reneweconomy.com.au/2016/electric-vehicle-numbers-hit-1-3m-as-costs-predicted-to-beat-petrol-cars-37517


Processing URLs:  69%|██████▉   | 690/1000 [26:56<07:07,  1.38s/it]

Error extracting text from https://www.enca.com/technology/top-cybercrime-ring-disrupted-authorities-raid-moscow-offices-sources: 404 Client Error: Not Found for url: https://www.enca.com/technology/top-cybercrime-ring-disrupted-authorities-raid-moscow-offices-sources
URL filtered: https://www.bloomberg.com/news/articles/2016-12-15/rupee-slides-with-indian-bonds-as-fed-hike-raises-outflow-risk


Processing URLs:  69%|██████▉   | 692/1000 [26:57<04:58,  1.03it/s]

Error extracting text from https://www.thecipherbrief.com/article/europe/western-balkans-russias-crosshairs-1091?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=9e2f0dbf76-EMAIL_CAMPAIGN_2017_05_24&amp;utm_medium=email&amp;utm_term=0_02cbee778d-9e2f0dbf76-122492589: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/europe/western-balkans-russias-crosshairs-1091?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=9e2f0dbf76-EMAIL_CAMPAIGN_2017_05_24&amp;utm_medium=email&amp;utm_term=0_02cbee778d-9e2f0dbf76-122492589


Processing URLs:  69%|██████▉   | 693/1000 [26:59<05:28,  1.07s/it]

URL filtered: https://www.youtube.com/watch?v=sytrrdOPYzA&feature=youtu.be&t=19m10s


Processing URLs:  70%|██████▉   | 698/1000 [27:04<05:29,  1.09s/it]

Error extracting text from http://www.tradingeconomics.com/united-states/bankruptcies: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/united-states/bankruptcies


Processing URLs:  70%|██████▉   | 699/1000 [27:05<05:31,  1.10s/it]

Error extracting text from http://supchina.com/2017/09/25/live-fire-drill-djibouti-chinas-latest-political-current-affairs-news/: 403 Client Error: Forbidden for url: http://thechinaproject.com/2017/09/25/live-fire-drill-djibouti-chinas-latest-political-current-affairs-news/


Processing URLs:  71%|███████   | 709/1000 [27:26<06:53,  1.42s/it]

Error extracting text from http://www.wsj.com/articles/hilsenrath-analysis-jobs-report-lowers-chances-of-fed-rate-hike-this-month-1443790670: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/hilsenrath-analysis-jobs-report-lowers-chances-of-fed-rate-hike-this-month-1443790670


Processing URLs:  71%|███████▏  | 714/1000 [27:34<06:48,  1.43s/it]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/309104: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/309104


Processing URLs:  72%|███████▏  | 715/1000 [27:35<05:59,  1.26s/it]

Error extracting text from http://uk.reuters.com/article/us-turkey-security-idUKKCN0ZX07S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  72%|███████▏  | 716/1000 [27:37<06:25,  1.36s/it]

Error extracting text from http://www.reuters.com/article/us-asean-summit-idUSKBN17V035: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-summit-idUSKBN17V035


Processing URLs:  72%|███████▏  | 719/1000 [27:39<04:46,  1.02s/it]

URL filtered: https://www.youtube.com/watch?v=Av7kpPJqa


Processing URLs:  72%|███████▏  | 721/1000 [27:40<03:18,  1.41it/s]

Error extracting text from http://thehill.com/policy/defense/266899-pentagon-may-recommend-moving-us-forces-closer-to-battlefield-for-mosul: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/266899-pentagon-may-recommend-moving-us-forces-closer-to-battlefield-for-mosul/


Processing URLs:  72%|███████▏  | 723/1000 [27:45<05:57,  1.29s/it]

Error extracting text from http://www.nytimes.com/reuters/2016/03/15/world/americas/15reuters-argentina-defense-china: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/03/15/world/americas/15reuters-argentina-defense-china


Processing URLs:  73%|███████▎  | 726/1000 [27:48<04:41,  1.03s/it]

Error extracting text from https://www.uber.com/?exp=home_signup_form: 406 Client Error: Not Acceptable for url: https://www.uber.com/?exp=home_signup_form


Processing URLs:  73%|███████▎  | 727/1000 [27:50<06:16,  1.38s/it]

URL filtered: https://twitter.com/lrozen/status/688497254865457152


Processing URLs:  73%|███████▎  | 730/1000 [28:01<11:24,  2.54s/it]

Error extracting text from http://www.saudigazette.com.sa/index.cfm?method=home.regcon&amp;contentid=20150908255860: HTTPConnectionPool(host='www.saudigazette.com.sa', port=80): Max retries exceeded with url: /index.cfm?method=home.regcon&amp;contentid=20150908255860 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30804b1a0>: Failed to resolve 'www.saudigazette.com.sa' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/video/2016/10/15/iraq-prepares-for-mosul-push?videoId=370149988&amp;mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/video/2016/10/15/iraq-prepares-for-mosul-push?videoId=370149988&amp;mod=related&amp;channelName=worldNews


Processing URLs:  73%|███████▎  | 731/1000 [28:02<09:29,  2.12s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/anti-military-protests-myanmar-anniversary-1988-uprising-2021-08-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/anti-military-protests-myanmar-anniversary-1988-uprising-2021-08-08/


Processing URLs:  73%|███████▎  | 734/1000 [28:05<06:30,  1.47s/it]

Error extracting text from http://www.reuters.com/article/us-usa-turkey-military-idUSKBN1A42RR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-turkey-military-idUSKBN1A42RR


Processing URLs:  74%|███████▎  | 736/1000 [28:07<05:11,  1.18s/it]

Error extracting text from http://www.medialifemagazine.com/magazines-embrace-digital-print-struggles/: HTTPConnectionPool(host='www.medialifemagazine.com', port=80): Max retries exceeded with url: /magazines-embrace-digital-print-struggles/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff1cd040>: Failed to resolve 'www.medialifemagazine.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  74%|███████▍  | 742/1000 [28:12<03:32,  1.21it/s]

Error extracting text from http://www.todayonline.com/mideast/iran-hands-over-stockpile-enriched-uranium-russia: 403 Client Error: Forbidden for url: https://www.todayonline.com/mideast/iran-hands-over-stockpile-enriched-uranium-russia


Processing URLs:  74%|███████▍  | 743/1000 [28:13<04:07,  1.04it/s]

Error extracting text from http://oruoracle.com/editorial/the-truth-about-fake-news/: 404 Client Error: Not Found for url: https://oruoracle.com/editorial/the-truth-about-fake-news/


Processing URLs:  75%|███████▍  | 749/1000 [28:28<08:07,  1.94s/it]

Error extracting text from https://www.dailystar.com.lb/News/Middle-East/2016/Jul-22/363341-anti-daesh-coalition-meets-with-mosul-in-its-sights.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Jul-22/363341-anti-daesh-coalition-meets-with-mosul-in-its-sights.ashx


Processing URLs:  75%|███████▌  | 754/1000 [28:33<04:33,  1.11s/it]

Error extracting text from https://www.nytimes.com/2021/02/13/us/politics/robert-malley-iran-middle-east.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/13/us/politics/robert-malley-iran-middle-east.html


Processing URLs:  76%|███████▌  | 755/1000 [28:34<04:13,  1.04s/it]

Error extracting text from http://www.uscg.mil/history/articles/USCG_Mariel_History_1980.asp: 403 Client Error: Forbidden for url: http://www.uscg.mil/history/articles/USCG_Mariel_History_1980.asp


Processing URLs:  76%|███████▌  | 757/1000 [28:48<14:35,  3.60s/it]

Error extracting text from https://presidencia.gob.do/noticias/poder-ejecutivo-introduce-cambios-en-estamentos-militares: 404 Client Error: Not Found for url: https://presidencia.gob.do/noticias/poder-ejecutivo-introduce-cambios-en-estamentos-militares


Processing URLs:  76%|███████▋  | 763/1000 [29:08<11:29,  2.91s/it]

Error extracting text from https://publishingperspectives.com/2021/04/us-first-quarter-2021-print-book-sales-grew-29-percent-covid19/: 403 Client Error: Forbidden for url: https://publishingperspectives.com/2021/04/us-first-quarter-2021-print-book-sales-grew-29-percent-covid19/


Processing URLs:  76%|███████▋  | 765/1000 [30:10<1:14:35, 19.04s/it]

Error extracting text from http://aa.com.tr/en/politics/myanmar-hundreds-protest-attack-on-anti-drug-group/527626: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  77%|███████▋  | 767/1000 [30:16<42:27, 10.94s/it]  

URL filtered: http://www.itpro.co.uk/networking/27554/facebook-reveals-plan-to-tackle-fake-news


Processing URLs:  78%|███████▊  | 781/1000 [30:31<02:00,  1.81it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics/senior-german-conservatives-urge-shift-to-right-as-merkel-picks-cabinet-idUSKCN1G80MW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/senior-german-conservatives-urge-shift-to-right-as-merkel-picks-cabinet-idUSKCN1G80MW
URL filtered: https://www.youtube.com/watch?v=h4C5SgeVK-Q
URL filtered: https://www.bloomberg.com/view/articles/2017-03-20/u-k-investors-have-too-much-faith-in-their-government
Error extracting text from https://panampost.com/sabrina-martin/2016/03/15/keiko-fujimori-leads-perus-presidential-race-after-ban-on-candidates/: 403 Client Error: Forbidden for url: https://panampost.com/sabrina-martin/2016/03/15/keiko-fujimori-leads-perus-presidential-race-after-ban-on-candidates/


Processing URLs:  78%|███████▊  | 782/1000 [30:32<01:46,  2.04it/s]

Error extracting text from https://www.nytimes.com/2017/07/13/business/net-neutrality-broadband-companies-fcc.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/13/business/net-neutrality-broadband-companies-fcc.html


Processing URLs:  78%|███████▊  | 784/1000 [30:36<05:02,  1.40s/it]

Error extracting text from http://www.undp.org/content/undp/en/home/presscenter/speeches/2017/09/05/achim-steiner-undp-administrator-statement-to-the-2nd-regular-session-of-the-undp-executive-board.html: 404 Client Error: Not Found for url: https://www.undp.org/content/undp/en/home/presscenter/speeches/2017/09/05/achim-steiner-undp-administrator-statement-to-the-2nd-regular-session-of-the-undp-executive-board


Processing URLs:  78%|███████▊  | 785/1000 [30:37<04:30,  1.26s/it]

URL filtered: https://www.youtube.com/watch?v=Gt3ouQKFN9Q
URL filtered: https://www.youtube.com/watch?v=ahQthnTmlc0


Processing URLs:  79%|███████▉  | 789/1000 [30:38<02:13,  1.58it/s]

Error extracting text from http://nationalinterest.org/feature/chinas-east-china-sea-adiz-gamble-past-present-south-china-13150: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/chinas-east-china-sea-adiz-gamble-past-present-south-china-13150
Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0XP05H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0XP05H


Processing URLs:  79%|███████▉  | 790/1000 [30:39<02:10,  1.61it/s]

Error extracting text from http://evobsession.com/ev-battery-manufacturer-sales-market-share-march-2015/: 403 Client Error: Forbidden for url: http://evobsession.com/ev-battery-manufacturer-sales-market-share-march-2015/


Processing URLs:  79%|███████▉  | 791/1000 [30:39<02:05,  1.66it/s]

Error extracting text from http://thehill.com/regulation/court-battles/290568-obama-allies-gop-will-cave-in-supreme-court-fight: 403 Client Error: Forbidden for url: https://thehill.com/regulation/court-battles/290568-obama-allies-gop-will-cave-in-supreme-court-fight/


Processing URLs:  79%|███████▉  | 794/1000 [30:43<02:56,  1.17it/s]

Error extracting text from http://thehill.com/blogs/pundits-blog/international-affairs/337087-opinion-its-not-comey-versus-trump-its-putin-versus: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/international-affairs/337087-opinion-its-not-comey-versus-trump-its-putin-versus/


Processing URLs:  80%|███████▉  | 795/1000 [30:43<02:25,  1.41it/s]

Error extracting text from https://friscofastball.com/2016/12/01/live-stock-coverage-quad-graphics-inc-on-focus-after-crashing-in-todays-session/: 404 Client Error: Not Found for url: https://friscofastball.com/2016/12/01/live-stock-coverage-quad-graphics-inc-on-focus-after-crashing-in-todays-session/


Processing URLs:  80%|███████▉  | 796/1000 [30:44<02:59,  1.14it/s]

Error extracting text from https://warisboring.com/the-united-states-is-getting-more-and-more-irritated-at-russias-nuke-treaty-violation-4feab0fa631e#.dndmm6pkt: 403 Client Error: Forbidden for url: https://warisboring.com/the-united-states-is-getting-more-and-more-irritated-at-russias-nuke-treaty-violation-4feab0fa631e#.dndmm6pkt


Processing URLs:  80%|███████▉  | 798/1000 [30:46<02:25,  1.39it/s]

Error extracting text from https://thehill.com/homenews/administration/597751-north-korea-tested-icbm-in-february-march-marking-serious-escalation: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/597751-north-korea-tested-icbm-in-february-march-marking-serious-escalation/


Processing URLs:  80%|████████  | 802/1000 [30:53<05:17,  1.61s/it]

Error extracting text from https://webbtelescope.org/contents/media/images/4180-Image: 403 Client Error: Forbidden for url: https://webbtelescope.org/contents/media/images/4180-Image


Processing URLs:  80%|████████  | 804/1000 [30:55<03:41,  1.13s/it]

Error extracting text from https://www.nytimes.com/2017/11/09/world/asia/afghanistan-war-troops.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/09/world/asia/afghanistan-war-troops.html


Processing URLs:  81%|████████  | 806/1000 [30:56<02:18,  1.40it/s]

Error extracting text from https://www.nytimes.com/2017/06/27/us/politics/trump-campaign-chiefs-firm-got-17-million-from-pro-russia-party.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/27/us/politics/trump-campaign-chiefs-firm-got-17-million-from-pro-russia-party.html


Processing URLs:  81%|████████  | 809/1000 [30:58<02:17,  1.39it/s]

Error extracting text from http://www.reuters.com/article/us-trade-europe-usa-idUSKCN0WC0GU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-europe-usa-idUSKCN0WC0GU


Processing URLs:  81%|████████▏ | 814/1000 [31:06<04:01,  1.30s/it]

Error extracting text from http://www.nytimes.com/2015/11/16/world/middleeast/tensions-in-iran-after-nuclear-deal-grow-in-hostility.html?partner=msft_msn: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/16/world/middleeast/tensions-in-iran-after-nuclear-deal-grow-in-hostility.html?partner=msft_msn


Processing URLs:  82%|████████▏ | 818/1000 [31:21<09:33,  3.15s/it]

Error extracting text from https://www.pnas.org/content/115/25/6506: 403 Client Error: Forbidden for url: https://www.pnas.org/content/115/25/6506


Processing URLs:  82%|████████▏ | 824/1000 [31:26<02:37,  1.12it/s]

Error extracting text from http://www.parliament.uk/documents/post/postpn389_cyber-security-in-the-uk.pdf: 403 Client Error: Forbidden for url: http://www.parliament.uk/documents/post/postpn389_cyber-security-in-the-uk.pdf
Error extracting text from https://www.wsj.com/articles/germanys-spd-to-decide-sunday-whether-to-form-merkel-alliance-1516293558: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/germanys-spd-to-decide-sunday-whether-to-form-merkel-alliance-1516293558


Processing URLs:  83%|████████▎ | 826/1000 [31:29<03:10,  1.10s/it]

Error extracting text from https://seekingalpha.com/article/4089910-ishares-j-p-morgan-usd-emerging-markets-bond-etf-questionable-holdings: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4089910-ishares-j-p-morgan-usd-emerging-markets-bond-etf-questionable-holdings


Processing URLs:  83%|████████▎ | 827/1000 [31:31<03:38,  1.26s/it]

Error extracting text from http://www.strategic-culture.org/news/2016/12/02/official-washington-info-wars.html: HTTPConnectionPool(host='www.strategic-culture.org', port=80): Max retries exceeded with url: /news/2016/12/02/official-washington-info-wars.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe74e990>: Failed to resolve 'www.strategic-culture.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  83%|████████▎ | 829/1000 [31:34<03:48,  1.34s/it]

Error extracting text from http://www.defensenews.com/story/defense/policy-budget/cyber/2015/11/23/mccain-obama-sanction-chinese-hackers/76267982/: 404 Client Error: Not Found for url: https://www.defensenews.com/home/2015/11/23/mccain-to-obama-sanction-chinese-hackers/


Processing URLs:  84%|████████▍ | 839/1000 [31:49<03:07,  1.17s/it]

Error extracting text from https://www.nytimes.com/2018/02/01/opinion/chief-justice-roberts-middle.html?action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=opinion-c-col-left-regi: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/01/opinion/chief-justice-roberts-middle.html?action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=opinion-c-col-left-regi


Processing URLs:  84%|████████▍ | 842/1000 [31:56<05:03,  1.92s/it]

Error extracting text from http://ca.gov/drought/topstory/top-story-53.html: 404 Client Error: Not Found for url: https://www.ca.gov/drought/topstory/top-story-53.html


Processing URLs:  84%|████████▍ | 844/1000 [31:59<04:38,  1.78s/it]

Error extracting text from http://www.peruthisweek.com/news-organization-states-monitor-elections-109032: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-organization-states-monitor-elections-109032 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x305b21010>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  85%|████████▍ | 846/1000 [32:02<04:03,  1.58s/it]

Error extracting text from http://analystmarket.com/archives/2249: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=analystmarket.com


Processing URLs:  85%|████████▌ | 852/1000 [32:19<04:36,  1.87s/it]

Error extracting text from http://leadercall.com/2016/01/seoul-resumes-propaganda-broadcasting-on-border-with-north/: 404 Client Error: Not Found for url: http://leadercall.com/2016/01/seoul-resumes-propaganda-broadcasting-on-border-with-north/


Processing URLs:  85%|████████▌ | 853/1000 [32:21<04:52,  1.99s/it]

Error extracting text from https://www.cfr.org/global-conflicttracker/conflict/territorial-disputes-south-china-sea: 404 Client Error: Not Found for url: https://www.cfr.org/global-conflict%02tracker/conflict/territorial-disputes-south-china-sea


Processing URLs:  86%|████████▌ | 856/1000 [32:26<03:47,  1.58s/it]

Error extracting text from http://www.reuters.com/article/2015/11/10/us-iran-nuclear-deal-idUSKCN0SZ1Z720151110#trJFWrlfSJdfP2pA.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/us-iran-nuclear-deal-idUSKCN0SZ1Z720151110#trJFWrlfSJdfP2pA.97


Processing URLs:  86%|████████▌ | 859/1000 [32:27<01:36,  1.46it/s]

Error extracting text from https://www.scientificamerican.com/article/rumsfelds-wisdom/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/rumsfelds-wisdom/
Error extracting text from http://blogs.wsj.com/washwire/2016/02/21/how-irans-political-dynamics-conflict-with-hype-over-elections-influence/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2016/02/21/how-irans-political-dynamics-conflict-with-hype-over-elections-influence/
Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0X708N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0X708N


Processing URLs:  86%|████████▌ | 860/1000 [32:30<02:59,  1.28s/it]

Error extracting text from http://insa-meinungstrend.de/de/sonntagsfrage.php: 404 Client Error: Not Found for url: https://www.insa-consulere.de/meinungstrend/de/sonntagsfrage.php


Processing URLs:  86%|████████▌ | 861/1000 [32:31<02:56,  1.27s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-01-28/merkel-s-support-in-home-district-softens-ahead-of-election-bid


Processing URLs:  87%|████████▋ | 868/1000 [32:39<01:51,  1.18it/s]

Error extracting text from http://www.reuters.com/article/2015/09/14/us-northkorea-missile-idUSKCN0RE1NJ20150914: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/14/us-northkorea-missile-idUSKCN0RE1NJ20150914


Processing URLs:  87%|████████▋ | 869/1000 [32:42<03:21,  1.54s/it]

Error extracting text from http://www.start.umd.edu/search/content/al-Shabaab%20Africa: 404 Client Error: Not Found for url: https://www.start.umd.edu/search/content/al-Shabaab%20Africa


Processing URLs:  87%|████████▋ | 873/1000 [32:46<02:35,  1.23s/it]

Error extracting text from https://www.dsc.gov.ae/en-us/Themes/Pages/Tourism.aspx?Theme=30&amp;year=2018#DSC_Tab1: HTTPSConnectionPool(host='www.dsc.gov.ae', port=443): Max retries exceeded with url: /en-us/Themes/Pages/Tourism.aspx?Theme=30&amp;year=2018 (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  88%|████████▊ | 877/1000 [33:02<07:08,  3.49s/it]

Error extracting text from https://shar.es/1Yby8V: 404 Client Error: Not Found for url: https://shar.es/1Yby8V/


Processing URLs:  88%|████████▊ | 880/1000 [33:07<03:55,  1.96s/it]

Error extracting text from http://www.sandiegouniontribune.com/news/2016/jul/15/scott-swift-pacific-fleet-south-china-sea-amphibs/: 403 Client Error: Forbidden for url: https://www.sandiegouniontribune.com/news/2016/jul/15/scott-swift-pacific-fleet-south-china-sea-amphibs/
Error extracting text from http://www.reuters.com/article/us-oil-meeting-kemp-idUSKCN0XF2AR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-oil-meeting-kemp-idUSKCN0XF2AR


Processing URLs:  88%|████████▊ | 883/1000 [33:09<01:59,  1.02s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-bondholders-position-for-regime-change-default-1493142548: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-bondholders-position-for-regime-change-default-1493142548


Processing URLs:  89%|████████▉ | 889/1000 [33:18<02:12,  1.19s/it]

Error extracting text from http://www.politicususa.com/2016/07/06/gop-close-dumping-trump-coup-plotters-votes-nomination-chaos.html: 403 Client Error: Forbidden for url: http://www.politicususa.com/2016/07/06/gop-close-dumping-trump-coup-plotters-votes-nomination-chaos.html


Processing URLs:  89%|████████▉ | 891/1000 [33:23<03:18,  1.82s/it]

Error extracting text from http://cphpost.dk/news/danish-foreign-minister-to-visit-iran.html: 404 Client Error: Not Found for url: http://cphpost.dk/news/danish-foreign-minister-to-visit-iran.html


Processing URLs:  89%|████████▉ | 892/1000 [33:26<03:55,  2.18s/it]

Error extracting text from https://in.reuters.com/article/us-usa-trade-china/biden-says-will-not-kill-phase-1-trade-deal-with-china-immediately-nyt-idUSKBN28C0HV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  89%|████████▉ | 893/1000 [33:27<03:21,  1.89s/it]

Error extracting text from http://www.ibtimes.co.uk/mosul-front-isis-infiltrators-sneak-behind-iraqi-lines-civilians-flee-jihadi-stronghold-1588803: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/mosul-front-isis-infiltrators-sneak-behind-iraqi-lines-civilians-flee-jihadi-stronghold-1588803


Processing URLs:  90%|████████▉ | 895/1000 [34:32<35:30, 20.29s/it]

Error extracting text from https://www.blm.gov/style/medialib/blm/nm/programs/planning/planning_docs.Par.53208.File.dat/A_Citizens_Guide_to_NEPA.pdf: 404 Client Error: Not Found for url: https://www.blm.gov/style/medialib/blm/nm/programs/planning/planning_docs.Par.53208.File.dat/A_Citizens_Guide_to_NEPA.pdf


Processing URLs:  90%|████████▉ | 896/1000 [34:32<24:43, 14.27s/it]

Error extracting text from https://www.nytimes.com/2021/01/08/world/biden-vaccine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/01/08/world/biden-vaccine.html


Processing URLs:  90%|████████▉ | 899/1000 [34:35<09:02,  5.37s/it]

Error extracting text from http://www.nytimes.com/2016/02/03/world/europe/david-cameron-european-union-britain-brexit.html?emc=edit_ee_20160203&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/03/world/europe/david-cameron-european-union-britain-brexit.html?emc=edit_ee_20160203&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  90%|█████████ | 904/1000 [34:42<03:32,  2.21s/it]

Error extracting text from http://worldmaritimenews.com/archives/184842/expanded-panama-canals-systems-put-through-their-paces/: HTTPConnectionPool(host='worldmaritimenews.com', port=80): Max retries exceeded with url: /archives/184842/expanded-panama-canals-systems-put-through-their-paces/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feaf3aa0>: Failed to resolve 'worldmaritimenews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  91%|█████████ | 906/1000 [34:43<02:21,  1.51s/it]

URL filtered: https://www.youtube.com/watch?v=mjnSw-ZYCig


Processing URLs:  91%|█████████ | 908/1000 [34:45<01:54,  1.25s/it]

Error extracting text from http://www.parliament.scot/gd/visitandlearn/Education/16285.aspx: 403 Client Error: Forbidden for url: https://www.parliament.scot/gd/visitandlearn/Education/16285.aspx


Processing URLs:  91%|█████████ | 911/1000 [34:47<01:19,  1.12it/s]

URL filtered: https://twitter.com/ulalaunch/status/1400800541198934017
Error extracting text from http://www.reuters.com/article/us-peru-election-idUSKCN0YS0KW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-idUSKCN0YS0KW


Processing URLs:  91%|█████████▏| 914/1000 [34:54<02:07,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-asean-summit-secgen-idUSKBN17U1S3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-summit-secgen-idUSKBN17U1S3


Processing URLs:  92%|█████████▏| 917/1000 [34:55<01:11,  1.16it/s]

Error extracting text from http://greece.greekreporter.com/2016/03/09/wsj-risk-of-grexit-might-return-in-july/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2016/03/09/wsj-risk-of-grexit-might-return-in-july/


Processing URLs:  92%|█████████▏| 918/1000 [34:56<01:13,  1.12it/s]

Error extracting text from https://bit.ly/3egKSMm: HTTPSConnectionPool(host='datakingz.com', port=443): Max retries exceeded with url: /2021/03/01/snp-set-to-win-slim-majority-in-elections-leaving-indyref-2-on-a-knife-edge/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'datakingz.com'. (_ssl.c:1000)")))


Processing URLs:  92%|█████████▏| 923/1000 [35:04<01:45,  1.37s/it]

Error extracting text from http://www.iranhumanrights.org/2016/01/hardliners-shut-pro-rouhani-elections/: 403 Client Error: Forbidden for url: http://www.iranhumanrights.org/2016/01/hardliners-shut-pro-rouhani-elections/


Processing URLs:  92%|█████████▎| 925/1000 [35:08<02:05,  1.67s/it]

Error extracting text from http://www.iar-gwu.org/content/clash-civilizations-ukraine-not-quite: 404 Client Error: Not Found for url: https://www.iar-gwu.org/content/clash-civilizations-ukraine-not-quite


Processing URLs:  93%|█████████▎| 927/1000 [35:12<02:10,  1.79s/it]

Error extracting text from http://magicvalley.com/news/world/asia/s-korea-says-time-to-consider-nuclear-talks-without-north/article_7648c888-a8cd-5fdd-968f-9f3a9736d48f.html: 404 Client Error: Not Found for url: https://magicvalley.com/news/world/asia/s-korea-says-time-to-consider-nuclear-talks-without-north/article_7648c888-a8cd-5fdd-968f-9f3a9736d48f.html


Processing URLs:  93%|█████████▎| 931/1000 [36:14<21:33, 18.75s/it]

Error extracting text from https://archive.is/eZDqv: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  93%|█████████▎| 932/1000 [36:15<15:24, 13.59s/it]

Error extracting text from http://www.thedickinsonpress.com/energy/oil/3850759-senators-seek-more-democrats-repeal-us-oil-export-ban: 404 Client Error: Not Found for url: https://www.thedickinsonpress.com/energy/oil/3850759-senators-seek-more-democrats-repeal-us-oil-export-ban


Processing URLs:  93%|█████████▎| 933/1000 [36:18<11:36, 10.40s/it]

Error extracting text from https://robinkelly.house.gov/what-a-government-shutdown-means-for-you: 404 Client Error: Not Found for url: https://robinkelly.house.gov/what-a-government-shutdown-means-for-you


Processing URLs:  94%|█████████▍| 939/1000 [36:27<02:13,  2.19s/it]

Error extracting text from http://business.financialpost.com/news/agriculture/long-path-to-legalization-marijuana-companies-not-convinced-of-legal-recreational-market-by-2018: 403 Client Error: Forbidden for url: https://financialpost.com/news/agriculture/long-path-to-legalization-marijuana-companies-not-convinced-of-legal-recreational-market-by-2018


Processing URLs:  94%|█████████▍| 940/1000 [36:30<02:21,  2.36s/it]

Error extracting text from http://tass.ru/en/world/842049: 404 Client Error: Not Found for url: https://tass.ru/en/world/842049


Processing URLs:  94%|█████████▍| 942/1000 [36:34<02:00,  2.08s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-mattis-idUSKBN1542VA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-mattis-idUSKBN1542VA


Processing URLs:  94%|█████████▍| 944/1000 [36:37<01:41,  1.80s/it]

Error extracting text from http://www.worldbulletin.net/news/169740/slovenia-deploys-army-to-border-to-block-refugees: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/news/169740/slovenia-deploys-army-to-border-to-block-refugees


Processing URLs:  95%|█████████▍| 946/1000 [36:43<02:20,  2.60s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-labour-china-insight-idUSKBN1AT00Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-labour-china-insight-idUSKBN1AT00Q


Processing URLs:  95%|█████████▌| 953/1000 [37:03<02:36,  3.34s/it]

URL filtered: https://twitter.com/ConceptualJames


Processing URLs:  96%|█████████▌| 957/1000 [37:23<04:07,  5.76s/it]

Error extracting text from https://www.investopedia.com/terms/b/blackswan.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/b/blackswan.asp


Processing URLs:  96%|█████████▌| 960/1000 [37:27<01:46,  2.66s/it]

Error extracting text from https://www.france24.com/en/europe/20210914-putin-self-isolating-after-covid-spreads-among-inner-circle: 403 Client Error: Forbidden for url: https://www.france24.com/en/europe/20210914-putin-self-isolating-after-covid-spreads-among-inner-circle
Error extracting text from http://www.nytimes.com/2016/06/29/technology/airbnb-sues-san-francisco-over-a-law-it-had-helped-pass.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/29/technology/airbnb-sues-san-francisco-over-a-law-it-had-helped-pass.html?_r=0


Processing URLs:  96%|█████████▌| 961/1000 [37:28<01:27,  2.23s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/14/gitrep-13may2016pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/14/gitrep-13may2016pm/


Processing URLs:  96%|█████████▌| 962/1000 [37:30<01:19,  2.10s/it]

Error extracting text from http://www.reuters.com/article/2015/10/23/china-logistics-idUSL3N12N24K20151023: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/23/china-logistics-idUSL3N12N24K20151023


Processing URLs:  97%|█████████▋| 966/1000 [37:32<00:38,  1.12s/it]

Error extracting text from http://beyondparallel.csis.org/rok-elections-and-dprk-provocations/: 403 Client Error: Forbidden for url: http://beyondparallel.csis.org/rok-elections-and-dprk-provocations/


Processing URLs:  97%|█████████▋| 973/1000 [37:42<00:30,  1.12s/it]

Error extracting text from http://www.nationalreview.com/article/432516/iran-nuclear-deal-iaea-reports-violations-disclosure: 404 Client Error: Not Found for url: https://www.nationalreview.com/article/432516/iran-nuclear-deal-iaea-reports-violations-disclosure/
Error extracting text from http://www.latimes.com/opinion/op-ed/la-oe-taubes-teicholz-us-news-best-diet-problems-20180128-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/opinion/op-ed/la-oe-taubes-teicholz-us-news-best-diet-problems-20180128-story.html
Error extracting text from https://www.reuters.com/article/alphabet-uber-trial/in-waymo-trial-what-fired-uber-exec-may-not-say-could-be-key-idUSKBN1FK1M6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/alphabet-uber-trial/in-waymo-trial-what-fired-uber-exec-may-not-say-could-be-key-idUSKBN1FK1M6


Processing URLs:  98%|█████████▊| 977/1000 [37:44<00:16,  1.37it/s]

Error extracting text from https://committees.parliament.uk/publications/5722/documents/56559/default/: 403 Client Error: Forbidden for url: https://committees.parliament.uk/publications/5722/documents/56559/default/
URL filtered: https://www.stratfor.com/analysis/tracing-islamic-states-path-destruction?utm_source=LinkedIn&amp;utm_medium=social&amp;utm_campaign=article


Processing URLs:  98%|█████████▊| 983/1000 [37:49<00:12,  1.34it/s]

Error extracting text from http://macaudailytimes.com.mo/myanmar-insurgents-fight-on-despite-advent-of-democracy.html: 500 Server Error: Internal Server Error for url: http://macaudailytimes.com.mo/myanmar-insurgents-fight-on-despite-advent-of-democracy.html


Processing URLs:  99%|█████████▊| 986/1000 [37:53<00:14,  1.04s/it]

Error extracting text from http://www.scoop.it/t/viral-bioinformatics: 404 Client Error: Not Found for url: https://www.scoop.it/topic/viral-bioinformatics
Error extracting text from https://www.pakistantoday.com.pk/2017/11/11/un-contradicts-us-says-at-least-10-civilians-died-in-afghanistan-airstrike/: 403 Client Error: Forbidden for url: https://www.pakistantoday.com.pk/2017/11/11/un-contradicts-us-says-at-least-10-civilians-died-in-afghanistan-airstrike/


Processing URLs:  99%|█████████▉| 994/1000 [38:08<00:12,  2.06s/it]

Error extracting text from http://www.asahi.com/ajw/articles/AJ201606030059.html: 404 Client Error: Not Found for url: https://www.asahi.com/ajw/articles/AJ201606030059.html


Processing URLs: 100%|██████████| 1000/1000 [38:15<00:00,  2.30s/it]


Error extracting text from http://www.wsj.com/articles/camerons-bid-to-redefine-u-k-ties-to-eu-faces-big-test-1453957262: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/camerons-bid-to-redefine-u-k-ties-to-eu-faces-big-test-1453957262


Processing URLs:   0%|          | 1/1000 [00:01<21:24,  1.29s/it]

Error extracting text from http://www.reuters.com/article/us-usa-obamacare-ryan-idUSKBN16E2VG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-ryan-idUSKBN16E2VG


Processing URLs:   0%|          | 2/1000 [00:03<26:30,  1.59s/it]

Error extracting text from http://www.understandingwar.org/sites/default/files/ISIS%20Sanctuary%2031%20MAR%202016_1.pdf: 404 Client Error: Not Found for url: https://www.understandingwar.org/sites/default/files/ISIS%20Sanctuary%2031%20MAR%202016_1.pdf


Processing URLs:   1%|          | 7/1000 [00:12<22:10,  1.34s/it]

Error extracting text from http://mashable.com/2014/12/18/nsa-track-sony-hackers/?utm_cid=mash-com-Tw-main-link#GJgMGJuxESqT: 404 Client Error: Not Found for url: https://mashable.com/2014/12/18/nsa-track-sony-hackers/?utm_cid=mash-com-Tw-main-link#GJgMGJuxESqT


Processing URLs:   1%|          | 8/1000 [00:12<18:18,  1.11s/it]

Error extracting text from http://thehill.com/homenews/senate/361225-graham-on-moore-we-are-about-to-give-away-a-seat-that-can-determine-the: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/361225-graham-on-moore-we-are-about-to-give-away-a-seat-that-can-determine-the/


Processing URLs:   1%|          | 10/1000 [00:13<13:30,  1.22it/s]

Error extracting text from http://www.realcleardefense.com/articles/2015/12/09/chinas_undeclared_cyber_war_on_the_us_108774.html: 403 Client Error: HTTP Forbidden for url: https://www.realcleardefense.com/articles/2015/12/09/chinas_undeclared_cyber_war_on_the_us_108774.html


Processing URLs:   1%|▏         | 14/1000 [00:23<25:52,  1.57s/it]

Error extracting text from http://www.securitynewsdesk.com/asis-europe-2016-warns-uk-isis-attacks-prior-eu-referendum/: HTTPSConnectionPool(host='www.securitynewsdesk.com', port=443): Max retries exceeded with url: /asis-europe-2016-warns-uk-isis-attacks-prior-eu-referendum/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1000)')))


Processing URLs:   2%|▏         | 16/1000 [01:24<5:12:07, 19.03s/it]

Error extracting text from http://dailytimes.com.pk/pakistan/15-Jun-16/3500-terrorists-killed-500-soldiers-martyred-in-operations-dg-ispr: HTTPConnectionPool(host='dailytimes.com.pk', port=80): Read timed out. (read timeout=60)


Processing URLs:   2%|▏         | 17/1000 [01:25<3:42:35, 13.59s/it]

URL filtered: https://twitter.com/realDonaldTrump/status/232572505238433794?ref_src=twsrc%5Etfw


Processing URLs:   2%|▏         | 19/1000 [01:25<2:00:28,  7.37s/it]

Error extracting text from http://www.reuters.com/article/us-france-shooting-syria-idUSKCN0T30Q520151114#8cKdE8OtClo8OiDS.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-shooting-syria-idUSKCN0T30Q520151114#8cKdE8OtClo8OiDS.97


Processing URLs:   2%|▏         | 20/1000 [01:27<1:39:22,  6.08s/it]

Error extracting text from http://www.reuters.com/article/us-poland-protest-idUSKCN0XY0EY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-poland-protest-idUSKCN0XY0EY


Processing URLs:   2%|▏         | 22/1000 [01:32<1:13:57,  4.54s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/ppk-43-y-keiko-fujimori-41-eventual-segunda-vuelta-noticia-1887854: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/ppk-43-y-keiko-fujimori-41-eventual-segunda-vuelta-noticia-1887854/


Processing URLs:   4%|▎         | 35/1000 [01:58<33:13,  2.07s/it]  

Error extracting text from http://www.reuters.com/article/2015/09/08/markets-bonds-usa-idUSL1N11E1W920150908: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/08/markets-bonds-usa-idUSL1N11E1W920150908


Processing URLs:   4%|▎         | 36/1000 [01:59<26:10,  1.63s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/03/08/world/iran-conducts-new-missile-tests-in-defiance-of-u-s-sanctions/#.Vt7k_JMrJE4: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/03/08/world/iran-conducts-new-missile-tests-in-defiance-of-u-s-sanctions/#.Vt7k_JMrJE4


Processing URLs:   4%|▎         | 37/1000 [02:01<26:48,  1.67s/it]

Error extracting text from http://www.ibtimes.com/north-korea-preparing-thermonuclear-weapon-tests-report-2247006: 403 Client Error: Forbidden for url: https://www.ibtimes.com/north-korea-preparing-thermonuclear-weapon-tests-report-2247006


Processing URLs:   4%|▍         | 39/1000 [02:02<19:43,  1.23s/it]

Error extracting text from https://www.osw.waw.pl/en),: HTTPSConnectionPool(host='www.osw.waw.pl', port=443): Max retries exceeded with url: /en), (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://www.youtube.com/watch?v=Nf_mxYKDT7I
Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:843e834460094affbfa4e94959390da3: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:843e834460094affbfa4e94959390da3 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3022d6e40>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   4%|▍         | 43/1000 [02:05<14:50,  1.07it/s]

Error extracting text from https://www.poundsterlinglive.com/eur/3858-hsbc-and-ubs-forecast-parity-in-euro-to-pound: 500 Server Error: Internal Server Error for url: https://www.poundsterlinglive.com/eur/3858-hsbc-and-ubs-forecast-parity-in-euro-to-pound


Processing URLs:   5%|▍         | 47/1000 [03:08<4:30:44, 17.05s/it]

Error extracting text from https://sports.ladbrokes.com/en-gb/betting/politics/british/next-general-election/next-general-election/220401605/: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /en-gb/betting/politics/british/next-general-election/next-general-election/220401605/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3022d7530>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:   5%|▍         | 48/1000 [03:09<3:17:08, 12.43s/it]

Error extracting text from https://theconversation.com/labour-party-conference-the-dispute-around-rule-changes-explained-in-brief-168740: 403 Client Error: Forbidden for url: https://theconversation.com/labour-party-conference-the-dispute-around-rule-changes-explained-in-brief-168740


Processing URLs:   5%|▌         | 50/1000 [03:10<1:43:46,  6.55s/it]

Error extracting text from http://www.universalweather.com/blog/2013/09/ads-b-requirements-coming-into-effect/: 404 Client Error: Not Found for url: https://www.universalweather.com/blog/2013/09/ads-b-requirements-coming-into-effect/
Error extracting text from http://www.reuters.com/article/us-venezuela-politics-eu/eu-to-impose-arms-embargo-on-venezuela-lays-basis-for-sanctions-diplomats-idUSKBN1D82CD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-eu/eu-to-impose-arms-embargo-on-venezuela-lays-basis-for-sanctions-diplomats-idUSKBN1D82CD


Processing URLs:   5%|▌         | 54/1000 [03:13<34:16,  2.17s/it]  

Error extracting text from https://medium.com/war-is-boring/we-went-inside-a-german-training-camp-for-kurdish-troops-71bd8cc92164: 403 Client Error: Forbidden for url: https://medium.com/war-is-boring/we-went-inside-a-german-training-camp-for-kurdish-troops-71bd8cc92164
URL filtered: http://www.bloomberg.com/news/articles/2015-10-27/venezuela-avoiding-default-again-shows-doomsayers-may-be-wrong


Processing URLs:   6%|▌         | 57/1000 [03:15<20:01,  1.27s/it]

Error extracting text from http://www.businessinsider.com/r-kazakhstans-nazarbayev-says-ready-to-host-syria-peace-talks-in-astana-2016-12: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-kazakhstans-nazarbayev-says-ready-to-host-syria-peace-talks-in-astana-2016-12
Error extracting text from http://www.reuters.com/article/us-usa-cyber-election/senators-to-introduce-bill-to-boost-cyber-defenses-of-voting-systems-idUSKBN1D022S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-election/senators-to-introduce-bill-to-boost-cyber-defenses-of-voting-systems-idUSKBN1D022S


Processing URLs:   6%|▌         | 61/1000 [03:19<14:41,  1.06it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-un-idUSKBN1AL0NU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-un-idUSKBN1AL0NU
Error extracting text from http://www.nytimes.com/2015/10/29/world/middleeast/syria-talks-vienna-iran.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/29/world/middleeast/syria-talks-vienna-iran.html?_r=0


Processing URLs:   6%|▋         | 64/1000 [03:23<16:09,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/impeachment-proceedings-opened-against-brazilian-president-1449092751: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/impeachment-proceedings-opened-against-brazilian-president-1449092751


Processing URLs:   7%|▋         | 66/1000 [04:27<4:50:52, 18.69s/it]

Error extracting text from https://www.seattletimes.com/entertainment/how-the-turner-diaries-inspires-white-supremacists/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   7%|▋         | 68/1000 [04:29<2:33:18,  9.87s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKBN16Z005?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-spratlys-idUSKBN16Z005?il=0


Processing URLs:   7%|▋         | 69/1000 [04:30<1:49:02,  7.03s/it]

Error extracting text from https://www.nytimes.com/live/2022/03/30/world/ukraine-russia-war-news/russian-soldiers-have-shot-down-their-own-planes-the-uk-spy-chief-says: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/03/30/world/ukraine-russia-war-news/russian-soldiers-have-shot-down-their-own-planes-the-uk-spy-chief-says


Processing URLs:   7%|▋         | 70/1000 [04:30<1:17:40,  5.01s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-work-idUSKCN0WW1QB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-work-idUSKCN0WW1QB


Processing URLs:   7%|▋         | 73/1000 [04:32<32:44,  2.12s/it]  

Error extracting text from http://www.bignewsnetwork.com/news/241541899/colombian-president-rejects-farc-peace-negotiation-extension: 403 Client Error: Forbidden for url: https://www.bignewsnetwork.com/news/241541899/colombian-president-rejects-farc-peace-negotiation-extension


Processing URLs:   7%|▋         | 74/1000 [04:33<26:34,  1.72s/it]

Error extracting text from https://www.nytimes.com/2017/09/07/technology/amazon-headquarters-north-america.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/07/technology/amazon-headquarters-north-america.html


Processing URLs:   8%|▊         | 75/1000 [05:04<2:43:09, 10.58s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18805-struggle-over-the-presidency-appears-to-postpone-nomination.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18805-struggle-over-the-presidency-appears-to-postpone-nomination.html
URL filtered: https://www.bloomberg.com/news/articles/2016-09-26/saudi-aramco-sees-oil-demand-steady-as-supply-growth-slows


Processing URLs:   8%|▊         | 81/1000 [05:12<41:50,  2.73s/it]  

Error extracting text from http://www.jada.or.jp/contents/data/ranking.html: 404 Client Error: Not Found for url: http://www.jada.or.jp/contents/data/ranking.html


Processing URLs:   8%|▊         | 82/1000 [05:12<32:31,  2.13s/it]

Error extracting text from https://www.amazon.com/Sea-Power-History-Geopolitics-Worlds/dp/1524778354: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Sea-Power-History-Geopolitics-Worlds/dp/1524778354


Processing URLs:   9%|▊         | 86/1000 [05:18<30:00,  1.97s/it]

Error extracting text from http://www.texasinsider.org/conservatives-vow-to-not-back-down-in-spending-fight/: 404 Client Error: Not Found for url: https://www.texasinsider.org/conservatives-vow-to-not-back-down-in-spending-fight/


Processing URLs:   9%|▉         | 88/1000 [05:21<27:10,  1.79s/it]

Error extracting text from http://www.lseg.com/sites/default/files/content/documents/Rupee%20bonds%20factsheet%20-%20final_0.pdf: 404 Client Error: Not Found for url: https://www.lseg.com/sites/default/files/content/documents/rupee%20bonds%20factsheet%20-%20final_0.pdf


Processing URLs:   9%|▉         | 90/1000 [05:24<23:09,  1.53s/it]

Error extracting text from http://www.reuters.com/article/2015/11/06/china-ipo-idUSL3N1313NV20151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/china-ipo-idUSL3N1313NV20151106
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN12708S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN12708S


Processing URLs:   9%|▉         | 92/1000 [05:32<41:53,  2.77s/it]

Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-20-february-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-20-february-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303418fe0>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   9%|▉         | 94/1000 [05:38<42:20,  2.80s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/index.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/index.htm


Processing URLs:  10%|▉         | 95/1000 [05:38<33:57,  2.25s/it]

Error extracting text from https://www.nytimes.com/2017/11/11/world/asia/afghanistan-iran-syria-revolutionary-guards.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/11/world/asia/afghanistan-iran-syria-revolutionary-guards.html


Processing URLs:  10%|▉         | 98/1000 [05:40<17:23,  1.16s/it]

Error extracting text from http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html?emc=edit_th_20151028&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/28/us/politics/house-votes-overwhelmingly-to-reopen-the-ex-im-bank.html?emc=edit_th_20151028&amp;nl=todaysheadlines&amp;nlid=28699183
Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0JG?feedType=RSS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-debt-idUSL2N1BU0JG?feedType=RSS


Processing URLs:  10%|▉         | 99/1000 [05:40<13:00,  1.15it/s]

Error extracting text from http://www.reuters.com/article/us-usa-colombia-idUSKCN0VD2XM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-colombia-idUSKCN0VD2XM
Error extracting text from http://www.reuters.com/article/us-britain-election-may-idUSKBN19033J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-may-idUSKBN19033J


Processing URLs:  10%|█         | 104/1000 [05:44<11:07,  1.34it/s]

Error extracting text from http://www.reuters.com/article/2015/09/16/us-usa-oilexports-senate-idUSKCN0RG2RT20150916: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/16/us-usa-oilexports-senate-idUSKCN0RG2RT20150916


Processing URLs:  11%|█         | 108/1000 [05:46<06:48,  2.18it/s]

Error extracting text from https://www.wsj.com/articles/the-last-battle-for-democracy-in-venezuela-1498240335: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-last-battle-for-democracy-in-venezuela-1498240335
Error extracting text from https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-increasingly-convinced-trump-will-pull-out-of-nafta-idUSKBN1EZ2K4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-increasingly-convinced-trump-will-pull-out-of-nafta-idUSKBN1EZ2K4
Error extracting text from https://www.reuters.com/article/us-crypto-currency-ecb-idUSKBN2AN104: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-crypto-currency-ecb-idUSKBN2AN104


Processing URLs:  11%|█         | 109/1000 [05:47<09:41,  1.53it/s]

Error extracting text from http://uk.reuters.com/article/opec-oil/table-opec-oil-output-rises-by-50000-bpd-in-september-reuters-survey-idUKL8N1MA38C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.reuters.com/article/us-usa-treasury-securities-idUSKBN13B2XG: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-treasury-securities-idUSKBN13B2XG
Error extracting text from http://www.reuters.com/article/us-russia-nato-idUSKCN0XG2FL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-nato-idUSKCN0XG2FL


Processing URLs:  11%|█         | 112/1000 [06:04<1:01:44,  4.17s/it]

Error extracting text from http://www.debtclocks.eu/public-debt-and-budget-deficits-comparison-of-the-eu-member-states.html: 404 Client Error: Not Found for url: https://www.haushaltssteuerung.de//public-debt-and-budget-deficits-comparison-of-the-eu-member-states.html


Processing URLs:  11%|█▏        | 113/1000 [06:06<52:36,  3.56s/it]  

Error extracting text from http://tass.ru/en/politics/840104: 404 Client Error: Not Found for url: https://tass.ru/en/politics/840104
URL filtered: http://www.bloomberg.com/news/articles/2016-01-06/north-korea-says-it-conducted-successful-nuclear-bomb-test


Processing URLs:  12%|█▏        | 118/1000 [06:12<27:50,  1.89s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160629/1505194736.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160629/1505194736.html
URL filtered: http://www.bloomberg.com/bw/articles/2014-09-12/nfls-next-commissioner-a-guide-to-the-replacing-roger-goodell


Processing URLs:  12%|█▏        | 120/1000 [06:12<17:16,  1.18s/it]

Error extracting text from http://www.cdm.me/english/russia-to-officially-oppose-montenegros-accession-to-nato: 403 Client Error: Forbidden for url: https://www.cdm.me/english/russia-to-officially-oppose-montenegros-accession-to-nato


Processing URLs:  12%|█▏        | 124/1000 [06:19<22:45,  1.56s/it]

Error extracting text from https://parstoday.com/en/news/world-i131611-russia_will_soon_test_sarmat_icbm_capable_of_beating_any_defenses: HTTPSConnectionPool(host='parstoday.com', port=443): Max retries exceeded with url: /en/news/world-i131611-russia_will_soon_test_sarmat_icbm_capable_of_beating_any_defenses (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30341b500>: Failed to resolve 'parstoday.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  13%|█▎        | 129/1000 [06:27<24:50,  1.71s/it]

Error extracting text from https://www.faa.gov/uas/request_waiver/waiver_safety_explanation_guidelines/: 404 Client Error: Not Found for url: https://www.faa.gov/uas/request_waiver/waiver_safety_explanation_guidelines/


Processing URLs:  13%|█▎        | 131/1000 [06:32<28:57,  2.00s/it]

Error extracting text from https://www.barrons.com/articles/what-is-section-230-and-what-does-it-do-everything-you-need-to-know-51609452187: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/what-is-section-230-and-what-does-it-do-everything-you-need-to-know-51609452187


Processing URLs:  14%|█▍        | 138/1000 [06:40<12:15,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-budget-ceiling-idUSKCN1B42FS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-budget-ceiling-idUSKCN1B42FS
Error extracting text from https://www.axios.com/iran-rejects-nuclear-deal-us-vienna-talks-raisi-303b4a34-b9a8-4073-9f93-c1174ab85685.html: 403 Client Error: Forbidden for url: https://www.axios.com/iran-rejects-nuclear-deal-us-vienna-talks-raisi-303b4a34-b9a8-4073-9f93-c1174ab85685.html


Processing URLs:  14%|█▍        | 140/1000 [06:44<20:33,  1.43s/it]

Error extracting text from http://www.hellenicshippingnews.com/kuwait-oil-price-improvement-linked-to-growth-in-china-and-japans-economies/: 404 Client Error: Not Found for url: https://www.hellenicshippingnews.com/kuwait-oil-price-improvement-linked-to-growth-in-china-and-japans-economies/
Error extracting text from http://www.reuters.com/article/us-northkorea-missile-idUSKCN0XK08U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-idUSKCN0XK08U


Processing URLs:  14%|█▍        | 143/1000 [06:46<10:39,  1.34it/s]

Error extracting text from http://www.wsj.com/articles/effort-to-force-vote-on-ex-im-bank-reauthorization-gains-some-republican-support-1443718801&gt: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/effort-to-force-vote-on-ex-im-bank-reauthorization-gains-some-republican-support-1443718801&gt
Error extracting text from https://rentberry.com/: 403 Client Error: Forbidden for url: https://rentberry.com/
URL filtered: http://www.bloomberg.com/news/articles/2015-11-23/core-comeback-may-herald-deflation-s-demise-for-central-bankers
URL filtered: http://www.bloomberg.com/news/articles/2016-02-16/secret-petro-diplomacy-starts-to-pay-off-in-echo-of-1999-shock


Processing URLs:  15%|█▌        | 150/1000 [06:53<15:50,  1.12s/it]

Error extracting text from http://www.nytimes.com/2016/04/18/world/americas/brazil-dilma-rousseff-impeachment-vote.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/18/world/americas/brazil-dilma-rousseff-impeachment-vote.html


Processing URLs:  15%|█▌        | 151/1000 [06:54<14:45,  1.04s/it]

Error extracting text from http://nationalinterest.org/feature/south-china-sea-clashes-are-fracturing-asean-16699: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/south-china-sea-clashes-are-fracturing-asean-16699
URL filtered: https://www.youtube.com/watch?v=zHiXgZRm_dU


Processing URLs:  15%|█▌        | 154/1000 [06:57<12:35,  1.12it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-12/roy-moore-trails-in-senate-poll-after-harassment-accusations


Processing URLs:  16%|█▌        | 158/1000 [07:02<16:25,  1.17s/it]

Error extracting text from http://news.sky.com/story/1691479/eu-remain-campaign-raises-14m-from-donors: 404 Client Error: Not Found for url: https://news.sky.com/story/1691479/eu-remain-campaign-raises-14m-from-donors


Processing URLs:  16%|█▌        | 160/1000 [07:12<38:48,  2.77s/it]

Error extracting text from http://blogs.channel4.com/factcheck/factcheck-sun-win-elections/20827: 404 Client Error: Not Found for url: https://www.channel4.com/news/factcheck/factcheck-sun-win-elections/20827


Processing URLs:  16%|█▌        | 161/1000 [07:12<29:05,  2.08s/it]

Error extracting text from http://www.levantinegroup.com/#!Iraq-Battle-Fronts-Update-ISIS-continues-counter-offensive-in-Ramadi-as-speculation-over-Fallujah-operation-mounts/c21xo/56b0e3bd0cf2dc1600ded0b1: 404 Client Error: Not Found for url: http://www.levantinegroup.com/#!Iraq-Battle-Fronts-Update-ISIS-continues-counter-offensive-in-Ramadi-as-speculation-over-Fallujah-operation-mounts/c21xo/56b0e3bd0cf2dc1600ded0b1


Processing URLs:  16%|█▋        | 165/1000 [07:19<24:46,  1.78s/it]

Error extracting text from https://www.reuters.com/article/us-northkorea-missiles/south-korea-offers-talks-with-north-ahead-of-olympics-idUSKBN1ER041: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles/south-korea-offers-talks-with-north-ahead-of-olympics-idUSKBN1ER041


Processing URLs:  17%|█▋        | 166/1000 [07:20<21:04,  1.52s/it]

Error extracting text from http://www.dispatch.com/content/stories/business/2015/09/12/ge-reportedly-nixed-cincinnati-for-new-hq-over-export-import-bank-fight.html: 404 Client Error: OK for url: https://www.dispatch.com/content/stories/business/2015/09/12/ge-reportedly-nixed-cincinnati-for-new-hq-over-export-import-bank-fight.html


Processing URLs:  17%|█▋        | 167/1000 [15:20<33:16:59, 143.84s/it]

Error extracting text from https://www.thespainreport.com/articles/818-160803125002-pp-and-ciudadanos-pressure-psoe-and-will-continue-talks-after-rajoy-and-rivera-meet-in-congress: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/818-160803125002-pp-and-ciudadanos-pressure-psoe-and-will-continue-talks-after-rajoy-and-rivera-meet-in-congress (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303419040>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  17%|█▋        | 170/1000 [15:22<12:37:29, 54.76s/it] 

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-china-idUSKCN1BB0OH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-china-idUSKCN1BB0OH
URL filtered: http://www.bloomberg.com/news/articles/2015-10-22/russia-said-to-back-early-syria-vote-to-give-assad-new-mandate
Error extracting text from http://www.wsj.com/articles/u-s-pursues-new-tack-in-vw-emissions-probe-1457464973: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-pursues-new-tack-in-vw-emissions-probe-1457464973


Processing URLs:  17%|█▋        | 171/1000 [15:24<9:37:32, 41.80s/it] 

Error extracting text from https://hacker.house/lab/redstar-os-3-0-remote-arbitrary-command-injection/: 404 Client Error: Not Found for url: https://hacker.house/lab/redstar-os-3-0-remote-arbitrary-command-injection/


Processing URLs:  17%|█▋        | 174/1000 [15:29<3:57:48, 17.27s/it]

Error extracting text from https://www.sec.gov/Archives/edgar/data/1609351/000160935116000107/spark-201693010xq.htm: 403 Client Error: Forbidden for url: https://www.sec.gov/Archives/edgar/data/1609351/000160935116000107/spark-201693010xq.htm
URL filtered: https://twitter.com/imillhiser/status/915231293209792512


Processing URLs:  18%|█▊        | 177/1000 [15:30<1:43:43,  7.56s/it]

Error extracting text from https://www.transportation.gov/AV: 403 Client Error: Forbidden for url: https://www.transportation.gov/AV


Processing URLs:  18%|█▊        | 180/1000 [15:35<54:03,  3.96s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-03-25/bolsonaro-put-on-notice-by-house-speaker-lira-on-virus-response
Error extracting text from http://www.reuters.com/article/us-vietnam-china-conflict-insight-idUSKBN0U000320151217: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-vietnam-china-conflict-insight-idUSKBN0U000320151217


Processing URLs:  18%|█▊        | 184/1000 [15:41<29:32,  2.17s/it]

Error extracting text from http://www.presstv.com/Detail/2015/12/05/440341/Iraq-Turkey-incursion-Abadi-prime-minister: 403 Client Error: Forbidden for url: https://www.presstv.com/Detail/2015/12/05/440341/Iraq-Turkey-incursion-Abadi-prime-minister


Processing URLs:  19%|█▊        | 186/1000 [15:43<23:36,  1.74s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-04/22/c_135304490.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-04/22/c_135304490.htm


Processing URLs:  19%|█▉        | 188/1000 [15:45<16:41,  1.23s/it]

Error extracting text from http://thehill.com/homenews/senate/314506-mccain-leans-toward-voting-for-tillerson: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/314506-mccain-leans-toward-voting-for-tillerson/


Processing URLs:  19%|█▉        | 189/1000 [15:46<17:06,  1.27s/it]

Error extracting text from http://www.atlanticcouncil.org/images/publications/Ten_Arguments_for_TTIP_web_0420.pdf: 404 Client Error: Not Found for url: https://www.atlanticcouncil.org/images/publications/Ten_Arguments_for_TTIP_web_0420.pdf


Processing URLs:  19%|█▉        | 190/1000 [16:46<4:12:33, 18.71s/it]

Error extracting text from http://www.usnews.com/news/business/articles/2015/12/04/opec-ponders-strategy-but-lacks-options-to-raise-oil-price: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  19%|█▉        | 193/1000 [17:23<3:51:04, 17.18s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-10-03/venezuelan-credit-dashboard-october-payments-total-1-8-billion


Processing URLs:  20%|█▉        | 196/1000 [17:24<1:35:12,  7.11s/it]

Error extracting text from http://www.reuters.com/article/us-california-electriccars-insight-idUSKCN1173LK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-california-electriccars-insight-idUSKCN1173LK


Processing URLs:  20%|█▉        | 197/1000 [17:26<1:17:19,  5.78s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/myanmar-anti-coup-fighters-retreat-town-us-makes-appeal-2021-05-16/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/myanmar-anti-coup-fighters-retreat-town-us-makes-appeal-2021-05-16/


Processing URLs:  20%|██        | 200/1000 [17:27<36:17,  2.72s/it]  

Error extracting text from https://www.yahoo.com/news/spain-pm-warns-elections-break-political-deadlock-002831803.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/spain-pm-warns-elections-break-political-deadlock-002831803.html


Processing URLs:  20%|██        | 205/1000 [17:37<30:25,  2.30s/it]

Error extracting text from http://38north.org/2016/02/sohae020516: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  21%|██        | 206/1000 [17:38<23:44,  1.79s/it]

Error extracting text from http://thehill.com/opinion/white-house/361193-The-feds-need-to-be-held-accountable-for-role-in-Russia-scandal: 403 Client Error: Forbidden for url: https://thehill.com/opinion/white-house/361193-The-feds-need-to-be-held-accountable-for-role-in-Russia-scandal/


Processing URLs:  21%|██        | 207/1000 [17:40<23:28,  1.78s/it]

Error extracting text from http://www.ibtimes.com/turkey-pkk-conflict-killed-162-civilians-august-rights-group-says-2258153: 403 Client Error: Forbidden for url: https://www.ibtimes.com/turkey-pkk-conflict-killed-162-civilians-august-rights-group-says-2258153


Processing URLs:  21%|██        | 212/1000 [18:48<4:14:18, 19.36s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-07-14/former-soviet-spy-attended-meeting-with-donald-trump-jr-and-russian-lawyer: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  21%|██▏       | 213/1000 [18:50<3:05:27, 14.14s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-controversy-pentagon-sends-uss-john-c-stennis-aircraft-carrier-2330154: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-controversy-pentagon-sends-uss-john-c-stennis-aircraft-carrier-2330154


Processing URLs:  21%|██▏       | 214/1000 [18:52<2:16:41, 10.43s/it]

Error extracting text from http://www.defense.gov/Portals/1/Documents/pubs/2015_China_Military_Power_Report.pdf: 404 Client Error: Not Found for url: https://www.defense.gov/Portals/1/Documents/pubs/2015_China_Military_Power_Report.pdf


Processing URLs:  22%|██▏       | 216/1000 [18:55<1:16:41,  5.87s/it]

Error extracting text from http://www.wsj.com/articles/russia-bulks-up-force-in-syria-starts-flying-drone-missions-1442856005: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russia-bulks-up-force-in-syria-starts-flying-drone-missions-1442856005


Processing URLs:  22%|██▏       | 218/1000 [19:28<2:41:24, 12.38s/it]

Error extracting text from http://www.todayszaman.com/business_refugees-employed-to-erect-wall-along-turkeys-syrian-border_412972.html: 522 Server Error:  for url: http://www.todayszaman.com/business_refugees-employed-to-erect-wall-along-turkeys-syrian-border_412972.html


Processing URLs:  22%|██▏       | 224/1000 [19:34<26:55,  2.08s/it]  

Error extracting text from http://www.nytimes.com/aponline/2016/09/05/world/asia/ap-as-southeast-asia-south-china-sea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/09/05/world/asia/ap-as-southeast-asia-south-china-sea.html?_r=0
Error extracting text from http://english.aawsat.com/2016/09/article55357785/isis-enrolls-arab-fighters-mosul: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/09/article55357785/isis-enrolls-arab-fighters-mosul


Processing URLs:  22%|██▎       | 225/1000 [19:34<19:44,  1.53s/it]

Error extracting text from https://www.yahoo.com/news/italy-pm-matteo-renzi-visit-iran-april-12-111252870.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/italy-pm-matteo-renzi-visit-iran-april-12-111252870.html


Processing URLs:  23%|██▎       | 228/1000 [19:42<24:53,  1.93s/it]

URL filtered: https://www.rferl.org/a/facebook-submitting-russia-ads-congress/28749656.html


Processing URLs:  23%|██▎       | 230/1000 [19:43<15:11,  1.18s/it]

Error extracting text from http://www.amazon.com/The-Second-Machine-Age-Technologies/dp/0393239357: 500 Server Error: Internal Server Error for url: https://www.amazon.com/The-Second-Machine-Age-Technologies/dp/0393239357


Processing URLs:  23%|██▎       | 232/1000 [19:48<23:17,  1.82s/it]

URL filtered: https://www.youtube.com/watch?v=8qaiD_dwr0U


Processing URLs:  24%|██▎       | 236/1000 [19:50<12:21,  1.03it/s]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-11/dubai-airports-sees-rise-in-traffic-after-41-first-half-drop


Processing URLs:  24%|██▍       | 243/1000 [19:59<12:09,  1.04it/s]

Error extracting text from http://www.balkaninsight.com/en/article/serbian-church-urges-montenegro-to-hold-referendum-on-nato-01-04-2016#sthash.nkrfNYZV.dpuf: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/serbian-church-urges-montenegro-to-hold-referendum-on-nato-01-04-2016#sthash.nkrfNYZV.dpuf


Processing URLs:  24%|██▍       | 244/1000 [20:00<12:31,  1.01it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-10/cameron-targeting-u-k-deal-on-eu-in-february-before-referendum


Processing URLs:  25%|██▍       | 248/1000 [20:04<14:13,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-student-idUSKCN0V00I8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-student-idUSKCN0V00I8


Processing URLs:  25%|██▌       | 251/1000 [20:08<13:07,  1.05s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKBN16S2KY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKBN16S2KY


Processing URLs:  25%|██▌       | 252/1000 [20:09<11:50,  1.05it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/official-maduro-allies-win-91-seats-venezuela-vote-74652316: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/official-maduro-allies-win-91-seats-venezuela-vote-74652316


Processing URLs:  25%|██▌       | 253/1000 [20:11<17:58,  1.44s/it]

URL filtered: https://www.youtube.com/watch?v=IYBA9JD5oW4


Processing URLs:  26%|██▌       | 255/1000 [20:11<10:26,  1.19it/s]

Error extracting text from http://autoweek.com/article/autonomous-cars/tesla-model-s-autopilot-strikes-again-dallas-crash: 403 Client Error: Forbidden for url: http://autoweek.com/article/autonomous-cars/tesla-model-s-autopilot-strikes-again-dallas-crash


Processing URLs:  26%|██▌       | 258/1000 [20:33<1:04:07,  5.18s/it]

Error extracting text from https://www.washingtonpost.com/national/atlanta-cybercrime-experts-investigating-equifax-hack/2017/11/12/3acd239e-c7b2-11e7-b506-8a10ed11ecf5_story.html?utm_term=.ca6098b9a0a5: 404 Client Error: Not Found for url: https://www.washingtonpost.com/national/atlanta-cybercrime-experts-investigating-equifax-hack/2017/11/12/3acd239e-c7b2-11e7-b506-8a10ed11ecf5_story.html?utm_term=.ca6098b9a0a5


Processing URLs:  26%|██▌       | 260/1000 [20:34<34:12,  2.77s/it]  

Error extracting text from http://www.consilium.europa.eu/en/council-eu/voting-system/qualified-majority/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/council-eu/voting-system/qualified-majority/
Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/11/27/brazils-corruption-arrests-renew-rousseff-impeachment-risk/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/11/27/brazils-corruption-arrests-renew-rousseff-impeachment-risk/


Processing URLs:  26%|██▋       | 263/1000 [20:58<1:35:47,  7.80s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-11/venezuela-default-fears-subside-as-shortest-pdvsa-bonds-climb


Processing URLs:  26%|██▋       | 265/1000 [20:59<54:41,  4.46s/it]  

Error extracting text from http://www.businessinsider.com/china-has-deployed-8-surface-to-air-missiles-on-a-contested-island-in-the-south-china-sea-2016-2: 404 Client Error: Not Found for url: https://www.businessinsider.com/china-has-deployed-8-surface-to-air-missiles-on-a-contested-island-in-the-south-china-sea-2016-2


Processing URLs:  27%|██▋       | 267/1000 [21:04<43:23,  3.55s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.bnm.gov.my/?ch=en_speech&amp;pg=en_speech_all&amp;ac=90&amp;lang=en: Document is empty


Processing URLs:  27%|██▋       | 270/1000 [21:10<37:27,  3.08s/it]

URL filtered: https://twitter.com/chris1reuters/status/745989166769541121/photo/1?ref_src=twsrc%5Etfw


Processing URLs:  28%|██▊       | 275/1000 [22:14<3:31:01, 17.46s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-12-26/south-korean-ruling-party-splits-over-impeached-president: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  28%|██▊       | 277/1000 [22:29<2:28:50, 12.35s/it]

URL filtered: https://twitter.com/HughSykes/status/793063085015785472


Processing URLs:  28%|██▊       | 279/1000 [22:35<1:39:20,  8.27s/it]

Error extracting text from http://www.parl.gc.ca/HouseChamberBusiness/ChamberCalendar.aspx?Language=E: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  28%|██▊       | 283/1000 [22:40<38:10,  3.19s/it]  

Error extracting text from http://thehill.com/policy/cybersecurity/274594-obama-extends-cyber-sanctions-powers: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/274594-obama-extends-cyber-sanctions-powers/


Processing URLs:  28%|██▊       | 284/1000 [22:40<28:02,  2.35s/it]

Error extracting text from http://www.wsj.com/articles/brazil-quits-venezuela-election-monitor-mission-1445406961: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-quits-venezuela-election-monitor-mission-1445406961


Processing URLs:  29%|██▊       | 286/1000 [22:45<27:55,  2.35s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-08-25/nord-stream-2-won-t-be-exempted-from-eu-rules-german-court-says


Processing URLs:  29%|██▉       | 289/1000 [22:48<20:37,  1.74s/it]

Error extracting text from https://www.justsecurity.org/28720/cyber-attack-dam-armed-attack/: 403 Client Error: Forbidden for url: https://www.justsecurity.org/28720/cyber-attack-dam-armed-attack/


Processing URLs:  29%|██▉       | 292/1000 [23:04<44:07,  3.74s/it]

Error extracting text from http://www.loebner.net/Prizef/loebner-prize.html: 404 Client Error: Not Found for url: http://www.loebner.net/Prizef/loebner-prize.html


Processing URLs:  30%|██▉       | 298/1000 [24:15<3:49:13, 19.59s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-03-14/support-for-scottish-independence-at-highest-ever-survey: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  30%|███       | 300/1000 [24:17<1:56:49, 10.01s/it]

Error extracting text from https://www.reuters.com/world/uk/uk-pm-johnson-hold-talks-with-biden-this-month-times-2021-09-12/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/uk-pm-johnson-hold-talks-with-biden-this-month-times-2021-09-12/


Processing URLs:  30%|███       | 304/1000 [24:29<52:13,  4.50s/it]  

Error extracting text from http://www.mdjonline.com/neighbor_newspapers/extra/news/russ-feingold-lead-over-ron-johnson-narrows-to-points-in/article_b8a61d30-72df-5c0e-9d2d-6ecdc51a68f7.html: 404 Client Error: Not Found for url: https://www.mdjonline.com/neighbor_newspapers/extra/news/russ-feingold-lead-over-ron-johnson-narrows-to-points-in/article_b8a61d30-72df-5c0e-9d2d-6ecdc51a68f7.html


Processing URLs:  31%|███       | 306/1000 [24:31<30:48,  2.66s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/chinese-military-aircraft/2706006.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/chinese-military-aircraft/2706006.html
Error extracting text from http://www.reuters.com/article/us-usa-cyber-rules/trump-administration-releases-rules-on-disclosing-cyber-flaws-idUSKBN1DF0A0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-rules/trump-administration-releases-rules-on-disclosing-cyber-flaws-idUSKBN1DF0A0


Processing URLs:  31%|███       | 307/1000 [24:31<22:31,  1.95s/it]

Error extracting text from https://www.nytimes.com/live/2022/03/29/world/ukraine-russia-war/the-chief-of-the-un-food-agency-warns-of-a-crisis-not-seen-since-world-war-ii: 403 Client Error: Forbidden for url: https://www.nytimes.com/live/2022/03/29/world/ukraine-russia-war/the-chief-of-the-un-food-agency-warns-of-a-crisis-not-seen-since-world-war-ii


Processing URLs:  31%|███▏      | 314/1000 [24:42<13:43,  1.20s/it]

Error extracting text from https://hayspost.com/posts/65d94ad7-10c6-4485-9027-2b51bcc43815: 403 Client Error: Forbidden for url: https://hayspost.com/posts/65d94ad7-10c6-4485-9027-2b51bcc43815


Processing URLs:  32%|███▏      | 322/1000 [24:59<21:42,  1.92s/it]

Error extracting text from https://www.amnesty.org/en/countries/europe-and-central-asia/ukraine/report-ukraine: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/location/europe-and-central-asia/ukraine/report-ukraine


Processing URLs:  32%|███▏      | 323/1000 [25:01<20:28,  1.81s/it]

Error extracting text from http://www.channelnewsasia.com/news/world/obama-says-chinese-led/2750896.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/world/obama-says-chinese-led/2750896.html


Processing URLs:  33%|███▎      | 327/1000 [25:07<17:18,  1.54s/it]

Error extracting text from https://www.tokendata.io/upcoming: 404 Client Error: Not Found for url: https://research.tokendata.io/upcoming


Processing URLs:  33%|███▎      | 329/1000 [25:12<25:41,  2.30s/it]

Error extracting text from http://www.iowafarmertoday.com/news/regional/fed-conference-highlights-downturn-decisions/article_1a389962-eeda-11e6-ab20-8f405756ad9a.html: 404 Client Error: Not Found for url: https://agupdate.com/iowafarmertoday/news/regional/fed-conference-highlights-downturn-decisions/article_1a389962-eeda-11e6-ab20-8f405756ad9a.html


Processing URLs:  33%|███▎      | 331/1000 [25:13<14:41,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKBN12Z0OB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKBN12Z0OB


Processing URLs:  33%|███▎      | 333/1000 [25:17<17:04,  1.54s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56777#.WSXqbVK-Ku4: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56777#.WSXqbVK-Ku4


Processing URLs:  34%|███▍      | 338/1000 [25:30<38:25,  3.48s/it]

Error extracting text from http://tmsnrt.rs/297CN7w: 404 Client Error:  for url: https://news.trust.org:443/item/20160702112910-auma1


Processing URLs:  34%|███▍      | 344/1000 [26:37<3:35:20, 19.70s/it]

Error extracting text from http://english.irib.ir/news/iran1/item/220869-expediency-council-secretary-urges-rouhani-to-order-increasing-iranian-missiles-range: HTTPConnectionPool(host='english.irib.ir', port=80): Max retries exceeded with url: /news/iran1/item/220869-expediency-council-secretary-urges-rouhani-to-order-increasing-iranian-missiles-range (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300935100>, 'Connection to english.irib.ir timed out. (connect timeout=60)'))


Processing URLs:  35%|███▍      | 346/1000 [26:40<1:55:01, 10.55s/it]

Error extracting text from http://en.trend.az/business/energy/2456059.html: 404 Client Error: Not Found for url: https://www.trend.az/business/energy/2456059.html


Processing URLs:  35%|███▍      | 348/1000 [26:43<1:01:35,  5.67s/it]

Error extracting text from https://www.nytimes.com/2022/02/08/world/asia/north-korea-icbm-china.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/02/08/world/asia/north-korea-icbm-china.html


Processing URLs:  35%|███▌      | 352/1000 [26:46<19:30,  1.81s/it]  

Error extracting text from http://www.reuters.com/article/2015/09/02/uk-opec-oil-idUKKCN0R21QV20150902: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/02/uk-opec-oil-idUKKCN0R21QV20150902


Processing URLs:  36%|███▌      | 357/1000 [26:53<19:28,  1.82s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/50656093.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/50656093.cms


Processing URLs:  36%|███▌      | 360/1000 [27:18<1:21:51,  7.67s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-27/singapore-defaults-seen-as-bellwether-for-asia-distress-in-2017


Processing URLs:  36%|███▋      | 363/1000 [27:21<38:03,  3.58s/it]  

Error extracting text from http://www.reuters.com/article/2015/11/14/us-syria-crisis-talks-idUSKCN0T22HN20151114#Ft8iWJ9OgoX24Aje.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/14/us-syria-crisis-talks-idUSKCN0T22HN20151114#Ft8iWJ9OgoX24Aje.99


Processing URLs:  36%|███▋      | 364/1000 [27:22<30:41,  2.90s/it]

Error extracting text from http://www.reuters.com/article/2015/10/22/us-brazil-politics-rousseff-idUSKCN0SG2R920151022: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/22/us-brazil-politics-rousseff-idUSKCN0SG2R920151022


Processing URLs:  37%|███▋      | 367/1000 [27:24<16:05,  1.52s/it]

Error extracting text from http://abcnews.go.com/Politics/wireStory/pentagon-lays-plan-back-mosul-raqqa-36270318: 404 Client Error: Not Found for url: https://abcnews.go.com/Politics/wireStory/pentagon-lays-plan-back-mosul-raqqa-36270318
Error extracting text from https://www.congress.gov/treaty-document/110th-congress/20: 403 Client Error: Forbidden for url: https://www.congress.gov/treaty-document/110th-congress/20


Processing URLs:  37%|███▋      | 368/1000 [27:24<14:04,  1.34s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-03/venezuela-is-seeking-debt-relief-and-confusing-its-bondholders


Processing URLs:  37%|███▋      | 370/1000 [28:26<2:22:33, 13.58s/it]

Error extracting text from http://www.post-gazette.com/news/politics-nation/2018/01/30/The-Supreme-Court-may-have-signaled-that-it-might-block-Pennsylvania-s-ruling-against-partisan-gerrymandering/stories/201801290222: HTTPConnectionPool(host='www.post-gazette.com', port=80): Max retries exceeded with url: /news/politics-nation/2018/01/30/The-Supreme-Court-may-have-signaled-that-it-might-block-Pennsylvania-s-ruling-against-partisan-gerrymandering/stories/201801290222 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x307f0eff0>, 'Connection to www.post-gazette.com timed out. (connect timeout=60)'))


Processing URLs:  37%|███▋      | 371/1000 [28:26<1:51:24, 10.63s/it]

Error extracting text from http://www.democraticaudit.com/?p=17107: 403 Client Error: Forbidden for url: http://www.democraticaudit.com/?p=17107


Processing URLs:  37%|███▋      | 374/1000 [28:27<49:40,  4.76s/it]  

Error extracting text from https://www.nytimes.com/2018/02/13/world/middleeast/netanyahu-israel-corruption.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/13/world/middleeast/netanyahu-israel-corruption.html
URL filtered: https://m.youtube.com/watch?v=qoz5Za-LBDU
Error extracting text from http://www.nytimes.com/2015/10/04/opinion/the-power-of-precise-predictions.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/04/opinion/the-power-of-precise-predictions.html


Processing URLs:  38%|███▊      | 375/1000 [28:28<41:53,  4.02s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-08/u-s-missile-defense-system-could-reshape-north-asia-security


Processing URLs:  38%|███▊      | 379/1000 [28:32<21:17,  2.06s/it]

Error extracting text from http://thehill.com/policy/international/201132-republicans-demand-obama-get-tougher-with-putin-on-ukraine: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/201132-republicans-demand-obama-get-tougher-with-putin-on-ukraine/


Processing URLs:  38%|███▊      | 384/1000 [28:38<12:28,  1.21s/it]

Error extracting text from http://www.wsj.com/articles/china-completes-runway-on-artificial-island-in-south-china-sea-1443184818?tesla=y: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-completes-runway-on-artificial-island-in-south-china-sea-1443184818?tesla=y


Processing URLs:  39%|███▊      | 387/1000 [28:59<47:14,  4.62s/it]  

Error extracting text from https://www.washingtonpost.com/politics/trump-welcomes-president-of-finland-to-white-house/2017/08/28/fa2c2092-8c29-11e7-9c53-6a169beb0953_story.html?utm_term=.b49404d83ad1: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/trump-welcomes-president-of-finland-to-white-house/2017/08/28/fa2c2092-8c29-11e7-9c53-6a169beb0953_story.html?utm_term=.b49404d83ad1
Error extracting text from http://www.wsj.com/articles/greeces-2017-budget-plan-sticks-with-robust-growth-forecast-1475508075: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/greeces-2017-budget-plan-sticks-with-robust-growth-forecast-1475508075


Processing URLs:  39%|███▉      | 389/1000 [29:00<25:47,  2.53s/it]

Error extracting text from http://www.tradearabia.com/news/INTNEWS_300815.html: 400 Client Error: Bad Request for url: http://www.tradearabia.com/news/INTNEWS_300815.html


Processing URLs:  39%|███▉      | 392/1000 [29:03<14:21,  1.42s/it]

Error extracting text from http://www.scotsman.com/news/politics/snp-to-target-soft-no-voters-in-new-scottish-independence-drive-1-4215038: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/snp-to-target-soft-no-voters-in-new-scottish-independence-drive-1-4215038


Processing URLs:  39%|███▉      | 393/1000 [29:07<21:20,  2.11s/it]

Error extracting text from http://www.ambafrance-ir.org/-L-Ambassadeur-: 429 Client Error: Too Many Requests for url: https://ir.ambafrance.org/-L-Ambassadeur-


Processing URLs:  40%|████      | 404/1000 [29:31<19:13,  1.94s/it]

Error extracting text from http://en.trend.az/business/economy/2531660.html: 404 Client Error: Not Found for url: https://www.trend.az/business/economy/2531660.html


Processing URLs:  41%|████      | 408/1000 [29:37<13:18,  1.35s/it]

Error extracting text from http://thehill.com/campaign-polls/362305-poll-roy-moore-up-5-points-on-dem-opponent: 403 Client Error: Forbidden for url: https://thehill.com/campaign-polls/362305-poll-roy-moore-up-5-points-on-dem-opponent/


Processing URLs:  41%|████      | 410/1000 [29:37<08:40,  1.13it/s]

Error extracting text from http://thehill.com/policy/energy-environment/256313-supporters-of-ending-the-oil-export-ban-eye-a-deal: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/256313-supporters-of-ending-the-oil-export-ban-eye-a-deal/


Processing URLs:  41%|████▏     | 413/1000 [30:09<1:19:14,  8.10s/it]

Error extracting text from https://www.investopedia.com/articles/economics/08/determining-oil-prices.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/economics/08/determining-oil-prices.asp


Processing URLs:  42%|████▏     | 415/1000 [30:11<42:37,  4.37s/it]  

Error extracting text from https://www.wsj.com/articles/covid-19-coronavirus-lab-leak-virology-origins-pandemic-11633462827?mod=djemalertNEWS: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/covid-19-coronavirus-lab-leak-virology-origins-pandemic-11633462827?mod=djemalertNEWS


Processing URLs:  42%|████▏     | 418/1000 [30:16<27:38,  2.85s/it]

Error extracting text from http://www.businessinsider.com/r-spacex-could-be-grounded-for-9-12-months-ula-chief-2016-9?IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-spacex-could-be-grounded-for-9-12-months-ula-chief-2016-9?IR=T


Processing URLs:  42%|████▏     | 419/1000 [30:17<21:01,  2.17s/it]

Error extracting text from http://thehill.com/homenews/campaign/363693-gop-strategist-donates-to-alabama-democrat: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/363693-gop-strategist-donates-to-alabama-democrat/


Processing URLs:  42%|████▏     | 421/1000 [30:19<15:39,  1.62s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0RO01320150924: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0RO01320150924


Processing URLs:  42%|████▎     | 425/1000 [30:26<14:04,  1.47s/it]

Error extracting text from http://www.wsj.com/articles/philippines-fidel-ramos-leaves-for-china-monday-for-south-china-sea-talks-1470565429: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/philippines-fidel-ramos-leaves-for-china-monday-for-south-china-sea-talks-1470565429
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-fighting-idUSKBN1370DY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-fighting-idUSKBN1370DY


Processing URLs:  43%|████▎     | 427/1000 [30:27<10:14,  1.07s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-economy-idUSKCN1062PH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-economy-idUSKCN1062PH


Processing URLs:  43%|████▎     | 433/1000 [30:31<07:46,  1.21it/s]

Error extracting text from http://abokifx.com/south-africa-reclaims-spot-africas-second-biggest-economy-business-tech/: 404 Client Error: Not Found for url: https://abokifx.com/south-africa-reclaims-spot-africas-second-biggest-economy-business-tech/


Processing URLs:  43%|████▎     | 434/1000 [30:33<10:40,  1.13s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-russia-casualtie/russian-toll-in-syria-battle-was-300-killed-and-wounded-sources-idUSKCN1FZ2DZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-casualtie/russian-toll-in-syria-battle-was-300-killed-and-wounded-sources-idUSKCN1FZ2DZ


Processing URLs:  44%|████▍     | 442/1000 [30:41<09:58,  1.07s/it]

Error extracting text from https://www.nytimes.com/2017/05/12/world/americas/venezuela-protests-maduro.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/05/12/world/americas/venezuela-protests-maduro.html
URL filtered: https://www.youtube.com/watch?v=fis-9Zqu2Ro


Processing URLs:  44%|████▍     | 445/1000 [30:43<07:57,  1.16it/s]

Error extracting text from http://nationalinterest.org/feature/hezbollah-winning-the-war-syria-19229: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/hezbollah-winning-the-war-syria-19229


Processing URLs:  45%|████▍     | 446/1000 [30:44<09:03,  1.02it/s]

Error extracting text from http://www.newsweek.com/911-lawsuit-saudi-bin-laden-583984: 403 Client Error: Forbidden for url: https://www.newsweek.com/911-lawsuit-saudi-bin-laden-583984


Processing URLs:  45%|████▍     | 447/1000 [30:46<10:09,  1.10s/it]

URL filtered: https://twitter.com/PeterYoachim/status/898616685086691328


Processing URLs:  45%|████▌     | 451/1000 [30:49<07:50,  1.17it/s]

Error extracting text from http://www.autoevolution.com/news/more-volkswagen-execs-knew-about-dieselgate-as-early-as-2006-103923.html: 403 Client Error: Forbidden for url: https://www.autoevolution.com/news/more-volkswagen-execs-knew-about-dieselgate-as-early-as-2006-103923.html


Processing URLs:  45%|████▌     | 454/1000 [30:53<09:32,  1.05s/it]

Error extracting text from http://thehill.com/policy/defense/282825-house-gop-to-roll-out-national-security-agenda: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/282825-house-gop-to-roll-out-national-security-agenda/


Processing URLs:  46%|████▌     | 456/1000 [30:55<09:50,  1.09s/it]

Error extracting text from http://ecodiario.eleconomista.es/politica/noticias/7680436/07/16/La-alternativa-del-PSOE-para-que-haya-investidura-ceder-algunos-de-sus-diputados-a-Rajoy.html: 403 Client Error: Forbidden for url: http://ecodiario.eleconomista.es/politica/noticias/7680436/07/16/La-alternativa-del-PSOE-para-que-haya-investidura-ceder-algunos-de-sus-diputados-a-Rajoy.html


Processing URLs:  46%|████▌     | 458/1000 [30:58<10:04,  1.12s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/nato-chief-praises-montenegro-progress-membership-bid-34489449: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/nato-chief-praises-montenegro-progress-membership-bid-34489449


Processing URLs:  46%|████▋     | 463/1000 [31:24<31:18,  3.50s/it]

Error extracting text from http://www.cnbc.com/2016/08/16/reuters-america-corrected-update-1-foreigners-sell-us-treasuries-for-3rd-month-in-june-data.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/08/16/reuters-america-corrected-update-1-foreigners-sell-us-treasuries-for-3rd-month-in-june-data.html
URL filtered: https://www.youtube.com/watch?v=UlUVeaZ14AQ


Processing URLs:  47%|████▋     | 468/1000 [31:28<11:18,  1.27s/it]

Error extracting text from https://www.fire.ca.gov/stats-events/: 403 Client Error: Forbidden for url: https://www.fire.ca.gov/stats-events/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-russia-idUSKBN17F2GH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-russia-idUSKBN17F2GH


Processing URLs:  47%|████▋     | 471/1000 [31:35<13:53,  1.57s/it]

Error extracting text from https://www.reuters.com/article/us-saundersonmeyer-southafrica-commentar/commentary-what-south-africas-ramaphosa-must-do-next-idUSKBN1EC2S8?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-12-19&amp;utm_term=US%20Reuters%20News%20Now: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saundersonmeyer-southafrica-commentar/commentary-what-south-africas-ramaphosa-must-do-next-idUSKBN1EC2S8?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=US%20Reuters%20News%20Now%202017-12-19&amp;utm_term=US%20Reuters%20News%20Now


Processing URLs:  48%|████▊     | 476/1000 [31:39<06:38,  1.31it/s]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3779404: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3779404
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-mood-idUSKBN13B18V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-mood-idUSKBN13B18V


Processing URLs:  48%|████▊     | 479/1000 [31:42<06:07,  1.42it/s]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://maringa.odiario.com/politica/2016/03/oposicao-pede-renuncia-de-dilma-para-aproximar-pmdb-de-impeachment/2098388/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://maringa.odiario.com/politica/2016/03/oposicao-pede-renuncia-de-dilma-para-aproximar-pmdb-de-impeachment/2098388/&amp;prev=search


Processing URLs:  49%|████▉     | 488/1000 [31:57<08:16,  1.03it/s]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=998878: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=998878


Processing URLs:  49%|████▉     | 489/1000 [31:57<07:33,  1.13it/s]

Error extracting text from http://beforeitsnews.com/alternative/2016/11/the-economist-the-world-in-2017-cover-leaked-ominous-nuclear-death-on-tarot-cards-3441270.html: 404 Client Error: Not Found for url: https://beforeitsnews.com/alternative/2016/11/the-economist-the-world-in-2017-cover-leaked-ominous-nuclear-death-on-tarot-cards-3441270.html


Processing URLs:  49%|████▉     | 492/1000 [32:02<11:01,  1.30s/it]

Error extracting text from http://www.businessinsider.com/r-esms-regling-says-confident-imf-will-participate-in-greek-program--2015-9: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-esms-regling-says-confident-imf-will-participate-in-greek-program--2015-9


Processing URLs:  50%|████▉     | 495/1000 [32:17<26:10,  3.11s/it]

URL filtered: http://www.bloombergview.com/articles/2015-10-19/trump-candidacy-will-fade-as-other-republicans-rise


Processing URLs:  50%|████▉     | 499/1000 [32:22<16:34,  1.99s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/forecasts-cases.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/forecasts-cases.html


Processing URLs:  50%|█████     | 502/1000 [32:27<13:34,  1.64s/it]

Error extracting text from https://www.straitstimes.com/world/united-states/several-drugmakers-working-on-oral-medication-to-treat-covid-19: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  50%|█████     | 504/1000 [32:30<12:11,  1.47s/it]

Error extracting text from http://apne.ws/1SMYGJN: 404 Client Error: Not Found for url: http://trib.al/1SMYGJN


Processing URLs:  51%|█████     | 509/1000 [32:36<11:04,  1.35s/it]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-usa-prisoners-insight-idUSKCN0UR01R20160113: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-usa-prisoners-insight-idUSKCN0UR01R20160113


Processing URLs:  51%|█████     | 510/1000 [32:38<10:46,  1.32s/it]

Error extracting text from http://www.sandandgravel.com/news/article.asp?v1=22866: 404 Client Error: Not Found for url: https://www.clarksons.net/wfr/dredgers/news/article.asp?v1=22866
URL filtered: https://www.bloomberg.com/politics/articles/2017-01-15/trump-calls-nato-obsolete-and-dismisses-eu-in-german-interview?cmpid=socialflow-twitter-business&amp;utm_content=business&amp;utm_campaign=socialflow-organic&amp;utm_source=twitter&amp;utm_medium=social


Processing URLs:  51%|█████     | 512/1000 [32:41<12:13,  1.50s/it]

Error extracting text from http://www.bcv.org.ve/: HTTPSConnectionPool(host='www.bcv.org.ve', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  52%|█████▏    | 519/1000 [32:50<07:27,  1.08it/s]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fvariant-surveillance%2Fvariant-info.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fvariant-surveillance%2Fvariant-info.html
Error extracting text from http://www.reuters.com/article/us-turkey-security-police-idUSKCN12408Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-security-police-idUSKCN12408Z


Processing URLs:  52%|█████▏    | 521/1000 [33:51<2:20:08, 17.55s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2017-11-13/spain-to-brief-eu-on-alleged-cyber-meddling-in-catalonia: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
URL filtered: https://www.youtube.com/watch?v=vFr3K2DORc8&amp;internalcountrycode=JP


Processing URLs:  53%|█████▎    | 526/1000 [33:58<39:54,  5.05s/it]  

Error extracting text from https://www.socialeurope.eu/2016/01/turkeys-military-onslaught-kurds-europe-us-silent/: 403 Client Error: Forbidden for url: https://www.socialeurope.eu/2016/01/turkeys-military-onslaught-kurds-europe-us-silent/


Processing URLs:  53%|█████▎    | 527/1000 [34:00<33:57,  4.31s/it]

Error extracting text from http://www.globeatnight.org/infographic/2015: 404 Client Error: Not Found for url: https://globeatnight.org/infographic/2015/


Processing URLs:  53%|█████▎    | 529/1000 [34:02<20:55,  2.67s/it]

Error extracting text from http://www.tradingeconomics.com/egypt/gdp-growth-annual/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/egypt/gdp-growth-annual/forecast


Processing URLs:  53%|█████▎    | 531/1000 [34:06<17:21,  2.22s/it]

URL filtered: https://www.youtube.com/watch?v=tyop0d30UqQ
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZ1CYM6K50YB01-27VV5EC1KUCFQOLS0JQ637N5U6


Processing URLs:  53%|█████▎    | 534/1000 [34:09<12:45,  1.64s/it]

Error extracting text from http://www.protecdental.com/news/sugar-tax-could-lower-dental-caries-and-health-care-expenses: 404 Client Error: Not Found for url: https://protecdental.com/news/sugar-tax-could-lower-dental-caries-and-health-care-expenses


Processing URLs:  54%|█████▎    | 536/1000 [34:15<15:19,  1.98s/it]

Error extracting text from https://larswericson.wordpress.com/2016/04/21/gitrep-20apr16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/04/21/gitrep-20apr16pm/


Processing URLs:  54%|█████▎    | 537/1000 [34:15<12:22,  1.60s/it]

Error extracting text from http://www.chinapost.com.tw/asia/2017/08/17/500055/p2/yingluck-verdict.htm: HTTPConnectionPool(host='www.chinapost.com.tw', port=80): Max retries exceeded with url: /asia/2017/08/17/500055/p2/yingluck-verdict.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fbdef320>: Failed to resolve 'www.chinapost.com.tw' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  54%|█████▍    | 539/1000 [34:20<15:14,  1.98s/it]

Error extracting text from http://www.blog.tommullins.com.au/2015/03/elo-ratings-for-one-day-international.html: HTTPConnectionPool(host='www.blog.tommullins.com.au', port=80): Max retries exceeded with url: /2015/03/elo-ratings-for-one-day-international.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fbdee9f0>: Failed to resolve 'www.blog.tommullins.com.au' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  54%|█████▍    | 543/1000 [34:22<06:55,  1.10it/s]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0Y81JL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0Y81JL
Error extracting text from http://www.washingtontimes.com/news/2016/sep/28/us-deploys-more-troops-to-iraq-for-showdown-with-i/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/sep/28/us-deploys-more-troops-to-iraq-for-showdown-with-i/
URL filtered: https://www.youtube.com/watch?v=7p8H_ESwrWg


Processing URLs:  55%|█████▍    | 548/1000 [34:30<10:53,  1.45s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_107711.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_107711.htm


error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserErro

Error extracting text from http://infobrics.org/blog/news/2016/11/08/20660/: Document is empty


Processing URLs:  55%|█████▌    | 552/1000 [34:35<10:16,  1.38s/it]

Error extracting text from http://www.ibtimes.com/iran-us-navy-dispute-live-updates-iranian-military-holds-10-american-sailors-irans-2262162: 403 Client Error: Forbidden for url: https://www.ibtimes.com/iran-us-navy-dispute-live-updates-iranian-military-holds-10-american-sailors-irans-2262162


Processing URLs:  56%|█████▌    | 557/1000 [34:41<10:49,  1.47s/it]

Error extracting text from http://www.atimes.com/indo-russian-supersonic-missile-threatens-chinas-security/: 404 Client Error: Not Found for url: https://atimes.com/indo-russian-supersonic-missile-threatens-chinas-security/


Processing URLs:  56%|█████▌    | 560/1000 [34:47<13:36,  1.86s/it]

URL filtered: https://www.youtube.com/watch?v=K34m6AaFTg8


Processing URLs:  56%|█████▋    | 565/1000 [34:51<07:27,  1.03s/it]

Error extracting text from https://thehill.com/opinion/538105-budowsky-joe-manchins-rendezvous-with-destiny: 403 Client Error: Forbidden for url: https://thehill.com/opinion/538105-budowsky-joe-manchins-rendezvous-with-destiny/


Processing URLs:  57%|█████▋    | 567/1000 [34:53<06:58,  1.04it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/organization-american-states-head-moves-venezuela-39502351: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/organization-american-states-head-moves-venezuela-39502351


Processing URLs:  57%|█████▋    | 569/1000 [35:28<1:14:59, 10.44s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/18920-constitution-debate-divides-ma-ba-tha.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/18920-constitution-debate-divides-ma-ba-tha.html


Processing URLs:  57%|█████▋    | 572/1000 [35:34<33:22,  4.68s/it]  

Error extracting text from http://www.cdm.me/politika/istrazivanje-ipsosa-za-nato-52-odsto-gradana: 403 Client Error: Forbidden for url: https://www.cdm.me/politika/istrazivanje-ipsosa-za-nato-52-odsto-gradana


Processing URLs:  57%|█████▊    | 575/1000 [35:35<13:02,  1.84s/it]

Error extracting text from http://www.google.com/trends/explore#cat=0-18&amp;q=iphone%206&amp;gprop=froogle&amp;cmpt=date&amp;tz=Etc%2FGMT%2B4: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#cat=0-18&amp;q=iphone%206&amp;gprop=froogle&amp;cmpt=date&amp;tz=Etc%2FGMT%2B4
Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950530001150: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950530001150 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304723cb0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://blogs.wsj.com/washwire/2015/12/07/how-long-will-reporter-jason-rezaian-be-imprisoned-in-iran/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/washwire/2015/12/07/how-long-will-reporter-jason-rezaian-be-imprisoned-in-iran/


Processing URLs:  58%|█████▊    | 577/1000 [35:36<08:22,  1.19s/it]

Error extracting text from http://www.reuters.com/article/us-un-election-idUSKCN125099: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-idUSKCN125099


Processing URLs:  58%|█████▊    | 578/1000 [35:36<07:12,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-global-policy-divergence-idUSKBN0TJ22720151130: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-global-policy-divergence-idUSKBN0TJ22720151130


Processing URLs:  58%|█████▊    | 581/1000 [35:42<11:08,  1.60s/it]

Error extracting text from http://1tv.ge/en/news/view/125980.html: 404 Client Error: Not Found for url: https://1tv.ge/en/news/view/125980.html


Processing URLs:  58%|█████▊    | 583/1000 [35:44<10:25,  1.50s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/the-latest-sen-mccain-says-hell-support-rex-tillerson/articleshow/56719875.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/the-latest-sen-mccain-says-hell-support-rex-tillerson/articleshow/56719875.cms


Processing URLs:  58%|█████▊    | 585/1000 [35:45<06:25,  1.08it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKCN18A12B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKCN18A12B


Processing URLs:  59%|█████▊    | 586/1000 [35:45<05:07,  1.35it/s]

Error extracting text from https://www.nytimes.com/2018/01/28/us/politics/rod-rosenstein-carter-page-secret-memo.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/28/us/politics/rod-rosenstein-carter-page-secret-memo.html


Processing URLs:  59%|█████▉    | 588/1000 [35:50<10:34,  1.54s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYYMTU6K50XV01-1Q5CJ2M85SO432OMHP42BLS2RM


Processing URLs:  59%|█████▉    | 592/1000 [35:53<07:00,  1.03s/it]

Error extracting text from http://thehill.com/homenews/administration/345111-trump-you-can-thank-congress-for-us-russia-relationship-at-all-time: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/345111-trump-you-can-thank-congress-for-us-russia-relationship-at-all-time/


Processing URLs:  59%|█████▉    | 593/1000 [35:54<05:33,  1.22it/s]

Error extracting text from http://www.balkans.com/open-news.php?uniquenumber=210706: 404 Client Error: Not Found for url: http://www.balkans.com/open-news.php?uniquenumber=210706


Processing URLs:  59%|█████▉    | 594/1000 [35:54<05:14,  1.29it/s]

Error extracting text from http://agingbiotech.info/companies/: 403 Client Error: Forbidden for url: http://agingbiotech.info/companies/


Processing URLs:  60%|█████▉    | 597/1000 [35:58<07:26,  1.11s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/taliban-myanmar-junta-unlikely-be-let-into-un-now-diplomats-2021-12-01/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/taliban-myanmar-junta-unlikely-be-let-into-un-now-diplomats-2021-12-01/


Processing URLs:  60%|██████    | 602/1000 [36:01<04:09,  1.59it/s]

Error extracting text from https://www.un.org/development/desa/dpad/publication/world-economic-situation-and-prospects-february-2021-briefing-no-146/: 403 Client Error: Forbidden for url: https://www.un.org/development/desa/dpad/publication/world-economic-situation-and-prospects-february-2021-briefing-no-146/
Error extracting text from http://www.washingtontimes.com/news/2017/feb/23/hundreds-scientists-urge-trump-withdraw-un-climate/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/feb/23/hundreds-scientists-urge-trump-withdraw-un-climate/


Processing URLs:  61%|██████    | 609/1000 [36:11<06:58,  1.07s/it]

Error extracting text from https://www.humboldtforum.org/en/programme-2/: 403 Client Error: Forbidden for url: https://www.humboldtforum.org/en/programme-2/
Error extracting text from http://www.balkaninsight.com/en/article/serbian-church-urges-montenegro-to-hold-referendum-on-nato-01-04-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/serbian-church-urges-montenegro-to-hold-referendum-on-nato-01-04-2016


Processing URLs:  61%|██████▏   | 614/1000 [36:18<08:19,  1.29s/it]

Error extracting text from https://tradingeconomics.com/commodity/baltic: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/commodity/baltic


Processing URLs:  62%|██████▏   | 616/1000 [36:32<23:39,  3.70s/it]

Error extracting text from http://www.bankofengland.co.uk/statistics/pages/yieldcurve/default.aspx: 500 Server Error: Internal Server Error for url: https://www.bankofengland.co.uk/statistics/pages/yieldcurve/default.aspx


Processing URLs:  62%|██████▏   | 619/1000 [36:34<10:25,  1.64s/it]

Error extracting text from https://www.wsj.com/articles/dreamers-fate-hangs-over-efforts-to-avert-government-shutdown-1512642601: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/dreamers-fate-hangs-over-efforts-to-avert-government-shutdown-1512642601


Processing URLs:  62%|██████▎   | 625/1000 [36:44<08:23,  1.34s/it]

Error extracting text from http://larepublica.pe/politica/772365-marcha-no-keiko-miles-se-manifiestan-hoy-en-el-peru-y-el-mundo: 403 Client Error: Forbidden for url: https://larepublica.pe/politica/772365-marcha-no-keiko-miles-se-manifiestan-hoy-en-el-peru-y-el-mundo


Processing URLs:  63%|██████▎   | 628/1000 [36:49<09:31,  1.54s/it]

Error extracting text from http://thehill.com/homenews/administration/353211-mueller-begins-interviewing-white-house-staff-in-collusion-probe: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/353211-mueller-begins-interviewing-white-house-staff-in-collusion-probe/


Processing URLs:  63%|██████▎   | 632/1000 [37:01<13:45,  2.24s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-04/global-military-spending-rose-in-2015-stockholm-peace-institute


Processing URLs:  63%|██████▎   | 634/1000 [37:02<09:28,  1.55s/it]

Error extracting text from https://slatestarcodex.com/2018/01/03/ssc-survey-results-2018/: 403 Client Error: Forbidden for url: https://slatestarcodex.com/2018/01/03/ssc-survey-results-2018/


Processing URLs:  64%|██████▍   | 642/1000 [37:11<05:16,  1.13it/s]

Error extracting text from http://warontherocks.com/2016/01/do-we-know-what-we-are-doing-in-afghanistan-this-year/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/01/do-we-know-what-we-are-doing-in-afghanistan-this-year/


Processing URLs:  64%|██████▍   | 645/1000 [37:13<03:42,  1.59it/s]

Error extracting text from http://www.nytimes.com/2016/05/19/business/tesla-to-offer-2-billion-in-stock-to-meet-model-3-production-goal.html?emc=edit_th_20160519&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/19/business/tesla-to-offer-2-billion-in-stock-to-meet-model-3-production-goal.html?emc=edit_th_20160519&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  65%|██████▍   | 647/1000 [37:16<05:38,  1.04it/s]

Error extracting text from https://www.aa.com.tr/en/middle-east/iran-s-zarif-speaks-with-hamas-chief-pledges-support/2236280: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Error extracting text from http://www.nytimes.com/2015/10/20/opinion/putins-partition-plan-for-syria.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/20/opinion/putins-partition-plan-for-syria.html?_r=0


Processing URLs:  65%|██████▍   | 648/1000 [37:16<04:32,  1.29it/s]

Error extracting text from https://psmag.com/news/how-an-election-can-be-hacked: 403 Client Error: Forbidden for url: https://psmag.com/news/how-an-election-can-be-hacked


Processing URLs:  65%|██████▌   | 650/1000 [37:18<04:39,  1.25it/s]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/US-agrees-spy-plane-deployment-in-Singapore-amid-China-tensions?page=2: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/US-agrees-spy-plane-deployment-in-Singapore-amid-China-tensions?page=2


Processing URLs:  65%|██████▌   | 653/1000 [37:21<04:41,  1.23it/s]

Error extracting text from http://www.timesofisrael.com/all-the-presidents-arguments-an-afternoon-with-abbas/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/all-the-presidents-arguments-an-afternoon-with-abbas/


Processing URLs:  65%|██████▌   | 654/1000 [37:39<33:52,  5.87s/it]

Error extracting text from http://www.ew.com/recap/the-good-wife-season-7-episode-11: 406 Client Error: Not Acceptable for url: https://www.ew.com/recap/the-good-wife-season-7-episode-11


Processing URLs:  66%|██████▌   | 656/1000 [37:42<21:19,  3.72s/it]

Error extracting text from http://en.portnews.ru/news/248536/: 403 Client Error: Forbidden for url: https://en.portnews.ru/news/248536/


Processing URLs:  66%|██████▌   | 658/1000 [37:46<16:26,  2.89s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/trump-nomination-to-lead-state-picks-up-support-in-senate/articleshow/56724270.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/trump-nomination-to-lead-state-picks-up-support-in-senate/articleshow/56724270.cms


Processing URLs:  66%|██████▌   | 660/1000 [37:47<09:29,  1.67s/it]

Error extracting text from http://www.reuters.com/article/us-israel-palestinians-settlement-idUSKBN1711K6?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-settlement-idUSKBN1711K6?il=0


Processing URLs:  67%|██████▋   | 666/1000 [37:55<06:07,  1.10s/it]

Error extracting text from https://www.whitehouse.gov/sites/default/files/docs/jcpoa_key_excerpts.pdf: 404 Client Error: Not Found for url: https://www.whitehouse.gov/sites/default/files/docs/jcpoa_key_excerpts.pdf


Processing URLs:  67%|██████▋   | 668/1000 [37:59<08:21,  1.51s/it]

Error extracting text from https://www.state.gov/e/eb/tfs/spi/iran/jcpoa/: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/


Processing URLs:  67%|██████▋   | 670/1000 [38:07<15:19,  2.78s/it]

Error extracting text from http://www.rollcall.com/news/politics/poll-pennsylvania-senate-race-virtually-tied-katie-mcginty-pat-toomey: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/poll-pennsylvania-senate-race-virtually-tied-katie-mcginty-pat-toomey


Processing URLs:  67%|██████▋   | 672/1000 [38:09<10:17,  1.88s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-11-15/russian-hackers-aren-t-the-nsa-s-biggest-problem


Processing URLs:  68%|██████▊   | 676/1000 [38:11<05:08,  1.05it/s]

Error extracting text from http://europe.autonews.com/article/20150927/ANE/150929837/bosch-warned-vw-about-illegal-software-use-in-diesel-cars-report-says: 403 Client Error: Forbidden for url: https://europe.autonews.com/article/20150927/ANE/150929837/bosch-warned-vw-about-illegal-software-use-in-diesel-cars-report-says


Processing URLs:  68%|██████▊   | 678/1000 [38:14<05:48,  1.08s/it]

Error extracting text from https://joebiden.com/empowerworkers/: 404 Client Error: Not Found for url: https://joebiden.com/empowerworkers/


Processing URLs:  68%|██████▊   | 682/1000 [38:20<07:56,  1.50s/it]

Error extracting text from http://www.broadbandchoices.co.uk/news/2014/04/line-rental-prices-110414: 403 Client Error: Forbidden for url: https://www.broadbandchoices.co.uk/news/2014/04/line-rental-prices-110414


Processing URLs:  69%|██████▊   | 686/1000 [38:24<05:25,  1.04s/it]

Error extracting text from http://www.nytimes.com/2016/01/08/opinion/a-cultural-revolution-in-malaysia.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/08/opinion/a-cultural-revolution-in-malaysia.html


Processing URLs:  69%|██████▉   | 688/1000 [38:28<07:22,  1.42s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-18/china-s-hong-kong-dilemma


Processing URLs:  69%|██████▉   | 690/1000 [38:29<04:38,  1.11it/s]

Error extracting text from http://www.chicagotribune.com/sns-wp-japan-russia-1e3d151e-b179-11e5-b820-eea4d64be2a1-20160102-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/sns-wp-japan-russia-1e3d151e-b179-11e5-b820-eea4d64be2a1-20160102-story.html


Processing URLs:  69%|██████▉   | 692/1000 [38:32<05:43,  1.11s/it]

Error extracting text from http://apanews.net/en/news/ivory-coast-several-casualties-recorded-in-police-school-attack: 403 Client Error: Forbidden for url: https://apanews.net/en/news/ivory-coast-several-casualties-recorded-in-police-school-attack


Processing URLs:  69%|██████▉   | 693/1000 [38:32<05:09,  1.01s/it]

Error extracting text from https://covid19.govt.nz/covid-19-vaccines/covid-19-vaccine-rollout-groups/: 404 Client Error: Not Found for url: https://covid19.govt.nz/covid-19-vaccines/covid-19-vaccine-rollout-groups/


Processing URLs:  69%|██████▉   | 694/1000 [38:37<10:19,  2.02s/it]

URL filtered: https://twitter.com/ChrisCEOHopson/status/1475540046677790723
URL filtered: https://www.youtube.com/watch?v=mzBBBrKfjIQ


Processing URLs:  70%|██████▉   | 697/1000 [38:38<05:29,  1.09s/it]

URL filtered: https://twitter.com/firefoxx66


Processing URLs:  70%|██████▉   | 699/1000 [38:39<04:09,  1.21it/s]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-OEDTAY6K50XT01-67F2N5LCR7CTCB4H9G8D703SI4


Processing URLs:  70%|███████   | 703/1000 [38:42<04:00,  1.23it/s]

Error extracting text from http://www.latimes.com/politics/la-na-pol-neilsen-dhs-secretary-20171011-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-neilsen-dhs-secretary-20171011-story.html


Processing URLs:  71%|███████   | 709/1000 [38:57<05:11,  1.07s/it]

Error extracting text from http://www.nytimes.com/2016/12/01/world/asia/myanmars-leader-faulted-for-silence-as-army-campaigns-against-rohingya.html?emc=edit_ee_20161202&amp;nl=todaysheadlines-europe&amp;nlid=70183565: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/01/world/asia/myanmars-leader-faulted-for-silence-as-army-campaigns-against-rohingya.html?emc=edit_ee_20161202&amp;nl=todaysheadlines-europe&amp;nlid=70183565


Processing URLs:  71%|███████   | 710/1000 [39:08<19:52,  4.11s/it]

Error extracting text from https://www.washingtonpost.com/business/the-latest-putin-avoids-criticizing-trump-climate-decision/2017/06/02/bf888c4a-4797-11e7-8de1-cec59a9bf4b1_story.html?utm_term=.512b20a62a4e: 404 Client Error: Not Found for url: https://www.washingtonpost.com/business/the-latest-putin-avoids-criticizing-trump-climate-decision/2017/06/02/bf888c4a-4797-11e7-8de1-cec59a9bf4b1_story.html?utm_term=.512b20a62a4e


Processing URLs:  71%|███████   | 711/1000 [39:10<16:09,  3.36s/it]

Error extracting text from http://www.who.int/csr/don/17-january-2017-ah7n9-china/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/17-january-2017-ah7n9-china/en/


Processing URLs:  71%|███████▏  | 714/1000 [39:14<09:32,  2.00s/it]

Error extracting text from http://www.maritime-executive.com/article/iranian-oil-exports-near-four-year-high: 404 Client Error: Not Found for url: https://www.maritime-executive.com/403.shtml


Processing URLs:  72%|███████▏  | 715/1000 [39:16<09:09,  1.93s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/community/organizations/business-employers/bars-restaurants.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/community/organizations/business-employers/bars-restaurants.html


Processing URLs:  72%|███████▏  | 719/1000 [39:22<06:10,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-china-idUSKCN10604M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-china-idUSKCN10604M


Processing URLs:  72%|███████▏  | 723/1000 [39:25<04:05,  1.13it/s]

Error extracting text from http://english.ahram.org.eg/NewsContent/3/12/254759/Business/Economy/Egypts-economic-growth-rate-up-to--pct-in-Q-of--.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/3/12/254759/Business/Economy/Egypts-economic-growth-rate-up-to--pct-in-Q-of--.aspx


Processing URLs:  73%|███████▎  | 726/1000 [39:29<04:43,  1.03s/it]

Error extracting text from https://www.nytimes.com/2017/10/14/us/voting-russians-hacking-states-.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/14/us/voting-russians-hacking-states-.html


Processing URLs:  73%|███████▎  | 729/1000 [39:33<05:27,  1.21s/it]

URL filtered: https://www.youtube.com/watch?v=Eb1hojMXsTI


Processing URLs:  73%|███████▎  | 732/1000 [39:34<03:40,  1.22it/s]

Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:9a09ad26de2844efb517e3ab087899d8: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:9a09ad26de2844efb517e3ab087899d8 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30748fd10>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/noticia/brasil/apos-prisao-de-marqueteiro-oposicao-tenta-turbinar-manifestacoes-pro-impeachment&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/noticia/brasil/apos-prisao-de-marqueteiro-oposicao-tenta-turbinar-manifestacoes-pro-impeachment&amp;prev=search
Error extracting text from http://www.chinapost.com.tw/china/national-news/2016/04/08/462883/China-Politburo

Processing URLs:  74%|███████▎  | 736/1000 [39:36<02:59,  1.47it/s]

Error extracting text from https://www.hotcars.com/kirov-class-battlecruiser-detailed-look/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  74%|███████▎  | 737/1000 [39:37<02:51,  1.53it/s]

URL filtered: https://www.politico.com/story/2017/10/29/facebook-russia-narrative-244285


Processing URLs:  74%|███████▍  | 743/1000 [39:55<07:57,  1.86s/it]

Error extracting text from http://telegraphvoice.com/2015/12/15/volkswagen-blames-small-group-of-employees-for-dieselgate/: 404 Client Error: Not Found for url: http://telegraphvoice.com/2015/12/15/volkswagen-blames-small-group-of-employees-for-dieselgate/


Processing URLs:  74%|███████▍  | 744/1000 [39:55<06:01,  1.41s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13Z0G9?feedName=topNews&amp;feedType=RSS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN13Z0G9?feedName=topNews&amp;feedType=RSS


Processing URLs:  75%|███████▍  | 746/1000 [39:57<05:06,  1.21s/it]

Error extracting text from http://www.straitstimes.com/sport/olympics-should-move-due-to-zika-concerns-say-150-experts: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  75%|███████▍  | 747/1000 [39:59<05:42,  1.35s/it]

URL filtered: https://www.bloomberg.com/opinion/articles/2021-03-19/russia-vs-ukraine-crimea-s-water-crisis-is-an-impossible-problem-for-putin


Processing URLs:  75%|███████▍  | 749/1000 [40:00<03:45,  1.12it/s]

Error extracting text from http://doi.org/10.1586/17474086.1.1.51: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/full/10.1586/17474086.1.1.51


Processing URLs:  75%|███████▌  | 754/1000 [40:09<06:10,  1.50s/it]

Error extracting text from https://www.africanews.com/2021/03/29/communities-host-displaced-people-and-share-farmland-to-fight-hunger-in-cabo-delgado//: 404 Client Error: Not Found for url: https://www.africanews.com/2021/03/29/communities-host-displaced-people-and-share-farmland-to-fight-hunger-in-cabo-delgado//


Processing URLs:  76%|███████▌  | 757/1000 [40:12<04:39,  1.15s/it]

Error extracting text from http://abcnews.go.com/Technology/wireStory/russias-hackers-week-pry-clinton-camp-50933115: 404 Client Error: Not Found for url: https://abcnews.go.com/Technology/wireStory/russias-hackers-week-pry-clinton-camp-50933115


Processing URLs:  76%|███████▌  | 758/1000 [40:12<03:30,  1.15it/s]

Error extracting text from https://www.wsj.com/articles/crude-rally-fizzles-on-concerns-over-u-s-stockpiles-1487767928: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/crude-rally-fizzles-on-concerns-over-u-s-stockpiles-1487767928


Processing URLs:  76%|███████▌  | 759/1000 [40:12<03:04,  1.31it/s]

URL filtered: https://www.msn.com/en-us/money/companies/twitter-has-permanently-banned-the-corporate-mypillow-account-after-founder-mike-lindell-posted-from-it/ar-BB1djZtV


Processing URLs:  76%|███████▋  | 763/1000 [40:19<05:07,  1.30s/it]

URL filtered: https://twitter.com/canadianpolling/status/1438664964177289217


Processing URLs:  76%|███████▋  | 765/1000 [40:20<03:25,  1.15it/s]

Error extracting text from http://globalriskinsights.com/2016/02/beijing-retains-south-china-sea-as-core-interest/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/02/beijing-retains-south-china-sea-as-core-interest/


Processing URLs:  77%|███████▋  | 766/1000 [41:21<58:14, 14.94s/it]

Error extracting text from https://sports.ladbrokes.com/event/politics/uk/uk-politics/boris-johnson-specials/228803216/all-markets: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /event/politics/uk/uk-politics/boris-johnson-specials/228803216/all-markets (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303b876e0>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:  77%|███████▋  | 769/1000 [41:28<27:27,  7.13s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-lenders-idUSKCN0VI0UH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-lenders-idUSKCN0VI0UH


Processing URLs:  77%|███████▋  | 773/1000 [41:30<08:42,  2.30s/it]

Error extracting text from http://www.heraldnet.com/article/20160110/OPINION/160119952: 403 Client Error: Forbidden for url: http://www.heraldnet.com/article/20160110/OPINION/160119952


Processing URLs:  78%|███████▊  | 776/1000 [41:43<12:51,  3.44s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0042682205005830: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0042682205005830


Processing URLs:  79%|███████▉  | 791/1000 [42:10<07:54,  2.27s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1392AF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKBN1392AF


Processing URLs:  79%|███████▉  | 793/1000 [42:10<04:40,  1.36s/it]

Error extracting text from http://www.humanosphere.org/world-politics/2016/09/hope-that-a-woman-will-be-the-next-u-n-secretary-general-is-fading/: 404 Client Error: Not Found for url: http://www.humanosphere.org/world-politics/2016/09/hope-that-a-woman-will-be-the-next-u-n-secretary-general-is-fading/


Processing URLs:  80%|███████▉  | 798/1000 [42:15<03:51,  1.15s/it]

Error extracting text from http://m.fredericksburg.com/news/news-wire/turkish-foreign-minister-pays-surprise-visit-to-iran/article_b4ab368b-92ff-5e16-bcfe-1394c5805ec8.html?mode=jqm: 404 Client Error: 404 Not Found for url: http://m.fredericksburg.com/news/news-wire/turkish-foreign-minister-pays-surprise-visit-to-iran/article_b4ab368b-92ff-5e16-bcfe-1394c5805ec8.html?mode=jqm


Processing URLs:  80%|████████  | 800/1000 [42:17<03:47,  1.14s/it]

Error extracting text from https://esa.un.org/unpd/wpp/: HTTPSConnectionPool(host='esa.un.org', port=443): Max retries exceeded with url: /unpd/wpp/ (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  80%|████████  | 802/1000 [42:21<05:16,  1.60s/it]

URL filtered: https://twitter.com/GJ_Analytics


Processing URLs:  81%|████████  | 807/1000 [42:24<02:45,  1.17it/s]

Error extracting text from https://www.bbc.co.uk/news/av/world-europe-55840961&amp;gt: 404 Client Error: Not Found for url: https://www.bbc.co.uk/news/av/world-europe-55840961&amp;gt


Processing URLs:  81%|████████  | 809/1000 [42:29<04:10,  1.31s/it]

Error extracting text from http://www.everyonecounts.com/introduction-to-elect/: 404 Client Error: Not Found for url: http://www.everyonecounts.com/introduction-to-elect/


Processing URLs:  81%|████████  | 812/1000 [42:31<03:05,  1.01it/s]

Error extracting text from https://www.reuters.com/world/india/iran-china-russia-hold-naval-drills-north-indian-ocean-2022-01-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/india/iran-china-russia-hold-naval-drills-north-indian-ocean-2022-01-21/


Processing URLs:  81%|████████▏ | 814/1000 [42:32<02:02,  1.52it/s]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2020/10/15/remarks-by-president-charles-michel-following-the-first-working-session-of-the-european-council/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2020/10/15/remarks-by-president-charles-michel-following-the-first-working-session-of-the-european-council/


Processing URLs:  82%|████████▏ | 820/1000 [42:41<04:49,  1.61s/it]

Error extracting text from http://www.aijac.org.au/news/article/khamanei-tries-to-re-write-nuclear-deal: 403 Client Error: Forbidden for url: https://www.aijac.org.au/news/article/khamanei-tries-to-re-write-nuclear-deal


Processing URLs:  82%|████████▏ | 823/1000 [42:43<03:15,  1.11s/it]

Error extracting text from http://warontherocks.com/2016/10/emails-and-influence-investigating-russias-attack-on-the-u-s-political-system/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/10/emails-and-influence-investigating-russias-attack-on-the-u-s-political-system/


Processing URLs:  83%|████████▎ | 826/1000 [42:54<07:07,  2.46s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0XZ0NI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0XZ0NI


Processing URLs:  83%|████████▎ | 829/1000 [42:57<04:02,  1.42s/it]

Error extracting text from https://medium.com/@elias.brockman/will-the-s-p-500-fall-by-more-than-20-in-2017-5005e444bd59: 403 Client Error: Forbidden for url: https://medium.com/@elias.brockman/will-the-s-p-500-fall-by-more-than-20-in-2017-5005e444bd59


Processing URLs:  83%|████████▎ | 830/1000 [42:58<03:26,  1.22s/it]

Error extracting text from https://www.nytimes.com/2017/04/01/world/europe/brexit-scotland-independence-vote.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/01/world/europe/brexit-scotland-independence-vote.html?_r=0


Processing URLs:  83%|████████▎ | 831/1000 [43:01<04:56,  1.75s/it]

Error extracting text from http://cherna.gora.me/news/mvpei-members-will-ratify-the-protocol-by-autumn/: 404 Client Error: Not Found for url: http://cherna.gora.me/news/mvpei-members-will-ratify-the-protocol-by-autumn/


Processing URLs:  83%|████████▎ | 833/1000 [43:08<08:22,  3.01s/it]

Error extracting text from http://www.harvard-dc.org/article.html?aid=1171: 404 Client Error: Not Found for url: https://harvard-dc.org/article.html?aid=1171
URL filtered: https://www.bloomberg.com/news/articles/2017-04-19/iran-may-keep-same-oil-output-if-others-extend-cuts-kuwait-says


Processing URLs:  84%|████████▍ | 838/1000 [43:14<04:20,  1.61s/it]

Error extracting text from http://www.reuters.com/article/us-pakistan-blast-idUSKCN10J0I7?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-pakistan-blast-idUSKCN10J0I7?il=0


Processing URLs:  84%|████████▍ | 842/1000 [43:16<02:25,  1.08it/s]

Error extracting text from http://greece.greekreporter.com/2015/12/07/greece-at-risk-of-colliding-with-creditors-wants-imf-out-of-bailout-program/: 403 Client Error: Forbidden for url: https://greece.greekreporter.com/2015/12/07/greece-at-risk-of-colliding-with-creditors-wants-imf-out-of-bailout-program/


Processing URLs:  84%|████████▍ | 844/1000 [43:17<01:50,  1.41it/s]

Error extracting text from https://www.bls.gov/: 403 Client Error: Forbidden for url: https://www.bls.gov/


Processing URLs:  84%|████████▍ | 845/1000 [43:18<01:49,  1.41it/s]

URL filtered: https://twitter.com/the_47th/status/826086141438865409


Processing URLs:  85%|████████▍ | 848/1000 [43:24<03:43,  1.47s/it]

Error extracting text from http://www.edgar.gov: HTTPConnectionPool(host='www.edgar.gov', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3022d6f90>: Failed to resolve 'www.edgar.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  85%|████████▌ | 851/1000 [43:28<03:22,  1.36s/it]

Error extracting text from http://www.kba.de/SharedDocs/Publikationen/DE/Statistik/Fahrzeuge/FZ/2016_monatlich/FZ11/fz11_2016_04_pdf.pdf?__blob=publicationFile&amp;v=2: 404 Client Error: Not Found for url: https://www.kba.de/SharedDocs/Publikationen/DE/Statistik/Fahrzeuge/FZ/2016_monatlich/FZ11/fz11_2016_04_pdf.pdf?__blob=publicationFile&amp;v=2


Processing URLs:  85%|████████▌ | 853/1000 [43:29<02:49,  1.16s/it]

Error extracting text from http://nationalinterest.org/feature/america-britain-three-priorities-boris-johnson-john-kerry-17268: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/america-britain-three-priorities-boris-johnson-john-kerry-17268


Processing URLs:  86%|████████▌ | 856/1000 [43:32<02:16,  1.05it/s]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x727007/230F2B72-823C-49E4-99EB-41E2DC490E47/Q4_13_Shareholder_Letter.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x727007/230F2B72-823C-49E4-99EB-41E2DC490E47/Q4_13_Shareholder_Letter.pdf


Processing URLs:  86%|████████▌ | 857/1000 [43:35<03:16,  1.38s/it]

Error extracting text from https://armscontrolcenter.org/congress-does-not-need-to-review-the-iran-nuclear-deal-again/: 403 Client Error: Forbidden for url: https://armscontrolcenter.org/congress-does-not-need-to-review-the-iran-nuclear-deal-again/


Processing URLs:  86%|████████▌ | 859/1000 [43:36<02:20,  1.00it/s]

Error extracting text from http://uk.mobile.reuters.com/article/idUKKCN0XW1F9?irpc=932: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUKKCN0XW1F9?irpc=932
Error extracting text from http://english.aawsat.com/adel-al-salmi/world-news/threat-war-stokes-conflict-iran: 403 Client Error: Forbidden for url: http://english.aawsat.com/adel-al-salmi/world-news/threat-war-stokes-conflict-iran


Processing URLs:  86%|████████▌ | 861/1000 [43:38<02:20,  1.01s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/348054-russias-propaganda-machine-amplifies-alt-right: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/348054-russias-propaganda-machine-amplifies-alt-right/


Processing URLs:  86%|████████▌ | 862/1000 [43:47<07:52,  3.42s/it]

URL filtered: https://twitter.com/realDonaldTrump/


Processing URLs:  86%|████████▋ | 864/1000 [44:00<10:38,  4.69s/it]

URL filtered: https://www.washingtonpost.com/news/the-switch/wp/2017/01/11/facebook-is-starting-its-own-journalism-project/?utm_term=.bbdbf482cfcb


Processing URLs:  87%|████████▋ | 867/1000 [44:02<05:31,  2.49s/it]

Error extracting text from http://news.yahoo.com/brazils-supreme-court-suspends-rousseff-impeachment-103107077.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazils-supreme-court-suspends-rousseff-impeachment-103107077.html


Processing URLs:  87%|████████▋ | 872/1000 [44:11<04:16,  2.00s/it]

Error extracting text from http://www.wsj.com/articles/kingdom-comedown-falling-oil-prices-shock-saudi-middle-class-1474623003: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/kingdom-comedown-falling-oil-prices-shock-saudi-middle-class-1474623003


Processing URLs:  88%|████████▊ | 875/1000 [44:15<03:13,  1.55s/it]

Error extracting text from https://www.gov.ie/en/press-release/28e8c1-government-approves-phasing-out-of-fur-farming/?referrer=http://www.agriculture.gov.ie/press/pressreleases/2019/june/title,128816,en.html: 405 Client Error: Not Allowed for url: https://www.gov.ie/en/press-release/28e8c1-government-approves-phasing-out-of-fur-farming/?referrer=http://www.agriculture.gov.ie/press/pressreleases/2019/june/title,128816,en.html


Processing URLs:  88%|████████▊ | 876/1000 [44:15<02:24,  1.17s/it]

Error extracting text from https://www.nytimes.com/2017/04/05/us/politics/filibuster-gorsuch-nomination-republicans.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/05/us/politics/filibuster-gorsuch-nomination-republicans.html?_r=1


Processing URLs:  88%|████████▊ | 879/1000 [44:16<01:04,  1.89it/s]

URL filtered: https://www.youtube.com/watch?feature=player_embedded&amp;v=u8vDElXEMWw
Error extracting text from https://postimg.org/image/favvr1srl/: HTTPSConnectionPool(host='postimg.org', port=443): Max retries exceeded with url: /image/favvr1srl/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x304fcea20>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  88%|████████▊ | 880/1000 [44:17<01:15,  1.60it/s]

Error extracting text from http://www.un.org/press/en/2016/sc12307.doc.htm: 403 Client Error: Forbidden for url: https://www.un.org/press/en/2016/sc12307.doc.htm


Processing URLs:  88%|████████▊ | 881/1000 [44:18<01:36,  1.24it/s]

Error extracting text from http://www.newsweek.com/putin-asked-assad-step-down-syrian-president-418537: 403 Client Error: Forbidden for url: https://www.newsweek.com/putin-asked-assad-step-down-syrian-president-418537
URL filtered: https://m.youtube.com/watch?v=MsbEU7TFBdk


Processing URLs:  88%|████████▊ | 883/1000 [44:19<01:20,  1.45it/s]

Error extracting text from http://www.connectionivoirienne.net/116029/cote-divoire-un-mort-et-trois-blesses-dans-une-manifestation-des-agents-du-complexe-sucrier-de-ferkessedougou: HTTPSConnectionPool(host='www.connectionivoirienne.net', port=443): Max retries exceeded with url: /116029/cote-divoire-un-mort-et-trois-blesses-dans-une-manifestation-des-agents-du-complexe-sucrier-de-ferkessedougou (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  89%|████████▉ | 890/1000 [44:38<03:24,  1.86s/it]

Error extracting text from http://www.reuters.com/article/2015/09/20/us-eurozone-greece-election-idUSKCN0RJ0US20150920: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/20/us-eurozone-greece-election-idUSKCN0RJ0US20150920
URL filtered: https://www.youtube.com/watch?v=phr1pOFK1V8


Processing URLs:  89%|████████▉ | 892/1000 [44:40<02:43,  1.51s/it]

Error extracting text from http://www.sfchronicle.com/news/world/article/Saudi-Arabia-hosting-Syrian-opposition-ahead-of-6674817.php: 404 Client Error: Not Found for url: http://www.sfchronicle.com/news/world/article/Saudi-Arabia-hosting-Syrian-opposition-ahead-of-6674817.php


Processing URLs:  89%|████████▉ | 894/1000 [44:44<02:56,  1.67s/it]

Error extracting text from http://en.parliran.ir: HTTPConnectionPool(host='en.parliran.ir', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fece3d70>: Failed to resolve 'en.parliran.ir' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|████████▉ | 898/1000 [44:52<02:36,  1.53s/it]

Error extracting text from http://ritholtz.com/2016/12/annual-predictions-2017-edition/: 403 Client Error: Forbidden for url: https://ritholtz.com/2016/12/annual-predictions-2017-edition/


Processing URLs:  90%|█████████ | 900/1000 [44:55<02:47,  1.67s/it]

Error extracting text from http://www.wsj.com/articles/u-s-state-department-officials-call-for-strikes-against-syrias-assad-1466121933: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-state-department-officials-call-for-strikes-against-syrias-assad-1466121933


Processing URLs:  90%|█████████ | 902/1000 [45:00<03:16,  2.01s/it]

Error extracting text from http://www.aae.wisc.edu/coxhead/papers/GDN/GDN-8.pdf: 404 Client Error: Not Found for url: https://aae.wisc.edu/coxhead/papers/GDN/GDN-8.pdf


Processing URLs:  91%|█████████ | 906/1000 [45:04<02:14,  1.43s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-09-18/nigeria-lawmakers-to-probe-biafra-separatist-crisis-saraki-says


Processing URLs:  91%|█████████ | 908/1000 [45:05<01:17,  1.19it/s]

Error extracting text from http://www.wsj.com/articles/u-s-bid-to-prosecute-bp-staff-in-gulf-oil-spill-falls-flat-1456532116: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-bid-to-prosecute-bp-staff-in-gulf-oil-spill-falls-flat-1456532116


Processing URLs:  91%|█████████ | 911/1000 [45:06<00:51,  1.74it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKBN0UI20520160105: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKBN0UI20520160105
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NXCUO46JIJV901-1OP9T9RTRF73B3CF12EGG79TCP
Error extracting text from http://www.nytimes.com/2015/10/13/world/asia/us-asia-south-china-sea-patrols.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/13/world/asia/us-asia-south-china-sea-patrols.html


Processing URLs:  92%|█████████▏| 919/1000 [45:21<02:32,  1.89s/it]

URL filtered: https://www.youtube.com/watch?v=GfOjxVMDpoo
Error extracting text from http://www.reuters.com/article/us-usa-mexico-trade-idUSKBN16H27V?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-mexico-trade-idUSKBN16H27V?il=0


Processing URLs:  92%|█████████▏| 922/1000 [45:30<03:23,  2.61s/it]

Error extracting text from https://www.iarpa.gov/challenges/gfchallenge.html: 404 Client Error: Not Found for url: https://www.iarpa.gov/challenges/gfchallenge.html


Processing URLs:  92%|█████████▏| 923/1000 [45:31<02:29,  1.95s/it]

Error extracting text from http://english.alarabiya.net/en/perspective/analysis/2016/09/04/Tapping-the-potential-of-Gulf-China-partnership.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/perspective/analysis/2016/09/04/Tapping-the-potential-of-Gulf-China-partnership.html


Processing URLs:  92%|█████████▏| 924/1000 [45:31<01:58,  1.56s/it]

Error extracting text from https://www.barchart.com/futures/quotes/CBZ21/volatility-greeks/dec-21: 403 Client Error: Forbidden for url: https://www.barchart.com/futures/quotes/CBZ21/volatility-greeks/dec-21


Processing URLs:  93%|█████████▎| 926/1000 [45:34<01:55,  1.56s/it]

Error extracting text from http://yosemite.epa.gov/opa/admpress.nsf/21b8983ffa5d0e4685257dd4006b85e2/dfc8e33b5ab162b985257ec40057813b!OpenDocument: HTTPSConnectionPool(host='yosemite.epa.gov', port=443): Max retries exceeded with url: /opa/admpress.nsf/21b8983ffa5d0e4685257dd4006b85e2/dfc8e33b5ab162b985257ec40057813b!OpenDocument (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  93%|█████████▎| 927/1000 [45:36<01:48,  1.48s/it]

URL filtered: https://twitter.com/kanyewest/status/1279575273365594112


Processing URLs:  93%|█████████▎| 930/1000 [45:38<01:14,  1.07s/it]

URL filtered: https://www.youtube.com/embed/TlqKFlU7YAs&amp;quot
Error extracting text from https://www.opais.co.mz/forcas-conjuntas-recuperam-mocimboa-da-praia-em-cabo-delgado/: HTTPSConnectionPool(host='www.opais.co.mz', port=443): Max retries exceeded with url: /forcas-conjuntas-recuperam-mocimboa-da-praia-em-cabo-delgado/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3050d7470>: Failed to resolve 'www.opais.co.mz' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  94%|█████████▎| 935/1000 [45:42<01:10,  1.08s/it]

Error extracting text from http://mobile.nytimes.com/2015/04/24/us/cash-flowed-to-clinton-foundation-as-russians-pressed-for-control-of-uranium-company.html?_r=1&amp;referer=: 403 Client Error: Forbidden for url: https://www.nytimes.com/2015/04/24/us/cash-flowed-to-clinton-foundation-as-russians-pressed-for-control-of-uranium-company.html?_r=1&amp;referer=


Processing URLs:  94%|█████████▎| 937/1000 [45:48<02:03,  1.97s/it]

Error extracting text from http://www.parl.gc.ca/LegisInfo/Home.aspx?Language=E&amp;BillType=House+Government+Bill&amp;ParliamentSession=42-1&amp;Page=1: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  94%|█████████▍| 938/1000 [45:48<01:38,  1.59s/it]

Error extracting text from http://thehill.com/homenews/house/258431-house-approves-budget-deal: 403 Client Error: Forbidden for url: https://thehill.com/homenews/house/258431-house-approves-budget-deal/


Processing URLs:  94%|█████████▍| 940/1000 [45:49<01:00,  1.00s/it]

Error extracting text from http://www.nytimes.com/2015/12/31/upshot/donald-trumps-strongest-supporters-a-certain-kind-of-democrat.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/31/upshot/donald-trumps-strongest-supporters-a-certain-kind-of-democrat.html?_r=0


Processing URLs:  94%|█████████▍| 942/1000 [45:55<01:45,  1.82s/it]

Error extracting text from http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/s_pv_7739.pdf: 403 Client Error: Forbidden for url: http://www.securitycouncilreport.org/atf/cf/%7B65BFCF9B-6D27-4E9C-8CD3-CF6E4FF96FF9%7D/s_pv_7739.pdf


Processing URLs:  95%|█████████▍| 946/1000 [45:59<00:58,  1.09s/it]

Error extracting text from https://addons.mozilla.org/de/firefox/addon/refcontrol/versions/: 404 Client Error: Not Found for url: https://addons.mozilla.org/de/firefox/addon/refcontrol/versions/


Processing URLs:  95%|█████████▌| 951/1000 [46:08<01:33,  1.92s/it]

Error extracting text from https://www.reuters.com/business/energy/russias-gazprom-says-it-has-completed-nord-stream-2-construction-2021-09-10/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/russias-gazprom-says-it-has-completed-nord-stream-2-construction-2021-09-10/


Processing URLs:  96%|█████████▌| 957/1000 [46:22<01:02,  1.45s/it]

Error extracting text from http://onlinelibrary.wiley.com/doi/10.1002/2015JD023929/full: 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/10.1002/2015JD023929/full


Processing URLs:  96%|█████████▌| 959/1000 [46:23<00:45,  1.12s/it]

Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN163059: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN163059


Processing URLs:  96%|█████████▌| 960/1000 [46:24<00:35,  1.13it/s]

Error extracting text from https://meaningofstrife.files.wordpress.com/2012/06/buddha-happiness.jpg?w=630&amp;h=451: 404 Client Error: Not Found for url: https://meaningofstrife.files.wordpress.com/2012/06/buddha-happiness.jpg?w=630&amp;h=451


Processing URLs:  96%|█████████▌| 961/1000 [46:24<00:31,  1.26it/s]

Error extracting text from http://on.wsj.com/1YTrbd5: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nfl-union-closer-to-deal-stripping-roger-goodell-of-discipline-power-1458001524?mod=wsj_nview_latest


Processing URLs:  97%|█████████▋| 974/1000 [46:46<00:41,  1.59s/it]

Error extracting text from https://cleantechnica.com/2016/11/21/china-electric-car-sales-baic-shines/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/11/21/china-electric-car-sales-baic-shines/


Processing URLs:  98%|█████████▊| 977/1000 [46:55<00:44,  1.96s/it]

Error extracting text from http://www.shape.nato.int/: 403 Client Error: Forbidden for url: http://www.shape.nato.int/


Processing URLs:  98%|█████████▊| 979/1000 [46:55<00:21,  1.03s/it]

Error extracting text from http://www.wsj.com/articles/odebrecht-ex-ceo-sentenced-to-19-years-in-prison-1457449835: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/odebrecht-ex-ceo-sentenced-to-19-years-in-prison-1457449835
Error extracting text from http://www.reuters.com/article/us-yemen-security-usa-emirates-idUSKBN1AJ2UW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-usa-emirates-idUSKBN1AJ2UW


Processing URLs:  98%|█████████▊| 981/1000 [46:55<00:11,  1.68it/s]

Error extracting text from https://www.fxempire.com/forecasts/article/crude-oil-price-forecast-crude-oil-gives-up-early-gains-on-tuesday-749836: 403 Client Error: Forbidden for url: https://www.fxempire.com/forecasts/article/crude-oil-price-forecast-crude-oil-gives-up-early-gains-on-tuesday-749836
Error extracting text from http://www.nytimes.com/2016/06/01/world/middleeast/iran-ali-larijani.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/01/world/middleeast/iran-ali-larijani.html


Processing URLs:  98%|█████████▊| 982/1000 [46:55<00:08,  2.17it/s]

Error extracting text from http://www.reuters.com/article/2015/11/10/usa-money-idUSL1N13510620151110#PEtTJxWDjvcE6wH1.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/10/usa-money-idUSL1N13510620151110#PEtTJxWDjvcE6wH1.97


Processing URLs:  98%|█████████▊| 985/1000 [47:00<00:20,  1.36s/it]

Error extracting text from http://www.gov.me/en/News/158507/Peter-Pavel-Chairman-of-the-NATO-Military-Committee-Montenegro-to-be-valuable-member-of-NATO.html: 404 Client Error: not found for url: https://www.gov.me/en/News/158507/Peter-Pavel-Chairman-of-the-NATO-Military-Committee-Montenegro-to-be-valuable-member-of-NATO.html


Processing URLs:  99%|█████████▊| 987/1000 [47:01<00:12,  1.02it/s]

Error extracting text from http://venturebeat.com/2016/01/26/apple-sees-iphone-sales-grow-just-0-4-to-74-8-million-in-q1-2016-ipad-sales-decline-25/: 403 Client Error: Forbidden for url: https://venturebeat.com/2016/01/26/apple-sees-iphone-sales-grow-just-0-4-to-74-8-million-in-q1-2016-ipad-sales-decline-25/


Processing URLs:  99%|█████████▉| 989/1000 [47:03<00:09,  1.11it/s]

Error extracting text from https://thehill.com/policy/international/587295-thousands-of-russian-troops-withdrawing-from-ukraine-border-report: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/587295-thousands-of-russian-troops-withdrawing-from-ukraine-border-report/


Processing URLs:  99%|█████████▉| 990/1000 [47:07<00:18,  1.85s/it]

Error extracting text from http://www.psaresearch.com/images/TOPMAGAZINES.pdf: 404 Client Error: Not Found for url: https://www.psaresearch.com/images/TOPMAGAZINES.pdf


Processing URLs:  99%|█████████▉| 994/1000 [47:20<00:16,  2.67s/it]

Error extracting text from http://news.investors.com/technology-click/122115-786286-apple-march-quarter-could-be-trough-in-iphone-sales.htm: 403 Client Error: Forbidden for url: https://news.investors.com/technology-click/122115-786286-apple-march-quarter-could-be-trough-in-iphone-sales.htm


Processing URLs: 100%|██████████| 1000/1000 [47:30<00:00,  2.85s/it]
Processing URLs:   0%|          | 4/1000 [00:03<10:51,  1.53it/s]

Error extracting text from http://www.reuters.com/article/us-eurozone-greece-review-idUSKCN0X41R6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-greece-review-idUSKCN0X41R6


Processing URLs:   0%|          | 5/1000 [00:03<07:52,  2.11it/s]

Error extracting text from https://eurovision.tv/event/turin-2022/participants: 403 Client Error: Forbidden for url: https://eurovision.tv/event/turin-2022/participants


Processing URLs:   1%|          | 9/1000 [00:08<12:50,  1.29it/s]

Error extracting text from http://www.nigeriatoday.ng/2016/08/max-du-preez-thinks-weve-just-seen-a-massive-political-shift/: HTTPConnectionPool(host='www.nigeriatoday.ng', port=80): Max retries exceeded with url: /2016/08/max-du-preez-thinks-weve-just-seen-a-massive-political-shift/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023def60>: Failed to resolve 'www.nigeriatoday.ng' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|          | 10/1000 [00:08<10:08,  1.63it/s]

Error extracting text from https://www.nytimes.com/2017/03/22/us/politics/health-care-vote-republicans.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/22/us/politics/health-care-vote-republicans.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:   1%|▏         | 13/1000 [00:11<13:22,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-epa-idUSKBN1640S9: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-epa-idUSKBN1640S9


Processing URLs:   2%|▏         | 18/1000 [00:15<09:15,  1.77it/s]

Error extracting text from http://www.wsj.com/articles/gulf-cooperation-council-labels-hezbollah-a-terrorist-group-1456926654: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gulf-cooperation-council-labels-hezbollah-a-terrorist-group-1456926654
Error extracting text from http://www.reuters.com/article/us-usa-trump-budget-idUSKBN16M1DO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-budget-idUSKBN16M1DO


Processing URLs:   2%|▏         | 19/1000 [00:15<07:37,  2.15it/s]

Error extracting text from http://news.yahoo.com/brazil-police-search-ex-presidents-home-corruption-probe-105112080.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazil-police-search-ex-presidents-home-corruption-probe-105112080.html


Processing URLs:   2%|▎         | 25/1000 [00:38<45:26,  2.80s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-eu-money/cracking-deadlock-on-brexit-bill-may-require-eu-summit-talks-idUSKCN1BC595: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-money/cracking-deadlock-on-brexit-bill-may-require-eu-summit-talks-idUSKCN1BC595


Processing URLs:   3%|▎         | 27/1000 [01:14<2:37:29,  9.71s/it]

Error extracting text from http://thehill.com/blogs/floor-action/senate/267382-senate-turning-to-north-korea-sanctions-next-month: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/267382-senate-turning-to-north-korea-sanctions-next-month/


Processing URLs:   3%|▎         | 32/1000 [01:21<47:13,  2.93s/it]  

Error extracting text from http://www.mediate.com/articles/noll1.cfm: 403 Client Error: Forbidden for url: http://www.mediate.com/articles/noll1.cfm


Processing URLs:   4%|▎         | 35/1000 [01:26<31:15,  1.94s/it]

Error extracting text from http://news.sky.com/story/1613086/ramadi-totally-encircled-by-iraqi-forces: 404 Client Error: Not Found for url: https://news.sky.com/story/1613086/ramadi-totally-encircled-by-iraqi-forces


Processing URLs:   4%|▎         | 37/1000 [02:27<5:06:37, 19.10s/it]

Error extracting text from https://www.economist.com/the-world-ahead/2020/11/16/governments-must-judge-if-the-economic-recovery-needs-more-help: HTTPSConnectionPool(host='www.economist.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   4%|▍         | 40/1000 [02:30<1:52:15,  7.02s/it]

Error extracting text from https://balkaneu.com/north-macedonia-the-first-phase-of-the-census-begins-next-week/: 404 Client Error: Not Found for url: https://balkaneu.com/north-macedonia-the-first-phase-of-the-census-begins-next-week/
Error extracting text from http://www.reuters.com/article/us-iran-europe-rouhani-idUSKCN0V20ON: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-europe-rouhani-idUSKCN0V20ON


Processing URLs:   4%|▍         | 41/1000 [02:30<1:19:26,  4.97s/it]

Error extracting text from https://www.dhs.gov/news/2015/02/17/statement-secretary-jeh-c-johnson-concerning-district-courts-ruling-concerning-dapa: 403 Client Error: Forbidden for url: https://www.dhs.gov/news/2015/02/17/statement-secretary-jeh-c-johnson-concerning-district-courts-ruling-concerning-dapa


Processing URLs:   5%|▌         | 51/1000 [02:54<22:38,  1.43s/it]  

Error extracting text from http://www.reuters.com/article/us-saudi-oil-idUSKCN0XZ0EE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-idUSKCN0XZ0EE


Processing URLs:   5%|▌         | 54/1000 [02:57<15:24,  1.02it/s]

Error extracting text from http://www.basnews.com/index.php/en/news/kurdistan/269177: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/kurdistan/269177


Processing URLs:   6%|▌         | 56/1000 [02:58<13:20,  1.18it/s]

Error extracting text from http://www.data.consilium.europa.eu/doc/document/ST-14359-2020-INIT/en/pdf: HTTPConnectionPool(host='www.data.consilium.europa.eu', port=80): Max retries exceeded with url: /doc/document/ST-14359-2020-INIT/en/pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304b08320>: Failed to resolve 'www.data.consilium.europa.eu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   6%|▌         | 58/1000 [02:59<10:14,  1.53it/s]

Error extracting text from https://www.middleeastmonitor.com/20191008-yemen-imminent-agreement-to-hand-over-aden-to-saudi-arabia/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20191008-yemen-imminent-agreement-to-hand-over-aden-to-saudi-arabia/


Processing URLs:   6%|▌         | 61/1000 [03:03<16:40,  1.07s/it]

Error extracting text from http://uk.reuters.com/article/saudi-aramco-ipo-regulations-idUKL5N1E12VM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   6%|▋         | 63/1000 [03:05<14:27,  1.08it/s]

Error extracting text from http://news.yahoo.com/saudi-executes-47-including-top-shiite-cleric-075812543.html: 404 Client Error: Not Found for url: http://news.yahoo.com/saudi-executes-47-including-top-shiite-cleric-075812543.html


Processing URLs:   6%|▋         | 65/1000 [03:07<16:19,  1.05s/it]

Error extracting text from http://www.ibtimes.com/dilma-rousseff-approval-ratings-poll-shows-brazilian-president-tax-hike-unpopular-2158668: 403 Client Error: Forbidden for url: https://www.ibtimes.com/dilma-rousseff-approval-ratings-poll-shows-brazilian-president-tax-hike-unpopular-2158668


Processing URLs:   7%|▋         | 71/1000 [03:11<09:28,  1.63it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-22/tesla-says-model-3-on-track-as-quarterly-loss-beats-estimates
Error extracting text from http://www.reuters.com/article/us-netherlands-election-turkey-idUSKBN16F1CN?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-netherlands-election-turkey-idUSKBN16F1CN?il=0


Processing URLs:   7%|▋         | 73/1000 [03:12<07:28,  2.06it/s]

URL filtered: https://www.foreignaffairs.com/articles/turkey/2017-08-02/turkey-and-eus-diplomatic-stalemate?cid=%3Fcid%3Dsoc-twitter-eu_turkey_diplomatic_stalemate_paywall_free-080217


Processing URLs:   8%|▊         | 85/1000 [03:53<42:47,  2.81s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2017-08-09/u-s-imposes-financial-sanctions-on-eight-more-venezuelans


Processing URLs:   9%|▉         | 90/1000 [03:59<25:58,  1.71s/it]

Error extracting text from http://www.takepart.com/article/2016/04/14/big-soda-wins-california-soda-tax-dies-legislature?cmpid=tp-ptnr-huffpost&amp;utm_source=huffpost&amp;utm_medium=partner&amp;utm_campaign=tp-traffic: 404 Client Error: Not Found for url: https://participant.com/article/2016/04/14/big-soda-wins-california-soda-tax-dies-legislature?cmpid=tp-ptnr-huffpost&amp;utm_source=huffpost&amp;utm_medium=partner&amp;utm_campaign=tp-traffic


Processing URLs:   9%|▉         | 91/1000 [03:59<20:15,  1.34s/it]

Error extracting text from http://www.nationmultimedia.com/business/Thailand-India-to-step-up-bid-for-FTA-30286167.html: 404 Client Error: Not Found for url: https://www.nationmultimedia.com/business/Thailand-India-to-step-up-bid-for-FTA-30286167.html


Processing URLs:  10%|▉         | 97/1000 [04:33<2:01:22,  8.07s/it]

Error extracting text from http://www.boydsteamship.com/booking.php: HTTPSConnectionPool(host='boydsteamship.com', port=443): Max retries exceeded with url: /web/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  10%|█         | 100/1000 [05:37<5:21:55, 21.46s/it]

Error extracting text from http://www.sacbee.com/news/state/california/water-and-drought/article56090190.html: HTTPConnectionPool(host='www.sacbee.com', port=80): Read timed out. (read timeout=60)
URL filtered: https://twitter.com/svrf/status/879712735759659009/video/1


Processing URLs:  10%|█         | 103/1000 [05:50<2:47:06, 11.18s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/02/21/736105/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/02/21/736105/story.html


Processing URLs:  11%|█         | 106/1000 [05:54<1:13:52,  4.96s/it]

Error extracting text from http://tass.ru/en/world/847791: 404 Client Error: Not Found for url: https://tass.ru/en/world/847791
Error extracting text from http://www.reuters.com/article/us-france-politics-column-idUSKCN10P0FY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-politics-column-idUSKCN10P0FY


Processing URLs:  11%|█         | 107/1000 [05:56<1:01:44,  4.15s/it]

Error extracting text from https://www.reuters.com/article/us-houston-dolls/houston-officials-block-brothel-from-featuring-sex-dolls-idUSKCN1MD2GQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-houston-dolls/houston-officials-block-brothel-from-featuring-sex-dolls-idUSKCN1MD2GQ


Processing URLs:  11%|█         | 112/1000 [06:11<53:17,  3.60s/it]  

Error extracting text from http://www.thenation.com/article/how-private-contractors-have-created-shadow-nsa/: 404 Client Error: Not Found for url: https://www.thenation.com/article/how-private-contractors-have-created-shadow-nsa/


Processing URLs:  11%|█▏        | 114/1000 [06:14<35:35,  2.41s/it]

Error extracting text from https://www.neweurope.eu/article/eu-commission-excludes-grexit-awaits-athens-prior-actions-votes/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/article/eu-commission-excludes-grexit-awaits-athens-prior-actions-votes/


Processing URLs:  12%|█▏        | 115/1000 [06:14<26:26,  1.79s/it]

Error extracting text from http://www.wsj.com/articles/kuwait-names-finance-minister-anas-al-saleh-as-new-oil-minister-1448814869: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/kuwait-names-finance-minister-anas-al-saleh-as-new-oil-minister-1448814869


Processing URLs:  12%|█▏        | 117/1000 [06:16<20:17,  1.38s/it]

Error extracting text from https://www.rbi.com/Cache/IRCache/33ba8475-a7df-ed29-3361-27ba8f6bb380.PDF?O=PDF&T=&Y=&D=&FID=33ba8475-a7df-ed29-3361-27ba8f6bb380&iid=4591210: 403 Client Error: Forbidden for url: https://www.rbi.com/Cache/IRCache/33ba8475-a7df-ed29-3361-27ba8f6bb380.PDF?O=PDF&T=&Y=&D=&FID=33ba8475-a7df-ed29-3361-27ba8f6bb380&iid=4591210


Processing URLs:  12%|█▏        | 120/1000 [06:20<21:06,  1.44s/it]

Error extracting text from http://berkshirehathaway.com/reports.html: 403 Client Error: Forbidden for url: https://berkshirehathaway.com/reports.html


Processing URLs:  12%|█▏        | 121/1000 [06:22<22:13,  1.52s/it]

Error extracting text from http://en.abna24.com/service/africa/archive/2016/03/12/740446/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/africa/archive/2016/03/12/740446/story.html


Processing URLs:  13%|█▎        | 127/1000 [06:31<19:36,  1.35s/it]

Error extracting text from https://beta.ctvnews.ca/national/world/2021/12/28/1_5720949.html).: 404 Client Error: Not Found for url: https://beta.ctvnews.ca/national/world/2021/12/28/1_5720949.html).


Processing URLs:  13%|█▎        | 128/1000 [06:31<16:51,  1.16s/it]

Error extracting text from http://aranews.net/2016/04/isis-captures-ten-villages-syrian-army-forces-southern-aleppo/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/isis-captures-ten-villages-syrian-army-forces-southern-aleppo/


Processing URLs:  13%|█▎        | 130/1000 [06:34<16:42,  1.15s/it]

Error extracting text from http://www.newsweek.com/joe-biden-hillary-clinton-financial-backer-372221: 403 Client Error: Forbidden for url: https://www.newsweek.com/joe-biden-hillary-clinton-financial-backer-372221


Processing URLs:  13%|█▎        | 132/1000 [06:37<22:22,  1.55s/it]

Error extracting text from https://blogs.sciencemag.org/pipeline/archives/2021/04/15/merck-keeps-plowing-on: 403 Client Error: Forbidden for url: https://www.science.org/doi/10.1126/blog-post.17084


Processing URLs:  14%|█▎        | 137/1000 [06:45<18:51,  1.31s/it]

Error extracting text from https://www.reuters.com/article/us-usa-trump-russia-microsoft/microsoft-looks-at-whether-russians-bought-u-s-ads-on-search-engine-idUSKBN1CE2K3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-microsoft/microsoft-looks-at-whether-russians-bought-u-s-ads-on-search-engine-idUSKBN1CE2K3


Processing URLs:  14%|█▍        | 139/1000 [06:48<18:04,  1.26s/it]

Error extracting text from http://www.futuredirections.org.au/publication/south-asian-association-regional-co-operation-part-two-next/: HTTPConnectionPool(host='www.futuredirections.org.au', port=80): Max retries exceeded with url: /publication/south-asian-association-regional-co-operation-part-two-next/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe2712b0>: Failed to resolve 'www.futuredirections.org.au' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  14%|█▍        | 144/1000 [06:59<31:50,  2.23s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-27/trump-said-to-not-plan-nafta-withdrawal-notice-in-state-of-union


Processing URLs:  15%|█▍        | 147/1000 [08:08<4:15:31, 17.97s/it]

Error extracting text from https://www.coinschedule.com/: HTTPSConnectionPool(host='www.coinschedule.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ffd5d520>, 'Connection to www.coinschedule.com timed out. (connect timeout=60)'))


Processing URLs:  15%|█▍        | 148/1000 [09:09<6:54:12, 29.17s/it]

Error extracting text from http://dfat.gov.au/trade/agreements/rcep/pages/regional-comprehensive-economic-partnership.aspx: HTTPSConnectionPool(host='www.dfat.gov.au', port=443): Read timed out. (read timeout=60)


Processing URLs:  15%|█▌        | 151/1000 [09:14<2:48:14, 11.89s/it]

Error extracting text from https://www.nytimes.com/2017/12/01/opinion/ehud-barak-israel-netanyahu.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/01/opinion/ehud-barak-israel-netanyahu.html


Processing URLs:  15%|█▌        | 153/1000 [09:19<1:37:41,  6.92s/it]

Error extracting text from http://blog.independencelive.net/did-nicola-sturgeon-just-announce-indyref2/: HTTPConnectionPool(host='blog.independencelive.net', port=80): Max retries exceeded with url: /did-nicola-sturgeon-just-announce-indyref2/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301661a00>: Failed to resolve 'blog.independencelive.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  16%|█▋        | 164/1000 [09:37<15:50,  1.14s/it]  

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0YK0LT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0YK0LT
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-obama-idUSKCN0XI1AH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-obama-idUSKCN0XI1AH


Processing URLs:  17%|█▋        | 166/1000 [09:44<30:44,  2.21s/it]

Error extracting text from https://www.predictit.org/Browse/Group/64/Super-Tuesday: 403 Client Error: Forbidden for url: https://www.predictit.org/Browse/Group/64/Super-Tuesday


Processing URLs:  17%|█▋        | 169/1000 [09:47<20:42,  1.49s/it]

URL filtered: https://www.youtube.com/watch?v=ZDSz00s34to


Processing URLs:  17%|█▋        | 174/1000 [09:53<14:57,  1.09s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/economy-budget/332058-senators-on-both-sides-are-friends-of-the-filibuster: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/economy-budget/332058-senators-on-both-sides-are-friends-of-the-filibuster/


Processing URLs:  18%|█▊        | 183/1000 [10:04<14:02,  1.03s/it]

Error extracting text from http://mennoworld.org/2017/07/17/columns/washington-witness-more-arms-for-nigeria/: 403 Client Error: Forbidden for url: https://anabaptistworld.org
Error extracting text from http://www.france24.com/en/20160411-held-raqa-mosul-must-fall-2016-french-defence-chief: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160411-held-raqa-mosul-must-fall-2016-french-defence-chief


Processing URLs:  19%|█▊        | 186/1000 [11:08<4:19:06, 19.10s/it]

Error extracting text from http://thestandard.com.ph/opinion/columns/back-channel-by-alejandro-del-rosario/210754/saving-face-and-its-consequences.html: HTTPConnectionPool(host='thestandard.com.ph', port=80): Max retries exceeded with url: /opinion/columns/back-channel-by-alejandro-del-rosario/210754/saving-face-and-its-consequences.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffd5fc50>, 'Connection to thestandard.com.ph timed out. (connect timeout=60)'))


Processing URLs:  20%|█▉        | 195/1000 [11:23<29:08,  2.17s/it]  

Error extracting text from http://www.wsj.com/articles/iaea-finds-some-iranian-nuclear-weapons-activity-continued-after-2003-1449078659: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iaea-finds-some-iranian-nuclear-weapons-activity-continued-after-2003-1449078659


Processing URLs:  20%|█▉        | 197/1000 [11:25<20:34,  1.54s/it]

Error extracting text from http://1tvnews.af/en/news/afghanistan/22113-afghan-unity-government-leaders-agree-on-nominees-for-defense-minister-spy-chief: 406 Client Error: Not Acceptable for url: http://1tvnews.af/en/news/afghanistan/22113-afghan-unity-government-leaders-agree-on-nominees-for-defense-minister-spy-chief


Processing URLs:  20%|██        | 200/1000 [11:27<10:49,  1.23it/s]

Error extracting text from https://abcnews.go.com/International/wireStory/launching-donor-conference-amid-fears-famine-yemen-76178739: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/launching-donor-conference-amid-fears-famine-yemen-76178739
Error extracting text from http://bigstory.ap.org/article/e72f2dc4a38b47f8a25f279522b3b4d7/told-he-must-go-syrias-assad-may-outlast-obama-office: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/e72f2dc4a38b47f8a25f279522b3b4d7/told-he-must-go-syrias-assad-may-outlast-obama-office (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3042e5520>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  21%|██        | 211/1000 [11:59<56:12,  4.27s/it]  

Error extracting text from http://www.japantimes.co.jp/news/2016/03/26/world/iran-denies-supporting-u-s-bank-hacking/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/03/26/world/iran-denies-supporting-u-s-bank-hacking/


Processing URLs:  21%|██▏       | 213/1000 [12:02<38:09,  2.91s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-02-08/iran-and-north-korea-resumed-cooperation-on-missiles-un-says


Processing URLs:  22%|██▏       | 216/1000 [12:06<27:22,  2.10s/it]



Processing URLs:  22%|██▏       | 218/1000 [12:08<20:32,  1.58s/it]

Error extracting text from http://blogs.wsj.com/moneybeat/2015/11/10/apple-shares-tumble-on-iphone-demand-worries/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2015/11/10/apple-shares-tumble-on-iphone-demand-worries/


Processing URLs:  22%|██▏       | 224/1000 [12:15<15:16,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-idUSKCN0WD21Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-idUSKCN0WD21Y


Processing URLs:  23%|██▎       | 231/1000 [12:26<15:10,  1.18s/it]

Error extracting text from https://www.wsj.com/articles/trump-considers-contenders-to-be-his-new-social-media-outlet-after-big-tech-crackdown-11621013567: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-considers-contenders-to-be-his-new-social-media-outlet-after-big-tech-crackdown-11621013567


Processing URLs:  23%|██▎       | 233/1000 [12:35<43:42,  3.42s/it]

Error extracting text from http://www.australianetworknews.com/winds-winter-release-story-news-everything-know-far/: HTTPConnectionPool(host='www.australianetworknews.com', port=80): Max retries exceeded with url: /winds-winter-release-story-news-everything-know-far/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304fcfe30>: Failed to resolve 'www.australianetworknews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  23%|██▎       | 234/1000 [12:36<33:01,  2.59s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-hyperinflation-comment-4133da6e-d0c1-11e5-90d3-34c2c42653ac-20160211-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-hyperinflation-comment-4133da6e-d0c1-11e5-90d3-34c2c42653ac-20160211-story.html


Processing URLs:  24%|██▎       | 236/1000 [12:37<20:29,  1.61s/it]

Error extracting text from http://www.wsj.com/articles/calls-grow-for-brazilian-presidents-ouster-1458489587: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/calls-grow-for-brazilian-presidents-ouster-1458489587


Processing URLs:  24%|██▍       | 238/1000 [12:40<17:36,  1.39s/it]

Error extracting text from http://fr.africatime.com/burundi/articles/le-burundi-pays-le-plus-malheureux-du-monde: 404 Client Error: Not Found for url: http://fr.africatime.com/burundi/articles/le-burundi-pays-le-plus-malheureux-du-monde


Processing URLs:  24%|██▍       | 241/1000 [12:41<08:51,  1.43it/s]

Error extracting text from http://www.huffingtonpost.ca/2016/04/16/saudi-arabia-oil_n_9708840.html: 502 Server Error: Bad Gateway for url: https://www.huffingtonpost.ca/2016/04/16/saudi-arabia-oil_n_9708840.html
Error extracting text from https://www.reuters.com/article/us-usa-biden-state-arms/biden-will-seek-to-extend-new-start-treaty-must-decide-how-long-idUSKBN29O2QC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-state-arms/biden-will-seek-to-extend-new-start-treaty-must-decide-how-long-idUSKBN29O2QC?il=0


Processing URLs:  24%|██▍       | 245/1000 [13:48<4:03:36, 19.36s/it]

Error extracting text from https://bit.ly/2NuWRKW: HTTPSConnectionPool(host='www.aa.com.tr', port=443): Read timed out. (read timeout=60)


Processing URLs:  25%|██▌       | 250/1000 [13:55<55:39,  4.45s/it]  

Error extracting text from http://www.ibtimes.co.uk/recep-tayyip-erdogan-says-west-siding-coup-plotters-terrorists-1573951: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/recep-tayyip-erdogan-says-west-siding-coup-plotters-terrorists-1573951


Processing URLs:  25%|██▌       | 254/1000 [14:01<27:46,  2.23s/it]

Error extracting text from http://www.pancanal.com/eng/pr/press-releases/2015/12/01/pr567.html: 403 Client Error: Forbidden for url: http://www.pancanal.com/eng/pr/press-releases/2015/12/01/pr567.html


Processing URLs:  26%|██▌       | 255/1000 [14:03<24:16,  1.95s/it]

Error extracting text from http://www.pajhwok.com/en/2017/07/16/govt-forces-reclaim-helmand%E2%80%99s-nawa-after-operation: 404 Client Error: Not Found for url: https://pajhwok.com/en/2017/07/16/govt-forces-reclaim-helmand%E2%80%99s-nawa-after-operation


Processing URLs:  26%|██▌       | 257/1000 [14:05<17:19,  1.40s/it]

Error extracting text from http://www.reuters.com/article/us-spain-election-rajoy-idUSKCN0ZD0SY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-election-rajoy-idUSKCN0ZD0SY


Processing URLs:  26%|██▌       | 260/1000 [14:11<22:10,  1.80s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-merkel-idUSKBN15L12U?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-merkel-idUSKBN15L12U?il=0


Processing URLs:  26%|██▋       | 264/1000 [14:17<19:40,  1.60s/it]

URL filtered: https://instagram.com/p/96IW2hGRPr/


Processing URLs:  27%|██▋       | 268/1000 [14:23<19:20,  1.59s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-19/vw-clean-diesel-scheme-exposed-as-u-s-weighs-criminal-charges


Processing URLs:  27%|██▋       | 272/1000 [14:29<18:42,  1.54s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/burundi-killed-violence-linked-presidents-term-39362090: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/burundi-killed-violence-linked-presidents-term-39362090


Processing URLs:  27%|██▋       | 274/1000 [14:34<23:49,  1.97s/it]

Error extracting text from https://www.google.com/amp/s/uk.mobile.reuters.com/article/amp/idUKKBN28Y1I4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/amp/idUKKBN28Y1I4


Processing URLs:  28%|██▊       | 275/1000 [14:36<25:01,  2.07s/it]

URL filtered: https://www.youtube.com/watch?v=RBq_THTWNM4


Processing URLs:  28%|██▊       | 278/1000 [14:41<24:02,  2.00s/it]

Error extracting text from http://1049maxcountry.com/agricultural/the-land-value-wave-dips/: 404 Client Error: Not Found for url: https://ruralradio.com/maxcountry/news/the-land-value-wave-dips/


Processing URLs:  28%|██▊       | 285/1000 [14:57<24:15,  2.04s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-pdvsa-idUSKCN0UQ2C720160112: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-pdvsa-idUSKCN0UQ2C720160112


Processing URLs:  29%|██▉       | 291/1000 [15:04<09:50,  1.20it/s]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-charges-two-in-telekom-bribe-case-11-17-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-charges-two-in-telekom-bribe-case-11-17-2015


Processing URLs:  29%|██▉       | 294/1000 [15:07<10:19,  1.14it/s]

Error extracting text from http://www.wsj.com/articles/brazilian-president-dilma-rousseffs-approval-ratings-remain-depressed-1443629542: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazilian-president-dilma-rousseffs-approval-ratings-remain-depressed-1443629542
URL filtered: https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/
URL filtered: https://www.bloomberg.com/graphics/2017-oil-projections/


Processing URLs:  30%|██▉       | 299/1000 [15:11<09:37,  1.21it/s]

Error extracting text from http://www.scotsman.com/news/nicola-sturgeon-announces-plans-for-second-indyref-1-4162797: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/nicola-sturgeon-announces-plans-for-second-indyref-1-4162797


Processing URLs:  30%|███       | 305/1000 [15:29<19:01,  1.64s/it]

Error extracting text from https://www.state.gov/documents/organization/245317.pdf: 404 Client Error: Not Found for url: https://www.state.gov/state-gov-website-modernization/
Error extracting text from http://www.worldbulletin.net/headlines/173116/anticipating-attack-syrians-flee-isil-held-raqqa: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/headlines/173116/anticipating-attack-syrians-flee-isil-held-raqqa


Processing URLs:  31%|███       | 306/1000 [15:30<16:10,  1.40s/it]

Error extracting text from http://allafrica.com/stories/201602080571.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201602080571.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303201370>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  31%|███       | 311/1000 [15:37<18:35,  1.62s/it]

Error extracting text from http://www.world-nuclear.org/info/Country-Profiles/Countries-T-Z/Turkey/: 404 Client Error: Not Found for url: https://www.world-nuclear.org/info/Country-Profiles/Countries-T-Z/Turkey/


Processing URLs:  31%|███       | 312/1000 [15:39<21:05,  1.84s/it]

Error extracting text from http://www.iec.org.af/media-section/press-releases/546-donduct-2016: HTTPConnectionPool(host='www.iec.org.af', port=80): Max retries exceeded with url: /media-section/press-releases/546-donduct-2016 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303203b90>: Failed to resolve 'www.iec.org.af' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  32%|███▏      | 318/1000 [15:45<10:50,  1.05it/s]

Error extracting text from https://www.reuters.com/world/us/biden-says-hes-confident-he-will-be-able-meet-putin-soon-2021-05-07/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/biden-says-hes-confident-he-will-be-able-meet-putin-soon-2021-05-07/
Error extracting text from http://www.cdm.me/english/foreign-ministry-integration-into-nato-is-not-directed-against-anyone: 403 Client Error: Forbidden for url: https://www.cdm.me/english/foreign-ministry-integration-into-nato-is-not-directed-against-anyone


Processing URLs:  32%|███▏      | 321/1000 [15:47<09:24,  1.20it/s]

Error extracting text from http://business.financialpost.com/business-insider/here-are-the-breakeven-oil-prices-for-every-drilling-project-in-the-world: 403 Client Error: Forbidden for url: https://financialpost.com/business-insider/here-are-the-breakeven-oil-prices-for-every-drilling-project-in-the-world


Processing URLs:  32%|███▏      | 323/1000 [15:51<13:24,  1.19s/it]

Error extracting text from http://www.nytimes.com/2016/10/05/world/asia/china-president-xi-jinping-successor.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/05/world/asia/china-president-xi-jinping-successor.html


Processing URLs:  32%|███▏      | 324/1000 [16:38<2:39:49, 14.18s/it]

URL filtered: http://timesofindia.indiatimes.com/world/europe/is-trains-400-fighters-to-attack-europe-in-wave-of-bloodshed/articleshow/51534590.cms?utm_source=toimobile&amp;utm_medium=Linkedin&amp;utm_campaign=referral


Processing URLs:  34%|███▎      | 336/1000 [16:55<15:22,  1.39s/it]  

Error extracting text from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2835841: 403 Client Error: Forbidden for url: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2835841


Processing URLs:  34%|███▍      | 338/1000 [16:56<10:17,  1.07it/s]

Error extracting text from http://www.newsletter.co.uk/news/northern-ireland-news/we-will-not-guarantee-support-for-stormont-house-deal-sdlp-1-7066853#ixzz3rj4JiA4i: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/northern-ireland-news/we-will-not-guarantee-support-for-stormont-house-deal-sdlp-1-7066853#ixzz3rj4JiA4i


Processing URLs:  34%|███▍      | 341/1000 [17:00<13:30,  1.23s/it]

Error extracting text from http://thinkprogress.org/climate/2015/04/13/3646004/electric-car-batteries-price/: 403 Client Error: Forbidden for url: https://thinkprogress.org/climate/2015/04/13/3646004/electric-car-batteries-price/


Processing URLs:  34%|███▍      | 344/1000 [17:04<13:23,  1.23s/it]

Error extracting text from http://www.nytimes.com/2015/09/10/us/politics/new-justice-dept-rules-aimed-at-prosecutingcorporate-: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/10/us/politics/new-justice-dept-rules-aimed-at-prosecutingcorporate-
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NYF8NU6JTSED01-6QG8GJOKULGNR28II2VJDPD7KP


Processing URLs:  35%|███▍      | 346/1000 [17:06<11:59,  1.10s/it]

Error extracting text from https://inhomelandsecurity.com/kaspersky-software-hack-us-intelligence/: 403 Client Error: Forbidden for url: https://amuedge.com/kaspersky-software-hack-us-intelligence/


Processing URLs:  35%|███▌      | 352/1000 [17:15<12:34,  1.16s/it]

Error extracting text from http://www.adweek.com/news/press/longtime-real-simple-editor-steps-down-amid-changes-time-inc-173074: 403 Client Error: Forbidden for url: https://www.adweek.com/news/press/longtime-real-simple-editor-steps-down-amid-changes-time-inc-173074


Processing URLs:  35%|███▌      | 354/1000 [17:17<11:23,  1.06s/it]

Error extracting text from http://www.elevenmyanmar.com/politics/myanmar-presidential-nomination-date-moved: 404 Client Error: Not Found for url: https://www.elevenmyanmar.com/politics/myanmar-presidential-nomination-date-moved


Processing URLs:  36%|███▌      | 355/1000 [17:18<10:55,  1.02s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=56651#.WSXpxFK-Ku4: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=56651#.WSXpxFK-Ku4


Processing URLs:  36%|███▌      | 357/1000 [17:19<06:47,  1.58it/s]

Error extracting text from http://news.nationalpost.com/news/the-secret-pact-between-russia-and-syria-drafted-in-august-gives-moscow-a-carte-blanche: 403 Client Error: Forbidden for url: https://nationalpost.com/category/news//
Error extracting text from https://www.reuters.com/article/us-somalia-security/somalias-peacekeeping-mission-could-be-hurt-by-cut-in-force-size-mission-chief-idUSKBN1DY2MV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-security/somalias-peacekeeping-mission-could-be-hurt-by-cut-in-force-size-mission-chief-idUSKBN1DY2MV?il=0


Processing URLs:  36%|███▌      | 359/1000 [18:00<1:36:28,  9.03s/it]

Error extracting text from http://www.seattlepi.com/news/us/article/Jackpot-fixing-investigation-expands-to-more-6707355.php: 403 Client Error: Forbidden for url: https://www.seattlepi.com/news/us/article/Jackpot-fixing-investigation-expands-to-more-6707355.php


Processing URLs:  37%|███▋      | 366/1000 [18:05<14:39,  1.39s/it]  

Error extracting text from http://www.reuters.com/article/us-brazil-corruption-idUSKBN0TY1W020151215: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-corruption-idUSKBN0TY1W020151215
Error extracting text from http://www.reuters.com/article/us-trade-ttip-kerry-idUSKCN0ZY139: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-ttip-kerry-idUSKCN0ZY139


Processing URLs:  37%|███▋      | 370/1000 [18:10<15:08,  1.44s/it]

Error extracting text from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msab118/6257226: 403 Client Error: Forbidden for url: https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msab118/6257226


Processing URLs:  37%|███▋      | 372/1000 [18:14<17:24,  1.66s/it]

Error extracting text from http://carnegie-mec.org/2015/12/16/kurdistan-s-political-armies-challenge-of-unifying-peshmerga-forces/in5p: 403 Client Error: Forbidden for url: http://carnegie-mec.org/2015/12/16/kurdistan-s-political-armies-challenge-of-unifying-peshmerga-forces/in5p


Processing URLs:  37%|███▋      | 373/1000 [18:14<13:03,  1.25s/it]

Error extracting text from https://www.predictit.org/api/marketdata/all/: 403 Client Error: Forbidden for url: https://www.predictit.org/api/marketdata/all/


Processing URLs:  37%|███▋      | 374/1000 [18:17<18:42,  1.79s/it]

Error extracting text from http://www.stardem.com/ap/business/article_66a9ce39-ff71-528c-9072-bd73b05782ec.html: 404 Client Error: Not Found for url: https://www.stardem.com/ap/business/article_66a9ce39-ff71-528c-9072-bd73b05782ec.html


Processing URLs:  38%|███▊      | 378/1000 [18:22<10:22,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-australia-security-northkorea-idUSKBN1AC00V: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-australia-security-northkorea-idUSKBN1AC00V


Processing URLs:  38%|███▊      | 379/1000 [18:22<08:47,  1.18it/s]

Error extracting text from http://www.yenisafak.com/en/world/russia-to-take-step-to-allow-visa-free-travel-for-turks-2610343: 422 Client Error:  for url: http://www.yenisafak.com/en/world/russia-to-take-step-to-allow-visa-free-travel-for-turks-2610343


Processing URLs:  38%|███▊      | 383/1000 [18:42<33:03,  3.21s/it]

Error extracting text from http://www.rollcall.com/news/politics/ohio-democrats-set-take-portmans-private-sector-life: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/ohio-democrats-set-take-portmans-private-sector-life


Processing URLs:  39%|███▊      | 386/1000 [18:46<21:37,  2.11s/it]

Error extracting text from http://fox6now.com/2015/02/20/u-s-official-plans-in-place-for-25000-strong-iraqi-spring-push-to-retake-mosul/#: 404 Client Error: Not Found for url: https://www.fox6now.com/2015/02/20/u-s-official-plans-in-place-for-25000-strong-iraqi-spring-push-to-retake-mosul/


Processing URLs:  39%|███▉      | 389/1000 [18:50<16:12,  1.59s/it]

Error extracting text from https://www.reuters.com/article/britain-eu-germany-diplomacy-idUSKBN28P1P5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/britain-eu-germany-diplomacy-idUSKBN28P1P5


Processing URLs:  39%|███▉      | 393/1000 [19:03<22:13,  2.20s/it]

URL filtered: http://www.reuters.com/article/us-israel-netanyahu-idUSKBN1AP1NN?utm_source=twitter&amp;utm_medium=Social


Processing URLs:  40%|████      | 400/1000 [19:13<12:17,  1.23s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-06-07/turkey-chooses-sides-in-gulf-conflict-as-erdogan-defends-qatar
Error extracting text from http://www.latimes.com/world/asia/la-fg-norkor-missile-2-20150919-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/asia/la-fg-norkor-missile-2-20150919-story.html


Processing URLs:  40%|████      | 402/1000 [19:17<17:31,  1.76s/it]

Error extracting text from http://www.latimes.com/world/afghanistan-pakistan/la-fg-afghanistan-taliban-20160425-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/afghanistan-pakistan/la-fg-afghanistan-taliban-20160425-story.html


Processing URLs:  41%|████      | 410/1000 [19:24<07:38,  1.29it/s]

Error extracting text from http://www.who.int/csr/don/11-june-2015-mers-saudi-arabia/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/11-june-2015-mers-saudi-arabia/en/
Error extracting text from http://www.reuters.com/article/us-britain-eu-johnson-idUSKCN1B50R1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-johnson-idUSKCN1B50R1


Processing URLs:  41%|████▏     | 414/1000 [19:27<06:04,  1.61it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-astana-idUSKBN16I0J1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-astana-idUSKBN16I0J1


Processing URLs:  42%|████▏     | 417/1000 [19:31<10:49,  1.11s/it]

Error extracting text from https://www.reuters.com/world/americas/amazon-fires-surge-anew-brazil-cleared-forest-burns-2021-09-03/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/amazon-fires-surge-anew-brazil-cleared-forest-burns-2021-09-03/


Processing URLs:  42%|████▏     | 421/1000 [19:38<16:16,  1.69s/it]

Error extracting text from http://thephilippinestar.ph/articles/2016-03-05/news/us-navy-deploys-strike-group-to-south-china-sea/142726: HTTPConnectionPool(host='thephilippinestar.ph', port=80): Max retries exceeded with url: /articles/2016-03-05/news/us-navy-deploys-strike-group-to-south-china-sea/142726 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301662990>: Failed to resolve 'thephilippinestar.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 422/1000 [19:38<14:17,  1.48s/it]

Error extracting text from https://abcnews.go.com/Health/wireStory/north-macedonia-speeds-vaccinations-eu-aid-arrives-77491196: 404 Client Error: Not Found for url: https://abcnews.go.com/Health/wireStory/north-macedonia-speeds-vaccinations-eu-aid-arrives-77491196


Processing URLs:  42%|████▏     | 424/1000 [19:40<11:37,  1.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-16/as-impeachment-fades-brazil-graft-probe-corners-rousseff-allies


Processing URLs:  43%|████▎     | 428/1000 [20:48<2:44:06, 17.21s/it]

Error extracting text from http://aa.com.tr/en/europe/hungarian-ambassador-calls-turkey-key-ally-of-europe/733198: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  43%|████▎     | 429/1000 [20:49<2:03:51, 13.02s/it]

Error extracting text from http://www.cnbc.com/2015/12/07/reuters-america-update-4-oil-falls-towards-2015-low-on-opec-inaction-strong-dollar.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/07/reuters-america-update-4-oil-falls-towards-2015-low-on-opec-inaction-strong-dollar.html


Processing URLs:  43%|████▎     | 430/1000 [20:52<1:35:08, 10.01s/it]

URL filtered: https://www.bloomberg.com/gadfly/articles/2017-06-25/opec-looks-totally-bewildered-by-the-oil-market


Processing URLs:  43%|████▎     | 433/1000 [21:11<1:23:48,  8.87s/it]

Error extracting text from http://www.polioeradication.org/mediaroom/newsstories/Is-Africa-polio-free-/tabid/526/news/1264/Default.aspx: 404 Client Error: Not Found for url: https://polioeradication.org/mediaroom/newsstories/Is-Africa-polio-free-/tabid/526/news/1264/Default.aspx


Processing URLs:  43%|████▎     | 434/1000 [21:12<1:02:52,  6.67s/it]

Error extracting text from http://www.latimes.com/politics/washington/la-na-essential-washington-updates-russia-denies-that-intelligence-agents-1487166625-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/washington/la-na-essential-washington-updates-russia-denies-that-intelligence-agents-1487166625-htmlstory.html


Processing URLs:  44%|████▎     | 436/1000 [21:14<37:16,  3.97s/it]  

URL filtered: https://mobile.twitter.com/EuropeElects


Processing URLs:  44%|████▍     | 439/1000 [21:15<18:09,  1.94s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN11305C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN11305C


Processing URLs:  44%|████▍     | 443/1000 [21:22<13:45,  1.48s/it]

Error extracting text from https://www.reuters.com/world/europe/close-ally-kremlin-critic-navalny-leaves-russia-amid-crackdown-rt-ren-tv-cite-2021-08-08/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/close-ally-kremlin-critic-navalny-leaves-russia-amid-crackdown-rt-ren-tv-cite-2021-08-08/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-opposition-idUSKBN1721PS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-opposition-idUSKBN1721PS


Processing URLs:  44%|████▍     | 445/1000 [21:23<10:51,  1.17s/it]

Error extracting text from https://www.theafricareport.com/86356/ethiopia-gerd-what-would-push-egypts-sisi-to-resort-to-force/: 403 Client Error: Forbidden for url: https://www.theafricareport.com/86356/ethiopia-gerd-what-would-push-egypts-sisi-to-resort-to-force/


Processing URLs:  45%|████▌     | 450/1000 [21:32<14:26,  1.58s/it]

Error extracting text from https://carnegieendowment.org/2021/11/12/ukraine-putin-s-unfinished-business-pub-85771: 403 Client Error: Forbidden for url: https://carnegieendowment.org/2021/11/12/ukraine-putin-s-unfinished-business-pub-85771


Processing URLs:  45%|████▌     | 453/1000 [21:36<11:27,  1.26s/it]

Error extracting text from http://nerdist.com/us-mecha-builder-challenges-japan-to-robo-duel/: 403 Client Error: Forbidden for url: http://nerdist.com/us-mecha-builder-challenges-japan-to-robo-duel/


Processing URLs:  46%|████▌     | 456/1000 [21:41<14:54,  1.64s/it]

Error extracting text from http://insideevs.com/november-2016-plug-electric-vehicle-sales-report-card/: 410 Client Error: Gone for url: https://insideevs.com:443/news/330344/november-2016-plug-in-electric-vehicle-sales-report-card/


Processing URLs:  46%|████▌     | 457/1000 [21:42<13:40,  1.51s/it]

Error extracting text from http://www.cnbc.com/2016/01/07/: 404 Client Error: Not Found for url: https://www.cnbc.com/2016/01/07/


Processing URLs:  46%|████▌     | 460/1000 [21:45<08:47,  1.02it/s]

Error extracting text from https://phys.org/news/2016-11-theory-gravity-dark.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-11-theory-gravity-dark.html


Processing URLs:  46%|████▌     | 461/1000 [21:45<07:15,  1.24it/s]

Error extracting text from http://www.reuters.tv/v/45H/2017/01/12/turkish-lawmakers-close-debate-with-brawl-again: HTTPConnectionPool(host='www.reuters.tv', port=80): Max retries exceeded with url: /v/45H/2017/01/12/turkish-lawmakers-close-debate-with-brawl-again (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3023ddb50>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  46%|████▌     | 462/1000 [21:46<07:51,  1.14it/s]

Error extracting text from http://www.caam.org.cn/hangye/20161230/1005203391.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/hangye/20161230/1005203391.html


Processing URLs:  46%|████▋     | 463/1000 [21:50<14:36,  1.63s/it]

Error extracting text from http://pswww.ifa.hawaii.edu/pswww/: HTTPConnectionPool(host='pswww.ifa.hawaii.edu', port=80): Max retries exceeded with url: /pswww/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3016616a0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  46%|████▋     | 464/1000 [21:50<10:50,  1.21s/it]

Error extracting text from http://www.wsj.com/articles/for-a-day-syrias-cease-fire-revives-peaceful-protest-1457113287: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/for-a-day-syrias-cease-fire-revives-peaceful-protest-1457113287


Processing URLs:  47%|████▋     | 466/1000 [21:51<06:55,  1.29it/s]

Error extracting text from http://www.nytimes.com/2015/11/11/world/middleeast/obama-assad-syria.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/11/world/middleeast/obama-assad-syria.html
URL filtered: http://www.bloomberg.com/news/articles/2015-10-31/u-s-says-greece-must-lift-bank-governance-to-build-on-progress


Processing URLs:  47%|████▋     | 468/1000 [21:52<06:16,  1.41it/s]

URL filtered: https://news.yahoo.com/over-2-000-flights-cancelled-163346664.html?ncid=twitter_yahoonewst_sjwumo1bpf4&amp;guccounter=1


Processing URLs:  47%|████▋     | 470/1000 [21:53<06:01,  1.47it/s]

URL filtered: https://www.bloomberg.com/news/articles/2018-01-09/kia-unveils-plan-for-16-new-electric-or-hybrid-vehicles-by-2025


Processing URLs:  47%|████▋     | 474/1000 [21:57<07:12,  1.22it/s]

Error extracting text from https://www.nytimes.com/2021/03/22/world/europe/scotland-nicola-sturgeon.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/22/world/europe/scotland-nicola-sturgeon.html


Processing URLs:  48%|████▊     | 475/1000 [21:58<07:44,  1.13it/s]

Error extracting text from http://www.malaysia-chronicle.com/index.php?option=com_k2&amp;view=item&amp;id=606063:najib-reiterates-no-ringgit-peg-or-capital-controls&amp;Itemid=3: HTTPConnectionPool(host='www.malaysia-chronicle.com', port=80): Max retries exceeded with url: /index.php?option=com_k2&amp;view=item&amp;id=606063:najib-reiterates-no-ringgit-peg-or-capital-controls&amp;Itemid=3 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ffd5f170>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  48%|████▊     | 476/1000 [21:59<08:34,  1.02it/s]

Error extracting text from https://bit.ly/3aXfBep: 403 Client Error: Forbidden for url: https://www.kyivpost.com/eastern-europe/unian-ukraine-intelligence-says-putin-haunted-by-health-issues.html?cn-reloaded=1


Processing URLs:  48%|████▊     | 478/1000 [22:10<23:34,  2.71s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKCN11S0MW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-shirqat-idUSKCN11S0MW


Processing URLs:  48%|████▊     | 479/1000 [22:13<22:28,  2.59s/it]

Error extracting text from http://www.miragenews.com/turkey-reopens-checkpoint-along-irans-border/: 404 Client Error: Not Found for url: https://www.miragenews.com/turkey-reopens-checkpoint-along-irans-border/


Processing URLs:  48%|████▊     | 484/1000 [22:18<09:16,  1.08s/it]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-markets-stocks-idUSKCN0SV1KV20151106: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-markets-stocks-idUSKCN0SV1KV20151106


Processing URLs:  49%|████▊     | 486/1000 [22:19<05:34,  1.54it/s]

Error extracting text from https://www.thenigerianvoice.com/news/297454/ethiopias-tigray-war-a-deadly-dangerous-stalemate.html: 403 Client Error: Forbidden for url: https://www.thenigerianvoice.com/news/297454/ethiopias-tigray-war-a-deadly-dangerous-stalemate.html


Processing URLs:  49%|████▊     | 487/1000 [22:20<08:16,  1.03it/s]

Error extracting text from https://www.strtrade.com/news-publications-TTIP-negotiations-market-access-tariffs-110515.html: 404 Client Error: Not Found for url: https://www.strtrade.com/news-publications-TTIP-negotiations-market-access-tariffs-110515


Processing URLs:  49%|████▉     | 489/1000 [22:24<12:31,  1.47s/it]

URL filtered: https://t.co/pwBXxviVQk&quot;&gt;pic.twitter.com/pwBXxviVQk&lt;/a&gt;&lt;/p&gt;&amp;mdash


Processing URLs:  49%|████▉     | 491/1000 [22:25<07:54,  1.07it/s]

URL filtered: http://www.bloomberg.com/news/articles/2014-06-11/with-the-machine-hp-may-have-invented-a-new-kind-of-computer


Processing URLs:  50%|████▉     | 495/1000 [22:28<06:19,  1.33it/s]

Error extracting text from http://globalnation.inquirer.net/135574/naia-to-close-2-runways-for-arrival-of-japans-imperial-couple: 403 Client Error: Forbidden for url: https://globalnation.inquirer.net/135574/naia-to-close-2-runways-for-arrival-of-japans-imperial-couple


Processing URLs:  50%|████▉     | 496/1000 [22:28<06:35,  1.28it/s]

URL filtered: http://www.bloomberg.com/politics/articles/2016-10-03/north-carolina-poll


Processing URLs:  50%|█████     | 502/1000 [22:45<20:34,  2.48s/it]

Error extracting text from http://www.ibtimes.com/el-nino-ethiopia-amid-severe-drought-food-crisis-who-deploys-emergency-response-team-2214401: 403 Client Error: Forbidden for url: https://www.ibtimes.com/el-nino-ethiopia-amid-severe-drought-food-crisis-who-deploys-emergency-response-team-2214401


Processing URLs:  51%|█████     | 506/1000 [22:51<15:14,  1.85s/it]

Error extracting text from https://www.agweb.com/agday/blog/straight-from-dc-agricultural-perspectives/the-four-famines/: 403 Client Error: Forbidden for url: https://www.agweb.com/agday/blog/straight-from-dc-agricultural-perspectives/the-four-famines/
URL filtered: https://www.bloomberg.com/news/articles/2021-07-09/u-s-frets-that-time-is-running-out-to-revive-iran-nuclear-deal&gt


Processing URLs:  51%|█████     | 510/1000 [22:55<08:40,  1.06s/it]

Error extracting text from http://www.latimes.com/world/mexico-americas/la-fg-brazil-leaks-20160531-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/mexico-americas/la-fg-brazil-leaks-20160531-snap-story.html
URL filtered: http://www.bloombergview.com/articles/2016-03-11/republicans-new-target-the-pre-convention


Processing URLs:  51%|█████▏    | 514/1000 [22:59<10:14,  1.26s/it]

Error extracting text from https://www.ipsos-mori.com/researchpublications/researcharchive/3726/Economist-Ipsos-MORI-April-2016-Issues-Index.aspx: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-uk/researchpublications/researcharchive/3726/Economist-Ipsos-MORI-April-2016-Issues-Index.aspx


Processing URLs:  52%|█████▏    | 515/1000 [23:00<08:01,  1.01it/s]

Error extracting text from http://www.wsj.com/articles/syria-strains-to-bolster-depleted-military-1470413365: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syria-strains-to-bolster-depleted-military-1470413365


Processing URLs:  52%|█████▏    | 516/1000 [23:00<06:17,  1.28it/s]

Error extracting text from http://www.wsj.com/articles/clinton-might-not-be-the-nominee-1464733898: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/clinton-might-not-be-the-nominee-1464733898


Processing URLs:  52%|█████▏    | 517/1000 [23:00<04:58,  1.62it/s]

Error extracting text from http://www.wsj.com/articles/fomc-minutes-fed-held-off-on-rate-increase-amid-worries-about-low-inflation-1444327409: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fomc-minutes-fed-held-off-on-rate-increase-amid-worries-about-low-inflation-1444327409


Processing URLs:  52%|█████▏    | 520/1000 [23:05<08:57,  1.12s/it]

Error extracting text from https://muckrack.com/andrew-sabisky/articles: 403 Client Error: Forbidden for url: https://muckrack.com/andrew-sabisky/articles


Processing URLs:  52%|█████▏    | 521/1000 [24:06<2:29:56, 18.78s/it]

Error extracting text from https://www.seattletimes.com/seattle-news/crime/violence-erupts-during-may-day-protests-in-portland/: HTTPSConnectionPool(host='www.seattletimes.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  53%|█████▎    | 526/1000 [24:13<34:14,  4.33s/it]  

Error extracting text from https://www.google.com/amp/s/europe.newsweek.com/theresa-may-popularity-most-popular-opinion-poll-542259/amp?client=safari: 403 Client Error: Forbidden for url: https://www.newsweek.com


Processing URLs:  53%|█████▎    | 530/1000 [24:18<14:42,  1.88s/it]

Error extracting text from http://thehill.com/policy/transportation/261633-ryan-foresees-good-majority-support-for-highway-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/transportation/261633-ryan-foresees-good-majority-support-for-highway-bill/


Processing URLs:  53%|█████▎    | 533/1000 [24:23<11:04,  1.42s/it]

Error extracting text from http://www.nytimes.com/2015/10/23/world/middleeast/us-considering-ways-to-shield-syrian-civilians.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/23/world/middleeast/us-considering-ways-to-shield-syrian-civilians.html?_r=0


Processing URLs:  54%|█████▎    | 535/1000 [24:24<07:17,  1.06it/s]

Error extracting text from http://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/#.VzMFxMhXeEd: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/05/11/national/politics-diplomacy/putin-may-pay-visit-abes-home-turf-yamaguchi/#.VzMFxMhXeEd


Processing URLs:  54%|█████▍    | 544/1000 [24:39<15:54,  2.09s/it]

Error extracting text from http://www.reuters.com/article/us-usa-congress-epa-idUSKBN15W131: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-epa-idUSKBN15W131


Processing URLs:  55%|█████▍    | 545/1000 [24:42<16:07,  2.13s/it]

Error extracting text from https://www.cruz.senate.gov/?p=press_release&amp;id=5482: 404 Client Error: Not Found for url: https://www.cruz.senate.gov/404?notfound=/.noindex.html
URL filtered: https://www.youtube.com/watch?v=pJRRNiCY9-s


Processing URLs:  55%|█████▌    | 550/1000 [24:47<10:20,  1.38s/it]

Error extracting text from http://cleantechnica.com/2016/02/15/renewables-69-of-new-us-electricity-capacity-in-2015/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/02/15/renewables-69-of-new-us-electricity-capacity-in-2015/


Processing URLs:  55%|█████▌    | 552/1000 [24:50<10:11,  1.36s/it]

Error extracting text from https://www.middleeastmonitor.com/20171001-turkish-red-cresent-sends-food-aid-packages-to-yemen/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20171001-turkish-red-cresent-sends-food-aid-packages-to-yemen/


Processing URLs:  55%|█████▌    | 554/1000 [24:53<10:40,  1.44s/it]

Error extracting text from http://aranews.net/2016/07/kurds-demand-proper-representation-global-anti-isis-coalition-meetings/: 404 Client Error: Not Found for url: http://aranews.net/2016/07/kurds-demand-proper-representation-global-anti-isis-coalition-meetings/


Processing URLs:  56%|█████▌    | 562/1000 [25:16<16:21,  2.24s/it]

Error extracting text from http://www.calchannel.com/live-webcast/: 404 Client Error: Not Found for url: http://www.calchannel.com/live-webcast/


Processing URLs:  57%|█████▋    | 566/1000 [25:23<13:13,  1.83s/it]

URL filtered: https://www.bloomberg.com/view/articles/2017-10-05/how-justice-kennedy-could-give-both-parties-a-win


Processing URLs:  57%|█████▋    | 569/1000 [25:25<08:05,  1.13s/it]

Error extracting text from https://www.sec.gov/rules/rulemaking-index.shtml: 403 Client Error: Forbidden for url: https://www.sec.gov/rules/rulemaking-index.shtml


Processing URLs:  57%|█████▋    | 574/1000 [26:34<41:34,  5.86s/it]  

Error extracting text from http://www.nytimes.com/2015/10/06/business/trans-pacific-partnership-trade-deal-is-reached.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/06/business/trans-pacific-partnership-trade-deal-is-reached.html?_r=0


Processing URLs:  57%|█████▊    | 575/1000 [26:35<32:33,  4.60s/it]

URL filtered: https://www.youtube.com/watch?v=0zFAvzf0Mv0


Processing URLs:  58%|█████▊    | 579/1000 [26:38<12:44,  1.82s/it]

Error extracting text from https://www.wsj.com/articles/white-house-to-test-federal-local-sharing-of-drone-regulation-1508320807: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/white-house-to-test-federal-local-sharing-of-drone-regulation-1508320807


Processing URLs:  58%|█████▊    | 580/1000 [26:40<12:40,  1.81s/it]

Error extracting text from http://asianjournal.com/news/philippine-us-military-exercises-to-begin-this-month/: 404 Client Error: Not Found for url: https://asianjournal.com/news/philippine-us-military-exercises-to-begin-this-month/


Processing URLs:  58%|█████▊    | 581/1000 [26:43<15:18,  2.19s/it]

Error extracting text from http://sputniknews.com/politics/20151023/1029024714/japan-putin-visit: 404 Client Error: Not Found for url: https://sputnikglobe.com/politics/20151023/1029024714/japan-putin-visit/


Processing URLs:  58%|█████▊    | 583/1000 [26:45<10:28,  1.51s/it]

Error extracting text from https://www.businessinsider.com.au/scott-galloway-amazon-will-spin-off-amazon-web-services-ignition-2018-12?r=US&amp;IR=T: 404 Client Error: Not Found for url: https://www.businessinsider.com.au/scott-galloway-amazon-will-spin-off-amazon-web-services-ignition-2018-12?r=US&amp;IR=T


Processing URLs:  58%|█████▊    | 584/1000 [26:46<10:48,  1.56s/it]

Error extracting text from http://www.cnbc.com/2017/06/30/china-outraged-by-planned-1-point-42-billion-us-arms-sales-to-taiwan.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/06/30/china-outraged-by-planned-1-point-42-billion-us-arms-sales-to-taiwan.html


Processing URLs:  59%|█████▉    | 589/1000 [26:53<07:47,  1.14s/it]

URL filtered: https://twitter.com/338canada/status/1439049214865125377
Error extracting text from http://www.latimes.com/politics/la-na-trump-backer-20151209-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-trump-backer-20151209-story.html


Processing URLs:  59%|█████▉    | 591/1000 [26:56<08:51,  1.30s/it]

URL filtered: https://www.bloomberg.com/news/articles/2013-12-17/yanukovych-and-putin-russia-will-invest-15-billion-in-ukraine


Processing URLs:  59%|█████▉    | 593/1000 [26:56<05:34,  1.22it/s]

Error extracting text from http://venturebeat.com/2015/12/19/u-s-reviews-possible-back-door-in-juniper-networks-code/: 403 Client Error: Forbidden for url: https://venturebeat.com/2015/12/19/u-s-reviews-possible-back-door-in-juniper-networks-code/


Processing URLs:  59%|█████▉    | 594/1000 [26:57<05:11,  1.30it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-21/un-chief-ban-signals-willingness-to-seek-south-korea-presidency


Processing URLs:  60%|██████    | 604/1000 [27:23<10:59,  1.67s/it]

Error extracting text from https://www.donaldjtrump.com/press-releases/donald-j.-trump-statement-on-preventing-muslim-immigration: 403 Client Error: Forbidden for url: https://www.donaldjtrump.com/press-releases/donald-j.-trump-statement-on-preventing-muslim-immigration
URL filtered: https://twitter.com/KirkegaardEmil/status/717395490703286272


Processing URLs:  61%|██████    | 606/1000 [27:25<08:20,  1.27s/it]

Error extracting text from http://www.ibtimes.com/us-confirms-blackenergy-malware-used-ukrainian-power-plant-hack-2263008: 403 Client Error: Forbidden for url: https://www.ibtimes.com/us-confirms-blackenergy-malware-used-ukrainian-power-plant-hack-2263008
URL filtered: https://twitter.com/eucopresident/status/689450487754522624


Processing URLs:  61%|██████    | 610/1000 [27:27<04:48,  1.35it/s]

Error extracting text from https://news.usni.org/2021/08/12/pentagon-sending-3000-troops-to-evacuate-u-s-embassy-in-afghanistan?utm_source=USNI+News&amp;utm_campaign=a6616bd15f-USNI_NEWS_DAILY&amp;utm_medium=email&amp;utm_term=0_0dd4a1450b-a6616bd15f-231045397&amp;mc_cid=a6616bd15f&amp;mc_eid=2394c94b5b: 403 Client Error: Forbidden for url: https://news.usni.org/2021/08/12/pentagon-sending-3000-troops-to-evacuate-u-s-embassy-in-afghanistan?utm_source=USNI+News&amp;utm_campaign=a6616bd15f-USNI_NEWS_DAILY&amp;utm_medium=email&amp;utm_term=0_0dd4a1450b-a6616bd15f-231045397&amp;mc_cid=a6616bd15f&amp;mc_eid=2394c94b5b
Error extracting text from http://www.nytimes.com/2016/05/02/world/europe/brexit-referendum-companies-neutral-eu.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/02/world/europe/brexit-referendum-companies-neutral-eu.html?_r=0


Processing URLs:  61%|██████    | 612/1000 [27:27<02:59,  2.16it/s]

Error extracting text from http://www.reuters.com/article/2015/12/01/us-mideast-crisis-usa-military-idUSKBN0TK50G20151201#OH4Fdg2Y2JmMdHkq.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/us-mideast-crisis-usa-military-idUSKBN0TK50G20151201#OH4Fdg2Y2JmMdHkq.97
Error extracting text from http://www.nytimes.com/reuters/2016/02/17/world/asia/17reuters-southchinasea-china.html?ref=world&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/reuters/2016/02/17/world/asia/17reuters-southchinasea-china.html?ref=world&amp;_r=0


Processing URLs:  62%|██████▏   | 615/1000 [27:32<06:30,  1.01s/it]

Error extracting text from http://www.imdb.com/title/tt0133093/: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0133093/


Processing URLs:  62%|██████▏   | 617/1000 [27:37<10:42,  1.68s/it]

Error extracting text from http://www.levc.com/corporate/news/london-taxi-company-becomes-levc/: 404 Client Error: Not Found for url: https://levc.com/corporate/news/london-taxi-company-becomes-levc/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.kba.de/DE/Statistik/Fahrzeuge/Neuzulassungen/MonatlicheNeuzulassungen/2016/201604GV1monatlich/201604_nzbarometer/201604_n_barometer.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=de&amp;u=http://www.kba.de/DE/Statistik/Fahrzeuge/Neuzulassungen/MonatlicheNeuzulassungen/2016/201604GV1monatlich/201604_nzbarometer/201604_n_barometer.html&amp;prev=search


Processing URLs:  62%|██████▏   | 618/1000 [27:53<37:22,  5.87s/it]

Error extracting text from https://www.almasdarnews.com/article/expert-isis-receiving-weapons-directly-eastern-europe-via-saudi-arabia-turkey/: 522 Server Error:  for url: https://www.almasdarnews.com/article/expert-isis-receiving-weapons-directly-eastern-europe-via-saudi-arabia-turkey/


Processing URLs:  62%|██████▏   | 620/1000 [27:57<24:31,  3.87s/it]

Error extracting text from https://www.wcl.american.edu/faculty/metcalfe/: 404 Client Error: Not Found for url: https://www.wcl.american.edu/community/faculty/profile/metcalfe/
URL filtered: https://www.google.com/amp/s/www.businessinsider.com/mark-zuckerberg-alex-jones-facebook-ban-soften-lenient-report-2021-2%3famp


Processing URLs:  62%|██████▏   | 622/1000 [27:59<16:35,  2.63s/it]

Error extracting text from http://www.nola.com/military/index.ssf/2016/10/russia_warns_us_not_to_attack.html: 404 Client Error: Not Found for url: https://www.nola.com/military/index.ssf/2016/10/russia_warns_us_not_to_attack.html


Processing URLs:  62%|██████▎   | 625/1000 [28:02<09:59,  1.60s/it]

Error extracting text from http://www.reuters.com/article/2015/11/19/us-mideast-crisis-syria-group-idUSKCN0T81EY20151119: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/19/us-mideast-crisis-syria-group-idUSKCN0T81EY20151119


Processing URLs:  63%|██████▎   | 627/1000 [28:06<11:08,  1.79s/it]

Error extracting text from http://www.timesofisrael.com/israeli-source-iran-will-first-invest-freed-up-funds-in-military/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/israeli-source-iran-will-first-invest-freed-up-funds-in-military/


Processing URLs:  63%|██████▎   | 630/1000 [28:26<32:59,  5.35s/it]

Error extracting text from http://www.almasdarnews.com/article/russian-jets-strike-isis-in-east-aleppo-to-propel-the-syrian-armys-advance-on-kuweires-airbase/: 522 Server Error:  for url: https://www.almasdarnews.com/article/russian-jets-strike-isis-in-east-aleppo-to-propel-the-syrian-armys-advance-on-kuweires-airbase/


Processing URLs:  63%|██████▎   | 631/1000 [28:27<24:54,  4.05s/it]

Error extracting text from http://www.wsj.com/articles/ban-ki-moon-returns-to-south-korea-in-bid-to-lead-it-148420504: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ban-ki-moon-returns-to-south-korea-in-bid-to-lead-it-148420504


Processing URLs:  64%|██████▎   | 637/1000 [28:34<07:35,  1.25s/it]

Error extracting text from https://ixquick-proxy.com/do/spg/show_picture.pl?l=english&amp;rais=1&amp;oiu=http%3A%2F%2Fwww.pakistantv.tv%2Fwp-content%2Fuploads%2F2016%2F05%2FNawaz-Sharif-in-ICU-Latest.jpg&amp;sp=6a0369639c162914c618612795c59a08: HTTPSConnectionPool(host='ixquick-proxy.com', port=443): Max retries exceeded with url: /do/spg/show_picture.pl?l=english&amp;rais=1&amp;oiu=http%3A%2F%2Fwww.pakistantv.tv%2Fwp-content%2Fuploads%2F2016%2F05%2FNawaz-Sharif-in-ICU-Latest.jpg&amp;sp=6a0369639c162914c618612795c59a08 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fc8b7140>: Failed to resolve 'ixquick-proxy.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  64%|██████▍   | 643/1000 [28:42<06:04,  1.02s/it]

Error extracting text from http://www.reuters.com/article/illinois-bonds/update-2-illinois-to-sell-6-75-bln-of-bonds-by-year-end-idUSL2N1M31YM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/illinois-bonds/update-2-illinois-to-sell-6-75-bln-of-bonds-by-year-end-idUSL2N1M31YM


Processing URLs:  64%|██████▍   | 645/1000 [28:44<04:54,  1.20it/s]

Error extracting text from https://www.yahoo.com/news/kinshasa-faces-unrest-congolese-president-refuses-step-down-174506688.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/kinshasa-faces-unrest-congolese-president-refuses-step-down-174506688.html


Processing URLs:  65%|██████▍   | 649/1000 [30:39<1:50:21, 18.87s/it]

Error extracting text from http://www.komodoexercise.org/#!about/cee5: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304922030>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/29/upshot/surge-for-sanders-or-trump-in-iowa-voter-registration-doesnt-suggest-it.html


Processing URLs:  65%|██████▌   | 650/1000 [30:39<1:22:57, 14.22s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-idUSKBN15304O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-idUSKBN15304O


Processing URLs:  65%|██████▌   | 651/1000 [30:56<1:26:19, 14.84s/it]

Error extracting text from https://www.investopedia.com/chinas-economic-strength-to-continue-into-2021-5100844: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/chinas-economic-strength-to-continue-into-2021-5100844


Processing URLs:  65%|██████▌   | 652/1000 [30:57<1:04:06, 11.05s/it]

URL filtered: https://www.youtube.com/watch?v=OeQ6jYzt6cM


Processing URLs:  66%|██████▌   | 655/1000 [31:00<30:09,  5.24s/it]  

Error extracting text from http://www.reuters.com/article/2015/11/07/southchinasea-china-idUSL3N13203O20151107#stzSMPsuITKEwcF3.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/07/southchinasea-china-idUSL3N13203O20151107#stzSMPsuITKEwcF3.97


Processing URLs:  66%|██████▌   | 659/1000 [31:08<14:12,  2.50s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-idUSKBN15V0Q8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-idUSKBN15V0Q8


Processing URLs:  66%|██████▌   | 660/1000 [31:09<10:50,  1.91s/it]

Error extracting text from https://news.mongabay.com/2021/04/cattle-driven-clearing-continues-in-brazils-triunfo-do-xingu-protected-area/: 403 Client Error: Forbidden for url: https://news.mongabay.com/2021/04/cattle-driven-clearing-continues-in-brazils-triunfo-do-xingu-protected-area/


Processing URLs:  66%|██████▌   | 661/1000 [31:10<10:19,  1.83s/it]

Error extracting text from http://the-japan-news.com/news/article/0002502625: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002502625


Processing URLs:  66%|██████▌   | 662/1000 [31:11<08:59,  1.60s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/seoul-time-push-nuclear-talks-north-korea-36436894: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/seoul-time-push-nuclear-talks-north-korea-36436894


Processing URLs:  67%|██████▋   | 668/1000 [31:19<05:21,  1.03it/s]

Error extracting text from http://www.nti.org/about/global-nuclear-policy/: 403 Client Error: Forbidden for url: https://www.nti.org/about/global-nuclear-policy/


Processing URLs:  67%|██████▋   | 672/1000 [31:34<17:40,  3.23s/it]

Error extracting text from https://www.stripes.com/news/us-allies-grapple-with-countering-russia-s-cyberoffensive-1.493254: 404 Client Error: Not Found for url: https://www.stripes.com/theaters/us-allies-grapple-with-countering-russia-s-cyberoffensive-1.493254


Processing URLs:  67%|██████▋   | 674/1000 [31:37<12:15,  2.26s/it]

Error extracting text from http://www.arabnews.com/node/1217631/business-economy: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1217631/business-economy


Processing URLs:  68%|██████▊   | 678/1000 [31:42<07:17,  1.36s/it]

Error extracting text from http://www.wsj.com/articles/u-s-military-officials-aim-to-bolster-troop-presence-in-europe-1447034653: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-military-officials-aim-to-bolster-troop-presence-in-europe-1447034653


Processing URLs:  69%|██████▉   | 688/1000 [32:00<10:41,  2.06s/it]

URL filtered: https://www.youtube.com/watch?v=MMFj8uDubsE


Processing URLs:  69%|██████▉   | 692/1000 [32:07<08:41,  1.69s/it]

Error extracting text from http://thehill.com/opinion/international/359196-were-not-putting-up-a-fight-against-russias-cyber-warfare: 403 Client Error: Forbidden for url: https://thehill.com/opinion/international/359196-were-not-putting-up-a-fight-against-russias-cyber-warfare/


Processing URLs:  69%|██████▉   | 693/1000 [32:08<07:55,  1.55s/it]

Error extracting text from http://www.ibtimes.co.uk/iraqs-victory-over-isis-ramadi-pivotal-daesh-are-far-beaten-1535243: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/iraqs-victory-over-isis-ramadi-pivotal-daesh-are-far-beaten-1535243


Processing URLs:  70%|██████▉   | 695/1000 [32:10<06:05,  1.20s/it]

Error extracting text from https://travel.state.gov/content/travel/en/News/visas-news/safely-resuming-travel-by-vaccine-requirement-and-rescission-of-travel-restrictions.html: 404 Client Error: Not Found for url: https://travel.state.gov/content/travel/en/News/visas-news/safely-resuming-travel-by-vaccine-requirement-and-rescission-of-travel-restrictions.html


Processing URLs:  70%|██████▉   | 696/1000 [32:10<05:04,  1.00s/it]

Error extracting text from https://www.whitehouse.gov/briefing-room/presidential-actions/executive-orders: 404 Client Error: Not Found for url: https://www.whitehouse.gov/briefing-room/presidential-actions/executive-orders


Processing URLs:  70%|██████▉   | 698/1000 [32:14<07:20,  1.46s/it]

URL filtered: https://twitter.com/USAmbUN/status/1424542745830514688


Processing URLs:  70%|███████   | 703/1000 [32:25<12:22,  2.50s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-10/venezuela-bonds-drop-for-third-day-on-heightened-default-concern


Processing URLs:  70%|███████   | 705/1000 [32:26<07:24,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/japan-shares-rise-on-weak-yen-1447810897: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/japan-shares-rise-on-weak-yen-1447810897


Processing URLs:  71%|███████   | 706/1000 [32:28<08:13,  1.68s/it]

Error extracting text from http://interactive.aljazeera.com/aje/2016/syria_why_aleppo_matters/: 404 Client Error: Not Found for url: https://interactive.aljazeera.com/aje/2016/syria_why_aleppo_matters/


Processing URLs:  71%|███████   | 710/1000 [32:35<07:25,  1.54s/it]

URL filtered: https://www.youtube.com/watch?v=jKk2PoJCQ0g


Processing URLs:  72%|███████▏  | 715/1000 [32:42<07:36,  1.60s/it]

Error extracting text from http://doi.org/10.4103/0973-1229.91426: HTTPSConnectionPool(host='viva99.io', port=443): Max retries exceeded with url: /text.asp?2012/10/1/158/91426 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303a76630>: Failed to resolve 'viva99.io' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  72%|███████▏  | 718/1000 [32:45<04:44,  1.01s/it]

URL filtered: http://thehill.com/opinion/technology/358055-in-russia-meddling-facebook-twitter-dont-know-what-theyre-up-against
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://rota2014.blogspot.com/2016/03/marcelo-odebrecht-comparsa-de-lula.html&amp;usg=ALkJrhiEcYvaDHItEMz9AmFev4S2ELo7_w: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://rota2014.blogspot.com/2016/03/marcelo-odebrecht-comparsa-de-lula.html&amp;usg=ALkJrhiEcYvaDHItEMz9AmFev4S2ELo7_w


Processing URLs:  72%|███████▎  | 725/1000 [32:58<08:41,  1.90s/it]

Error extracting text from http://blogs.wsj.com/riskandcompliance/2016/04/27/corruption-currents-panama-papers-database-coming-soon/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/riskandcompliance/2016/04/27/corruption-currents-panama-papers-database-coming-soon/


Processing URLs:  73%|███████▎  | 728/1000 [33:00<05:29,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-impeachment-idUSKCN11007P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-impeachment-idUSKCN11007P


Processing URLs:  73%|███████▎  | 731/1000 [34:02<1:18:30, 17.51s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/haiti/article70385237.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  73%|███████▎  | 733/1000 [34:07<44:27,  9.99s/it]  

Error extracting text from https://www.freightwaves.com/news/amazon-logistics-historic-growth-spurt-in-context: 403 Client Error: Forbidden for url: https://www.freightwaves.com/news/amazon-logistics-historic-growth-spurt-in-context


Processing URLs:  73%|███████▎  | 734/1000 [34:07<31:37,  7.13s/it]

Error extracting text from http://news.abs-cbn.com/news/05/02/17/ph-to-china-pag-asa-island-construction-is-legal: 403 Client Error: Forbidden for url: http://news.abs-cbn.com/news/05/02/17/ph-to-china-pag-asa-island-construction-is-legal


Processing URLs:  74%|███████▎  | 737/1000 [34:10<14:10,  3.23s/it]

Error extracting text from http://thephilippinestar.ph/articles/2016-03-14/news/china-to-set-up-international-maritime-judicial-center/143928: HTTPConnectionPool(host='thephilippinestar.ph', port=80): Max retries exceeded with url: /articles/2016-03-14/news/china-to-set-up-international-maritime-judicial-center/143928 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30153aa50>: Failed to resolve 'thephilippinestar.ph' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  74%|███████▍  | 738/1000 [34:11<10:59,  2.52s/it]

Error extracting text from http://seekingalpha.com/news/3244327-hurdles-mount-saudi-aramco-ipo: 403 Client Error: Forbidden for url: https://seekingalpha.com/news/3244327-hurdles-mount-saudi-aramco-ipo


Processing URLs:  74%|███████▍  | 741/1000 [34:15<06:39,  1.54s/it]

Error extracting text from http://www.mintpressnews.com/saudi-king-hospitalized-for-dementia/210145/: 403 Client Error: Forbidden for url: http://www.mintpressnews.com/saudi-king-hospitalized-for-dementia/210145/


Processing URLs:  75%|███████▍  | 746/1000 [34:24<07:40,  1.81s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-syria-usa-idUKKBN1722UU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  75%|███████▍  | 747/1000 [34:26<08:26,  2.00s/it]

URL filtered: https://www.youtube.com/watch?v=e2cjVhUrmII


Processing URLs:  75%|███████▍  | 749/1000 [34:29<07:16,  1.74s/it]

Error extracting text from http://newsok.com/vp-joe-biden-says-he-will-not-run-for-president-in-2016/article/5454992?newsletter=breaking-email-dynamic: 404 Client Error: OK for url: https://www.oklahoman.com/vp-joe-biden-says-he-will-not-run-for-president-in-2016/article/5454992/


Processing URLs:  75%|███████▌  | 752/1000 [34:32<05:09,  1.25s/it]

Error extracting text from https://www.google.com/amp/s/www.wired.com/story/waymo-google-arizona-phoenix-driverless-self-driving-cars/amp: 403 Client Error: Forbidden for url: https://www.wired.com/story/waymo-google-arizona-phoenix-driverless-self-driving-cars/amp


Processing URLs:  76%|███████▌  | 755/1000 [34:54<19:53,  4.87s/it]

Error extracting text from https://www.poundsterlinglive.com/gbp-live-today/5076-pound-to-euro-citi-analysis-forecast: 500 Server Error: Internal Server Error for url: https://www.poundsterlinglive.com/gbp-live-today/5076-pound-to-euro-citi-analysis-forecast


Processing URLs:  76%|███████▌  | 759/1000 [34:58<08:21,  2.08s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-02/venezuela-bondholders-spooked-as-sanctions-spur-default-fears


Processing URLs:  76%|███████▌  | 761/1000 [34:59<05:27,  1.37s/it]

Error extracting text from http://www.ipcinfo.org/ipcinfo-detail-forms/ipcinfo-map-detail/en/c/471270/: 404 Client Error: Not Found for url: https://www.ipcinfo.org/ipcinfo-detail-forms/ipcinfo-map-detail/en/c/471270/


Processing URLs:  76%|███████▋  | 763/1000 [35:00<03:53,  1.01it/s]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NZ1CYM6K50YB01-1STN880ULK84DHJMRAN7V3B7EU


Processing URLs:  76%|███████▋  | 765/1000 [35:01<02:41,  1.45it/s]

Error extracting text from http://www.politics.co.uk/news/2016/07/19/boris-johnson-once-outed-mi6-spy-for-a-laugh: 403 Client Error: Forbidden for url: http://www.politics.co.uk/news/2016/07/19/boris-johnson-once-outed-mi6-spy-for-a-laugh


Processing URLs:  77%|███████▋  | 771/1000 [35:11<05:26,  1.43s/it]

Error extracting text from http://www.nytimes.com/2016/11/15/opinion/colombias-revised-peace-accord.html?emc=edit_ee_20161115&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/15/opinion/colombias-revised-peace-accord.html?emc=edit_ee_20161115&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  77%|███████▋  | 772/1000 [35:12<04:59,  1.31s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2017/03/07/96/0200000000AEN20170307009200315F.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  77%|███████▋  | 773/1000 [35:12<03:47,  1.00s/it]

Error extracting text from https://www.nytimes.com/2017/11/28/technology/uber-waymo-lawsuit.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/28/technology/uber-waymo-lawsuit.html


Processing URLs:  77%|███████▋  | 774/1000 [35:12<02:58,  1.27it/s]

Error extracting text from http://www.wsj.com/articles/iraqi-gas-facility-stormed-by-gunmen-1469954013: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/iraqi-gas-facility-stormed-by-gunmen-1469954013


Processing URLs:  78%|███████▊  | 775/1000 [35:13<02:33,  1.46it/s]

Error extracting text from https://arabic.cnn.com/middle-east/article/2021/07/24/khalfan-qatar-iran-correct-policy&amp;ved=2ahUKEwiWwp-My67yAhUTrRQKHYFKBdkQxfQBMAF6BAgIEAM&amp;usg=AOvVaw2xzG_KdgAXncpwgsh5dFib: 404 Client Error: Not Found for url: https://arabic.cnn.com/middle-east/article/2021/07/24/khalfan-qatar-iran-correct-policy&amp;ved=2ahUKEwiWwp-My67yAhUTrRQKHYFKBdkQxfQBMAF6BAgIEAM&amp;usg=AOvVaw2xzG_KdgAXncpwgsh5dFib


Processing URLs:  78%|███████▊  | 778/1000 [35:17<03:36,  1.02it/s]

Error extracting text from http://news.yahoo.com/panama-president-urges-canal-consortium-focus-expansion-020151490.html: 404 Client Error: Not Found for url: http://news.yahoo.com/panama-president-urges-canal-consortium-focus-expansion-020151490.html
URL filtered: https://twitter.com/laurnorman/status/1450122885956243460


Processing URLs:  78%|███████▊  | 785/1000 [36:00<16:43,  4.67s/it]

Error extracting text from http://www.reuters.com/article/2015/08/21/us-southchinasea-china-pentagon-idUSKCN0QQ0S920150821#wf6CUKSF8CM75cLB.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/08/21/us-southchinasea-china-pentagon-idUSKCN0QQ0S920150821#wf6CUKSF8CM75cLB.97


Processing URLs:  79%|███████▊  | 786/1000 [36:01<12:49,  3.60s/it]

Error extracting text from http://www.governor.ny.gov/news/governor-cuomo-announces-cruise-automation-applying-begin-first-fully-autonomous-vehicle: 403 Client Error: Forbidden for url: https://www.governor.ny.gov/news/governor-cuomo-announces-cruise-automation-applying-begin-first-fully-autonomous-vehicle


Processing URLs:  79%|███████▉  | 789/1000 [36:04<06:05,  1.73s/it]

Error extracting text from http://www.reuters.com/article/us-tesla-prices-idUSKCN0ZT1HH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-prices-idUSKCN0ZT1HH


Processing URLs:  79%|███████▉  | 790/1000 [36:04<04:29,  1.28s/it]

Error extracting text from https://www.khaama.com/intelligence-chief-defense-minister-nominees-introduced-to-parliament-for-voting-01200: 403 Client Error: Forbidden for url: https://www.khaama.com/intelligence-chief-defense-minister-nominees-introduced-to-parliament-for-voting-01200


Processing URLs:  79%|███████▉  | 792/1000 [36:05<02:42,  1.28it/s]

Error extracting text from http://www.nytimes.com/2015/10/23/world/middleeast/us-considering-ways-to-shield-syrian-civilians.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/23/world/middleeast/us-considering-ways-to-shield-syrian-civilians.html


Processing URLs:  80%|███████▉  | 795/1000 [36:14<06:23,  1.87s/it]

Error extracting text from http://newsroom.toyota.co.jp/en/corporate/companyinformation/manufacturing: 404 Client Error: Not Found for url: https://global.toyota/en/company/profile/manufacturing/


Processing URLs:  80%|███████▉  | 797/1000 [36:16<04:37,  1.37s/it]

Error extracting text from https://www.nytimes.com/2017/12/10/us/politics/richard-shelby-roy-moore.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/10/us/politics/richard-shelby-roy-moore.html


Processing URLs:  80%|████████  | 803/1000 [36:36<08:52,  2.70s/it]

Error extracting text from http://thehill.com/policy/energy-environment/320793-climate-skeptics-ask-trump-to-withdraw-from-un-agency: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/320793-climate-skeptics-ask-trump-to-withdraw-from-un-agency/


Processing URLs:  80%|████████  | 805/1000 [36:39<05:48,  1.79s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/judge-hear-arguments-dakota-access-oil-pipeline-work-45794754: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/judge-hear-arguments-dakota-access-oil-pipeline-work-45794754
Error extracting text from http://www.reuters.com/article/us-usa-obamacare-congress-idUSKBN15V2EW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-obamacare-congress-idUSKBN15V2EW


Processing URLs:  81%|████████  | 806/1000 [36:39<04:10,  1.29s/it]

Error extracting text from http://www.reuters.com/article/us-usa-cyber-russia-hpe-specialreport/special-report-hp-enterprise-let-russia-scrutinize-cyberdefense-system-used-by-pentagon-idUSKCN1C716M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-hpe-specialreport/special-report-hp-enterprise-let-russia-scrutinize-cyberdefense-system-used-by-pentagon-idUSKCN1C716M


Processing URLs:  81%|████████  | 810/1000 [36:42<02:52,  1.10it/s]

Error extracting text from https://www.wsj.com/articles/north-korea-threatens-absolute-force-as-u-s-south-hold-military-drills-1503392504: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-threatens-absolute-force-as-u-s-south-hold-military-drills-1503392504


Processing URLs:  81%|████████  | 811/1000 [36:43<03:03,  1.03it/s]

Error extracting text from http://ir.teslamotors.com/static-files/725970e6-eda5-47ab-96e1-422d4045f799: HTTPConnectionPool(host='ir.teslamotors.com', port=80): Max retries exceeded with url: /static-files/725970e6-eda5-47ab-96e1-422d4045f799 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303419430>: Failed to resolve 'ir.teslamotors.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  82%|████████▏ | 815/1000 [36:57<07:29,  2.43s/it]

Error extracting text from https://www.wsj.com/articles/venezuelan-debt-crisis-will-be-huge-and-devilishly-complex-1509724546?cx_testId=16&amp;cx_testVariant=cx&amp;cx_artPos=2&amp;cx_tag=contextual&amp;cx_navSource=newsReel: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelan-debt-crisis-will-be-huge-and-devilishly-complex-1509724546?cx_testId=16&amp;cx_testVariant=cx&amp;cx_artPos=2&amp;cx_tag=contextual&amp;cx_navSource=newsReel


Processing URLs:  82%|████████▏ | 821/1000 [37:05<03:36,  1.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-21/tipping-point-looms-for-south-africa-as-economy-s-despair-grows
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://flaviochaves.com.br/2016/03/18/dilma-rousseff-fraudou-o-diario-oficial-na-nomeacao-de-lula-que-ela-nao-tinha-assinado/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://flaviochaves.com.br/2016/03/18/dilma-rousseff-fraudou-o-diario-oficial-na-nomeacao-de-lula-que-ela-nao-tinha-assinado/&amp;prev=search


Processing URLs:  83%|████████▎ | 828/1000 [37:16<03:45,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-eu-trade-usa-idUSKCN12T095?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-trade-usa-idUSKCN12T095?il=0


Processing URLs:  83%|████████▎ | 830/1000 [37:19<03:47,  1.34s/it]

Error extracting text from http://www.who.int/csr/don/28-june-2017-ah7n9-china/en/: 404 Client Error: Not Found for url: https://www.who.int/csr/don/28-june-2017-ah7n9-china/en/


Processing URLs:  83%|████████▎ | 832/1000 [37:24<05:30,  1.97s/it]

Error extracting text from http://kipac-web.stanford.edu/how-special-3-sigma: HTTPConnectionPool(host='kipac-web.stanford.edu', port=80): Max retries exceeded with url: /how-special-3-sigma (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feaf23c0>: Failed to resolve 'kipac-web.stanford.edu' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▍ | 838/1000 [37:32<03:04,  1.14s/it]

Error extracting text from http://www.tradingeconomics.com/italy/inflation-cpi/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/italy/inflation-cpi/forecast
Error extracting text from http://www.reuters.com/article/us-britain-eu-may-idUSKBN18P21T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-may-idUSKBN18P21T


Processing URLs:  84%|████████▍ | 839/1000 [37:36<05:40,  2.11s/it]

Error extracting text from http://www.trtworld.com/turkey/erdogan-says-assad-must-be-excluded-from-syria-transition-190093: 404 Client Error: Not Found for url: https://www.trtworld.com:443/turkey/erdogan-says-assad-must-be-excluded-from-syria-transition-190093


Processing URLs:  84%|████████▍ | 840/1000 [37:38<05:29,  2.06s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/brazil-braces-for-protests-as-ex-president-lula-joins-cabinet/article29271440/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/brazil-braces-for-protests-as-ex-president-lula-joins-cabinet/article29271440/


Processing URLs:  85%|████████▌ | 850/1000 [38:03<04:41,  1.88s/it]

Error extracting text from https://www.iaea.org/NuclearPower/Downloadable/Meetings/2013/2013-02-11-02-14-TM-INIG/20.smirnov.pdf: 404 Client Error: Not Found for url: https://www.iaea.org/NuclearPower/Downloadable/Meetings/2013/2013-02-11-02-14-TM-INIG/20.smirnov.pdf


Processing URLs:  85%|████████▌ | 851/1000 [38:04<04:01,  1.62s/it]

URL filtered: https://m.youtube.com/watch?v=DvRkSMCDX3o


Processing URLs:  86%|████████▌ | 856/1000 [38:12<03:30,  1.46s/it]

Error extracting text from http://thehill.com/homenews/336692-media-likely-to-be-excluded-from-trump-putin-meeting-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/336692-media-likely-to-be-excluded-from-trump-putin-meeting-report/


Processing URLs:  86%|████████▌ | 858/1000 [38:14<02:59,  1.26s/it]

Error extracting text from https://www.gjopen.com/comments/1335342).: 404 Client Error: Not Found for url: https://www.gjopen.com/comments/1335342).


Processing URLs:  86%|████████▌ | 859/1000 [38:16<03:07,  1.33s/it]

URL filtered: https://twitter.com/NarangVipin/status/1378162325581135873?s=20


Processing URLs:  86%|████████▋ | 863/1000 [38:22<03:22,  1.48s/it]

Error extracting text from http://www.irrawaddy.com/election/news/military-chief-pledges-apolitical-duty-to-nations-defense: 403 Client Error: Forbidden for url: http://www.irrawaddy.com/election/news/military-chief-pledges-apolitical-duty-to-nations-defense
Error extracting text from https://scontent-mia1-1.xx.fbcdn.net/v/t1.0-9/13332855_10154214850354591_5618410058977117036_n.jpg?oh=5e44af08f67fb857ec0c386d44c34547&amp;oe=57CD2C5A: HTTPSConnectionPool(host='scontent-mia1-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13332855_10154214850354591_5618410058977117036_n.jpg?oh=5e44af08f67fb857ec0c386d44c34547&amp;oe=57CD2C5A (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fbded790>: Failed to resolve 'scontent-mia1-1.xx.fbcdn.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▋ | 865/1000 [38:25<03:17,  1.46s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-11-23/yellen-defends-seven-years-of-zero-interest-in-letter-to-nader


Processing URLs:  87%|████████▋ | 869/1000 [38:26<01:32,  1.42it/s]

Error extracting text from http://english.ahram.org.eg/NewsContent/3/12/180351/Business/Economy/World-Bank-cuts-Egypts-predicted--growth-rate.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/3/12/180351/Business/Economy/World-Bank-cuts-Egypts-predicted--growth-rate.aspx
Error extracting text from https://www.newspapers.com/newspage/106698692/: 403 Client Error: Forbidden for url: https://www.newspapers.com/newspage/106698692/


Processing URLs:  87%|████████▋ | 873/1000 [38:35<03:22,  1.60s/it]

URL filtered: https://twitter.com/elizondogabriel/status/714895908438204416?ref_src=twsrc^google|twcamp^serp|twgr^tweet


Processing URLs:  88%|████████▊ | 879/1000 [38:55<05:27,  2.71s/it]

Error extracting text from http://www.atimes.com/article/us-indifferent-scope-syria-peace-talks-shrunk/: 404 Client Error: Not Found for url: https://atimes.com/article/us-indifferent-scope-syria-peace-talks-shrunk/


Processing URLs:  88%|████████▊ | 880/1000 [38:56<04:09,  2.08s/it]

Error extracting text from https://www.hypermind.com/en/offers/hypermind-prescience/: 404 Client Error: Not Found for url: https://www.hypermind.com/en/offers/hypermind-prescience/
URL filtered: https://www.bloomberg.com/politics/articles/2017-04-21/brexit-bill-shouldn-t-delay-trade-talks-too-long-say-leaders


Processing URLs:  88%|████████▊ | 883/1000 [38:57<02:20,  1.20s/it]

Error extracting text from https://www.csoonline.com/article/3236716/authentication/how-hackers-crack-passwords-and-why-you-cant-stop-them.html: 404 Client Error: Not Found for url: https://www.csoonline.com/article/3236716/authentication/how-hackers-crack-passwords-and-why-you-cant-stop-them.html


Processing URLs:  89%|████████▊ | 886/1000 [39:01<01:55,  1.02s/it]

Error extracting text from https://www.researchgate.net/publication/235654523_Voters_versus_Terrorists_Analyzing_the_Effect_of_Terrorist_Events_on_Voter_Turnout: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/235654523_Voters_versus_Terrorists_Analyzing_the_Effect_of_Terrorist_Events_on_Voter_Turnout


Processing URLs:  89%|████████▉ | 888/1000 [39:03<01:59,  1.07s/it]

Error extracting text from http://in.reuters.com/article/venezuela-politics-idINKBN1A31WB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  89%|████████▉ | 889/1000 [39:03<01:31,  1.21it/s]

Error extracting text from http://www.barrons.com/articles/venezuelas-pdvsa-debt-in-selective-default-s-p-says-1477438907: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuelas-pdvsa-debt-in-selective-default-s-p-says-1477438907


Processing URLs:  89%|████████▉ | 890/1000 [39:04<01:46,  1.04it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/tear-gas-fired-montenegro-anti-government-protest-34559363: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/tear-gas-fired-montenegro-anti-government-protest-34559363


Processing URLs:  89%|████████▉ | 893/1000 [39:07<01:37,  1.10it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-hackers/u-s-authorities-identify-six-russian-officials-in-dnc-hack-wsj-idUSKBN1D21MI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-hackers/u-s-authorities-identify-six-russian-officials-in-dnc-hack-wsj-idUSKBN1D21MI


Processing URLs:  90%|████████▉ | 895/1000 [39:09<01:28,  1.19it/s]

Error extracting text from https://reut.rs/33tu3rd: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/uk/uk-parliament-standards-watchdog-investigating-pm-johnsons-foreign-travel-2021-05-10/


Processing URLs:  90%|████████▉ | 898/1000 [39:13<01:43,  1.02s/it]

Error extracting text from http://www.reuters.com/article/venezuela-bonds-cac-idUSL2N1551WL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-bonds-cac-idUSL2N1551WL
Error extracting text from https://www.thelocal.no/20201229/norways-centre-party-the-british-have-a-better-deal-than-the-eea: 403 Client Error: Forbidden for url: https://www.thelocal.no/20201229/norways-centre-party-the-british-have-a-better-deal-than-the-eea


Processing URLs:  90%|█████████ | 902/1000 [39:18<01:51,  1.14s/it]

Error extracting text from https://www.google.com/trends/explore#q=eu+leave%2C: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#q=eu+leave%2C


Processing URLs:  90%|█████████ | 903/1000 [39:28<06:16,  3.88s/it]

Error extracting text from http://www.focus-fen.net/news/2016/04/18/403939/die-zeit-german-chancellors-approval-rating-goes-down-after-boehmermann-ruling.html: HTTPConnectionPool(host='www.focus-fen.net', port=80): Max retries exceeded with url: /news/2016/04/18/403939/die-zeit-german-chancellors-approval-rating-goes-down-after-boehmermann-ruling.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30618b830>: Failed to resolve 'www.focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|█████████ | 905/1000 [39:30<03:55,  2.48s/it]

Error extracting text from https://www.yahoo.com/gma/north-korea-claims-tested-first-intercontinental-missile-070907829--abc-news-topstories.html: 404 Client Error: Not Found for url: https://www.yahoo.com/gma/north-korea-claims-tested-first-intercontinental-missile-070907829--abc-news-topstories.html


Processing URLs:  92%|█████████▏| 919/1000 [40:06<01:59,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-eurozone-crisis-greece-schaeueble-idUSKBN17W05T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eurozone-crisis-greece-schaeueble-idUSKBN17W05T
URL filtered: http://macro-man.blogspot.de/2016/06/trading-brexit-vote.html?utm_source=twitterfeed&amp;utm_medium=twitter


Processing URLs:  92%|█████████▎| 925/1000 [40:12<01:04,  1.17it/s]

Error extracting text from http://www.reuters.com/article/us-tesla-outlook-idUSKBN1621W5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-outlook-idUSKBN1621W5


Processing URLs:  93%|█████████▎| 927/1000 [40:13<00:47,  1.52it/s]

Error extracting text from https://thecounter.org/next-weeks-amazon-union-vote-bessamer-alabama-rwdsu/: 403 Client Error: Forbidden for url: https://thecounter.org/next-weeks-amazon-union-vote-bessamer-alabama-rwdsu/
Error extracting text from http://www.nytimes.com/2016/08/09/world/asia/china-spratly-islands-south-china-sea.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/09/world/asia/china-spratly-islands-south-china-sea.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  93%|█████████▎| 928/1000 [40:13<00:45,  1.57it/s]

Error extracting text from http://www.bostonherald.com/news/national/2016/02/ap_exclusive_iranian_drone_first_over_us_carrier_since_2014: 404 Client Error: Not Found for url: https://www.bostonherald.com/news/national/2016/02/ap_exclusive_iranian_drone_first_over_us_carrier_since_2014
URL filtered: https://www.bnnbloomberg.ca/cop26-finally-set-rules-on-carbon-markets-what-does-it-mean-1.1681308


Processing URLs:  93%|█████████▎| 931/1000 [40:16<01:01,  1.13it/s]

Error extracting text from http://www.newsfultoncounty.com/politics/news/1516287-skorea-kept-on-alert-by-possible-nkorean-missile-launch: 403 Client Error: Forbidden for url: https://www.newsfultoncounty.com/politics/news/1516287-skorea-kept-on-alert-by-possible-nkorean-missile-launch


Processing URLs:  93%|█████████▎| 934/1000 [40:23<01:43,  1.57s/it]

URL filtered: https://www.washingtonpost.com/business/technology/russian-facebook-ads-showed-a-black-woman-firing-a-rifle-amid-efforts-to-stoke-racial-strife/2017/10/02/e4e78312-a785-11e7-b3aa-c0e2e1d41e38_story.html?utm_term=.6a38fc2b0176
Error extracting text from http://www.reuters.com/article/us-usa-venezuela-sanctions-idUSKCN1B5216: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-venezuela-sanctions-idUSKCN1B5216


Processing URLs:  94%|█████████▎| 937/1000 [40:23<00:49,  1.28it/s]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-northkorea/u-s-blames-north-korea-for-wannacry-cyber-attack-idUSKBN1ED00Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-northkorea/u-s-blames-north-korea-for-wannacry-cyber-attack-idUSKBN1ED00Q


Processing URLs:  94%|█████████▍| 941/1000 [40:30<01:28,  1.49s/it]

Error extracting text from http://www.clingendael.nl/sites/default/files/Clingendael_Policy_Brief_Foreign%20Policy%20Responses_September2015.pdf: 404 Client Error: Not Found for url: https://www.clingendael.org/sites/default/files/Clingendael_Policy_Brief_Foreign%20Policy%20Responses_September2015.pdf


Processing URLs:  94%|█████████▍| 942/1000 [40:33<01:51,  1.92s/it]

Error extracting text from https://icrtopblog.org/tag/burundi/: 404 Client Error: Not Found for url: https://icrtopblog.org/tag/burundi/


Processing URLs:  94%|█████████▍| 945/1000 [40:50<04:07,  4.51s/it]

Error extracting text from http://thehill.com/blogs/blog-briefing-room/news/336168-russia-special-counsel-investigation-includes-manafort-may: 403 Client Error: Forbidden for url: https://thehill.com/blogs/blog-briefing-room/news/336168-russia-special-counsel-investigation-includes-manafort-may/


Processing URLs:  95%|█████████▍| 948/1000 [40:51<01:48,  2.10s/it]

Error extracting text from http://mobile.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=0&amp;referer=http://www.theatlantic.com/international/archive/2016/03/iran-election-results-winner/472128/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=0&amp;referer=http://www.theatlantic.com/international/archive/2016/03/iran-election-results-winner/472128/


Processing URLs:  95%|█████████▌| 950/1000 [40:53<01:16,  1.52s/it]

Error extracting text from http://www.telegraph.co.uk/news/newstopics/eureferendum/12193963/EU-referendum-Exclusive-Telegraph-poll-says-Leave-campaign-most-likely-to-win-in-June.htmlhttp://www.telegraph.co.uk/news/politics/12194138/Remain-or-leave-It-all-rests-on-the-risk-factor.html: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/news/newstopics/eureferendum/12193963/EU-referendum-Exclusive-Telegraph-poll-says-Leave-campaign-most-likely-to-win-in-June.htmlhttp:/www.telegraph.co.uk/news/politics/12194138/Remain-or-leave-It-all-rests-on-the-risk-factor.html


Processing URLs:  96%|█████████▌| 955/1000 [41:09<02:47,  3.72s/it]

Error extracting text from http://www.washingtontimes.com/news/2017/feb/16/neil-gorsuch-supreme-court-confirmation-hearing-se/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/feb/16/neil-gorsuch-supreme-court-confirmation-hearing-se/


Processing URLs:  96%|█████████▌| 960/1000 [41:23<01:39,  2.48s/it]

Error extracting text from http://www.reuters.com/article/us-southkorea-politics-idUSKBN15G3IV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-idUSKBN15G3IV?il=0


Processing URLs:  96%|█████████▋| 964/1000 [41:29<00:59,  1.65s/it]

Error extracting text from https://www.aace.com/files/2014-congression-i-work-schedule.pdf: 404 Client Error: Not Found for url: https://www.aace.com/files/2014-congression-i-work-schedule.pdf


Processing URLs:  97%|█████████▋| 971/1000 [42:42<09:37, 19.90s/it]

Error extracting text from http://www.philstar.com:8080/business/2017/03/21/1682989/rcep-talks-seen-wrap-soon: HTTPConnectionPool(host='www.philstar.com', port=8080): Read timed out. (read timeout=60)


Processing URLs:  97%|█████████▋| 974/1000 [42:55<04:15,  9.82s/it]

Error extracting text from https://www.entornointeligente.com/nicaragua-se-acerca-peligrosamente-a-una-dictadura/: 404 Client Error: Not Found for url: https://entornointeligente.com/nicaragua-se-acerca-peligrosamente-a-una-dictadura/


Processing URLs:  98%|█████████▊| 975/1000 [43:56<10:26, 25.07s/it]

Error extracting text from http://aa.com.tr/en/politics/unsc-resolution-gives-syria-s-assad-2-more-years-in-power/494968: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  98%|█████████▊| 978/1000 [43:57<03:13,  8.78s/it]

Error extracting text from https://de.sputniknews.com/zeitungen/20161003/312792019/montenegro-nato-wahlkampagne.html: HTTPSConnectionPool(host='de.sputniknews.com', port=443): Max retries exceeded with url: /zeitungen/20161003/312792019/montenegro-nato-wahlkampagne.html (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3018665a0>: Failed to resolve 'de.sputniknews.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-usa-trade-nafta-mexico-idUSKBN1582UQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trade-nafta-mexico-idUSKBN1582UQ


Processing URLs:  98%|█████████▊| 979/1000 [43:58<02:16,  6.52s/it]

Error extracting text from http://www.go-baduk-weiqi.de/gewinnspiel-lee-sedol-gegen-alphago/: 436 Client Error:  for url: http://www.go-baduk-weiqi.de/gewinnspiel-lee-sedol-gegen-alphago/
Error extracting text from https://www.reuters.com/article/us-germany-politics-coalition/germanys-fdp-does-not-expect-coalition-to-form-before-christmas-idUSKBN1CJ0DO?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-coalition/germanys-fdp-does-not-expect-coalition-to-form-before-christmas-idUSKBN1CJ0DO?il=0
URL filtered: https://www.youtube.com/watch?v=-UTEiKuQC5U


Processing URLs:  98%|█████████▊| 982/1000 [44:00<00:57,  3.21s/it]

Error extracting text from http://washingtontimes.com/news/2017/mar/3/legal-marijuana-likely-coming-canada-matter-months: 403 Client Error: Forbidden for url: https://www.washingtontimes.com:443/news/2017/mar/3/legal-marijuana-likely-coming-canada-matter-months


Processing URLs:  98%|█████████▊| 983/1000 [44:01<00:45,  2.69s/it]

URL filtered: https://twitter.com/hashtag/ttip?lang=en
URL filtered: http://www.bloomberg.com/view/articles/2016-08-11/venezuela-has-good-reasons-to-avoid-default
Error extracting text from http://kids.clerk.house.gov/high-school/lesson.html?intID=17: HTTPConnectionPool(host='kids.clerk.house.gov', port=80): Max retries exceeded with url: /high-school/lesson.html?intID=17 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303a75820>: Failed to resolve 'kids.clerk.house.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▊| 987/1000 [44:01<00:17,  1.31s/it]

Error extracting text from http://pressroom.toyota.com/releases/mazda+denso+toyota+sign+joint+technology+development+contract+electric+vehicles.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/mazda+denso+toyota+sign+joint+technology+development+contract+electric+vehicles/


Processing URLs:  99%|█████████▉| 988/1000 [44:02<00:15,  1.27s/it]

Error extracting text from http://ssrn.com/abstract=2583528: 403 Client Error: Forbidden for url: https://www.ssrn.com/abstract=2583528


Processing URLs:  99%|█████████▉| 990/1000 [44:05<00:12,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-oil-idUSKCN0Z025R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-idUSKCN0Z025R


Processing URLs:  99%|█████████▉| 992/1000 [44:07<00:08,  1.07s/it]

Error extracting text from http://maritime-executive.com/article/ex-im-bank-faces-hurdles-to-approval: 404 Client Error: Not Found for url: https://maritime-executive.com/403.shtml


Processing URLs:  99%|█████████▉| 993/1000 [44:13<00:15,  2.28s/it]

Error extracting text from http://www.quotenet.com/bond/search?borrower=153: HTTPConnectionPool(host='www.quotenet.com', port=80): Max retries exceeded with url: /bond/search?borrower=153 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x305b23770>: Failed to resolve 'www.quotenet.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  99%|█████████▉| 994/1000 [44:13<00:10,  1.76s/it]

Error extracting text from http://www.wsj.com/articles/s-p-cuts-outlook-on-south-africa-to-negative-1449252238: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/s-p-cuts-outlook-on-south-africa-to-negative-1449252238


Processing URLs: 100%|█████████▉| 996/1000 [44:14<00:04,  1.02s/it]

Error extracting text from https://www.nytimes.com/2017/06/02/us/politics/trump-comey-russia.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/02/us/politics/trump-comey-russia.html?_r=0
Error extracting text from https://www.reuters.com/article/us-britain-eu-tusk-idUSKCN1ME1M1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-tusk-idUSKCN1ME1M1


Processing URLs: 100%|██████████| 1000/1000 [44:19<00:00,  2.66s/it]


Error extracting text from http://thehill.com/homenews/campaign/364140-moore-bumps-jones-from-top-spot-in-alabama-senate-poll: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/364140-moore-bumps-jones-from-top-spot-in-alabama-senate-poll/


Processing URLs:   0%|          | 2/1000 [00:00<01:04, 15.44it/s]

URL filtered: http://www.facebook.com/photo
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-russia-turkey-davutogl-idUSKBN0TR14F20151208: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-russia-turkey-davutogl-idUSKBN0TR14F20151208


Processing URLs:   0%|          | 4/1000 [00:12<58:34,  3.53s/it]

Error extracting text from http://www.buenosairesherald.com/article/202805/psdb-demands-brazil-speaker-cunha-quit: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/202805/psdb-demands-brazil-speaker-cunha-quit
URL filtered: https://www.youtube.com/watch?v=vbeWsxWy1XU


Processing URLs:   1%|          | 8/1000 [00:17<35:09,  2.13s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/china-deploys-missiles-on-contested-island/article28779439/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/china-deploys-missiles-on-contested-island/article28779439/


Processing URLs:   1%|          | 10/1000 [00:21<34:16,  2.08s/it]

Error extracting text from http://training.goodjudgment.com: HTTPConnectionPool(host='training.goodjudgment.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3076da0f0>: Failed to resolve 'training.goodjudgment.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|▏         | 13/1000 [00:21<16:32,  1.01s/it]

Error extracting text from http://emarketalerts.forecast1.com/mic/eabstract.cfm?recno=237154: HTTPConnectionPool(host='emarketalerts.forecast1.com', port=80): Max retries exceeded with url: /mic/eabstract.cfm?recno=237154 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe8419d0>: Failed to resolve 'emarketalerts.forecast1.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-philippines-usa-defence-idUSKBN15A18Z: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-philippines-usa-defence-idUSKBN15A18Z


Processing URLs:   2%|▏         | 21/1000 [00:45<46:59,  2.88s/it]  

URL filtered: https://www.youtube.com/watch?v=JlYLT5vhPfo&amp;t=777s


Processing URLs:   2%|▏         | 23/1000 [00:50<45:49,  2.81s/it]

Error extracting text from http://predictwise.com/politics/uk-politics: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/uk-politics


Processing URLs:   2%|▎         | 25/1000 [00:52<30:22,  1.87s/it]

Error extracting text from https://trends.google.com/trends/explore?date=today%203-m&amp;q=bump%20stocks: 429 Client Error: unknown for url: https://trends.google.com/trends/explore?date=today%203-m&amp;q=bump%20stocks


Processing URLs:   3%|▎         | 28/1000 [00:54<16:48,  1.04s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-arms-taurus-idUSKBN1612LA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-arms-taurus-idUSKBN1612LA


Processing URLs:   3%|▎         | 32/1000 [01:03<22:33,  1.40s/it]

Error extracting text from http://www.nytimes.com/2016/07/28/opinion/is-the-iran-saudi-cold-war-heating-up.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/28/opinion/is-the-iran-saudi-cold-war-heating-up.html?_r=0
Error extracting text from http://www.reuters.com/article/us-bangladesh-violence-idUSKCN0Z12B5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-bangladesh-violence-idUSKCN0Z12B5


Processing URLs:   3%|▎         | 33/1000 [01:03<17:03,  1.06s/it]

Error extracting text from https://www.nytimes.com/2017/02/10/opinion/sunday/charles-schumer-judge-gorsuch-we-wont-be-fooled-again.html?emc=edit_th_20170211&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/10/opinion/sunday/charles-schumer-judge-gorsuch-we-wont-be-fooled-again.html?emc=edit_th_20170211&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0


Processing URLs:   4%|▎         | 37/1000 [01:09<15:50,  1.01it/s]

Error extracting text from https://www.wsj.com/articles/how-american-shale-drillers-flipped-opecs-script-1495618203?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/how-american-shale-drillers-flipped-opecs-script-1495618203?mod=e2fb
Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://g1.globo.com/jornal-nacional/noticia/2016/03/investigadores-suspeitas-contra-lula-tem-base-em-provas-e-depoimentos.html&amp;usg=ALkJrhgotoOVbbjsC-Byexz2L13EezDOYQ: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://g1.globo.com/jornal-nacional/noticia/2016/03/investigadores-suspeitas-contra-lula-tem-base-em-provas-e-depoimentos.html&amp;usg=ALkJrhgotoOVbbjsC-Byexz2L13EezDOYQ


Processing URLs:   4%|▍         | 38/1000 [01:09<11:37,  1.38it/s]

Error extracting text from http://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-oilexports-idUSKBN0TU2SX20151212


Processing URLs:   4%|▍         | 40/1000 [01:11<13:31,  1.18it/s]

Error extracting text from http://www.imdb.com/title/tt3498820/releaseinfo?mode=desktop: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt3498820/releaseinfo?mode=desktop


Processing URLs:   5%|▍         | 46/1000 [01:24<34:27,  2.17s/it]

Error extracting text from https://www.fxempire.com/forecasts/article/oil-price-fundamental-weekly-forecast-market-needs-opec-extension-news-to-feed-rally-407502: 403 Client Error: Forbidden for url: https://www.fxempire.com/forecasts/article/oil-price-fundamental-weekly-forecast-market-needs-opec-extension-news-to-feed-rally-407502


Processing URLs:   5%|▍         | 48/1000 [01:25<22:15,  1.40s/it]

URL filtered: https://www.youtube.com/watch?v=vDkdjrEqu1I&amp;nohtml5=False


Processing URLs:   5%|▌         | 50/1000 [01:28<23:52,  1.51s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950306000745: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950306000745 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30650f0e0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 52/1000 [01:29<19:08,  1.21s/it]

Error extracting text from http://www.newsweek.com/russias-lavrov-warns-sweden-nato-membership-453890: 403 Client Error: Forbidden for url: https://www.newsweek.com/russias-lavrov-warns-sweden-nato-membership-453890
URL filtered: https://www.forbes.com/sites/mattperez/2020/06/29/youtube-bans-white-supremacists-stefan-molyneux-richard-spencer-david-duke/#6daab6f45ff1


Processing URLs:   6%|▌         | 57/1000 [01:34<14:41,  1.07it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-philippines-idUSKBN1780NR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-philippines-idUSKBN1780NR


Processing URLs:   6%|▌         | 58/1000 [01:38<28:33,  1.82s/it]

Error extracting text from http://www.quadcapital.com/Rating%20Agency%20Credit%20Ratings.pdf: 403 Client Error: Forbidden for url: https://www.cbrecap.com/en/contact-us/charlie-knudsenRating%20Agency%20Credit%20Ratings.pdf
URL filtered: https://www.youtube.com/watch?v=UsNnkax2wNA


Processing URLs:   6%|▌         | 60/1000 [01:39<20:37,  1.32s/it]

Error extracting text from https://www.cia.gov/library/publications/resources/the-world-factbook/geos/my.html: 404 Client Error: Not Found for url: https://www.cia.gov/library/publications/resources/the-world-factbook/geos/my.html


Processing URLs:   6%|▌         | 61/1000 [01:41<20:16,  1.30s/it]

Error extracting text from http://www.gov.me/en/News/156481/Government-announces-candidancy-of-FM-Luksic-for-UN-Secretary-General-to-submit-request-for.html: 404 Client Error: not found for url: https://www.gov.me/en/News/156481/Government-announces-candidancy-of-FM-Luksic-for-UN-Secretary-General-to-submit-request-for.html


Processing URLs:   6%|▋         | 64/1000 [01:46<25:32,  1.64s/it]

Error extracting text from http://www.reuters.com/article/us-nato-russia-idUSKCN0YZ0LW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-russia-idUSKCN0YZ0LW


Processing URLs:   7%|▋         | 66/1000 [01:46<15:35,  1.00s/it]

Error extracting text from http://www.wsj.com/articles/global-stocks-up-as-brexit-vote-gets-underway-1466667893: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/global-stocks-up-as-brexit-vote-gets-underway-1466667893


Processing URLs:   7%|▋         | 73/1000 [01:54<14:11,  1.09it/s]

Error extracting text from http://english.ahram.org.eg/NewsContent/3/12/245462/Business/Economy/Number-of-August-tourists-in-Egypt-down-pct-agains.aspx: 403 Client Error: Forbidden for url: http://english.ahram.org.eg/NewsContent/3/12/245462/Business/Economy/Number-of-August-tourists-in-Egypt-down-pct-agains.aspx


Processing URLs:   8%|▊         | 75/1000 [02:01<32:49,  2.13s/it]

Error extracting text from https://africanspress.org/2017/08/31/40-tons-of-south-sudan-weapons-land-at-entebbe/: 436 Client Error:  for url: http://ww16.africanspress.org/2017/08/31/40-tons-of-south-sudan-weapons-land-at-entebbe/?sub1=20240203-0150-5061-8555-5aebec1b532b


Processing URLs:   8%|▊         | 81/1000 [02:14<32:47,  2.14s/it]

URL filtered: https://www.youtube.com/watch?v=qrwlk7_GF9g


Processing URLs:   9%|▊         | 86/1000 [02:16<12:03,  1.26it/s]

Error extracting text from http://www.wsj.com/articles/lockheed-says-qatar-budget-woes-could-delay-defense-deal-1461692108: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/lockheed-says-qatar-budget-woes-could-delay-defense-deal-1461692108
Error extracting text from http://www.reuters.com/article/panama-canal-idUSL1N13M01V20151127#v4xuQzLpv80KlrpX.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/panama-canal-idUSL1N13M01V20151127#v4xuQzLpv80KlrpX.99


Processing URLs:   9%|▉         | 88/1000 [02:17<10:20,  1.47it/s]

Error extracting text from http://www.crisisgroup.org/en/regions/asia/north-east-asia/north-korea/269-north-korea-beyond-the-six-party-talks.aspx: 404 Client Error: Not Found for url: https://www.crisisgroup.org/en/regions/asia/north-east-asia/north-korea/269-north-korea-beyond-the-six-party-talks.aspx


Processing URLs:   9%|▉         | 91/1000 [02:22<18:48,  1.24s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/kerry-warns-beijing-over/2846396.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/kerry-warns-beijing-over/2846396.html


Processing URLs:  10%|▉         | 95/1000 [02:31<29:27,  1.95s/it]

Error extracting text from http://warontherocks.com/2016/08/china-signals-resolve-with-bomber-flights-over-the-south-china-sea/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/08/china-signals-resolve-with-bomber-flights-over-the-south-china-sea/


Processing URLs:  10%|▉         | 98/1000 [02:35<22:29,  1.50s/it]

Error extracting text from http://eng.mod.gov.cn/MilitaryExercises/2016-01/02/content_4634962.htm: 404 Client Error: Not Found for url: http://eng.mod.gov.cn/MilitaryExercises/2016-01/02/content_4634962.htm


Processing URLs:  10%|█         | 101/1000 [02:39<20:50,  1.39s/it]

Error extracting text from http://www.cnbc.com/2015/12/22/reuters-america-update-1-japan-govt-targets-growth-fiscal-reform-in-record-budget-spending-plan.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/12/22/reuters-america-update-1-japan-govt-targets-growth-fiscal-reform-in-record-budget-spending-plan.html
Error extracting text from https://www.reuters.com/business/aerospace-defense/south-korea-sees-imminent-prospect-north-icbm-test-newspaper-2022-03-14/?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=EBB%2003.14.2022&amp;utm_term=Editorial%20-%20Early%20Bird%20Brief: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/aerospace-defense/south-korea-sees-imminent-prospect-north-icbm-test-newspaper-2022-03-14/?utm_source=Sailthru&amp;utm_medium=email&amp;utm_campaign=EBB%2003.14.2022&amp;utm_term=Editorial%20-%20Early%20Bird%20Brief


Processing URLs:  10%|█         | 104/1000 [02:40<12:19,  1.21it/s]

Error extracting text from https://www.nytimes.com/2017/08/23/world/asia/afghanistan-taliban-helmand-suicide-attack.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/23/world/asia/afghanistan-taliban-helmand-suicide-attack.html


Processing URLs:  11%|█         | 106/1000 [02:43<13:22,  1.11it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-24/boko-haram-burns-down-hospital-houses-in-northeastern-nigeria


Processing URLs:  11%|█         | 109/1000 [02:44<09:30,  1.56it/s]

Error extracting text from http://www.debka.com/: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.fabiocampana.com.br/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.fabiocampana.com.br/&amp;prev=search


Processing URLs:  11%|█         | 112/1000 [02:47<10:48,  1.37it/s]

Error extracting text from http://www.ata.gov.al/en/na-nato-membership-of-montenegroprotocol-of-north-atlantic-treaty-ratified/: 403 Client Error: Forbidden for url: http://www.ata.gov.al/en/na-nato-membership-of-montenegroprotocol-of-north-atlantic-treaty-ratified/


Processing URLs:  11%|█▏        | 114/1000 [03:01<1:02:54,  4.26s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/japan-fm-says-russia-key-to-resolving-syria-n-korea-issues/2016/01/19/9ead07f8-bea4-11e5-98c8-7fab78677d51_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/japan-fm-says-russia-key-to-resolving-syria-n-korea-issues/2016/01/19/9ead07f8-bea4-11e5-98c8-7fab78677d51_story.html


Processing URLs:  12%|█▏        | 116/1000 [03:07<52:11,  3.54s/it]  

Error extracting text from http://smile.amazon.com/Deep-State-Constitution-Shadow-Government/dp/0525428348: 500 Server Error: Internal Server Error for url: https://www.amazon.com:443/Deep-State-Constitution-Shadow-Government/dp/0525428348


Processing URLs:  12%|█▏        | 118/1000 [03:10<38:13,  2.60s/it]

URL filtered: http://www.bloomberg.com/politics/articles/2015-09-14/biden-secretly-met-with-top-obama-bundler-during-new-york-swing


Processing URLs:  12%|█▏        | 122/1000 [03:12<17:02,  1.16s/it]

Error extracting text from http://www.reuters.com/article/us-germany-turkey-merkel/in-shift-merkel-backs-end-to-eu-turkey-membership-talks-idUSKCN1BE15B?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-turkey-merkel/in-shift-merkel-backs-end-to-eu-turkey-membership-talks-idUSKCN1BE15B?il=0


Processing URLs:  13%|█▎        | 126/1000 [03:14<09:21,  1.56it/s]

Error extracting text from http://www.basnews.com/index.php/en/news/iraq/289622: 403 Client Error: Forbidden for url: http://www.basnews.com/index.php/en/news/iraq/289622


Processing URLs:  13%|█▎        | 129/1000 [03:17<14:36,  1.01s/it]

Error extracting text from http://www.newsweek.com/merkel-germany-deport-migrants-refugees-428032: 403 Client Error: Forbidden for url: https://www.newsweek.com/merkel-germany-deport-migrants-refugees-428032


Processing URLs:  13%|█▎        | 130/1000 [03:18<13:58,  1.04it/s]

Error extracting text from http://thehill.com/video/in-the-news/267773-live-with-kelly-and-michael-race-pigs-to-predict-iowa-caucus-winner: 403 Client Error: Forbidden for url: https://thehill.com/video/in-the-news/267773-live-with-kelly-and-michael-race-pigs-to-predict-iowa-caucus-winner/


Processing URLs:  13%|█▎        | 133/1000 [03:24<21:43,  1.50s/it]

Error extracting text from http://www.stripes.com/news/iranian-ships-approach-us-vessel-carrying-centcom-chief-1.418669: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/iranian-ships-approach-us-vessel-carrying-centcom-chief-1.418669


Processing URLs:  14%|█▎        | 135/1000 [03:36<48:51,  3.39s/it]  

Error extracting text from https://missilethreat.csis.org/missile/pukkuksong-2/: 403 Client Error: Forbidden for url: https://missilethreat.csis.org/missile/pukkuksong-2/


Processing URLs:  14%|█▎        | 136/1000 [03:36<37:20,  2.59s/it]

Error extracting text from http://www.aol.com/article/2015/10/26/poll-trump-seen-favorably-by-11-percent-of-hispanics/21254161/?icid=maing-grid7|main5|dl1|sec1_lnk3%26pLid%3D862828259: 404 Client Error: Not Found for url: https://www.aol.com/article/2015/10/26/poll-trump-seen-favorably-by-11-percent-of-hispanics/21254161/?icid=maing-grid7%7Cmain5%7Cdl1%7Csec1_lnk3%26pLid%3D862828259


Processing URLs:  14%|█▍        | 141/1000 [03:42<16:55,  1.18s/it]

Error extracting text from http://news.xinhuanet.com/english/2015-10/15/c_134717013.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2015-10/15/c_134717013.htm
Error extracting text from http://www.reuters.com/article/us-health-zika-wolbachia-idUSKCN0XV2UU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-zika-wolbachia-idUSKCN0XV2UU


Processing URLs:  14%|█▍        | 143/1000 [03:45<21:54,  1.53s/it]

Error extracting text from https://www.euronuclear.org/info/encyclopedia/n/nuclear-power-plant-europe.htm: 404 Client Error: Not Found for url: https://www.euronuclear.org/info/encyclopedia/n/nuclear-power-plant-europe.htm


Processing URLs:  14%|█▍        | 144/1000 [03:51<37:27,  2.63s/it]

Error extracting text from http://theiranproject.com/blog/2015/09/22/parliament-to-make-decision-on-jcpoa-next-week/: 404 Client Error: Not Found for url: https://www.theiranproject.com/var/www/theiranproject.ir/web/url_converter.php?url=parliament-to-make-decision-on-jcpoa-next-week


Processing URLs:  14%|█▍        | 145/1000 [03:52<33:32,  2.35s/it]

Error extracting text from http://ftnnews.com/tours/30653-egypt-s-tourist-arrivals-drop-41-9-percent.html: 404 Client Error: Not Found for url: https://ftnnews.com/tours/30653-egypt-s-tourist-arrivals-drop-41-9-percent.html


Processing URLs:  15%|█▍        | 147/1000 [03:55<25:49,  1.82s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/12/19/France-demands-Assad-exit-assurances.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/12/19/France-demands-Assad-exit-assurances.html


Processing URLs:  15%|█▍        | 149/1000 [03:59<26:53,  1.90s/it]

Error extracting text from http://www.tv360nigeria.com/un-condemns-rise-burundi-violence/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  15%|█▌        | 150/1000 [04:00<23:20,  1.65s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/killed-dozen-injured-burundi-blasts-36943034: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/killed-dozen-injured-burundi-blasts-36943034


Processing URLs:  15%|█▌        | 153/1000 [04:03<15:49,  1.12s/it]

Error extracting text from http://news.investors.com/091415-770936-congress-top-priorities-tax-reform-avoid-shutdown.htm#ixzz3lpTRmLWS: 403 Client Error: Forbidden for url: https://news.investors.com/091415-770936-congress-top-priorities-tax-reform-avoid-shutdown.htm#ixzz3lpTRmLWS


Processing URLs:  16%|█▌        | 156/1000 [04:07<13:39,  1.03it/s]

URL filtered: https://www.bloomberg.com/politics/articles/2017-03-26/pboc-s-zhou-signals-financial-opening-will-require-negotiation
Error extracting text from http://www.reuters.com/article/us-britain-eu-poll-idUSKBN0UL0ZR20160107?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-poll-idUSKBN0UL0ZR20160107?mod=related&amp;channelName=worldNews


Processing URLs:  16%|█▌        | 157/1000 [04:07<12:01,  1.17it/s]

Error extracting text from https://corpgov.law.harvard.edu/2019/10/31/spin-offs-unraveled/: 403 Client Error: Forbidden for url: https://corpgov.law.harvard.edu/2019/10/31/spin-offs-unraveled/


Processing URLs:  16%|█▌        | 159/1000 [04:09<13:59,  1.00it/s]

Error extracting text from http://abcnews.go.com/International/wireStory/afghanistan-seeks-3b-aid-corruption-concerns-persist-42522878: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/afghanistan-seeks-3b-aid-corruption-concerns-persist-42522878


Processing URLs:  16%|█▌        | 162/1000 [04:11<08:31,  1.64it/s]

URL filtered: https://m.facebook.com/help/733019746855448
Error extracting text from http://www.barrons.com/articles/fed-fear-emerging-market-bonds-remain-big-winners-1493837865: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/fed-fear-emerging-market-bonds-remain-big-winners-1493837865
URL filtered: https://twitter.com/realDonaldTrump/status/824229586091307008


Processing URLs:  16%|█▋        | 164/1000 [04:11<05:36,  2.48it/s]

Error extracting text from http://www.infotep.gov.do/art.php?id=1202: 403 Client Error: Forbidden for url: http://www.infotep.gov.do/art.php?id=1202


Processing URLs:  17%|█▋        | 166/1000 [05:15<3:31:24, 15.21s/it]

Error extracting text from https://www.betfair.com/exchange/plus/#/politics/market/1.118739911: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x303a753a0>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:  17%|█▋        | 167/1000 [05:18<2:45:22, 11.91s/it]

Error extracting text from http://www.latinamericanpost.com/article/keiko-fujimori-seems-set-to-become-perus-next-pre/#: 404 Client Error: Not Found for url: https://latinamericanpost.com/article/keiko-fujimori-seems-set-to-become-perus-next-pre/


Processing URLs:  17%|█▋        | 168/1000 [05:19<2:08:35,  9.27s/it]

Error extracting text from https://www.faa.gov/uas/media/Part_107_Summary.pdf: 404 Client Error: Not Found for url: https://www.faa.gov/uas/media/Part_107_Summary.pdf


Processing URLs:  17%|█▋        | 169/1000 [05:20<1:34:53,  6.85s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-15/twitter-threat-may-endanger-venezuela-s-oil-for-cash-china-deals
URL filtered: https://www.bloomberg.com/news/videos/2017-11-22/tesla-burns-through-about-8-000-every-minute-video


Processing URLs:  18%|█▊        | 181/1000 [05:34<16:53,  1.24s/it]  

Error extracting text from https://www.google.com/trends/explore#q=%2Fm%2F0jzt8tx%2C%20%2Fm%2F0wrshm2%2C%20%2Fm%2F0cc846d%2C%20%2Fm%2F0125zrjx%2C%20%2Fm%2F0n15g8q&amp;geo=US&amp;cmpt=q&amp;tz=Etc%2FGMT-2: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#q=%2Fm%2F0jzt8tx%2C%20%2Fm%2F0wrshm2%2C%20%2Fm%2F0cc846d%2C%20%2Fm%2F0125zrjx%2C%20%2Fm%2F0n15g8q&amp;geo=US&amp;cmpt=q&amp;tz=Etc%2FGMT-2


Processing URLs:  18%|█▊        | 184/1000 [05:37<12:30,  1.09it/s]

Error extracting text from http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?_r=0


Processing URLs:  18%|█▊        | 185/1000 [05:37<09:38,  1.41it/s]

Error extracting text from http://news.yahoo.com/iran-nuclear-review-panel-says-deal-flawed-103101551.html: 404 Client Error: Not Found for url: http://news.yahoo.com/iran-nuclear-review-panel-says-deal-flawed-103101551.html


Processing URLs:  19%|█▊        | 187/1000 [05:39<12:27,  1.09it/s]

Error extracting text from http://in.reuters.com/article/us-davos-meeting-dollar-idINKBN15111E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  19%|█▉        | 190/1000 [05:42<13:35,  1.01s/it]

Error extracting text from https://www.tuko.co.ke/248083-kdf-reacts-raila-releases-documents-claiming-military-rig-elections.html: 410 Client Error: Gone for url: https://www.tuko.co.ke/248083-kdf-reacts-raila-releases-documents-claiming-military-rig-elections.html


Processing URLs:  19%|█▉        | 193/1000 [05:45<11:35,  1.16it/s]

Error extracting text from http://www.cdm.me/english/lute-us-will-support-montenegro-for-nato-invitation: 403 Client Error: Forbidden for url: https://www.cdm.me/english/lute-us-will-support-montenegro-for-nato-invitation


Processing URLs:  19%|█▉        | 194/1000 [05:46<11:27,  1.17it/s]

Error extracting text from http://finance.yahoo.com/news/corn-etf-pops-usda-pares-173548621.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/corn-etf-pops-usda-pares-173548621.html


Processing URLs:  20%|█▉        | 195/1000 [06:46<4:09:37, 18.61s/it]

Error extracting text from https://www.betonline.ag/sportsbook/futures-and-props/politics-futures/2022-philippine-presidential-election: HTTPSConnectionPool(host='www.betonline.ag', port=443): Max retries exceeded with url: /sportsbook/futures-and-props/politics-futures/2022-philippine-presidential-election (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x301865100>, 'Connection to www.betonline.ag timed out. (connect timeout=60)'))
URL filtered: https://www.bloomberg.com/news/articles/2017-01-03/venezuelan-credit-dashboard-can-default-be-avoided-in-2017


Processing URLs:  20%|█▉        | 198/1000 [07:09<2:34:42, 11.57s/it]

Error extracting text from http://www.washingtonpost.com/world/national-security/us-cyber-pros-test-skills-in-exercise-meant-to-stop-attacks/2016/03/09/1c7d5bca-e5ff-11e5-a9ce-681055c7a05f_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/national-security/us-cyber-pros-test-skills-in-exercise-meant-to-stop-attacks/2016/03/09/1c7d5bca-e5ff-11e5-a9ce-681055c7a05f_story.html
Error extracting text from https://www.congress.gov/bill/114th-congress/house-bill/757: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/house-bill/757


Processing URLs:  20%|██        | 200/1000 [07:11<1:28:25,  6.63s/it]

Error extracting text from http://www.wsj.com/articles/u-s-china-agree-to-sanction-north-korea-on-nuclear-program-1456383545: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-china-agree-to-sanction-north-korea-on-nuclear-program-1456383545


Processing URLs:  21%|██        | 208/1000 [07:24<20:49,  1.58s/it]  

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/10/28/bond-swap-for-venezuelas-pdvsa-delays-default-risk-for-now/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/10/28/bond-swap-for-venezuelas-pdvsa-delays-default-risk-for-now/


Processing URLs:  21%|██        | 212/1000 [07:35<28:42,  2.19s/it]

URL filtered: https://www.facebook.com/help/733019746855448


Processing URLs:  21%|██▏       | 214/1000 [08:35<3:17:36, 15.08s/it]

Error extracting text from http://archive.is/5NLwM: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  22%|██▏       | 220/1000 [08:45<47:50,  3.68s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-10-15/bitcoin-futures-etf-said-not-to-face-sec-opposition-at-deadline?sref=PFCuwcPr&amp;utm_campaign=socialflow-organic&amp;utm_content=business&amp;utm_source=twitter&amp;utm_medium=social&amp;cmpid=socialflow-twitter-business


Processing URLs:  22%|██▏       | 223/1000 [09:04<1:16:31,  5.91s/it]

Error extracting text from http://www.upmc-cbn.org/report_archive/2009/2009-SW-H1N1-Issue-Briefs/2009-06-11-A_Closer_Look_at_WHO_Pandemic_Declaration.html: HTTPConnectionPool(host='www.upmc-cbn.org', port=80): Max retries exceeded with url: /report_archive/2009/2009-SW-H1N1-Issue-Briefs/2009-06-11-A_Closer_Look_at_WHO_Pandemic_Declaration.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3063227b0>: Failed to resolve 'www.upmc-cbn.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://urbandevelopment.berlin/index.php/2021/04/29/opening-of-the-humboldt-forum-interview-with-michael-mathis/: HTTPSConnectionPool(host='urbandevelopment.berlin', port=443): Max retries exceeded with url: /index.php/2021/04/29/opening-of-the-humboldt-forum-interview-with-michael-mathis/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x306323590>: Failed to resolve 'urbandevelopment.berlin' ([Errno 8] 

Processing URLs:  22%|██▎       | 225/1000 [09:04<45:35,  3.53s/it]  

Error extracting text from https://www.thelocal.de/20220323/german-consumer-prices-set-to-rise-steeply-amid-war-in-ukraine/: 403 Client Error: Forbidden for url: https://www.thelocal.de/20220323/german-consumer-prices-set-to-rise-steeply-amid-war-in-ukraine


Processing URLs:  23%|██▎       | 228/1000 [09:06<25:21,  1.97s/it]

Error extracting text from https://www.jns.org/opinion/israels-blame-my-predecessor-iran-strategy-revealed/: 403 Client Error: Forbidden for url: https://www.jns.org/opinion/israels-blame-my-predecessor-iran-strategy-revealed/


Processing URLs:  23%|██▎       | 230/1000 [09:11<25:40,  2.00s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/06/22/0200000000AEN20160622010600315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  23%|██▎       | 231/1000 [09:11<19:24,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/russian-special-forces-seen-as-key-to-aleppo-victory-1481884200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/russian-special-forces-seen-as-key-to-aleppo-victory-1481884200


Processing URLs:  23%|██▎       | 233/1000 [09:13<15:12,  1.19s/it]

Error extracting text from http://www.reuters.com/article/2015/12/03/usa-oilexports-house-idUSL1N13S20620151203#qm1tBxA8xbzZBEx9.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/03/usa-oilexports-house-idUSL1N13S20620151203#qm1tBxA8xbzZBEx9.99


Processing URLs:  24%|██▎       | 235/1000 [09:17<17:59,  1.41s/it]

Error extracting text from https://www.yahoo.com/tech/nissan-confirms-next-gen-leaf-135003615.html: 404 Client Error: Not Found for url: https://www.yahoo.com/tech/nissan-confirms-next-gen-leaf-135003615.html


Processing URLs:  24%|██▎       | 236/1000 [09:18<16:34,  1.30s/it]

URL filtered: https://www.youtube.com/watch?v=35rErQtJ6uA


Processing URLs:  24%|██▍       | 238/1000 [09:20<14:03,  1.11s/it]

Error extracting text from http://blogs.spectator.co.uk/2016/05/brexit-odds-live-updates-on-percentage-chance-of-uk-leaving-the-eu/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2016/05/brexit-odds-live-updates-on-percentage-chance-of-uk-leaving-the-eu/


Processing URLs:  24%|██▍       | 244/1000 [09:33<18:04,  1.44s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-mosul-idUSKCN12I1NB?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-mosul-idUSKCN12I1NB?il=0


Processing URLs:  25%|██▍       | 247/1000 [09:43<28:42,  2.29s/it]

Error extracting text from http://abcnews.go.com/US/wireStory/virginia-state-senator-travels-syria-praises-assad-38709529: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/virginia-state-senator-travels-syria-praises-assad-38709529


Processing URLs:  25%|██▌       | 250/1000 [09:47<21:47,  1.74s/it]

Error extracting text from http://uk.reuters.com/article/uk-eurozone-greece-reforms-idUKKCN0V41TP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  25%|██▌       | 252/1000 [10:52<4:07:29, 19.85s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-12-11/barack-obama-records-robocall-urging-alabama-voters-to-reject-roy-moore: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  25%|██▌       | 254/1000 [10:54<2:08:27, 10.33s/it]

Error extracting text from https://www.freightwaves.com/news/in-the-eye-of-the-congestion-storm-qa-with-port-of-las-gene-seroka: 403 Client Error: Forbidden for url: https://www.freightwaves.com/news/in-the-eye-of-the-congestion-storm-qa-with-port-of-las-gene-seroka


Processing URLs:  26%|██▌       | 255/1000 [10:56<1:36:55,  7.81s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/aung-san-suu-kyi-meets/2456946.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/aung-san-suu-kyi-meets/2456946.html


Processing URLs:  26%|██▌       | 257/1000 [11:01<1:02:56,  5.08s/it]

URL filtered: https://www.youtube.com/watch?v=2sky1tt8vLA


Processing URLs:  26%|██▌       | 262/1000 [12:05<3:41:04, 17.97s/it]

Error extracting text from http://blogs.rollcall.com/218/breaking-cr-vote/: HTTPConnectionPool(host='blogs.rollcall.com', port=80): Max retries exceeded with url: /218/breaking-cr-vote/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ff3d9b20>, 'Connection to blogs.rollcall.com timed out. (connect timeout=60)'))


Processing URLs:  26%|██▋       | 264/1000 [13:06<5:26:33, 26.62s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2018-01-12/twenty-one-hours-and-counting-german-coalition-talks-drag-on: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 267/1000 [13:09<2:03:56, 10.15s/it]

Error extracting text from https://www.newsweek.com/biden-irans-survival-strategy-opinion-1608643: 403 Client Error: Forbidden for url: https://www.newsweek.com/biden-irans-survival-strategy-opinion-1608643


Processing URLs:  27%|██▋       | 268/1000 [13:11<1:31:16,  7.48s/it]

Error extracting text from http://raconteur.net/business/is-future-cyber-crime-a-nightmare-scenario: 404 Client Error: Not Found for url: https://www.raconteur.net/business/is-future-cyber-crime-a-nightmare-scenario


Processing URLs:  27%|██▋       | 271/1000 [13:12<34:43,  2.86s/it]  

Error extracting text from https://www.applegazette.com/apple-inc/apples-spinoff-companies/: 403 Client Error: Forbidden for url: https://www.applegazette.com/apple-inc/apples-spinoff-companies/
Error extracting text from http://www.rand.org/blog/2016/12/beyond-the-headlines-rands-christopher-paul-discusses.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2016/12/beyond-the-headlines-rands-christopher-paul-discusses.html


Processing URLs:  27%|██▋       | 273/1000 [13:14<22:05,  1.82s/it]

Error extracting text from http://www.nytimes.com/2015/12/21/world/asia/afghan-government-faces-new-set-of-rivals.html?emc=edit_ee_20151221&amp;nl=todaysheadlines-europe&amp;nlid=70183565: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/21/world/asia/afghan-government-faces-new-set-of-rivals.html?emc=edit_ee_20151221&amp;nl=todaysheadlines-europe&amp;nlid=70183565


Processing URLs:  28%|██▊       | 279/1000 [13:38<49:28,  4.12s/it]  

Error extracting text from https://www.arabnews.com/node/1842531/world: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1842531/world


Processing URLs:  28%|██▊       | 283/1000 [13:45<28:00,  2.34s/it]

Error extracting text from http://www.levada.ru/eng/: 404 Client Error: Not Found for url: https://www.levada.ru/eng/
Error extracting text from https://www.reuters.com/world/middle-east/iran-hails-palestinian-victory-warns-deadly-blows-against-israel-2021-05-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/iran-hails-palestinian-victory-warns-deadly-blows-against-israel-2021-05-21/


Processing URLs:  29%|██▉       | 291/1000 [13:54<11:30,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-un-idUSKBN0TV02620151212: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-un-idUSKBN0TV02620151212


Processing URLs:  29%|██▉       | 293/1000 [13:55<08:04,  1.46it/s]

Error extracting text from https://www.wsj.com/articles/oil-slips-as-opec-meeting-disappoints-1496051940: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-slips-as-opec-meeting-disappoints-1496051940
URL filtered: https://twitter.com/AmarAmarasingam/status/787697937803468800
URL filtered: https://twitter.com/D3M0_Anon/status/1026464797133422593


Processing URLs:  30%|██▉       | 296/1000 [13:55<04:17,  2.73it/s]

Error extracting text from http://www.wsj.com/articles/china-economic-growth-slows-to-6-9-on-year-in-2015-1453169398: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-economic-growth-slows-to-6-9-on-year-in-2015-1453169398


Processing URLs:  30%|██▉       | 297/1000 [13:56<05:13,  2.24it/s]

Error extracting text from http://www.lloyds.com/~/media/lloyds/reports/emerging%20risk%20reports/solar%20storm%20risk%20to%20the%20north%20american%20electric%20grid.pdf: HTTPSConnectionPool(host='www.lloyds.com', port=443): Max retries exceeded with url: /~/media/lloyds/reports/emerging%20risk%20reports/solar%20storm%20risk%20to%20the%20north%20american%20electric%20grid.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  30%|██▉       | 299/1000 [13:56<05:03,  2.31it/s]

Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-peacetalks-idUSKCN12I0O2


Processing URLs:  30%|███       | 301/1000 [13:57<04:57,  2.35it/s]

Error extracting text from http://www.portman.senate.gov/public/index.cfm/press-releases?ID=53CBFD89-45E7-4C47-9E19-AF3205EA7799: HTTPConnectionPool(host='www.portman.senate.gov', port=80): Max retries exceeded with url: /public/index.cfm/press-releases?ID=53CBFD89-45E7-4C47-9E19-AF3205EA7799 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306321af0>: Failed to resolve 'www.portman.senate.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|███       | 302/1000 [14:01<14:50,  1.28s/it]

Error extracting text from http://tass.ru/en/politics/858942: 404 Client Error: Not Found for url: https://tass.ru/en/politics/858942


Processing URLs:  31%|███       | 312/1000 [14:16<12:21,  1.08s/it]

Error extracting text from http://www.businessinsider.com/r-putin-and-erdogan-to-meet-next-month-amid-growing-rapprochement-2016-7: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-putin-and-erdogan-to-meet-next-month-amid-growing-rapprochement-2016-7


Processing URLs:  32%|███▏      | 315/1000 [14:17<08:14,  1.39it/s]

Error extracting text from http://www.nrttv.com/en/Details.aspx?Jimare=9655: 403 Client Error: Forbidden for url: https://www.nrttv.com/en/Details.aspx?Jimare=9655


Processing URLs:  32%|███▏      | 319/1000 [14:23<11:40,  1.03s/it]

Error extracting text from http://www.wsj.com/articles/u-s-rate-rise-would-benefit-world-economy-says-philippines-1441797036: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-rate-rise-would-benefit-world-economy-says-philippines-1441797036


Processing URLs:  32%|███▏      | 322/1000 [14:36<32:50,  2.91s/it]

Error extracting text from https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;amp


Processing URLs:  32%|███▏      | 323/1000 [14:40<37:05,  3.29s/it]

Error extracting text from http://www.startribune.com/amnesty-documents-chilling-abuses-by-armed-groups-in-syria/385492161/: 404 Client Error: Not Found for url: https://www.startribune.com/amnesty-documents-chilling-abuses-by-armed-groups-in-syria/385492161/


Processing URLs:  33%|███▎      | 326/1000 [14:42<17:14,  1.53s/it]

Error extracting text from http://www.reuters.com/article/us-usa-tax-senate-idUSKBN15G50O: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax-senate-idUSKBN15G50O


Processing URLs:  33%|███▎      | 327/1000 [15:44<3:37:41, 19.41s/it]

Error extracting text from http://www.aipchina.org/Version/201512/Html/MaterialVersion.htm: HTTPConnectionPool(host='www.aipchina.org', port=80): Read timed out. (read timeout=60)


Processing URLs:  33%|███▎      | 328/1000 [15:44<2:34:02, 13.75s/it]

Error extracting text from http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2816%2900651-6/abstract: 403 Client Error: Forbidden for url: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2816%2900651-6/abstract


Processing URLs:  33%|███▎      | 332/1000 [16:12<1:42:25,  9.20s/it]

Error extracting text from http://www.ibtimes.com/italy-permits-armed-us-drones-fly-out-sigonella-air-base-attacks-against-isis-2318872: 403 Client Error: Forbidden for url: https://www.ibtimes.com/italy-permits-armed-us-drones-fly-out-sigonella-air-base-attacks-against-isis-2318872


Processing URLs:  33%|███▎      | 334/1000 [16:15<58:27,  5.27s/it]  

Error extracting text from http://www.newsweek.com/russia-tells-us-it-will-not-attend-2016-nuclear-security-summit-282449: 403 Client Error: Forbidden for url: https://www.newsweek.com/russia-tells-us-it-will-not-attend-2016-nuclear-security-summit-282449


Processing URLs:  34%|███▍      | 339/1000 [16:25<22:39,  2.06s/it]

Error extracting text from http://www.arabnews.com/node/1217656/saudi-arabia: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1217656/saudi-arabia


Processing URLs:  34%|███▍      | 340/1000 [16:26<19:38,  1.78s/it]

Error extracting text from http://www.nytimes.com/2016/02/17/world/middleeast/us-had-cyberattack-planned-if-iran-nuclear-negotiations-failed.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/17/world/middleeast/us-had-cyberattack-planned-if-iran-nuclear-negotiations-failed.html?_r=0


Processing URLs:  34%|███▍      | 345/1000 [16:35<17:19,  1.59s/it]

Error extracting text from http://allafrica.com/stories/201609071004.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201609071004.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fdf5dd90>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  35%|███▍      | 346/1000 [16:39<24:27,  2.24s/it]

Error extracting text from http://www.reuters.com/article/us-iran-banking-insight-idUSKCN1091QM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-banking-insight-idUSKCN1091QM


Processing URLs:  35%|███▍      | 347/1000 [16:47<43:28,  3.99s/it]

Error extracting text from http://www.defenseindustrydaily.com/russia-to-order-french-mistral-lhds-05749/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  35%|███▍      | 349/1000 [16:51<33:04,  3.05s/it]

Error extracting text from https://www.defcon.org/: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  35%|███▌      | 351/1000 [16:59<37:31,  3.47s/it]

Error extracting text from https://www.yahoo.com/news/europe-close-limits-refugee-influx-tusk-063809482.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/europe-close-limits-refugee-influx-tusk-063809482.html


Processing URLs:  36%|███▌      | 356/1000 [18:06<3:30:44, 19.63s/it]

Error extracting text from https://www.betfair.com/exchange/plus/#/politics/market/1.128390571: HTTPSConnectionPool(host='www.betfair.com', port=443): Max retries exceeded with url: /exchange/plus/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fdf5f560>, 'Connection to www.betfair.com timed out. (connect timeout=60)'))


Processing URLs:  36%|███▌      | 358/1000 [18:07<1:46:49,  9.98s/it]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2015/12/17/IAEA-Iran-sanctions-ending-in-Jan-not-impossible-.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2015/12/17/IAEA-Iran-sanctions-ending-in-Jan-not-impossible-.html


Processing URLs:  36%|███▌      | 361/1000 [18:11<43:56,  4.13s/it]  

Error extracting text from http://fuelfix.com/blog/2016/11/11/market-currents-is-iran-overestimating-its-oil-production/: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2016/11/11/market-currents-is-iran-overestimating-its-oil-production/
Error extracting text from http://www.reuters.com/article/us-southkorea-china-idUSKCN0HZ09M20141010: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-china-idUSKCN0HZ09M20141010


Processing URLs:  36%|███▋      | 363/1000 [18:12<23:06,  2.18s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu/brexit-carnage-shellfish-trucks-protest-in-london-over-export-delays-idUSKBN29N0UK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/brexit-carnage-shellfish-trucks-protest-in-london-over-export-delays-idUSKBN29N0UK
Error extracting text from http://www.reuters.com/article/2015/11/20/china-bonds-idUSL3N13F1UI20151120#ZbJ8a7D7EewmirLI.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/20/china-bonds-idUSL3N13F1UI20151120#ZbJ8a7D7EewmirLI.99


Processing URLs:  36%|███▋      | 365/1000 [18:14<16:50,  1.59s/it]

Error extracting text from http://www.ibtimes.com/government-shutdown-2015-will-federal-government-shut-down-again-2221847: 403 Client Error: Forbidden for url: https://www.ibtimes.com/government-shutdown-2015-will-federal-government-shut-down-again-2221847


Processing URLs:  37%|███▋      | 367/1000 [18:18<19:18,  1.83s/it]

Error extracting text from https://www.reuters.com/article/us-venezuela-politics-unrest-analysis-idUSKBN1AN2GB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-unrest-analysis-idUSKBN1AN2GB


Processing URLs:  37%|███▋      | 370/1000 [18:21<13:55,  1.33s/it]

Error extracting text from https://www.dw.com/en/zoran-zaev-on-conflict-zone/av-58582668: 404 Client Error: Not Found for url: https://www.dw.com/en/zoran-zaev-on-conflict-zone/av-58582668


Processing URLs:  37%|███▋      | 371/1000 [18:21<12:04,  1.15s/it]

Error extracting text from http://www.wsj.com/articles/nextev-china-backed-electric-car-co-applies-for-tax-credit-in-california-1465326359: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nextev-china-backed-electric-car-co-applies-for-tax-credit-in-california-1465326359


Processing URLs:  37%|███▋      | 372/1000 [18:25<19:45,  1.89s/it]

URL filtered: https://www.bloomberg.com/news/articles/2016-12-19/russia-turkey-thaw-seen-withstanding-assassination-of-envoy


Processing URLs:  37%|███▋      | 374/1000 [19:28<2:31:21, 14.51s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-01-26/apple-forecasts-first-sales-drop-since-2003-on-iphone-slowdown


Processing URLs:  38%|███▊      | 376/1000 [19:29<1:35:29,  9.18s/it]

Error extracting text from http://www.swissinfo.ch/eng/assad-says-family-do-not-own-syria--ready-to-be-voted-out/42941416: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/assad-says-family-do-not-own-syria--ready-to-be-voted-out/42941416


Processing URLs:  38%|███▊      | 381/1000 [19:35<29:53,  2.90s/it]  

Error extracting text from http://www.kurdistan24.net/en/news/dc14b055-b566-4ccf-a28c-4d59081cfe9f/Islamic-State-remains-a-threat-to-Kurdistan-until-Mosul-liberated-: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/dc14b055-b566-4ccf-a28c-4d59081cfe9f/Islamic-State-remains-a-threat-to-Kurdistan-until-Mosul-liberated-


Processing URLs:  38%|███▊      | 383/1000 [19:36<18:55,  1.84s/it]

Error extracting text from http://www.straitstimes.com/asia/incident-may-push-jakarta-to-take-more-active-role-in-territorial-rows: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  39%|███▊      | 387/1000 [19:41<12:33,  1.23s/it]

Error extracting text from http://www.nytimes.com/2016/01/20/world/middleeast/ayatollah-ali-khamenei-iran.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/20/world/middleeast/ayatollah-ali-khamenei-iran.html


Processing URLs:  39%|███▉      | 388/1000 [19:45<22:21,  2.19s/it]

Error extracting text from http://www.thenational.ae/business/economy/saudi-wins-approval-from-imf-on-transformation-plan: 404 Client Error: Not Found for url: https://www.thenationalnews.com/business/economy/saudi-wins-approval-from-imf-on-transformation-plan/
URL filtered: https://www.usatoday.com/story/news/politics/onpolitics/2017/08/24/onpolitics-today-how-russian-twitter-accounts-push-pro-trump-propaganda/600157001/


Processing URLs:  40%|███▉      | 395/1000 [19:57<18:13,  1.81s/it]

Error extracting text from https://www.spglobal.com/marketintelligence/en/pages/toc-primer/lcd-primer#sec24: 403 Client Error: Forbidden for url: https://pitchbook.com/leveraged-commentary-data/leveraged-loan-primer#sec24


Processing URLs:  40%|████      | 400/1000 [20:05<15:18,  1.53s/it]

Error extracting text from http://www.nytimes.com/2015/10/01/us/politics/hillary-clinton-camp-begins-to-fear-run-by-joe-biden.html?emc=edit_th_20151001&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/01/us/politics/hillary-clinton-camp-begins-to-fear-run-by-joe-biden.html?emc=edit_th_20151001&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  40%|████      | 404/1000 [20:17<24:16,  2.44s/it]

Error extracting text from http://www.caranddriver.com/news/2016-toyota-mirai-fuel-cell-sedan-photos-and-info-news: 403 Client Error: Forbidden for url: http://www.caranddriver.com/news/2016-toyota-mirai-fuel-cell-sedan-photos-and-info-news


Processing URLs:  41%|████      | 407/1000 [20:20<13:32,  1.37s/it]

Error extracting text from http://www.latimes.com/politics/la-na-pol-sater-trump-20170223-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-sater-trump-20170223-story.html


Processing URLs:  41%|████▏     | 413/1000 [20:30<13:21,  1.37s/it]

Error extracting text from https://www.nytimes.com/2017/02/20/science/hubble-constant-universe-expanding-speed.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/20/science/hubble-constant-universe-expanding-speed.html


Processing URLs:  41%|████▏     | 414/1000 [20:32<13:57,  1.43s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2016-12-18/china-s-drone-seizure-sows-concern-that-u-s-secrets-were-stolen-iwux2t29


Processing URLs:  42%|████▏     | 416/1000 [20:33<08:48,  1.10it/s]

Error extracting text from https://www.whitehouse.gov/briefing-room/vetoed-legislation: 404 Client Error: Not Found for url: https://www.whitehouse.gov/briefing-room/vetoed-legislation


Processing URLs:  42%|████▏     | 421/1000 [20:44<22:21,  2.32s/it]

Error extracting text from http://www.buenosairesherald.com/article/221874/pdvsa-awards-oil-contracts-worth-us$32b: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/221874/pdvsa-awards-oil-contracts-worth-us32b


Processing URLs:  42%|████▏     | 423/1000 [20:48<18:06,  1.88s/it]

Error extracting text from https://seekingalpha.com/article/4419414-merck-keeps-plowing-on: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4419414-merck-keeps-plowing-on


Processing URLs:  42%|████▎     | 425/1000 [20:58<29:37,  3.09s/it]

Error extracting text from http://www.nytimes.com/2015/11/17/business/dealbook/the-challenges-for-volkswagens-internal-investigation.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/17/business/dealbook/the-challenges-for-volkswagens-internal-investigation.html?_r=0


Processing URLs:  43%|████▎     | 426/1000 [21:04<38:17,  4.00s/it]

Error extracting text from http://cajnewsafrica.com/2017/05/18/millions-more-face-starvation-in-nigeria/: 404 Client Error: Not Found for url: https://www.cajnewsafrica.com/2017/05/18/millions-more-face-starvation-in-nigeria/


Processing URLs:  43%|████▎     | 427/1000 [21:06<31:13,  3.27s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-01-26/u-s-states-ease-covid-restrictions-even-as-variants-take-hold?sref=x7nYEkiY


Processing URLs:  43%|████▎     | 429/1000 [29:06<17:48:10, 112.24s/it]

Error extracting text from https://www.thespainreport.com//articles/833-160811112825-psoe-repeats-no-ciudadanos-rejects-formal-coalition: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: //articles/833-160811112825-psoe-repeats-no-ciudadanos-rejects-formal-coalition (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2ff717080>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  43%|████▎     | 433/1000 [29:12<5:45:47, 36.59s/it]  

URL filtered: https://www.youtube.com/watch?v=njG7p6CSbCU
Error extracting text from http://www.business-standard.com/article/news-ani/chinese-fury-continues-against-south-china-sea-judgement-116071900167_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ani/chinese-fury-continues-against-south-china-sea-judgement-116071900167_1.html


Processing URLs:  44%|████▎     | 436/1000 [29:15<2:29:51, 15.94s/it]

Error extracting text from http://www.reuters.com/article/us-venezuela-oil-idUSKBN19V1SM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-oil-idUSKBN19V1SM


Processing URLs:  44%|████▍     | 442/1000 [29:28<36:48,  3.96s/it]  

Error extracting text from https://www.bbc.co.uk/news/world-asia-pacific-60186538.: 404 Client Error: Not Found for url: https://www.bbc.co.uk/news/world-asia-pacific-60186538.
URL filtered: http://www.bloomberg.com/news/articles/2015-10-30/fed-s-updated-model-of-economy-suggests-it-s-time-to-raise-rates


Processing URLs:  44%|████▍     | 445/1000 [29:32<21:03,  2.28s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-business-idUSKBN1AJ35J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-business-idUSKBN1AJ35J


Processing URLs:  45%|████▍     | 447/1000 [29:35<16:14,  1.76s/it]

Error extracting text from http://english.alarabiya.net/en/business/economy/2017/11/07/Qatar-c-bank-s-foreign-reserves-liquidity-drop-in-September.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/business/economy/2017/11/07/Qatar-c-bank-s-foreign-reserves-liquidity-drop-in-September.html


Processing URLs:  45%|████▍     | 448/1000 [29:38<20:05,  2.18s/it]

URL filtered: https://www.youtube.com/watch?v=1Q60yBQG8XI&amp;feature=youtu.be


Processing URLs:  45%|████▌     | 454/1000 [29:43<09:13,  1.01s/it]

Error extracting text from http://www.reuters.com/article/2015/10/27/us-apple-results-idUSKCN0SL2VY20151027: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/27/us-apple-results-idUSKCN0SL2VY20151027


Processing URLs:  46%|████▌     | 455/1000 [29:44<07:19,  1.24it/s]

Error extracting text from https://www.nytimes.com/2017/04/18/us/russian-bombers-alaska.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/18/us/russian-bombers-alaska.html


Processing URLs:  47%|████▋     | 467/1000 [30:06<16:31,  1.86s/it]

URL filtered: https://www.youtube.com/watch?v=UZK2FZGKAd0


Processing URLs:  47%|████▋     | 470/1000 [30:06<07:54,  1.12it/s]

Error extracting text from http://www.reuters.com/article/idUSKCN0UP1BL20160111: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0UP1BL20160111


Processing URLs:  48%|████▊     | 475/1000 [30:12<10:23,  1.19s/it]

Error extracting text from https://understandrussia.com/time/: HTTPSConnectionPool(host='understandrussia.com', port=443): Max retries exceeded with url: /time/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fae62f90>: Failed to resolve 'understandrussia.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  48%|████▊     | 481/1000 [30:27<18:21,  2.12s/it]

Error extracting text from https://www.wsj.com/articles/irans-fast-boats-stop-harassing-u-s-navy-baffling-military-1516897301: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/irans-fast-boats-stop-harassing-u-s-navy-baffling-military-1516897301


Processing URLs:  48%|████▊     | 485/1000 [30:30<09:31,  1.11s/it]

Error extracting text from https://www.nytimes.com/2017/08/30/health/gene-therapy-cancer.html?mcubz=1&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/30/health/gene-therapy-cancer.html?mcubz=1&amp;_r=0


Processing URLs:  49%|████▊     | 486/1000 [30:31<08:12,  1.04it/s]

Error extracting text from http://thehill.com/blogs/congress-blog/technology/288525-predicting-heightened-malicious-cyber-activity-the-old: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/technology/288525-predicting-heightened-malicious-cyber-activity-the-old/


Processing URLs:  49%|████▉     | 488/1000 [30:32<06:04,  1.40it/s]

Error extracting text from http://news.sky.com/story/1588256/france-drops-20-bombs-on-is-stronghold-raqqa: 404 Client Error: Not Found for url: https://news.sky.com/story/1588256/france-drops-20-bombs-on-is-stronghold-raqqa
Error extracting text from http://www.reuters.com/article/idUSKCN0YD0UJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0YD0UJ


Processing URLs:  49%|████▉     | 490/1000 [30:34<06:07,  1.39it/s]

Error extracting text from http://www.wsj.com/articles/a-default-in-china-spreads-anxiety-among-investors-1485513181: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-default-in-china-spreads-anxiety-among-investors-1485513181


Processing URLs:  49%|████▉     | 493/1000 [30:37<08:04,  1.05it/s]

Error extracting text from https://www.thenation.com/article/after-last-weeks-terror-attack-will-turkey-retreat-from-neo-ottomanism/: 404 Client Error: Not Found for url: https://www.thenation.com/article/after-last-weeks-terror-attack-will-turkey-retreat-from-neo-ottomanism/


Processing URLs:  49%|████▉     | 494/1000 [31:39<2:43:03, 19.33s/it]

Error extracting text from https://politics.concordmonitor.com/2015/11/politics-election/clinton-supporters-top-attendance-at-democrats-annual-jefferson-jackson-dinner/: HTTPSConnectionPool(host='politics.concordmonitor.com', port=443): Max retries exceeded with url: /2015/11/politics-election/clinton-supporters-top-attendance-at-democrats-annual-jefferson-jackson-dinner/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x307f7d4f0>, 'Connection to politics.concordmonitor.com timed out. (connect timeout=60)'))


Processing URLs:  50%|████▉     | 495/1000 [31:39<1:54:39, 13.62s/it]

Error extracting text from https://www.nytimes.com/2017/09/01/us/politics/russia-election-hacking.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/01/us/politics/russia-election-hacking.html


Processing URLs:  50%|████▉     | 499/1000 [31:46<38:22,  4.60s/it]  

URL filtered: https://twitter.com/SamRamani2/status/1499737208852566016


Processing URLs:  50%|█████     | 504/1000 [31:54<18:43,  2.26s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/22/749057/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/22/749057/story.html


Processing URLs:  50%|█████     | 505/1000 [31:55<15:28,  1.88s/it]

Error extracting text from https://www.yahoo.com/finance/news/four-ways-north-koreas-nuclear-041727602.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/four-ways-north-koreas-nuclear-041727602.html


Processing URLs:  51%|█████     | 508/1000 [31:58<10:15,  1.25s/it]

Error extracting text from https://thehill.com/policy/international/567112-us-donating-50m-in-humanitarian-aid-to-myanmar: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/567112-us-donating-50m-in-humanitarian-aid-to-myanmar/


Processing URLs:  51%|█████     | 509/1000 [32:00<12:48,  1.57s/it]

Error extracting text from http://en.trend.az/iran/politics/2499008.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2499008.html


Processing URLs:  51%|█████     | 511/1000 [32:05<15:04,  1.85s/it]

URL filtered: https://twitter.com/ragipsoylu/status/1503397795323162633


Processing URLs:  51%|█████▏    | 513/1000 [32:06<10:52,  1.34s/it]

Error extracting text from http://www.kyivpost.com/article/opinion/op-ed/willem-gert-aldershoff-west-must-put-rock-hard-conditions-for-continued-aid-to-ukraine-417352.html: 403 Client Error: Forbidden for url: https://www.kyivpost.com/article/opinion/op-ed/willem-gert-aldershoff-west-must-put-rock-hard-conditions-for-continued-aid-to-ukraine-417352.html


Processing URLs:  52%|█████▏    | 516/1000 [32:08<07:00,  1.15it/s]

Error extracting text from https://www.realclearpolitics.com/epolls/2020/president/us/general_election_trump_vs_biden-6247.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2020/president/us/general_election_trump_vs_biden-6247.html
Error extracting text from http://www.nytimes.com/2016/03/22/us/politics/john-roberts-criticized-supreme-court-confirmation-process-before-there-was-a-vacancy.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/22/us/politics/john-roberts-criticized-supreme-court-confirmation-process-before-there-was-a-vacancy.html


Processing URLs:  52%|█████▏    | 520/1000 [32:13<08:48,  1.10s/it]

Error extracting text from http://www.independent.ie/business/irish/be-careful-what-you-wish-for-when-it-comes-to-a-sustained-oil-price-low-31497818.html&gt: 404 Client Error: Not Found for url: https://www.independent.ie/business/irish/be-careful-what-you-wish-for-when-it-comes-to-a-sustained-oil-price-low-31497818.html&gt


Processing URLs:  52%|█████▏    | 521/1000 [32:14<07:38,  1.04it/s]

Error extracting text from http://theconversation.com/cyberattack-on-ukraine-grid-heres-how-it-worked-and-perhaps-why-it-was-done-52802: 403 Client Error: Forbidden for url: http://theconversation.com/cyberattack-on-ukraine-grid-heres-how-it-worked-and-perhaps-why-it-was-done-52802


Processing URLs:  53%|█████▎    | 526/1000 [32:30<22:51,  2.89s/it]

Error extracting text from http://eureferendum.com/default.aspx: 404 Client Error: Not Found for url: http://eureferendum.com/default.php


Processing URLs:  54%|█████▎    | 536/1000 [32:52<12:22,  1.60s/it]

Error extracting text from http://www.bradycampaign.org/key-gun-violence-statistics: 403 Client Error: Forbidden for url: https://www.bradyunited.org/


Processing URLs:  54%|█████▍    | 539/1000 [32:55<07:44,  1.01s/it]

Error extracting text from http://splash247.com/pdvsa-payment-problems-leave-millions-of-barrels-of-bp-crude-in-limbo/: 403 Client Error: Forbidden for url: https://splash247.com/pdvsa-payment-problems-leave-millions-of-barrels-of-bp-crude-in-limbo/


Processing URLs:  54%|█████▍    | 540/1000 [32:58<11:50,  1.55s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-04-16/erdogan-declares-victory-as-opposition-contest-turkish-result


Processing URLs:  54%|█████▍    | 542/1000 [32:59<09:28,  1.24s/it]

Error extracting text from http://pocketnow.com/2015/12/29/iphone-6s-shipments: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  54%|█████▍    | 544/1000 [33:02<09:14,  1.22s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-05-11/germany-s-rust-belt-state-tests-merkel-in-last-election-warmup


Processing URLs:  55%|█████▌    | 552/1000 [33:11<07:12,  1.04it/s]

Error extracting text from https://icgstrategicteam.wikispaces.com/file/view/Source%20Reliability%20Evaluation.xls/456184382/Source%20Reliability%20Evaluation.xls: HTTPSConnectionPool(host='icgstrategicteam.wikispaces.com', port=443): Max retries exceeded with url: /file/view/Source%20Reliability%20Evaluation.xls/456184382/Source%20Reliability%20Evaluation.xls (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30695a960>: Failed to resolve 'icgstrategicteam.wikispaces.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  56%|█████▌    | 556/1000 [33:15<07:13,  1.02it/s]

Error extracting text from http://www.parliament.uk/about/how/elections-and-voting/general/#jump-link-0: 403 Client Error: Forbidden for url: http://www.parliament.uk/about/how/elections-and-voting/general/#jump-link-0


Processing URLs:  56%|█████▌    | 558/1000 [33:18<07:25,  1.01s/it]

Error extracting text from https://www.wort.lu/en/international/referendum-brexit-sparks-debate-on-united-ireland-vote-59196a8ca5e74263e13bf763: 404 Client Error: Not Found for url: https://www.luxtimes.lu/?sourceID=59196a8ca5e74263e13bf763
Error extracting text from https://www.researchgate.net/figure/World-Temperature-Map-November-2018-to-March-2019_fig1_342118086: 403 Client Error: Forbidden for url: https://www.researchgate.net/figure/World-Temperature-Map-November-2018-to-March-2019_fig1_342118086


Processing URLs:  56%|█████▌    | 560/1000 [33:20<07:31,  1.03s/it]

Error extracting text from https://theconversation.com/morrison-still-enjoys-strong-ratings-in-separate-polls-indicating-labors-gains-may-be-short-lived-157129: 403 Client Error: Forbidden for url: https://theconversation.com/morrison-still-enjoys-strong-ratings-in-separate-polls-indicating-labors-gains-may-be-short-lived-157129


Processing URLs:  56%|█████▋    | 564/1000 [33:29<13:59,  1.92s/it]

URL filtered: https://www.bloomberglaw.com/product/blaw/exp_blp/ewogICAgImN0eHQiOiAiRE9DIiwKICAgICJpZCI6ICJPWVAyRFo2SzUwWFM/cmVzb3VyY2VfaWQ9NzA2YWM


Processing URLs:  57%|█████▋    | 568/1000 [33:39<16:47,  2.33s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/south-korea-us-begin-military-drills-as-north-korea-threatens-attack/article29058863/: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/south-korea-us-begin-military-drills-as-north-korea-threatens-attack/article29058863/


Processing URLs:  57%|█████▋    | 569/1000 [33:39<12:48,  1.78s/it]

Error extracting text from https://www.nti.org/learn/treaties-and-regimes/treaty-between-the-united-states-of-america-and-the-russian-federation-on-measures-for-the-further-reduction-and-limitation-of-strategic-offensive-arms/: 403 Client Error: Forbidden for url: https://www.nti.org/learn/treaties-and-regimes/treaty-between-the-united-states-of-america-and-the-russian-federation-on-measures-for-the-further-reduction-and-limitation-of-strategic-offensive-arms/


Processing URLs:  57%|█████▋    | 574/1000 [33:44<06:35,  1.08it/s]

Error extracting text from http://nyti.ms/1YR7AdU: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/06/opinion/sunday/let-math-save-our-democracy.html?smid=tw-share
Error extracting text from http://www.reuters.com/article/us-volkswagen-emissions-investigation-idUSKCN0V02E7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-volkswagen-emissions-investigation-idUSKCN0V02E7


Processing URLs:  59%|█████▊    | 586/1000 [34:09<10:38,  1.54s/it]

Error extracting text from https://www.nytimes.com/2016/11/22/world/asia/philippines-rodrigo-duterte-scarborough-shoal-china.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/11/22/world/asia/philippines-rodrigo-duterte-scarborough-shoal-china.html?_r=0


Processing URLs:  59%|█████▉    | 588/1000 [34:10<08:11,  1.19s/it]

Error extracting text from http://in.reuters.com/article/iran-oil-mou-idINKBN15I1R0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  59%|█████▉    | 590/1000 [34:12<06:59,  1.02s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/03/11/asia-pacific/china-set-to-begin-operating-civilian-flights-to-and-from-disputed-south-china-sea-next-year/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/03/11/asia-pacific/china-set-to-begin-operating-civilian-flights-to-and-from-disputed-south-china-sea-next-year/


Processing URLs:  60%|█████▉    | 598/1000 [34:25<10:39,  1.59s/it]

Error extracting text from http://en.trend.az/iran/politics/2476607.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2476607.html


Processing URLs:  60%|██████    | 602/1000 [34:29<07:53,  1.19s/it]

Error extracting text from https://www.icrc.org/en/document/massive-scaling-urgently-needed-tackle-hunger-crisis-says-icrcs-director-operations: 403 Client Error: Forbidden for url: https://www.icrc.org/en/document/massive-scaling-urgently-needed-tackle-hunger-crisis-says-icrcs-director-operations


Processing URLs:  60%|██████    | 603/1000 [34:30<07:13,  1.09s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN19Q1N2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN19Q1N2


Processing URLs:  61%|██████    | 606/1000 [34:32<05:03,  1.30it/s]

Error extracting text from https://www.business-standard.com/article/sports/ioc-push-ahead-with-plans-to-open-tokyo-olympics-during-state-of-emergency-121042800133_1.html: 403 Client Error: Forbidden for url: https://www.business-standard.com/article/sports/ioc-push-ahead-with-plans-to-open-tokyo-olympics-during-state-of-emergency-121042800133_1.html


Processing URLs:  61%|██████    | 609/1000 [34:34<03:15,  2.00it/s]

Error extracting text from http://www.nytimes.com/2015/12/15/upshot/how-trump-could-win-and-why-he-probably-wont.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/15/upshot/how-trump-could-win-and-why-he-probably-wont.html


Processing URLs:  61%|██████    | 612/1000 [34:38<06:45,  1.05s/it]

Error extracting text from http://www.imdb.com/title/tt0031725/?ref_=nv_sr_4: 403 Client Error: Forbidden for url: https://www.imdb.com/title/tt0031725/?ref_=nv_sr_4


Processing URLs:  62%|██████▏   | 620/1000 [34:54<11:50,  1.87s/it]

Error extracting text from https://www.nytimes.com/2021/04/11/world/middleeast/iran-nuclear-natanz.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/11/world/middleeast/iran-nuclear-natanz.html
Error extracting text from http://www.financialexpress.com/article/world-news/brazils-dilma-rousseff-says-impeachment-aimed-at-corruption-probe/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/world-news/brazils-dilma-rousseff-says-impeachment-aimed-at-corruption-probe/


Processing URLs:  62%|██████▏   | 622/1000 [34:55<06:53,  1.09s/it]

Error extracting text from http://theiowarepublican.com/2015/iowa-caucus-perspective-ground-game-edition/: 404 Client Error: Not Found for url: http://theiowarepublican.com/2015/iowa-caucus-perspective-ground-game-edition/


Processing URLs:  62%|██████▎   | 625/1000 [34:57<05:37,  1.11it/s]

Error extracting text from http://concorde.ua/en/research/daily/eu-to-release-ukraine-loan-after-imf-tranche-poroshenko-says-15556/: 404 Client Error: Not Found for url: https://concorde.ua/en/research/daily/eu-to-release-ukraine-loan-after-imf-tranche-poroshenko-says-15556/
Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-thaad-idUSKBN1AD2ES: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-thaad-idUSKBN1AD2ES


Processing URLs:  63%|██████▎   | 627/1000 [34:59<05:06,  1.22it/s]

Error extracting text from http://en.trend.az/iran/politics/2454052.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/politics/2454052.html
Error extracting text from http://www.reuters.com/article/us-ethiopia-violence-idUSKCN10J0Z8?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ethiopia-violence-idUSKCN10J0Z8?il=0


Processing URLs:  63%|██████▎   | 629/1000 [35:01<04:45,  1.30it/s]

Error extracting text from http://www.yenisafak.com/en/news/bozkir-not-very-hopeful-on-turkey-eu-visa-free-deal-2465908: 422 Client Error:  for url: http://www.yenisafak.com/en/news/bozkir-not-very-hopeful-on-turkey-eu-visa-free-deal-2465908


Processing URLs:  63%|██████▎   | 630/1000 [35:02<05:04,  1.21it/s]

Error extracting text from http://www.businessinsider.com/r-rouhani-allies-face-tough-challenge-in-votes-to-shape-iran-2016-2: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-rouhani-allies-face-tough-challenge-in-votes-to-shape-iran-2016-2


Processing URLs:  63%|██████▎   | 634/1000 [35:09<08:40,  1.42s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/355996-pentagon-pressed-on-source-code-disclosures-to-russia: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/355996-pentagon-pressed-on-source-code-disclosures-to-russia/
URL filtered: https://twitter.com/veskogarcevic


Processing URLs:  64%|██████▎   | 636/1000 [35:33<38:41,  6.38s/it]

Error extracting text from http://www.ict.org.il/Article/1573/Between-Ramadi-and-Mosul-the-War-against-ISIS: 404 Client Error: Not Found for url: https://ict.org.il/Article/1573/Between-Ramadi-and-Mosul-the-War-against-ISIS


Processing URLs:  64%|██████▎   | 637/1000 [36:34<1:59:07, 19.69s/it]

Error extracting text from https://dc.isda.org/documents/2017/11/pdvsa-pt2-dc-decision-nov-7.pdf: HTTPSConnectionPool(host='dc.isda.org', port=443): Max retries exceeded with url: /documents/2017/11/pdvsa-pt2-dc-decision-nov-7.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30650cf50>, 'Connection to dc.isda.org timed out. (connect timeout=60)'))


Processing URLs:  64%|██████▍   | 638/1000 [36:35<1:30:17, 14.97s/it]

URL filtered: https://www.youtube.com/watch?v=KpyVENBPj5c
URL filtered: http://www.bloomberg.com/news/articles/2015-11-24/fed-rate-odds-rise-to-74-in-bond-market-as-pimco-sees-liftoff


Processing URLs:  64%|██████▍   | 643/1000 [36:38<28:50,  4.85s/it]  

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13941201001023: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13941201001023 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303ecb170>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▍   | 647/1000 [36:43<14:48,  2.52s/it]

Error extracting text from https://www.itgovernance.eu/blog/bundestag-cyber-attack-confirmed/: 403 Client Error: Forbidden for url: https://www.itgovernance.eu/blog/bundestag-cyber-attack-confirmed/


Processing URLs:  65%|██████▍   | 649/1000 [36:45<10:43,  1.83s/it]

Error extracting text from http://www.reuters.com/article/us-spain-politics-investiture-idUSKCN10T1S2?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-investiture-idUSKCN10T1S2?mod=related&amp;channelName=worldNews


Processing URLs:  65%|██████▌   | 651/1000 [36:47<07:47,  1.34s/it]

Error extracting text from http://www.wsj.com/articles/syrias-yellow-brick-road-1450652687: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syrias-yellow-brick-road-1450652687


Processing URLs:  65%|██████▌   | 654/1000 [36:49<04:30,  1.28it/s]

Error extracting text from http://strategicstudiesinstitute.army.mil/pubs/parameters/articles/98spring/thomas.htm: HTTPConnectionPool(host='strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /pubs/parameters/articles/98spring/thomas.htm (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebda90>: Failed to resolve 'strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nrttv.com/en/Details.aspx?Jimare=9549: 403 Client Error: Forbidden for url: https://www.nrttv.com/en/Details.aspx?Jimare=9549


Processing URLs:  66%|██████▌   | 655/1000 [36:49<03:20,  1.72it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-saudi-missiles-idUSKCN0VS28D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-saudi-missiles-idUSKCN0VS28D


Processing URLs:  66%|██████▋   | 663/1000 [36:58<05:45,  1.03s/it]

Error extracting text from http://thehill.com/regulation/labor/260852-new-hampshire-seiu-branch-backs-sanders: 403 Client Error: Forbidden for url: https://thehill.com/regulation/labor/260852-new-hampshire-seiu-branch-backs-sanders/


Processing URLs:  66%|██████▋   | 665/1000 [37:02<08:43,  1.56s/it]

Error extracting text from http://uk.reuters.com/article/2015/09/21/uk-mideast-crisis-syria-drones-exclusive-idUKKCN0RL1C920150921: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  67%|██████▋   | 673/1000 [37:14<07:26,  1.37s/it]

Error extracting text from https://tradingeconomics.com/venezuela/rating: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/venezuela/rating


Processing URLs:  67%|██████▋   | 674/1000 [37:15<07:15,  1.34s/it]

Error extracting text from https://www.publictechnology.net/articles/news/two-thirds-workers-did-not-work-home-all-2020-ons-data-finds: 403 Client Error: Forbidden for url: https://www.publictechnology.net/articles/news/two-thirds-workers-did-not-work-home-all-2020-ons-data-finds


Processing URLs:  68%|██████▊   | 677/1000 [37:18<04:56,  1.09it/s]

Error extracting text from https://www.nytimes.com/2018/02/24/world/canada/canada-nafta-trade.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/24/world/canada/canada-nafta-trade.html


Processing URLs:  68%|██████▊   | 678/1000 [37:19<05:54,  1.10s/it]

URL filtered: https://www.youtube.com/watch?v=u75XQdTxZRc


Processing URLs:  68%|██████▊   | 681/1000 [37:22<05:21,  1.01s/it]

Error extracting text from http://en.trend.az/world/other/2481477.html: 404 Client Error: Not Found for url: https://www.trend.az/world/other/2481477.html


Processing URLs:  68%|██████▊   | 683/1000 [37:24<05:22,  1.02s/it]

Error extracting text from http://www.cnbc.com/2017/01/24/mexico-may-leave-nafta-if-renegotiation-unfavorable-minister.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/01/24/mexico-may-leave-nafta-if-renegotiation-unfavorable-minister.html


Processing URLs:  68%|██████▊   | 684/1000 [38:24<1:32:20, 17.53s/it]

Error extracting text from http://www.nasdaq.com/article/exclusive-brexit-poll-update-do-the-math-cm617893: HTTPConnectionPool(host='www.nasdaq.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  69%|██████▊   | 686/1000 [38:28<51:27,  9.83s/it]  

Error extracting text from http://www.theepochtimes.com/n3/2119872-new-communist-party-discipline-regulations-brings-impetus-to-xi-jinpings-anti-corruption-drive/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2119872-new-communist-party-discipline-regulations-brings-impetus-to-xi-jinpings-anti-corruption-drive/


Processing URLs:  69%|██████▊   | 687/1000 [38:29<37:30,  7.19s/it]

Error extracting text from http://www.themoscowtimes: HTTPConnectionPool(host='www.themoscowtimes', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebc830>: Failed to resolve 'www.themoscowtimes' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://twitter.com/i/moments/759015382409621505


Processing URLs:  69%|██████▉   | 690/1000 [38:29<16:47,  3.25s/it]

Error extracting text from http://www.wsj.com/articles/china-charts-deeper-focus-on-latin-america-1480096963: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-charts-deeper-focus-on-latin-america-1480096963


Processing URLs:  70%|██████▉   | 695/1000 [38:37<09:52,  1.94s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/eu-exit-could-trim-uk-gdp-by-15-to-45-by-2019-imf-head-christine-lagarde/articleshow/53047066.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/eu-exit-could-trim-uk-gdp-by-15-to-45-by-2019-imf-head-christine-lagarde/articleshow/53047066.cms


Processing URLs:  70%|██████▉   | 697/1000 [38:39<07:05,  1.40s/it]

Error extracting text from http://www.nytimes.com/2016/11/14/opinion/beijing-tightens-its-grip-in-hong-kong-again.html?emc=edit_ee_20161114&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/14/opinion/beijing-tightens-its-grip-in-hong-kong-again.html?emc=edit_ee_20161114&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  70%|██████▉   | 698/1000 [38:42<09:40,  1.92s/it]

Error extracting text from http://www.iarpa.gov/index.php/research-programs/create/create-baa: 404 Client Error: Not Found for url: https://www.iarpa.gov/index.php/research-programs/create/create-baa


Processing URLs:  70%|███████   | 703/1000 [38:50<07:50,  1.58s/it]

Error extracting text from https://coronavirus.upenn.edu/announcements: 404 Client Error: Not Found for url: https://wellness.upenn.edu/announcements


Processing URLs:  70%|███████   | 705/1000 [38:52<07:15,  1.48s/it]

Error extracting text from http://www.sci-tech-today.com/news/Waymo-Wants-Uber-Trial-Delayed/story.xhtml?story_id=102007LGEOWI: 404 Client Error: Not Found for url: https://www.sci-tech-today.com/news/Waymo-Wants-Uber-Trial-Delayed/story.xhtml?story_id=102007LGEOWI


Processing URLs:  71%|███████   | 707/1000 [38:58<09:08,  1.87s/it]

Error extracting text from http://www.wsj.com/articles/greece-has-no-debt-problem-for-a-decade-says-esm-head-1477565544: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/greece-has-no-debt-problem-for-a-decade-says-esm-head-1477565544


Processing URLs:  71%|███████   | 710/1000 [39:04<10:04,  2.09s/it]

Error extracting text from https://www.faa.gov/uas/request_waiver/waivers_granted/media/107W-2016-00001A_CNN_CoW.pdf: 404 Client Error: Not Found for url: https://www.faa.gov/uas/request_waiver/waivers_granted/media/107W-2016-00001A_CNN_CoW.pdf


Processing URLs:  71%|███████   | 712/1000 [39:05<06:36,  1.38s/it]

Error extracting text from https://www.uefa.com/uefachampionsleague/clubs/: 403 Client Error: Forbidden for url: https://www.uefa.com/uefachampionsleague/clubs/


Processing URLs:  72%|███████▏  | 717/1000 [39:07<02:30,  1.88it/s]

URL filtered: https://www.bloomberg.com/amp/news/articles/2017-11-03/pdvsa-bonds-slump-after-venezuela-calls-for-restructuring-chart
Error extracting text from http://www.reuters.com/article/venezuela-pdvsa-bond-idUSC2N13F00C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-pdvsa-bond-idUSC2N13F00C


Processing URLs:  72%|███████▏  | 722/1000 [39:19<10:46,  2.32s/it]

URL filtered: https://twitter.com/peterboghossian/status/1281047567468392448


Processing URLs:  72%|███████▏  | 724/1000 [39:19<06:09,  1.34s/it]

Error extracting text from https://www.kickstarter.com/projects/megabots/support-team-usa-in-the-giant-robot-duel/description: 403 Client Error: Forbidden for url: https://www.kickstarter.com/projects/megabots/support-team-usa-in-the-giant-robot-duel/description


Processing URLs:  73%|███████▎  | 728/1000 [39:27<08:45,  1.93s/it]

URL filtered: https://twitter.com/ClintEhrlich/status/1499282708970696707


Processing URLs:  73%|███████▎  | 731/1000 [39:36<10:43,  2.39s/it]

Error extracting text from https://www.washingtonpost.com/politics/a-look-at-the-known-ties-between-trump-associates-and-russia/2017/03/03/a151728a-ffeb-11e6-9b78-824ccab94435_story.html?utm_term=.4d5b60036bd4: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/a-look-at-the-known-ties-between-trump-associates-and-russia/2017/03/03/a151728a-ffeb-11e6-9b78-824ccab94435_story.html?utm_term=.4d5b60036bd4
Error extracting text from http://investors.dna.com/2016-01-19-Expansion-of-Oxitecs-Vector-Control-Solution-in-Brazil-Attacking-Source-of-Zika-Virus-and-Dengue-Fever-after-Positive-Program-Results: HTTPConnectionPool(host='investors.dna.com', port=80): Max retries exceeded with url: /2016-01-19-Expansion-of-Oxitecs-Vector-Control-Solution-in-Brazil-Attacking-Source-of-Zika-Virus-and-Dengue-Fever-after-Positive-Program-Results (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebf3b0>: Failed to resolve 'investors.dna.com' ([Errno 8]

Processing URLs:  73%|███████▎  | 733/1000 [39:36<06:02,  1.36s/it]

Error extracting text from http://www.nytimes.com/2016/01/02/world/africa/al-qaeda-uses-video-of-trump-for-recruiting.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/02/world/africa/al-qaeda-uses-video-of-trump-for-recruiting.html
Error extracting text from http://www.financialexpress.com/article/fe-columnist/will-india-lead-the-indo-pacific-century/272672/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/article/fe-columnist/will-india-lead-the-indo-pacific-century/272672/


Processing URLs:  74%|███████▍  | 738/1000 [39:46<07:37,  1.75s/it]

Error extracting text from http://time.com/5058148/roy-moore-doug-jones-alabama/: 404 Client Error: Not Found for url: https://time.com/5058148/roy-moore-doug-jones-alabama/


Processing URLs:  74%|███████▍  | 743/1000 [39:54<05:37,  1.31s/it]

Error extracting text from http://www.borgenmagazine.com/drones-in-humanitarian-aid/: 403 Client Error: Forbidden for url: http://www.borgenmagazine.com/drones-in-humanitarian-aid/


Processing URLs:  75%|███████▍  | 746/1000 [39:58<04:21,  1.03s/it]

Error extracting text from http://www.nato.int/cps/en/natolive/topics_49212.htm: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natolive/topics_49212.htm


Processing URLs:  75%|███████▌  | 751/1000 [40:06<04:36,  1.11s/it]

Error extracting text from http://allafrica.com/stories/201607160043.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607160043.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x303eca6c0>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-sees-scope-for-more-spending-in-boost-for-coalition-talks-idUSKBN1FL5BA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-sees-scope-for-more-spending-in-boost-for-coalition-talks-idUSKBN1FL5BA


Processing URLs:  75%|███████▌  | 753/1000 [40:08<04:00,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKBN19I2RV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN19I2RV


Processing URLs:  76%|███████▌  | 759/1000 [40:18<07:13,  1.80s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-22/oil-s-bad-timing-puts-pressure-on-drillers-as-banks-review-loans
URL filtered: https://twitter.com/AP/status/1355195551554412548


Processing URLs:  76%|███████▋  | 763/1000 [40:20<03:46,  1.05it/s]

Error extracting text from http://www.laprensasa.com/309_america-in-english/3738541_colombian-rebel-leader-certain-of-success-in-peace-talks-with-gov-t.html: 404 Client Error: Not Found for url: http://www.laprensasa.com/309_america-in-english/3738541_colombian-rebel-leader-certain-of-success-in-peace-talks-with-gov-t.html


Processing URLs:  76%|███████▋  | 765/1000 [40:21<03:01,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-weapons-idUSKCN1152VD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-weapons-idUSKCN1152VD


Processing URLs:  77%|███████▋  | 770/1000 [40:40<10:05,  2.63s/it]

Error extracting text from https://www.nlm.nih.gov/services/ctphases.html: 404 Client Error: Not Found for url: https://www.nlm.nih.gov/services/ctphases.html


Processing URLs:  77%|███████▋  | 772/1000 [40:43<07:45,  2.04s/it]

Error extracting text from https://www.google.com/trends/explore#q=brexit%2C%20remain%20eu%2C%20leave%20eu%2C%20stay%20eu%2C%20quit%20eu&amp;geo=GB&amp;cmpt=q&amp;tz=Etc%2FGMT%2B4: 429 Client Error: unknown for url: https://trends.google.com/trends/explore#q=brexit%2C%20remain%20eu%2C%20leave%20eu%2C%20stay%20eu%2C%20quit%20eu&amp;geo=GB&amp;cmpt=q&amp;tz=Etc%2FGMT%2B4


Processing URLs:  77%|███████▋  | 774/1000 [40:45<05:23,  1.43s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/interaktivnye-metody-resheniya-zadachi-mnogokriterialnoy-optimizatsii-obzor&amp;usg=ALkJrhijni8GTlkMqaKc3fZh_NbEsryNsw: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=ru&amp;u=http://cyberleninka.ru/article/n/interaktivnye-metody-resheniya-zadachi-mnogokriterialnoy-optimizatsii-obzor&amp;usg=ALkJrhijni8GTlkMqaKc3fZh_NbEsryNsw


Processing URLs:  78%|███████▊  | 778/1000 [40:48<03:01,  1.22it/s]

Error extracting text from https://www.nytimes.com/2021/07/08/us/politics/intelligence-agencies-science.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/08/us/politics/intelligence-agencies-science.html


Processing URLs:  78%|███████▊  | 779/1000 [40:49<03:11,  1.15it/s]

Error extracting text from http://nvs24.com/news/world/Japanese-PM-to-visit-Russia-this-spring-meet-with-Putin-media-3341401.html: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=nvs24.com


Processing URLs:  78%|███████▊  | 783/1000 [40:55<05:11,  1.43s/it]

Error extracting text from http://www.banknotenews.com/files/tag-burundi.php: 404 Client Error: Not Found for url: https://www.banknotenews.com/files/tag-burundi.php


Processing URLs:  79%|███████▊  | 786/1000 [41:05<09:02,  2.53s/it]

Error extracting text from http://en.trend.az/business/energy/2654978.html: 404 Client Error: Not Found for url: https://www.trend.az/business/energy/2654978.html


Processing URLs:  79%|███████▉  | 789/1000 [41:07<04:23,  1.25s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=117005: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_Po_detail.htm?No=117005
Error extracting text from https://www.eureporter.co/politics/2016/10/01/czarnyprotest-members-of-the-european-parliament-stand-shoulder-to-shoulder-with-polish-women/: 403 Client Error: Forbidden for url: https://www.eureporter.co/politics/2016/10/01/czarnyprotest-members-of-the-european-parliament-stand-shoulder-to-shoulder-with-polish-women/


Processing URLs:  79%|███████▉  | 792/1000 [41:10<03:51,  1.11s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/09/21/world/nuclear-watchdog-chief-amano-pays-ceremonial-visit-irans-parchin-military-site/#.Vf9tn99Viko: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/09/21/world/nuclear-watchdog-chief-amano-pays-ceremonial-visit-irans-parchin-military-site/#.Vf9tn99Viko


Processing URLs:  80%|███████▉  | 795/1000 [41:14<03:28,  1.02s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-ruling-stakes-idUSKCN0ZS02U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-ruling-stakes-idUSKCN0ZS02U


Processing URLs:  80%|███████▉  | 796/1000 [41:44<32:53,  9.67s/it]

Error extracting text from https://www.washingtonpost.com/world/russian-warplanes-strike-deep-inside-islamic-states-heartland/2015/10/02/ace6dfcc-6866-11e5-bdb6-6861: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/russian-warplanes-strike-deep-inside-islamic-states-heartland/2015/10/02/ace6dfcc-6866-11e5-bdb6-6861/


Processing URLs:  80%|███████▉  | 798/1000 [41:57<25:46,  7.66s/it]

Error extracting text from https://www.hertz.ag/ag-industry/current-headlines/0702bf4e10222015115200/: 404 Client Error: Not Found for url: https://www.hertz.ag/ag-industry/current-headlines/0702bf4e10222015115200


Processing URLs:  80%|████████  | 802/1000 [42:01<08:08,  2.47s/it]

Error extracting text from http://www.nytimes.com/2015/12/29/business/dealbook/shkreli-volkswagen-and-other-stars-in-white-collar-crime.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/29/business/dealbook/shkreli-volkswagen-and-other-stars-in-white-collar-crime.html


Processing URLs:  81%|████████  | 807/1000 [42:10<06:15,  1.95s/it]

Error extracting text from http://www.theepochtimes.com/n3/2122089-china-admits-there-is-no-plan-b-for-economy/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2122089-china-admits-there-is-no-plan-b-for-economy/


Processing URLs:  82%|████████▏ | 816/1000 [42:24<06:03,  1.97s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-09/why-venezuela-struggles-so-hard-to-avoid-default-quicktake-q-a


Processing URLs:  82%|████████▏ | 818/1000 [43:24<45:16, 14.93s/it]

Error extracting text from http://mergermarketgroup.com/publication/monthly-ma-insider-may-2017/#.WSwImTOB3-Z: HTTPConnectionPool(host='mergermarketgroup.com', port=80): Max retries exceeded with url: /publication/monthly-ma-insider-may-2017/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3081a70b0>, 'Connection to mergermarketgroup.com timed out. (connect timeout=60)'))


Processing URLs:  82%|████████▏ | 821/1000 [43:29<20:27,  6.86s/it]

Error extracting text from http://www.wsj.com/articles/brazilian-police-search-offices-of-former-president-1457095180: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazilian-police-search-offices-of-former-president-1457095180


Processing URLs:  82%|████████▏ | 822/1000 [43:32<17:03,  5.75s/it]

URL filtered: http://www.spiegel.de/international/europe/death-of-brexit-at-the-hands-of-theresa-may-a-1152330.html?utm_source=dlvr.it&amp;utm_medium=facebook#ref=rss
Error extracting text from http://www.ohchr.org/EN/NewsEvents/Pages/DisplayNews.aspx?NewsID=17012: 403 Client Error: Forbidden for url: https://www.ohchr.org/EN/NewsEvents/Pages/DisplayNews.aspx?NewsID=17012


Processing URLs:  83%|████████▎ | 826/1000 [43:37<08:51,  3.06s/it]

Error extracting text from http://timesofindia.indiatimes.com/world/europe/UK-warship-dispatched-to-Aegean-Sea-to-turn-back-migrants/articleshow/51285620.cms: 410 Client Error: Gone for url: https://timesofindia.indiatimes.com/world/europe/UK-warship-dispatched-to-Aegean-Sea-to-turn-back-migrants/articleshow/51285620.cms


Processing URLs:  83%|████████▎ | 827/1000 [43:39<07:49,  2.71s/it]

Error extracting text from http://thecipherbrief.com/article/russian-influence-latin-america: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/russian-influence-latin-america


Processing URLs:  83%|████████▎ | 831/1000 [43:45<05:09,  1.83s/it]

Error extracting text from http://the-japan-news.com/news/article/0002648430: 404 Client Error: Not Found for url: https://japannews.yomiuri.co.jp/news/article/0002648430


Processing URLs:  83%|████████▎ | 833/1000 [43:50<06:15,  2.25s/it]

Error extracting text from http://isc.independent.gov.uk/news-archive/20december2017: 404 Client Error: Not Found for url: https://isc.independent.gov.uk:443/news-archive/20december2017


Processing URLs:  83%|████████▎ | 834/1000 [43:51<05:09,  1.86s/it]

Error extracting text from http://c.yang: HTTPConnectionPool(host='c.yang', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3081a5610>: Failed to resolve 'c.yang' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▎ | 836/1000 [43:52<03:23,  1.24s/it]

Error extracting text from https://servir.ciat.cgiar.org/2021-forecasts-show-average-fire-risk-in-the-southern-amazon-in-contrast-to-2020/: HTTPSConnectionPool(host='servir.ciat.cgiar.org', port=443): Max retries exceeded with url: /2021-forecasts-show-average-fire-risk-in-the-southern-amazon-in-contrast-to-2020/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'servir.ciat.cgiar.org'. (_ssl.c:1000)")))


Processing URLs:  84%|████████▍ | 839/1000 [43:53<01:54,  1.41it/s]

URL filtered: https://socialblade.com/youtube/user/tseries/realtime
Error extracting text from http://www.nytimes.com/2015/12/05/opinion/end-the-gun-epidemic-in-america.html?ref=topics&amp;_r=1: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/05/opinion/end-the-gun-epidemic-in-america.html?ref=topics&amp;_r=1


Processing URLs:  84%|████████▍ | 840/1000 [43:53<01:31,  1.74it/s]

Error extracting text from https://www.fintechfutures.com/2020/12/stripe-chases-100bn-valuation-with-no-sign-of-ipo/: 403 Client Error: Forbidden for url: https://www.fintechfutures.com/2020/12/stripe-chases-100bn-valuation-with-no-sign-of-ipo/


Processing URLs:  84%|████████▍ | 841/1000 [43:54<01:51,  1.42it/s]

URL filtered: http://www.t-online.de/nachrichten/ausland/id_78504588/iran-immer-mehr-ignorieren-verbote-von-facebook-und-twitter.html


Processing URLs:  84%|████████▍ | 844/1000 [43:59<02:39,  1.02s/it]

Error extracting text from http://www.nytimes.com/2015/12/15/upshot/why-very-low-interest-rates-may-stick-around.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/15/upshot/why-very-low-interest-rates-may-stick-around.html


Processing URLs:  85%|████████▍ | 846/1000 [44:01<02:56,  1.15s/it]

Error extracting text from https://www.debtclocks.eu/public-debt-and-budget-deficit-of-the-eurozone.html: HTTPSConnectionPool(host='www.debtclocks.eu', port=443): Max retries exceeded with url: /public-debt-and-budget-deficit-of-the-eurozone.html (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1000)')))


Processing URLs:  85%|████████▍ | 847/1000 [44:03<03:13,  1.26s/it]

Error extracting text from http://www.newsweek.com/no-fly-zone-over-syria-achievable-384171: 403 Client Error: Forbidden for url: https://www.newsweek.com/no-fly-zone-over-syria-achievable-384171


Processing URLs:  85%|████████▍ | 849/1000 [44:06<03:31,  1.40s/it]

Error extracting text from http://election.princeton.edu/2016/04/22/two-ways-to-estimate-primary-outcomes-without-polls/: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /04/22/two-ways-to-estimate-primary-outcomes-without-polls/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303418290>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▌ | 859/1000 [44:25<03:06,  1.32s/it]

Error extracting text from http://www.chicagotribune.com/news/opinion/commentary/ct-donald-trump-iowa-caucuses-ted-cruz-20160115-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/opinion/commentary/ct-donald-trump-iowa-caucuses-ted-cruz-20160115-story.html
Error extracting text from https://www.reuters.com/article/us-afghanistan-governor-atta-noor/stand-off-over-powerful-afghan-governor-foreshadows-bitter-election-fight-idUSKBN1EW07N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-governor-atta-noor/stand-off-over-powerful-afghan-governor-foreshadows-bitter-election-fight-idUSKBN1EW07N


Processing URLs:  86%|████████▋ | 863/1000 [44:29<02:13,  1.03it/s]

Error extracting text from http://www.latimes.com/world/europe/la-fg-jason-rezaian-release-20160118-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/europe/la-fg-jason-rezaian-release-20160118-story.html


Processing URLs:  87%|████████▋ | 866/1000 [44:37<03:38,  1.63s/it]

Error extracting text from https://www.nytimes.com/2017/02/14/world/europe/russia-cruise-missile-arms-control-treaty.html?emc=edit_na_20170214&amp;nl=breaking-news&amp;nlid=70183565&amp;ref=cta&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/14/world/europe/russia-cruise-missile-arms-control-treaty.html?emc=edit_na_20170214&amp;nl=breaking-news&amp;nlid=70183565&amp;ref=cta&amp;_r=0


Processing URLs:  87%|████████▋ | 870/1000 [44:43<02:49,  1.30s/it]

Error extracting text from http://www.nytimes.com/2016/05/18/us/politics/consensus-supreme-court-roberts.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/18/us/politics/consensus-supreme-court-roberts.html


Processing URLs:  87%|████████▋ | 872/1000 [44:46<02:53,  1.35s/it]

Error extracting text from http://europe.newsweek.com/zimbabwe-mugabes-zanu-pf-dismiss-opposition-protests-448291?rm=eu: 403 Client Error: Forbidden for url: https://www.newsweek.com/zimbabwe-mugabes-zanu-pf-dismiss-opposition-protests-448291


Processing URLs:  88%|████████▊ | 876/1000 [44:50<02:17,  1.11s/it]

Error extracting text from https://globalguessing.com/contact/: HTTPSConnectionPool(host='www.thirdimage.media', port=443): Max retries exceeded with url: /contact/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.thirdimage.media'. (_ssl.c:1000)")))


Processing URLs:  88%|████████▊ | 879/1000 [44:53<01:36,  1.25it/s]

Error extracting text from https://www.reuters.com/world/us/us-laser-focused-potential-terrorist-attack-by-taliban-foes-says-security-2021-08-19/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-laser-focused-potential-terrorist-attack-by-taliban-foes-says-security-2021-08-19/
URL filtered: https://www.youtube.com/watch?v=t4LWIP7SAjY


Processing URLs:  88%|████████▊ | 884/1000 [45:00<02:05,  1.08s/it]

Error extracting text from https://www.graphicnews.com/en/pages/35135/EU_Massive_EU_Brexit_bill_: 403 Client Error: Forbidden for url: https://www.graphicnews.com/en/pages/35135/EU_Massive_EU_Brexit_bill_
Error extracting text from http://www.business-standard.com/article/news-ians/nato-members-urged-to-increase-budgets-over-security-threats-115101300076_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/nato-members-urged-to-increase-budgets-over-security-threats-115101300076_1.html


Processing URLs:  88%|████████▊ | 885/1000 [45:01<01:50,  1.04it/s]

Error extracting text from https://www.insidesport.co/tokyo-olympics-not-just-covid-now-intense-heat-prediction-in-japan-a-problem-for-organisers/: 410 Client Error: Gone for url: https://www.insidesport.in/tokyo-olympics-not-just-covid-now-intense-heat-prediction-in-japan-a-problem-for-organisers/


Processing URLs:  89%|████████▊ | 887/1000 [45:04<02:35,  1.38s/it]

Error extracting text from http://en.apa.az/news/238354: 500 Server Error: Internal Server Error for url: https://en.apa.az/news/238354


Processing URLs:  89%|████████▉ | 888/1000 [45:04<02:02,  1.10s/it]

Error extracting text from http://caselaw.findlaw.com/us-supreme-court/478/109.html: 403 Client Error: Forbidden for url: https://caselaw.findlaw.com/us-supreme-court/478/109.html


Processing URLs:  89%|████████▉ | 889/1000 [45:06<02:14,  1.21s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7405773/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7405773/


Processing URLs:  89%|████████▉ | 890/1000 [45:06<01:41,  1.09it/s]

URL filtered: https://www.coindesk.com/bloomberg-analyst-bitcoin-etf-2021


Processing URLs:  89%|████████▉ | 893/1000 [45:11<02:13,  1.25s/it]

Error extracting text from http://www.israelhayom.com/site/newsletter_article.php?id=29439: 403 Client Error: Forbidden for url: https://www.israelhayom.com/site/newsletter_article.php?id=29439
URL filtered: http://www.bloomberg.com/news/articles/2015-03-11/venezuela-s-5-9-billion-cash-burn-raises-bond-concerns


Processing URLs:  90%|████████▉ | 895/1000 [45:12<01:53,  1.08s/it]

Error extracting text from https://www.reuters.com/world/food-prices-hit-record-high-february-un-agency-says-2022-03-04/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/food-prices-hit-record-high-february-un-agency-says-2022-03-04/


Processing URLs:  90%|████████▉ | 897/1000 [45:14<01:39,  1.03it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-11-07/pdvsa-credit-rating-cut-to-c-by-fitch-as-default-imminent


Processing URLs:  90%|█████████ | 905/1000 [45:25<02:14,  1.42s/it]

Error extracting text from http://news.antiwar.com/2017/01/10/iraq-pm-turkey-ties-cannot-improve-without-troop-pullout/: 403 Client Error: Forbidden for url: https://news.antiwar.com/2017/01/10/iraq-pm-turkey-ties-cannot-improve-without-troop-pullout/


Processing URLs:  91%|█████████ | 906/1000 [45:27<02:23,  1.53s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/nato-should-press-ahead-with-new-memberships-cee-countries/articleshow/51142202.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/world-news/nato-should-press-ahead-with-new-memberships-cee-countries/articleshow/51142202.cms


Processing URLs:  91%|█████████ | 908/1000 [45:30<02:22,  1.54s/it]

Error extracting text from https://oversight.house.gov/sites/democrats.oversight.house.gov/files/Committee%20on%20Oversight%20and%20Reform%20Coronavirus%20Relief%20Measures%20Legislation.pdf: 404 Client Error: Not Found for url: https://oversight.house.gov/sites/democrats.oversight.house.gov/files/Committee%20on%20Oversight%20and%20Reform%20Coronavirus%20Relief%20Measures%20Legislation.pdf


Processing URLs:  91%|█████████ | 911/1000 [45:38<03:10,  2.14s/it]

Error extracting text from http://www.defense.gov/News-Article-View/Article/643280/strikes-target-isil-terrorists-in-syria-iraq: 404 Client Error: Not Found for url: https://www.defense.gov/News-Article-View/Article/643280/strikes-target-isil-terrorists-in-syria-iraq
Error extracting text from http://blogs.wsj.com/chinarealtime/2016/07/20/china-warns-off-south-china-sea-protests/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/chinarealtime/2016/07/20/china-warns-off-south-china-sea-protests/


Processing URLs:  92%|█████████▏| 918/1000 [45:45<02:02,  1.49s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-01-27/merkel-cabinet-reshuffled-as-social-democrats-position-for-vote


Processing URLs:  92%|█████████▏| 922/1000 [45:48<01:19,  1.02s/it]

Error extracting text from http://www.wsj.com/articles/crude-rises-modestly-as-u-s-stockpiles-fall-1449743438: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/crude-rises-modestly-as-u-s-stockpiles-fall-1449743438


Processing URLs:  92%|█████████▏| 923/1000 [45:52<02:08,  1.67s/it]

Error extracting text from http://tass.ru/en/politics/866477: 404 Client Error: Not Found for url: https://tass.ru/en/politics/866477


Processing URLs:  92%|█████████▏| 924/1000 [45:53<01:52,  1.48s/it]

Error extracting text from http://atieva.com/join/open-vacancies: 404 Client Error: Unknown site for url: http://atieva.com/join/open-vacancies


Processing URLs:  93%|█████████▎| 927/1000 [46:59<23:16, 19.13s/it]

Error extracting text from http://www.miamiherald.com/news/business/international-business/article48346215.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 931/1000 [47:01<06:07,  5.32s/it]

Error extracting text from http://world.kbs.co.kr/english/news/news_IK_detail.htm?No=113743&amp;id=IK: 404 Client Error: Not Found for url: http://world.kbs.co.kr/english/news/news_IK_detail.htm?No=113743&amp;id=IK
URL filtered: https://www.youtube.com/watch?v=H61yfxU2AHg&amp;feature=youtu.be
Error extracting text from http://www.realclearpolitics.com/epolls/2016/senate/nc/north_carolina_senate_burr_vs_d_ross-5693.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/senate/nc/north_carolina_senate_burr_vs_d_ross-5693.html


Processing URLs:  94%|█████████▎| 937/1000 [47:25<06:31,  6.21s/it]

Error extracting text from https://www.almasdarnews.com/article/iraqi-army-captures-mahana-village-just-60-kilometers-isis-held-mosul/: 522 Server Error:  for url: https://www.almasdarnews.com/article/iraqi-army-captures-mahana-village-just-60-kilometers-isis-held-mosul/
URL filtered: https://twitter.com/thebrexitpoll/status/734116782991155202


Processing URLs:  94%|█████████▍| 944/1000 [47:29<01:15,  1.34s/it]

Error extracting text from http://budapestbeacon.com/featured-articles/eu-may-apply-rule-of-law-mechanism-to-hungary-as-new-issues-arise-says-emmons/40704: 403 Client Error: Forbidden for url: http://budapestbeacon.com/featured-articles/eu-may-apply-rule-of-law-mechanism-to-hungary-as-new-issues-arise-says-emmons/40704


Processing URLs:  94%|█████████▍| 945/1000 [48:29<16:48, 18.34s/it]

Error extracting text from https://www.usnews.com/news/national-news/articles/2017-09-18/robert-mueller-likely-has-donald-trumps-tax-returns: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  95%|█████████▍| 946/1000 [48:29<11:44, 13.05s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0WW1YO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0WW1YO


Processing URLs:  95%|█████████▌| 950/1000 [48:55<06:50,  8.22s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/fresh-wave-of-airstrikes-hit-syrias-aleppo/2016/10/14/d1eb06de-91e3-11e6-bc00-1a9756d4111b_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/fresh-wave-of-airstrikes-hit-syrias-aleppo/2016/10/14/d1eb06de-91e3-11e6-bc00-1a9756d4111b_story.html


Processing URLs:  96%|█████████▌| 956/1000 [49:02<01:23,  1.90s/it]

Error extracting text from http://www.parliament.uk/documents/post/postpn389_cyber-security-in-the-UK.pdf: 403 Client Error: Forbidden for url: http://www.parliament.uk/documents/post/postpn389_cyber-security-in-the-UK.pdf


Processing URLs:  96%|█████████▌| 957/1000 [49:13<03:07,  4.37s/it]

Error extracting text from https://www.washingtonpost.com/world/middle_east/sudan-appoints-prime-minister-for-first-time-since-1989/2017/03/02/02387b0e-ff4a-11e6-9b78-824ccab94435_story.html?_hsenc=p2ANqtz---_F-yw5-g-TD049YhlnMbGJ7uHoMhuRUd0662Wagthe7d8LLIBUSCYyGe9kj9bQ3mswiyviw3RqB1xNBMcve9Kpe9ig&amp;_hsmi=43762671&amp;utm_campaign=Daily%20Fringe%20&amp;utm_content=43762671&amp;utm_medium=email&amp;utm_source=hs_email&amp;utm_term=.c27c4753eb59: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/middle_east/sudan-appoints-prime-minister-for-first-time-since-1989/2017/03/02/02387b0e-ff4a-11e6-9b78-824ccab94435_story.html?_hsenc=p2ANqtz---_F-yw5-g-TD049YhlnMbGJ7uHoMhuRUd0662Wagthe7d8LLIBUSCYyGe9kj9bQ3mswiyviw3RqB1xNBMcve9Kpe9ig&amp;_hsmi=43762671&amp;utm_campaign=Daily%20Fringe%20&amp;utm_content=43762671&amp;utm_medium=email&amp;utm_source=hs_email&amp;utm_term=.c27c4753eb59


Processing URLs:  96%|█████████▌| 959/1000 [49:24<03:42,  5.43s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-05-08/johnson-rejects-scottish-referendum-calls-u-k-elections-update


Processing URLs:  96%|█████████▌| 961/1000 [49:24<02:00,  3.10s/it]

Error extracting text from http://aranews.net/2016/03/six-german-jihadis-reported-dead-coalition-airstrike-hit-isis-training-camp-north-iraq/: 404 Client Error: Not Found for url: http://aranews.net/2016/03/six-german-jihadis-reported-dead-coalition-airstrike-hit-isis-training-camp-north-iraq/


Processing URLs:  96%|█████████▌| 962/1000 [49:26<01:40,  2.64s/it]

Error extracting text from http://www.novelrank.com/asin/0307408868: HTTPSConnectionPool(host='www.novelrank.com', port=443): Max retries exceeded with url: /asin/0307408868 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  96%|█████████▋| 963/1000 [49:30<01:54,  3.09s/it]

Error extracting text from https://bit.ly/3ccmvy5: 404 Client Error: Not Found for url: http://thehighlandtimes.com/news/2021/03/09/greensill-capital-collapse-raises-concerns-over-the-future-of-gfg-alliance/liberty-steel/


Processing URLs:  97%|█████████▋| 966/1000 [49:33<00:59,  1.75s/it]

Error extracting text from http://peakoil.com/production/opec-is-studying-a-proposal-for-emergency-meeting: 403 Client Error: Forbidden for url: https://peakoil.com/production/opec-is-studying-a-proposal-for-emergency-meeting


Processing URLs:  97%|█████████▋| 967/1000 [49:36<01:03,  1.92s/it]

Error extracting text from http://uk.reuters.com/article/uk-russia-oil-doha-iran-idUKKCN0X1126: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
URL filtered: http://www.bloomberg.com/news/articles/2016-08-01/venezuelan-credit-dashboard-726-million-comes-due-in-august


Processing URLs:  97%|█████████▋| 970/1000 [49:39<00:41,  1.40s/it]

Error extracting text from http://en.apa.az/world-news/asia-news/iraqi-air-force-kills-19-isis-commanders-in-mosul.html: 404 Client Error: Not Found for url: https://en.apa.az/world-news/asia-news/iraqi-air-force-kills-19-isis-commanders-in-mosul.html


Processing URLs:  97%|█████████▋| 973/1000 [49:45<00:50,  1.85s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-06/poland-ramps-up-pressure-on-top-court-after-talk-of-compromise


Processing URLs:  98%|█████████▊| 978/1000 [49:50<00:21,  1.05it/s]

Error extracting text from https://www.tandfonline.com/doi/abs/10.1080/13619469408581315?journalCode=fcbh19: 403 Client Error: Forbidden for url: https://www.tandfonline.com/doi/abs/10.1080/13619469408581315?journalCode=fcbh19


Processing URLs:  98%|█████████▊| 980/1000 [49:55<00:31,  1.57s/it]

Error extracting text from https://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action#Enacted: 404 Client Error: Not Found for url: https://cyberlaw.stanford.edu/wiki/index.php/Automated_Driving:_Legislative_and_Regulatory_Action#Enacted


Processing URLs:  98%|█████████▊| 984/1000 [50:00<00:18,  1.18s/it]

Error extracting text from https://www.ctc.usma.edu/posts/a-view-from-the-ct-foxhole-an-interview-with-john-brennan-director-cia: HTTPSConnectionPool(host='www.ctc.usma.edu', port=443): Max retries exceeded with url: /posts/a-view-from-the-ct-foxhole-an-interview-with-john-brennan-director-cia (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.ctc.usma.edu'. (_ssl.c:1000)")))


Processing URLs:  98%|█████████▊| 985/1000 [50:00<00:14,  1.06it/s]

Error extracting text from http://asoiaf.westeros.org/index.php/topic/53288-the-winds-of-winter-the-latest-info-updated-4-april-2015/: 403 Client Error: Forbidden for url: http://asoiaf.westeros.org/index.php/topic/53288-the-winds-of-winter-the-latest-info-updated-4-april-2015/


Processing URLs:  99%|█████████▊| 987/1000 [50:01<00:09,  1.32it/s]

Error extracting text from http://www.publications.parliament.uk/pa/.../62/62.pdf: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/.../62/62.pdf


Processing URLs:  99%|█████████▉| 991/1000 [50:07<00:10,  1.19s/it]

Error extracting text from https://www.thenation.com/article/the-lost-alternatives-of-mikhail-gorbachev/: 404 Client Error: Not Found for url: https://www.thenation.com/article/the-lost-alternatives-of-mikhail-gorbachev/


Processing URLs: 100%|█████████▉| 995/1000 [50:14<00:08,  1.60s/it]

URL filtered: https://twitter.com/AHoweBlogger/status/937795422575263755


Processing URLs: 100%|█████████▉| 997/1000 [50:14<00:03,  1.04s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-24/all-opec-members-except-iran-libya-pledge-to-attend-doha-talks


Processing URLs: 100%|█████████▉| 999/1000 [50:15<00:00,  1.48it/s]

Error extracting text from http://www.wsj.com/articles/pentagon-not-ready-to-launch-biggest-spy-satellites-on-spacex-rockets-1470768726: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pentagon-not-ready-to-launch-biggest-spy-satellites-on-spacex-rockets-1470768726


Processing URLs: 100%|██████████| 1000/1000 [50:16<00:00,  3.02s/it]


Error extracting text from http://insideevs.com/norway-ev-sales-surge-in-september-with-volume-deliveries-of-tesla-model-x/: 404 Client Error: Not Found for url: https://insideevs.com:443/norway-ev-sales-surge-in-september-with-volume-deliveries-of-tesla-model-x/


Processing URLs:   0%|          | 4/1000 [01:05<6:53:25, 24.90s/it]

Error extracting text from https://www.usnews.com/news/top-news/articles/2017-11-11/trump-says-agrees-with-us-intelligence-assessment-of-russian-meddling: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   1%|          | 6/1000 [01:08<3:07:30, 11.32s/it]

Error extracting text from http://www.businessinsider.com.au/michigan-bill-autonomous-cars-2016-8?r=UK&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/michigan-bill-autonomous-cars-2016-8?r=UK&amp;IR=T
URL filtered: http://www.bloomberg.com/news/videos/2015-12-04/opec-agrees-to-increase-daily-production-target


Processing URLs:   1%|          | 10/1000 [01:19<1:32:27,  5.60s/it]

URL filtered: https://www.youtube.com/watch?v=GQMlWwIXg3M


Processing URLs:   1%|▏         | 14/1000 [01:25<49:46,  3.03s/it]  

Error extracting text from http://micanaldepanama.com/ampliacion/: 403 Client Error: Forbidden for url: https://pancanal.com/ampliacion/


Processing URLs:   2%|▏         | 17/1000 [01:30<31:26,  1.92s/it]

Error extracting text from https://www.pnas.org/content/118/23/e2022239118: 403 Client Error: Forbidden for url: https://www.pnas.org/content/118/23/e2022239118


Processing URLs:   2%|▏         | 19/1000 [01:34<31:27,  1.92s/it]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/09/19/sp-downgrades-venezuelas-pdvsa-swap-is-default/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/09/19/sp-downgrades-venezuelas-pdvsa-swap-is-default/


Processing URLs:   2%|▏         | 21/1000 [01:37<26:06,  1.60s/it]

Error extracting text from http://www.eleccionesenperu.com/noticias-intencion-voto-segun-nivel-educativo-encuesta-ipsos-3070.html: 436 Client Error:  for url: http://www.eleccionesenperu.com/noticias-intencion-voto-segun-nivel-educativo-encuesta-ipsos-3070.html


Processing URLs:   2%|▏         | 22/1000 [01:39<28:31,  1.75s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/latest-suicide-car-bombs-slow-iraqs-mosul-advance-43502670: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/latest-suicide-car-bombs-slow-iraqs-mosul-advance-43502670


Processing URLs:   3%|▎         | 27/1000 [01:45<26:50,  1.65s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-14/oil-freeze-everything-you-need-to-know-about-the-doha-summit
URL filtered: https://www.politico.com/story/2017/10/27/twitter-russia-election-data-244226


Processing URLs:   3%|▎         | 34/1000 [01:51<15:57,  1.01it/s]

URL filtered: https://www.bloomberg.com/quote/CL1:COM?sref=x7nYEkiY


Processing URLs:   4%|▎         | 36/1000 [01:52<11:34,  1.39it/s]

Error extracting text from http://www.poynter.org/news/mediawire/376765/high-stakes-foreign-trading-the-fate-of-jason-rezaian/: 403 Client Error: Forbidden for url: http://www.poynter.org/news/mediawire/376765/high-stakes-foreign-trading-the-fate-of-jason-rezaian/


Processing URLs:   4%|▍         | 42/1000 [02:06<30:16,  1.90s/it]

Error extracting text from https://www.berlin.de/en/tickets/miscellaneous/finally-open/2021-07-20-n-a-5a51c91a-45be-4e92-883a-c7d14a47590b/: 404 Client Error: Not Found for url: https://www.berlin.de/en/tickets/miscellaneous/finally-open/2021-07-20-n-a-5a51c91a-45be-4e92-883a-c7d14a47590b/


Processing URLs:   4%|▍         | 45/1000 [02:09<18:14,  1.15s/it]

Error extracting text from http://www.nytimes.com/2016/05/18/world/middleeast/isis-bombing-baghdad-iraq-market.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/18/world/middleeast/isis-bombing-baghdad-iraq-market.html


Processing URLs:   5%|▍         | 47/1000 [02:12<21:59,  1.38s/it]

Error extracting text from http://mobile.nytimes.com/2016/04/15/world/middleeast/irans-president-is-squeezed-by-opposition-leaders-demand-for-a-trial.html?ref=topics&amp;referer=http://topics.nytimes.com/top/reference/timestopics/people/k/mehdi_karroubi/index.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/04/15/world/middleeast/irans-president-is-squeezed-by-opposition-leaders-demand-for-a-trial.html?ref=topics&amp;referer=http://topics.nytimes.com/top/reference/timestopics/people/k/mehdi_karroubi/index.html


Processing URLs:   5%|▍         | 49/1000 [02:15<22:11,  1.40s/it]

Error extracting text from http://aranews.net/2016/05/us-led-coalition-major-general-isis-weaker-mosul-fight-will-hard/: 404 Client Error: Not Found for url: http://aranews.net/2016/05/us-led-coalition-major-general-isis-weaker-mosul-fight-will-hard/


Processing URLs:   6%|▌         | 56/1000 [02:28<18:06,  1.15s/it]

Error extracting text from http://www.komodoexercise.org/#!exercises/c1mi3: HTTPConnectionPool(host='www.komodoexercise.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303ec8aa0>: Failed to resolve 'www.komodoexercise.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://blogs.wsj.com/moneybeat/2016/06/10/will-brexit-hurt-the-euro-the-market-seems-to-think-not/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/moneybeat/2016/06/10/will-brexit-hurt-the-euro-the-market-seems-to-think-not/
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/2016/01/pdt-lanca-ciro-gomes-para-presidente-e-apoio-a-dilma-contra-impeachment-00748649.html&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://br.blastingnews.com/politica/201

Processing URLs:   6%|▌         | 58/1000 [02:32<25:46,  1.64s/it]

Error extracting text from http://buenosairesherald.com/article/203562/venezuela-opposition-slams-%E2%80%98late-incomplete-wrong%E2%80%99-unasur-mission: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/203562/venezuela-opposition-slams-%E2%80%98late-incomplete-wrong%E2%80%99-unasur-mission


Processing URLs:   6%|▌         | 61/1000 [02:35<21:25,  1.37s/it]

Error extracting text from http://www.csmonitor.com/USA/Military/2015/1223/In-Ramadi-battle-a-potential-model-for-rolling-back-ISIS: 500 Server Error: Internal Server Error for url: https://www.csmonitor.com/USA/Military/2015/1223/In-Ramadi-battle-a-potential-model-for-rolling-back-ISIS


Processing URLs:   6%|▋         | 63/1000 [02:37<17:56,  1.15s/it]

Error extracting text from http://www.businessinsider.com/r-nato-urges-montenegro-to-prove-readiness-for-accession-2015-10: 404 Client Error: Not Found for url: https://www.businessinsider.com/r-nato-urges-montenegro-to-prove-readiness-for-accession-2015-10


Processing URLs:   6%|▋         | 65/1000 [02:39<16:35,  1.06s/it]

Error extracting text from http://uk.reuters.com/article/us-usa-afghanistan-obama-exclusive-idUKKCN0YW055: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:   7%|▋         | 68/1000 [02:44<19:32,  1.26s/it]

Error extracting text from https://www.yahoo.com/news/m/9a6d1d49-7fe2-34fb-97fd-4429aaa35113/ss_monkey-wrench-in-the-drive-on.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/m/9a6d1d49-7fe2-34fb-97fd-4429aaa35113/ss_monkey-wrench-in-the-drive-on.html


Processing URLs:   7%|▋         | 70/1000 [02:47<21:33,  1.39s/it]

Error extracting text from http://elections.huffingtonpost.com/pollster/2016-new-hampshire-presidential-democratic-caucus: 404 Client Error: Not Found for url: https://elections.huffingtonpost.com/pollster/2016-new-hampshire-presidential-democratic-caucus


Processing URLs:   7%|▋         | 72/1000 [02:50<20:31,  1.33s/it]

Error extracting text from https://medium.com/@pmakela1/tactical-lessons-from-mosul-infantry-skills-matter-da3ced9ab8d6#.s0dt482vm: 403 Client Error: Forbidden for url: https://medium.com/@pmakela1/tactical-lessons-from-mosul-infantry-skills-matter-da3ced9ab8d6#.s0dt482vm


Processing URLs:   7%|▋         | 74/1000 [02:53<19:37,  1.27s/it]

Error extracting text from http://www.automobilwoche.de/article/20160714/AGENTURMELDUNGEN/307149907/kaufpramie-fur-elektroautos-vor-allem-privatleute-wollen-kaufpramie-fur-elektroautos: 403 Client Error: Forbidden for url: https://www.automobilwoche.de/article/20160714/AGENTURMELDUNGEN/307149907/kaufpramie-fur-elektroautos-vor-allem-privatleute-wollen-kaufpramie-fur-elektroautos


Processing URLs:   8%|▊         | 77/1000 [02:56<17:14,  1.12s/it]

Error extracting text from http://www.businessinsider.com.au/iran-deal-jason-rezaian-2015-11: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/iran-deal-jason-rezaian-2015-11


Processing URLs:   8%|▊         | 80/1000 [02:59<12:02,  1.27it/s]

Error extracting text from http://www.yenisafak.com/en/world/russia-delivers-weapons-to-ypg-in-syria-2350006: 422 Client Error:  for url: http://www.yenisafak.com/en/world/russia-delivers-weapons-to-ypg-in-syria-2350006
Error extracting text from http://www.nytimes.com/2016/03/11/world/middleeast/iran-executions-at-highest-level-since-89.html?ref=todayspaper: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/11/world/middleeast/iran-executions-at-highest-level-since-89.html?ref=todayspaper


Processing URLs:   8%|▊         | 81/1000 [02:59<09:11,  1.67it/s]

Error extracting text from https://www.carolinajournal.com/opinion-article/how-republican-is-north-carolina/: 403 Client Error: Forbidden for url: https://www.carolinajournal.com/opinion-article/how-republican-is-north-carolina/


Processing URLs:   8%|▊         | 85/1000 [03:04<14:48,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-usa-trump-russia-idUSKBN14Z041: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-russia-idUSKBN14Z041


Processing URLs:   9%|▉         | 92/1000 [03:22<22:59,  1.52s/it]

Error extracting text from http://finance.yahoo.com/video/china-warns-u-over-south-064705003.html: 400 Client Error: Invalid HTTP Request for url: https://finance.yahoo.com/video/china-warns-u-over-south-064705003.html
URL filtered: https://btcmanager.com/facebook-diem-libra-testnet-52-million-transactions/?utm_source=onesignal&amp;utm_medium=push&amp;utm_campaign=push%notification


Processing URLs:  10%|▉         | 96/1000 [03:25<17:31,  1.16s/it]

Error extracting text from http://eswi.org/knowledge-center/wp-content/uploads/sites/11/2014/07/global-influenza-vaccine-distribution-Vaccine-2015.pdf: 404 Client Error: Not Found for url: https://eswi.org/knowledge-center/wp-content/uploads/sites/11/2014/07/global-influenza-vaccine-distribution-Vaccine-2015.pdf


Processing URLs:  10%|█         | 100/1000 [03:28<10:33,  1.42it/s]

Error extracting text from https://www.chathamhouse.org/expert/comment/shift-syrian-constitution-could-help-assad-survive#: 403 Client Error: Forbidden for url: https://www.chathamhouse.org/expert/comment/shift-syrian-constitution-could-help-assad-survive
URL filtered: https://www.bloomberg.com/view/articles/2018-06-21/trump-s-space-force-is-no-joke


Processing URLs:  10%|█         | 103/1000 [03:30<08:37,  1.73it/s]

Error extracting text from http://www.reuters.com/article/us-usa-turkey-idUSKCN18C1ZT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-turkey-idUSKCN18C1ZT
URL filtered: https://www.bloomberg.com/news/articles/2018-01-01/north-korea-s-olympics-peace-bid-tests-u-s-south-korea-alliance


Processing URLs:  11%|█         | 111/1000 [03:46<41:15,  2.78s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-03-02/the-big-problem-with-china-s-bridge-and-tunnel-addiction


Processing URLs:  12%|█▏        | 116/1000 [03:47<11:54,  1.24it/s]

Error extracting text from http://www.nytimes.com/2016/01/16/world/americas/venezuela-declares-emergency-as-its-economy-falters.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/16/world/americas/venezuela-declares-emergency-as-its-economy-falters.html?_r=0
URL filtered: https://www.nytimes.com/2020/11/05/technology/donald-trump-twitter.html
Error extracting text from http://english.aawsat.com/2016/06/article55352692/operations-liberate-mosul-commences-phase-two-hamzah-mustafa: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/06/article55352692/operations-liberate-mosul-commences-phase-two-hamzah-mustafa


Processing URLs:  12%|█▏        | 120/1000 [03:53<17:19,  1.18s/it]

Error extracting text from http://www.newshub.co.nz/world/clinton-wants-further-iran-missile-sanctions-2016031013#axzz42aQruF8J: 404 Client Error: Not Found for url: https://www.newshub.co.nz/home/world/2016/03/clinton-wants-further-iran-missile-sanctions.html#axzz42aQruF8J


Processing URLs:  13%|█▎        | 130/1000 [04:07<19:45,  1.36s/it]

URL filtered: https://www.facebook.com/GuyVerhofstadt/posts/10154832083240016:0


Processing URLs:  13%|█▎        | 134/1000 [04:07<07:16,  1.98it/s]

URL filtered: https://twitter.com/jbloom_lab/status/1407445604029009923?s=21
Error extracting text from http://www.nytimes.com/2016/09/21/world/europe/erdogan-turkey-unga-2016-united-nations.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/09/21/world/europe/erdogan-turkey-unga-2016-united-nations.html
URL filtered: https://www.bloomberg.com/news/articles/2016-11-28/southeast-asia-currency-slide-inflates-a-20-billion-debt-bill


Processing URLs:  14%|█▍        | 139/1000 [04:10<08:30,  1.69it/s]

Error extracting text from https://www.reuters.com/article/us-ukraine-crisis-russia/russia-warns-u-s-warships-to-steer-clear-of-crimea-for-their-own-good-idUSKBN2C00WD?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-russia/russia-warns-u-s-warships-to-steer-clear-of-crimea-for-their-own-good-idUSKBN2C00WD?il=0


Processing URLs:  14%|█▍        | 141/1000 [04:12<09:10,  1.56it/s]

Error extracting text from http://thehill.com/policy/national-security/348295-top-trump-lawyer-asked-putin-spokesman-for-help-with-real-estate: 403 Client Error: Forbidden for url: https://thehill.com/policy/national-security/348295-top-trump-lawyer-asked-putin-spokesman-for-help-with-real-estate/


Processing URLs:  14%|█▍        | 145/1000 [04:20<23:54,  1.68s/it]

Error extracting text from http://www.mid.ru/en/foreign_policy/news/-/asset_publisher/cKNonkJE02Bw/content/id/2124391: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  15%|█▌        | 150/1000 [04:37<40:28,  2.86s/it]  

Error extracting text from http://www.securityweek.com/nsa-chief-worries-about-cyber-attack-us-infrastructure: 403 Client Error: Forbidden for url: https://www.securityweek.com/nsa-chief-worries-about-cyber-attack-us-infrastructure


Processing URLs:  16%|█▌        | 155/1000 [04:43<20:41,  1.47s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/watch-out-china-russia-are-working-together-sea-15767: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/watch-out-china-russia-are-working-together-sea-15767


Processing URLs:  16%|█▌        | 156/1000 [04:46<26:04,  1.85s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-10-19/fed-s-waller-sees-tapering-next-month-rate-hike-some-time-off


Processing URLs:  16%|█▌        | 158/1000 [04:47<15:58,  1.14s/it]

Error extracting text from http://www.japantimes.co.jp/news/2015/11/17/national/politics-diplomacy/abe-may-visit-russia-before-putin-visits-japan-russian-official/#.VlwGCb_3hKB: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/11/17/national/politics-diplomacy/abe-may-visit-russia-before-putin-visits-japan-russian-official/#.VlwGCb_3hKB


Processing URLs:  16%|█▌        | 159/1000 [04:47<15:08,  1.08s/it]

Error extracting text from http://www.reuters.com/article/2015/09/30/japan-economy-idUSL3N11Z5SQ20150930: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/30/japan-economy-idUSL3N11Z5SQ20150930


Processing URLs:  16%|█▋        | 164/1000 [04:54<17:13,  1.24s/it]

Error extracting text from https://www.nytimes.com/2017/04/06/world/middleeast/us-said-to-weigh-military-responses-to-syrian-chemical-attack.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/06/world/middleeast/us-said-to-weigh-military-responses-to-syrian-chemical-attack.html


Processing URLs:  17%|█▋        | 167/1000 [04:57<16:07,  1.16s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/02/07/0200000000AEN20160207001900315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.nato.int/cps/en/natohq/news_128096.htm?utm_source=facebook&amp;utm_medium=smc&amp;utm_campaign=160215+montenegro+acc+talks


Processing URLs:  17%|█▋        | 173/1000 [05:03<12:35,  1.09it/s]

Error extracting text from http://www.wsj.com/articles/volkswagen-nears-deal-to-make-electric-cars-in-china-1473229192: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/volkswagen-nears-deal-to-make-electric-cars-in-china-1473229192


Processing URLs:  18%|█▊        | 177/1000 [05:07<10:50,  1.27it/s]

Error extracting text from http://www.adweek.com/fishbowlny/sam-jacobs-time-digital/386756: 403 Client Error: Forbidden for url: https://www.adweek.com/fishbowlny/sam-jacobs-time-digital/386756
Error extracting text from http://www.reuters.com/article/us-g20-china-idUSKCN1140CN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-g20-china-idUSKCN1140CN


Processing URLs:  18%|█▊        | 179/1000 [05:12<20:19,  1.49s/it]

Error extracting text from http://nerdist.com/japan-accepts-americas-giant-robot-duel/: 403 Client Error: Forbidden for url: http://nerdist.com/japan-accepts-americas-giant-robot-duel/


Processing URLs:  18%|█▊        | 183/1000 [05:16<12:35,  1.08it/s]

Error extracting text from https://news.usni.org/2016/02/26/pacom-harris-u-s-would-ignore-a-destabilizing-chinese-south-china-sea-air-defense-identification-zone: 403 Client Error: Forbidden for url: https://news.usni.org/2016/02/26/pacom-harris-u-s-would-ignore-a-destabilizing-chinese-south-china-sea-air-defense-identification-zone


Processing URLs:  19%|█▊        | 187/1000 [05:32<47:31,  3.51s/it]

Error extracting text from http://computer-go.org/pipermail/computer-go/2016-March/008741.html: 404 Client Error: Not Found for url: https://www.computer-go.org/pipermail/computer-go/2016-March/008741.html


Processing URLs:  19%|█▉        | 188/1000 [05:33<37:16,  2.75s/it]

Error extracting text from http://www.un.org/africarenewal/magazine/october-2005/niger-famine-foretold: 403 Client Error: Forbidden for url: https://www.un.org/africarenewal/magazine/october-2005/niger-famine-foretold


Processing URLs:  19%|█▉        | 190/1000 [05:46<53:19,  3.95s/it]  

Error extracting text from https://www.nytimes.com/2016/12/15/world/asia/china-spratly-islands.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/12/15/world/asia/china-spratly-islands.html


Processing URLs:  19%|█▉        | 194/1000 [05:50<22:45,  1.69s/it]

Error extracting text from http://www.opensecrets.org/overview/topindivs.php: 403 Client Error: Forbidden for url: https://www.opensecrets.org/overview/topindivs.php
URL filtered: http://www.spiegel.de/netzwelt/web/facebook-staatsanwaltschaft-ermittelt-gegen-mark-zuckerberg-a-1119746.html


Processing URLs:  20%|█▉        | 198/1000 [06:04<31:55,  2.39s/it]

Error extracting text from http://allafrica.com/stories/201801120010.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201801120010.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe655af0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  20%|██        | 200/1000 [06:07<27:32,  2.07s/it]

Error extracting text from http://www.arkansasbusiness.com/article/108045/leading-from-the-front-on-ex-im-bank-randy-zook-commentary?utm_source=enews_110615&amp;utm_medium=email&amp;utm_content=daily-report&amp;utm_campaign=newsletter&amp;enews_zone=2020: 403 Client Error: Forbidden for url: https://www.arkansasbusiness.com/article/leading-from-the-front-on-ex-im-bank-randy-zook-commentary?utm_source=enews_110615&amp;utm_medium=email&amp;utm_content=daily-report&amp;utm_campaign=newsletter&amp;enews_zone=2020


Processing URLs:  21%|██        | 206/1000 [06:18<21:05,  1.59s/it]

Error extracting text from https://ahvalnews.com/turkey-politics/erdogan-sacks-trade-minister-after-nepotism-allegations: 403 Client Error: Forbidden for url: https://ahvalnews.com/turkey-politics/erdogan-sacks-trade-minister-after-nepotism-allegations


Processing URLs:  21%|██        | 207/1000 [06:18<17:29,  1.32s/it]

Error extracting text from http://www.govtech.com/policy/Pennsylvania-Pulls-Ahead-of-the-Pack-on-Self-Driving-Vehicles.html: 403 Client Error: Forbidden for url: https://www.govtech.com/policy/Pennsylvania-Pulls-Ahead-of-the-Pack-on-Self-Driving-Vehicles.html


Processing URLs:  21%|██        | 209/1000 [06:20<13:42,  1.04s/it]

Error extracting text from http://www.barrons.com/articles/venezuela-has-will-have-cash-makes-185m-debt-payment-1506032693: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/venezuela-has-will-have-cash-makes-185m-debt-payment-1506032693


Processing URLs:  21%|██        | 211/1000 [06:21<10:18,  1.28it/s]

Error extracting text from http://www.reuters.com/article/2015/11/25/eurozone-ecb-quandary-idUSL8N13K4UJ201511257: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/25/eurozone-ecb-quandary-idUSL8N13K4UJ201511257


Processing URLs:  21%|██        | 212/1000 [06:22<08:29,  1.55it/s]

Error extracting text from https://www.reuters.com/article/us-tennessee-blast/motor-home-explodes-in-nashville-possible-human-remains-found-near-site-idUSKBN28Z0SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tennessee-blast/motor-home-explodes-in-nashville-possible-human-remains-found-near-site-idUSKBN28Z0SB


Processing URLs:  22%|██▏       | 219/1000 [06:29<12:09,  1.07it/s]

Error extracting text from http://www.humanosphere.org/world-politics/2016/04/colombia-farc-struggle-reach-agreement-peace-deal/: 404 Client Error: Not Found for url: http://www.humanosphere.org/world-politics/2016/04/colombia-farc-struggle-reach-agreement-peace-deal/


Processing URLs:  22%|██▏       | 221/1000 [06:34<20:37,  1.59s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-22/in-crisis-hit-brazil-etf-traders-actually-see-less-volatility


Processing URLs:  23%|██▎       | 227/1000 [06:40<14:02,  1.09s/it]

Error extracting text from http://www.arabnews.com/news/449684: 403 Client Error: Forbidden for url: https://www.arabnews.com/news/449684


Processing URLs:  23%|██▎       | 231/1000 [06:49<28:11,  2.20s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://congressoemfoco.uol.com.br/noticias/mapa-do-impeachment-mostra-maioria-da-camara-ainda-indecisa/&amp;usg=ALkJrhiMnIoV_5eyDQqZHEWa6JN3lJdNgQ: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://congressoemfoco.uol.com.br/noticias/mapa-do-impeachment-mostra-maioria-da-camara-ainda-indecisa/&amp;usg=ALkJrhiMnIoV_5eyDQqZHEWa6JN3lJdNgQ


Processing URLs:  23%|██▎       | 232/1000 [06:51<28:14,  2.21s/it]

Error extracting text from http://www.themoscowtimes.com/business/article/why-russia-is-expanding-its-syrian-naval-base/531986.html: 500 Server Error: Internal Server Error for url: https://www.themoscowtimes.com/business/article/why-russia-is-expanding-its-syrian-naval-base/531986.html


Processing URLs:  24%|██▎       | 235/1000 [06:56<21:49,  1.71s/it]

Error extracting text from http://www.wsj.com/articles/opec-head-to-discuss-oil-output-cap-with-iran-iraq-venezuela-say-sources-1455697107: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/opec-head-to-discuss-oil-output-cap-with-iran-iraq-venezuela-say-sources-1455697107


Processing URLs:  24%|██▎       | 236/1000 [06:58<20:39,  1.62s/it]

Error extracting text from http://in.reuters.com/article/mideast-crisis-iraq-mosul-idINKBN0UC0LN20151229: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  24%|██▎       | 237/1000 [06:58<15:33,  1.22s/it]

Error extracting text from https://www.wsj.com/articles/google-says-it-wont-allow-its-artificial-intelligence-in-military-weapons-1528398091: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/google-says-it-wont-allow-its-artificial-intelligence-in-military-weapons-1528398091


Processing URLs:  24%|██▍       | 241/1000 [07:03<15:23,  1.22s/it]

Error extracting text from http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2286691: 403 Client Error: Forbidden for url: http://ir.sparktx.com/phoenix.zhtml?c=253900&amp;p=irol-newsArticle&amp;ID=2286691


Processing URLs:  24%|██▍       | 243/1000 [07:15<39:33,  3.14s/it]

Error extracting text from https://bitbet.us/bet/1249/alphago-will-defeat-lee-sedol-overall-in-march/: 403 Client Error: Forbidden for url: https://bitbet.us/bet/1249/alphago-will-defeat-lee-sedol-overall-in-march/
Error extracting text from http://www.farsnews.com/newstext.php?nn=13941207001732: HTTPConnectionPool(host='www.farsnews.com', port=80): Max retries exceeded with url: /newstext.php?nn=13941207001732 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5ca10>: Failed to resolve 'www.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  25%|██▍       | 247/1000 [07:18<19:54,  1.59s/it]

Error extracting text from https://www.thecipherbrief.com/column/expert-view/russian-b-team-1090: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/expert-view/russian-b-team-1090


Processing URLs:  25%|██▍       | 248/1000 [07:19<16:57,  1.35s/it]

Error extracting text from http://thehill.com/policy/energy-environment/317254-senator-army-corps-told-to-clear-way-for-dakota-access-construction: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/317254-senator-army-corps-told-to-clear-way-for-dakota-access-construction/


Processing URLs:  25%|██▍       | 249/1000 [07:19<13:16,  1.06s/it]

Error extracting text from https://www.nytimes.com/2017/12/02/us/politics/republicans-tax-cuts.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/02/us/politics/republicans-tax-cuts.html


Processing URLs:  26%|██▌       | 256/1000 [07:40<20:51,  1.68s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2016/06/03/0200000000AEN20160603000300315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  26%|██▌       | 259/1000 [08:44<2:54:25, 14.12s/it]

Error extracting text from http://www.reuters.com/article/2015/11/16/us-japan-economy-gdp-idUSKCN0T41CC20151116: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/us-japan-economy-gdp-idUSKCN0T41CC20151116


Processing URLs:  26%|██▌       | 261/1000 [08:46<1:28:48,  7.21s/it]

Error extracting text from https://www.nytimes.com/2021/06/17/health/covid-pill-antiviral.html?action=click&amp;module=Top%20Stories&amp;pgtype=Homepage: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/17/health/covid-pill-antiviral.html?action=click&amp;module=Top%20Stories&amp;pgtype=Homepage


Processing URLs:  26%|██▋       | 263/1000 [08:48<49:28,  4.03s/it]  

Error extracting text from http://www.nytimes.com/aponline/2016/01/30/us/politics/ap-us-clinton-emails.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/01/30/us/politics/ap-us-clinton-emails.html


Processing URLs:  26%|██▋       | 264/1000 [08:49<38:54,  3.17s/it]

Error extracting text from http://fuelfix.com/blog/2016/04/21/iran-rules-out-oil-production-freeze/: 403 Client Error: Forbidden for url: https://www.houstonchronicle.com/business/fuelfix/blog/2016/04/21/iran-rules-out-oil-production-freeze/


Processing URLs:  26%|██▋       | 265/1000 [09:51<4:15:33, 20.86s/it]

Error extracting text from http://aa.com.tr/en/asia-pacific/taliban-attacks-repelled-in-afghanistans-helmand/876415: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  27%|██▋       | 267/1000 [09:52<2:09:14, 10.58s/it]

Error extracting text from https://nyti.ms/3GORIEb: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/12/09/world/europe/ukraine-military-russia-invasion.html


Processing URLs:  27%|██▋       | 268/1000 [09:53<1:31:31,  7.50s/it]

Error extracting text from http://www.todayonline.com/world/brazil-court-authorizes-probes-former-rousseff-top-aide-sao-paulo-mayor: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/brazil-court-authorizes-probes-former-rousseff-top-aide-sao-paulo-mayor


Processing URLs:  27%|██▋       | 269/1000 [09:53<1:04:53,  5.33s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-kemp-idUSKCN0ZL1X6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-kemp-idUSKCN0ZL1X6


Processing URLs:  27%|██▋       | 274/1000 [10:04<37:19,  3.08s/it]  

URL filtered: https://twitter.com/elonmusk/status/1357422126161145856


Processing URLs:  28%|██▊       | 278/1000 [10:13<27:45,  2.31s/it]

Error extracting text from https://shar.es/1CAQbC: 404 Client Error: Not Found for url: https://shar.es/1CAQbC/


Processing URLs:  28%|██▊       | 281/1000 [10:18<20:52,  1.74s/it]

Error extracting text from http://predictwise.com/politics/2016-congress-senate#Link3: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016-congress-senate#Link3
Error extracting text from http://www.nytimes.com/2016/08/11/world/europe/poland-debate-values.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/11/world/europe/poland-debate-values.html?_r=0


Processing URLs:  28%|██▊       | 283/1000 [10:22<23:27,  1.96s/it]

Error extracting text from http://atimes.com/2015/12/groups-slam-bill-giving-immunity-to-myanmars-former-leaders/: 404 Client Error: Not Found for url: https://atimes.com/2015/12/groups-slam-bill-giving-immunity-to-myanmars-former-leaders/


Processing URLs:  29%|██▊       | 286/1000 [10:24<13:33,  1.14s/it]

Error extracting text from http://www.balkans.com/open-news.php?uniquenumber=211653: 404 Client Error: Not Found for url: http://www.balkans.com/open-news.php?uniquenumber=211653


Processing URLs:  29%|██▉       | 288/1000 [10:29<19:00,  1.60s/it]

Error extracting text from http://www.todayonline.com/world/iraq-pm-tells-kurds-not-use-mosul-battle-expand-territory?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/iraq-pm-tells-kurds-not-use-mosul-battle-expand-territory?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s


Processing URLs:  29%|██▉       | 293/1000 [10:36<20:37,  1.75s/it]

URL filtered: https://twitter.com/NASAWebb/status/1474083585216892939
URL filtered: https://www.youtube.com/watch?v=Zu8xxzhMZ_Y


Processing URLs:  30%|██▉       | 296/1000 [10:36<09:49,  1.20it/s]

Error extracting text from https://www.whitehouse.gov/the-press-office/2017/01/25/presidential-executive-order-enhancing-public-safety-interior-united: 404 Client Error: Not Found for url: https://www.whitehouse.gov/the-press-office/2017/01/25/presidential-executive-order-enhancing-public-safety-interior-united


Processing URLs:  30%|██▉       | 299/1000 [10:49<30:01,  2.57s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN15E1JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-assad-idUSKBN15E1JK


Processing URLs:  30%|███       | 301/1000 [10:53<25:48,  2.22s/it]

Error extracting text from http://www.newindianexpress.com/world/2017/apr/15/what-would-hardliner-ebrahim-raisi-mean-for-iran-and-the-world-1593886--1.html: 404 Client Error: Not Found for url: https://www.newindianexpress.com/world/2017/apr/15/what-would-hardliner-ebrahim-raisi-mean-for-iran-and-the-world-1593886--1.html


Processing URLs:  30%|███       | 302/1000 [10:54<24:04,  2.07s/it]

URL filtered: https://www.youtube.com/watch?v=wvDlMxGNe74


Processing URLs:  31%|███       | 307/1000 [11:03<21:19,  1.85s/it]

Error extracting text from http://africanarguments.org/2017/07/10/kenya-2017-elections-will-be-like-none-before-heres-why/: 403 Client Error: Forbidden for url: http://africanarguments.org/2017/07/10/kenya-2017-elections-will-be-like-none-before-heres-why/


Processing URLs:  31%|███       | 308/1000 [11:04<17:35,  1.53s/it]

Error extracting text from http://joshblackman.com/blog/2017/01/20/undone-with-his-first-executive-order-president-trump-begins-the-repeal-of-obamacare/: 403 Client Error: Forbidden for url: http://joshblackman.com/blog/2017/01/20/undone-with-his-first-executive-order-president-trump-begins-the-repeal-of-obamacare/


Processing URLs:  31%|███▏      | 313/1000 [11:11<15:32,  1.36s/it]

Error extracting text from http://strategicstudiesinstitute.army.mil/pubs/display.cfm?pubID: HTTPConnectionPool(host='strategicstudiesinstitute.army.mil', port=80): Max retries exceeded with url: /pubs/display.cfm?pubID (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fe8419d0>: Failed to resolve 'strategicstudiesinstitute.army.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  32%|███▏      | 316/1000 [11:15<12:01,  1.05s/it]

Error extracting text from http://www.cfr.org/afghanistan/strategic-reversal-afghanistan/p37947: 404 Client Error: Not Found for url: https://www.cfr.org/afghanistan/strategic-reversal-afghanistan/p37947
URL filtered: http://www.bloomberg.com/news/articles/2016-08-11/erdogan-s-approval-rating-soars-in-turkey-following-coup-attempt
Error extracting text from http://www.reuters.com/article/2015/10/08/us-brazil-rousseff-idUSKCN0S124S20151008: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/08/us-brazil-rousseff-idUSKCN0S124S20151008


Processing URLs:  32%|███▏      | 318/1000 [11:18<14:10,  1.25s/it]

Error extracting text from http://nationalinterest.org/feature/the-wests-silver-lining-turkey-russia-tensions-14509: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/the-wests-silver-lining-turkey-russia-tensions-14509


Processing URLs:  32%|███▎      | 325/1000 [11:32<21:35,  1.92s/it]

Error extracting text from http://www.barrons.com/articles/at-t-still-sees-time-warner-buy-by-year-end-1503309193: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/at-t-still-sees-time-warner-buy-by-year-end-1503309193


Processing URLs:  33%|███▎      | 326/1000 [11:32<16:07,  1.43s/it]

Error extracting text from http://www.wsj.com/articles/south-korean-president-park-geun-hyes-ouster-to-trigger-shift-on-u-s-policy-1489167265: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-korean-president-park-geun-hyes-ouster-to-trigger-shift-on-u-s-policy-1489167265


Processing URLs:  33%|███▎      | 329/1000 [11:41<23:22,  2.09s/it]

Error extracting text from https://www.nytimes.com/2021/05/18/us/politics/will-breyer-retire-supreme-court.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/05/18/us/politics/will-breyer-retire-supreme-court.html


Processing URLs:  33%|███▎      | 330/1000 [11:44<26:48,  2.40s/it]

Error extracting text from http://www.theweek.co.uk/eu-referendum/65461/eu-referendum-poll-shows-drop-in-support-for-brexit: 404 Client Error: Not Found for url: https://theweek.com/eu-referendum/65461/eu-referendum-poll-shows-drop-in-support-for-brexit


Processing URLs:  33%|███▎      | 332/1000 [11:47<22:48,  2.05s/it]

URL filtered: https://worldview.stratfor.com/article/us-reviews-its-stance-iran?utm_source=Twitter&amp;utm_medium=social&amp;utm_campaign=article


Processing URLs:  34%|███▎      | 336/1000 [11:55<18:53,  1.71s/it]

Error extracting text from https://www.wsj.com/articles/sweeping-tax-bill-heads-to-trump-for-his-signature-1513792578: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/sweeping-tax-bill-heads-to-trump-for-his-signature-1513792578


Processing URLs:  34%|███▍      | 338/1000 [11:58<19:15,  1.75s/it]

Error extracting text from http://www.math.uchicago.edu/calendar?calendar=Combinatorics%20and%20Theoretical%20Computer%20Science: 404 Client Error: Not Found for url: http://www.math.uchicago.edu/calendar?calendar=Combinatorics%20and%20Theoretical%20Computer%20Science


Processing URLs:  34%|███▍      | 340/1000 [12:05<25:20,  2.30s/it]

URL filtered: https://www.nytimes.com/2020/03/24/technology/virus-facebook-usage-traffic.html


Processing URLs:  34%|███▍      | 342/1000 [12:09<25:00,  2.28s/it]

Error extracting text from http://www.newsmovingmarkets.com/2016/02/venezuela-sovereign-debt-default-risk-cds.html: 404 Client Error: Not Found for url: http://www.newsmovingmarkets.com/2016/02/venezuela-sovereign-debt-default-risk-cds.html


Processing URLs:  35%|███▌      | 350/1000 [12:18<09:44,  1.11it/s]

Error extracting text from https://www.macrotrends.net/2480/brent-crude-oil-prices-10-year-daily-chart: 403 Client Error: Forbidden for url: https://www.macrotrends.net/2480/brent-crude-oil-prices-10-year-daily-chart
Error extracting text from http://www.france24.com/en/20151201-usa-deploys-special-forces-iraq-fight-group-terrorism-islamic-state: 403 Client Error: Forbidden for url: http://www.france24.com/en/20151201-usa-deploys-special-forces-iraq-fight-group-terrorism-islamic-state


Processing URLs:  35%|███▌      | 354/1000 [12:21<08:54,  1.21it/s]

Error extracting text from https://redfieldandwiltonstrategies.com/latest-gb-voting-intention-2-august-2021/: 403 Client Error: Forbidden for url: https://redfieldandwiltonstrategies.com/latest-gb-voting-intention-2-august-2021/


Processing URLs:  36%|███▌      | 355/1000 [12:23<11:16,  1.05s/it]

Error extracting text from http://www.ibtimes.com/electricity-through-solar-power-now-cheaper-fossil-fuels-wef-says-new-report-2465707: 403 Client Error: Forbidden for url: https://www.ibtimes.com/electricity-through-solar-power-now-cheaper-fossil-fuels-wef-says-new-report-2465707


Processing URLs:  36%|███▌      | 357/1000 [12:24<09:04,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-northkorea-missiles-oil-idUSKBN1A20GI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-oil-idUSKBN1A20GI
URL filtered: https://twitter.com/zerohedge/status/1281679937410404352
URL filtered: http://www.latimes.com/nation/la-na-pol-twitter-russia-20170928-story.html


Processing URLs:  36%|███▌      | 360/1000 [12:25<06:30,  1.64it/s]

Error extracting text from http://www.ibtimes.co.uk/senior-iraqi-military-commander-killed-by-isis-snipers-south-mosul-1565571: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/senior-iraqi-military-commander-killed-by-isis-snipers-south-mosul-1565571


Processing URLs:  36%|███▋      | 363/1000 [12:32<16:36,  1.57s/it]

Error extracting text from https://www.commentarymagazine.com/foreign-policy/middle-east/iran/iran-violates-deal-now/: 404 Client Error: Not Found for url: https://www.commentary.org/foreign-policy/middle-east/iran/iran-violates-deal-now/


Processing URLs:  37%|███▋      | 367/1000 [12:55<43:19,  4.11s/it]

Error extracting text from http://www.veteranstoday.com/2014/03/10/mind-control-in-the-21st-century/: 404 Client Error: Not Found for url: https://veteranstoday.com/2014/03/10/mind-control-in-the-21st-century/


Processing URLs:  37%|███▋      | 368/1000 [12:57<35:00,  3.32s/it]

Error extracting text from http://atimes.com/2016/02/the-us-navys-real-china-challenge-an-anti-access-swarm-strike/: 404 Client Error: Not Found for url: https://atimes.com/2016/02/the-us-navys-real-china-challenge-an-anti-access-swarm-strike/
Error extracting text from http://bigstory.ap.org/article/b10ccf5dadb54a8a87990c0053400c24/bulgaria-replaces-candidate-united-nations-top-job: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/b10ccf5dadb54a8a87990c0053400c24/bulgaria-replaces-candidate-united-nations-top-job (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304721550>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  37%|███▋      | 370/1000 [12:57<19:51,  1.89s/it]

Error extracting text from https://www.nytimes.com/2017/08/28/business/dealbook/vix-trading.html?emc=edit_th_20170829&amp;nl=todaysheadlines&amp;nlid=77825025: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/28/business/dealbook/vix-trading.html?emc=edit_th_20170829&amp;nl=todaysheadlines&amp;nlid=77825025


Processing URLs:  37%|███▋      | 373/1000 [13:00<14:26,  1.38s/it]

Error extracting text from https://cleantechnica.com/2016/12/22/china-electric-cars-sales-record-43441-november/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/12/22/china-electric-cars-sales-record-43441-november/


Processing URLs:  38%|███▊      | 375/1000 [14:03<3:13:50, 18.61s/it]

Error extracting text from http://government.ru/media/files/AF8APZOAtbAdQmORxrwCwvyNIGZAeGew.pdf: HTTPConnectionPool(host='government.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  38%|███▊      | 382/1000 [14:10<24:11,  2.35s/it]  

Error extracting text from http://warontherocks.com/2016/03/are-cia-backed-syrian-rebels-really-fighting-pentagon-backed-syrian-rebels/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/03/are-cia-backed-syrian-rebels-really-fighting-pentagon-backed-syrian-rebels/
Error extracting text from http://www.reuters.com/article/us-bis-annualreport-policy-idUSKCN0ZC0FP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-bis-annualreport-policy-idUSKCN0ZC0FP


Processing URLs:  38%|███▊      | 383/1000 [14:11<19:59,  1.94s/it]

Error extracting text from https://www.reuters.com/article/us-russia-putin-weapons/putin-calls-on-u-s-to-extend-new-start-arms-control-treaty-for-one-year-idUSKBN28R1TO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-putin-weapons/putin-calls-on-u-s-to-extend-new-start-arms-control-treaty-for-one-year-idUSKBN28R1TO


Processing URLs:  38%|███▊      | 385/1000 [14:11<12:12,  1.19s/it]

Error extracting text from http://www.vocativ.com/304836/migrants-panic-as-greece-plans-for-deportations/: 404 Client Error: Not Found for url: http://www.vocativ.com/304836/migrants-panic-as-greece-plans-for-deportations/


Processing URLs:  39%|███▊      | 386/1000 [14:12<10:36,  1.04s/it]

Error extracting text from http://allafrica.com/stories/201605040038.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201605040038.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3040e8110>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  39%|███▊      | 387/1000 [14:14<11:47,  1.15s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-internet-cybercrime-idUSKBN0U60Y820151223: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-internet-cybercrime-idUSKBN0U60Y820151223


Processing URLs:  39%|███▉      | 392/1000 [14:18<07:48,  1.30it/s]

Error extracting text from http://www.nytimes.com/2015/11/12/business/dealbook/a-debate-with-bernanke-over-the-feds-easy-money-policies.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/12/business/dealbook/a-debate-with-bernanke-over-the-feds-easy-money-policies.html
Error extracting text from http://www.reuters.com: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/


Processing URLs:  40%|███▉      | 395/1000 [14:26<17:29,  1.73s/it]

Error extracting text from https://abcnews.go.com/Health/wireStory/latest-covid-19-deaths-hit-daily-record-russia-78684786: 404 Client Error: Not Found for url: https://abcnews.go.com/Health/wireStory/latest-covid-19-deaths-hit-daily-record-russia-78684786


Processing URLs:  40%|███▉      | 396/1000 [14:26<13:44,  1.37s/it]

Error extracting text from http://www.agweb.com/crops/soybeans/: 403 Client Error: Forbidden for url: http://www.agweb.com/crops/soybeans/


Processing URLs:  40%|███▉      | 399/1000 [14:28<08:00,  1.25it/s]

Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-16-january-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-16-january-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x305687380>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  40%|████      | 402/1000 [14:29<05:41,  1.75it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKCN0VY18B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKCN0VY18B


Processing URLs:  41%|████      | 406/1000 [14:43<23:01,  2.33s/it]

Error extracting text from http://english.sina.com/world/2014/0602/705974.html: 404 Client Error: Not Found for url: https://www.sina.com.cn/404.html


Processing URLs:  41%|████      | 408/1000 [14:45<14:54,  1.51s/it]

Error extracting text from https://www.neweurope.eu/press-release/press-release-contrasting-views-on-the-state-of-greeces-adjustment-programme/: 403 Client Error: Forbidden for url: https://www.neweurope.eu/press-release/press-release-contrasting-views-on-the-state-of-greeces-adjustment-programme/


Processing URLs:  41%|████      | 412/1000 [14:57<24:14,  2.47s/it]

Error extracting text from http://www.gov.me/en/News/153894/Main-target-of-Democratic-Front-s-protests-in-Podgorica-was-sovereignty-of-Montenegro-PM-dukanovic-said-in-Parliament.html: 404 Client Error: not found for url: https://www.gov.me/en/News/153894/Main-target-of-Democratic-Front-s-protests-in-Podgorica-was-sovereignty-of-Montenegro-PM-dukanovic-said-in-Parliament.html


Processing URLs:  42%|████▏     | 415/1000 [16:03<3:14:26, 19.94s/it]

Error extracting text from https://www.nasdaq.com/market-activity/commodities/bz%3Anmx: HTTPSConnectionPool(host='www.nasdaq.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  42%|████▏     | 419/1000 [16:09<55:23,  5.72s/it]  

Error extracting text from https://missilethreat.csis.org/north-korea-missile-launches-1984-present/: 403 Client Error: Forbidden for url: https://missilethreat.csis.org/north-korea-missile-launches-1984-present/
Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN11418V?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN11418V?il=0


Processing URLs:  42%|████▎     | 425/1000 [16:18<15:43,  1.64s/it]

Error extracting text from https://www.wsj.com/articles/north-korea-missile-launch-hands-trump-foreign-policy-crisis-1499182303: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/north-korea-missile-launch-hands-trump-foreign-policy-crisis-1499182303


Processing URLs:  43%|████▎     | 426/1000 [16:19<15:36,  1.63s/it]

URL filtered: https://twitter.com/pranayrvaddi/status/1354092436264542208


Processing URLs:  43%|████▎     | 429/1000 [16:22<11:21,  1.19s/it]

Error extracting text from https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?crid=1SMJ7JSALNHCC&amp;keywords=dictators+handbook&amp;qid=1647397944&amp;sprefix=dictator%2Caps%2C117&amp;sr=8-1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Dictators-Handbook-Behavior-Almost-Politics/dp/1610391845/ref=sr_1_1?crid=1SMJ7JSALNHCC&amp;keywords=dictators+handbook&amp;qid=1647397944&amp;sprefix=dictator%2Caps%2C117&amp;sr=8-1


Processing URLs:  43%|████▎     | 434/1000 [16:39<24:29,  2.60s/it]

Error extracting text from http://www.khmertimeskh.com/news/20710/myanmar---s-president-exits-with-offer-to-help-nld/: 404 Client Error: Not Found for url: https://www.khmertimeskh.com/news/20710/myanmar---s-president-exits-with-offer-to-help-nld/


Processing URLs:  44%|████▎     | 436/1000 [16:41<18:14,  1.94s/it]

Error extracting text from http://www.publicpolicypolling.com/main/us-senate-2016/: 404 Client Error: Not Found for url: https://www.publicpolicypolling.com/main/us-senate-2016/


Processing URLs:  44%|████▍     | 440/1000 [16:45<11:01,  1.18s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/357210-trump-officials-grilled-by-lawmakers-over-russian-cyber-firm: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/357210-trump-officials-grilled-by-lawmakers-over-russian-cyber-firm/


Processing URLs:  44%|████▍     | 441/1000 [16:46<09:25,  1.01s/it]

Error extracting text from http://thehill.com/policy/defense/258282-senators-shred-obamas-syria-strategy: 403 Client Error: Forbidden for url: https://thehill.com/policy/defense/258282-senators-shred-obamas-syria-strategy/


Processing URLs:  44%|████▍     | 443/1000 [16:47<06:32,  1.42it/s]

Error extracting text from http://www.reuters.com/article/us-tesla-ratings-consumerreports-idUSKBN17S14Q?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-ratings-consumerreports-idUSKBN17S14Q?il=0


Processing URLs:  44%|████▍     | 444/1000 [16:48<08:38,  1.07it/s]

URL filtered: https://www.youtube.com/watch?v=SKRma7PDW10


Processing URLs:  45%|████▍     | 448/1000 [17:03<33:52,  3.68s/it]

Error extracting text from http://www.reuters.com/article/us-usa-autoshow-china-electric-idUSKBN14V1H3?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-autoshow-china-electric-idUSKBN14V1H3?il=0


Processing URLs:  45%|████▌     | 451/1000 [17:07<21:49,  2.39s/it]

Error extracting text from http://docdro.id/cGzM6xj: 404 Client Error: Not Found for url: http://docdro.id/cGzM6xj


Processing URLs:  45%|████▌     | 452/1000 [17:07<16:53,  1.85s/it]

Error extracting text from https://www.france24.com/en/20200909-ethiopia-s-tigray-region-defies-pm-abiy-with-illegal-election-1: 403 Client Error: Forbidden for url: https://www.france24.com/en/20200909-ethiopia-s-tigray-region-defies-pm-abiy-with-illegal-election-1


Processing URLs:  46%|████▌     | 455/1000 [17:11<12:49,  1.41s/it]

Error extracting text from http://aaj.tv/2017/04/asif-explains-how-india-exploiting-rivers/: 404 Client Error: Not Found for url: https://www.aaj.tv/2017/04/asif-explains-how-india-exploiting-rivers/


Processing URLs:  46%|████▌     | 456/1000 [17:11<09:54,  1.09s/it]

Error extracting text from https://www.nytimes.com/2017/11/20/world/europe/germany-merkel-coalition.html?rref=collection%2Fsectioncollection%2Fworld&amp;action=click&amp;contentCollection=world&amp;region=rank&amp;amp: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/20/world/europe/germany-merkel-coalition.html?rref=collection%2Fsectioncollection%2Fworld&amp;action=click&amp;contentCollection=world&amp;region=rank&amp;amp
URL filtered: https://twitter.com/AlborzHabibi/status/707981796999340032


Processing URLs:  46%|████▌     | 462/1000 [17:20<11:40,  1.30s/it]

Error extracting text from http://www.latimes.com/politics/la-pol-ca-russia-rohrabacher-subcommittee-20171026-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-pol-ca-russia-rohrabacher-subcommittee-20171026-story.html


Processing URLs:  46%|████▋     | 464/1000 [17:25<15:45,  1.76s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-27/rousseff-future-on-hold-as-brazilians-take-break-from-crisis


Processing URLs:  47%|████▋     | 466/1000 [17:27<12:08,  1.36s/it]

Error extracting text from http://nucleardiner.com/2015/07/31/jcpoa-the-arak-reactor/: HTTPConnectionPool(host='nucleardiner.com', port=80): Max retries exceeded with url: /2015/07/31/jcpoa-the-arak-reactor/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3073ce6f0>: Failed to resolve 'nucleardiner.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  47%|████▋     | 468/1000 [17:27<07:55,  1.12it/s]

Error extracting text from http://www.wsj.com/articles/turkey-launches-fresh-incursion-into-syria-1472911575: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkey-launches-fresh-incursion-into-syria-1472911575


Processing URLs:  47%|████▋     | 472/1000 [17:36<10:49,  1.23s/it]

Error extracting text from https://eurovision.tv/story/183-million-viewers-welcome-back-the-eurovision-song-contest: 403 Client Error: Forbidden for url: https://eurovision.tv/story/183-million-viewers-welcome-back-the-eurovision-song-contest


Processing URLs:  48%|████▊     | 476/1000 [17:42<11:50,  1.36s/it]

Error extracting text from http://www.reuters.com/article/us-usa-northkorea-missiletest-launch-idUSKBN18Q2E0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-northkorea-missiletest-launch-idUSKBN18Q2E0
URL filtered: https://www.youtube.com/watch?v=3WioP042zLA


Processing URLs:  48%|████▊     | 480/1000 [17:46<09:35,  1.11s/it]

Error extracting text from http://www.legalink.ch/Root/Sites/legalink/Resources/Questionnaires/IPOs/Asia/Legalink%20IPO_China.pdf: 403 Client Error: Forbidden for url: https://www.legalink.ch/Root/Sites/legalink/Resources/Questionnaires/IPOs/Asia/Legalink%20IPO_China.pdf


Processing URLs:  48%|████▊     | 484/1000 [17:53<11:20,  1.32s/it]

Error extracting text from https://www.reuters.com/article/venezuela-economy/venezuela-says-it-has-paid-2027-bond-coupon-idUSL2N1M12B7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/venezuela-economy/venezuela-says-it-has-paid-2027-bond-coupon-idUSL2N1M12B7


Processing URLs:  49%|████▉     | 490/1000 [19:00<1:18:38,  9.25s/it]

Error extracting text from http://www.wsj.com/articles/u-s-european-union-race-to-meet-deadline-on-safe-harbor-data-pact-1454223602: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-european-union-race-to-meet-deadline-on-safe-harbor-data-pact-1454223602


Processing URLs:  49%|████▉     | 491/1000 [19:02<1:00:07,  7.09s/it]

URL filtered: https://www.youtube.com/watch?v=n-azWgGI9iU
URL filtered: https://www.bloomberg.com/news/articles/2016-10-17/russia-s-mega-india-oil-deal-takes-turf-war-to-mideast-backyard


Processing URLs:  50%|████▉     | 499/1000 [19:12<17:32,  2.10s/it]  

URL filtered: https://twitter.com/ast309/status/898596613328740352


Processing URLs:  50%|█████     | 504/1000 [19:15<07:17,  1.13it/s]

Error extracting text from https://www.sciencedaily.com/releases/2015/12/151201141244.htm: 403 Client Error: Forbidden for url: https://www.sciencedaily.com/releases/2015/12/151201141244.htm
Error extracting text from https://www.opensecrets.org/overview/donordemographics.php: 403 Client Error: Forbidden for url: https://www.opensecrets.org/overview/donordemographics.php


Processing URLs:  51%|█████     | 508/1000 [19:22<09:14,  1.13s/it]

Error extracting text from https://www.reuters.com/world/europe/russia-extends-house-arrest-kremlin-critic-navalnys-spokesperson-2021-07-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/russia-extends-house-arrest-kremlin-critic-navalnys-spokesperson-2021-07-21/


Processing URLs:  51%|█████     | 510/1000 [19:36<28:22,  3.47s/it]

Error extracting text from https://www.predictit.org/markets/detail/4366/Which-party-will-control-the-Senate-after-2020-election: 403 Client Error: Forbidden for url: https://www.predictit.org/markets/detail/4366/Which-party-will-control-the-Senate-after-2020-election


Processing URLs:  51%|█████     | 512/1000 [19:37<16:06,  1.98s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-shelling-idUSKCN0VM0Q0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-shelling-idUSKCN0VM0Q0


Processing URLs:  52%|█████▏    | 516/1000 [19:44<13:18,  1.65s/it]

Error extracting text from http://www.reuters.com/article/us-ceraweek-saudi-shale-idUSKBN16G2TJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ceraweek-saudi-shale-idUSKBN16G2TJ


Processing URLs:  52%|█████▏    | 517/1000 [19:45<10:01,  1.25s/it]

Error extracting text from https://www.nytimes.com/2017/04/08/world/asia/china-xi-jinping-president-trump-xinhua.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/08/world/asia/china-xi-jinping-president-trump-xinhua.html?_r=0


Processing URLs:  52%|█████▏    | 518/1000 [19:46<10:11,  1.27s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-01/11/c_134998337.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-01/11/c_134998337.htm


Processing URLs:  52%|█████▏    | 519/1000 [19:47<09:10,  1.15s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-blm-bg-editorial-putin-c8e9042c-d4b6-11e5-a65b-587e721fb231-20160216-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-blm-bg-editorial-putin-c8e9042c-d4b6-11e5-a65b-587e721fb231-20160216-story.html


Processing URLs:  52%|█████▏    | 520/1000 [19:47<06:54,  1.16it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKCN0X428P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-rebels-idUSKCN0X428P
Error extracting text from https://www.reuters.com/business/swift-says-it-its-examining-which-entities-are-subject-sanctions-2022-03-01/).: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/swift-says-it-its-examining-which-entities-are-subject-sanctions-2022-03-01/).


Processing URLs:  53%|█████▎    | 527/1000 [19:57<11:16,  1.43s/it]

URL filtered: https://www.youtube.com/watch?v=PYjehPH3Y78&amp;t


Processing URLs:  53%|█████▎    | 531/1000 [20:01<10:24,  1.33s/it]

Error extracting text from https://www.newsweek.com/taiwan-lawmaker-accuses-chinas-xi-jinping-hypocrisy-after-davos-speech-1564826: 403 Client Error: Forbidden for url: https://www.newsweek.com/taiwan-lawmaker-accuses-chinas-xi-jinping-hypocrisy-after-davos-speech-1564826


Processing URLs:  53%|█████▎    | 532/1000 [20:03<10:15,  1.32s/it]

Error extracting text from https://tradingeconomics.com/euro-area/inflation-cpi: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/euro-area/inflation-cpi


Processing URLs:  54%|█████▎    | 535/1000 [20:05<07:08,  1.08it/s]

Error extracting text from http://21stcenturywire.com/2016/04/10/burundi-geopolitical-jewel-in-the-cross-hairs-of-regime-change-and-hybrid-war-pundits/: 403 Client Error: Forbidden for url: http://21stcenturywire.com/2016/04/10/burundi-geopolitical-jewel-in-the-cross-hairs-of-regime-change-and-hybrid-war-pundits/


Processing URLs:  54%|█████▎    | 537/1000 [20:08<09:57,  1.29s/it]

Error extracting text from https://www.ipsos-mori.com/researchpublications/researcharchive/3731/Half-of-people-in-nine-European-countries-believe-UK-will-vote-to-leave-the-EU.aspx: 403 Client Error: Forbidden for url: https://www.ipsos.com/en-uk/researchpublications/researcharchive/3731/Half-of-people-in-nine-European-countries-believe-UK-will-vote-to-leave-the-EU.aspx
Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0VU0XE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear-idUSKCN0VU0XE


Processing URLs:  54%|█████▍    | 539/1000 [20:08<05:47,  1.33it/s]

Error extracting text from http://www.nytimes.com/2015/09/15/world/asia/north-korea-plans-rocket-launch-that-could-lead-to-missile.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/15/world/asia/north-korea-plans-rocket-launch-that-could-lead-to-missile.html


Processing URLs:  54%|█████▍    | 540/1000 [20:09<05:16,  1.45it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2015/09/03/venezuela-default-risk-overstated: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2015/09/03/venezuela-default-risk-overstated


Processing URLs:  55%|█████▍    | 545/1000 [20:13<05:22,  1.41it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-08-03/iraq-s-kirkuk-province-spurns-plan-to-ship-its-oil-to-iran
Error extracting text from http://www.nytimes.com/2016/01/19/us/politics/evangelicals-see-donald-trump-as-man-of-conviction-if-not-faith.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/19/us/politics/evangelicals-see-donald-trump-as-man-of-conviction-if-not-faith.html


Processing URLs:  55%|█████▌    | 550/1000 [20:19<07:42,  1.03s/it]

Error extracting text from http://business.financialpost.com/investing/global-investor/imagine-a-world-in-2016-where-black-swans-really-do-come-true: 403 Client Error: Forbidden for url: https://financialpost.com/investing/global-investor/imagine-a-world-in-2016-where-black-swans-really-do-come-true


Processing URLs:  55%|█████▌    | 552/1000 [20:21<06:34,  1.14it/s]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/myanmar-sets-up-committee/2337734.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/myanmar-sets-up-committee/2337734.html
Error extracting text from https://www.arabnews.com/node/1766066/middle-east: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1766066/middle-east


Processing URLs:  55%|█████▌    | 553/1000 [20:22<06:01,  1.24it/s]

Error extracting text from https://sanjosespotlight.com/masks-will-be-required-indoors-in-santa-clara-county/: 403 Client Error: Forbidden for url: https://sanjosespotlight.com/masks-will-be-required-indoors-in-santa-clara-county/


Processing URLs:  55%|█████▌    | 554/1000 [20:23<06:21,  1.17it/s]

Error extracting text from https://run2-13tev.web.cern.ch/background/lhc-season-2-new-frontiers-physics: 404 Client Error: Not Found for url: https://run2-13tev.web.cern.ch/background/lhc-season-2-new-frontiers-physics


Processing URLs:  56%|█████▌    | 557/1000 [20:28<10:22,  1.41s/it]

Error extracting text from http://allafrica.com/stories/201511231595.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201511231595.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x306189580>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  56%|█████▌    | 559/1000 [20:35<15:05,  2.05s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-is-insight-idUSKCN0VB11X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-is-insight-idUSKCN0VB11X


Processing URLs:  56%|█████▋    | 565/1000 [20:48<14:00,  1.93s/it]

Error extracting text from http://syriadirect.org/news/economist-new-syrian-central-bank-policy-sign-of-%E2%80%98massively-depleted%E2%80%99-foreign-reserves/: 404 Client Error: Not Found for url: http://syriadirect.org/news/economist-new-syrian-central-bank-policy-sign-of-%E2%80%98massively-depleted%E2%80%99-foreign-reserves/


Processing URLs:  57%|█████▋    | 569/1000 [20:56<17:02,  2.37s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2016/Oct-13/376270-iran-warships-sent-to-waters-off-conflict-hit-yemen.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Oct-13/376270-iran-warships-sent-to-waters-off-conflict-hit-yemen.ashx


Processing URLs:  58%|█████▊    | 580/1000 [21:30<16:28,  2.35s/it]

Error extracting text from https://www.gjopen.com/comments/1371551.: 404 Client Error: Not Found for url: https://www.gjopen.com/comments/1371551.


Processing URLs:  58%|█████▊    | 583/1000 [22:08<1:19:29, 11.44s/it]

Error extracting text from http://www.president-office.gov.mm/en/: 522 Server Error:  for url: https://www.president-office.gov.mm/en/


Processing URLs:  58%|█████▊    | 585/1000 [22:10<41:25,  5.99s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16U33K: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16U33K


Processing URLs:  59%|█████▊    | 587/1000 [22:13<25:30,  3.71s/it]

Error extracting text from http://finance.yahoo.com/q?s=LMT: 404 Client Error: Not Found for url: https://finance.yahoo.com/q?s=LMT


Processing URLs:  59%|█████▉    | 590/1000 [22:16<12:04,  1.77s/it]

Error extracting text from https://www.yahoo.com/news/syrias-assad-says-willing-hold-early-presidential-vote-140606893.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/syrias-assad-says-willing-hold-early-presidential-vote-140606893.html


Processing URLs:  59%|█████▉    | 591/1000 [22:22<20:29,  3.01s/it]

URL filtered: https://www.bloomberg.com/news/articles/2018-02-13/u-s-strikes-said-to-kill-scores-of-russian-fighters-in-syria


Processing URLs:  59%|█████▉    | 593/1000 [22:22<11:36,  1.71s/it]

Error extracting text from https://townhall.com/tipsheet/mattvespa/2022/03/14/putin-places-spy-chiefs-under-house-arrest-over-ukrainian-fiasco-n2604484: 403 Client Error: Forbidden for url: https://townhall.com/tipsheet/mattvespa/2022/03/14/putin-places-spy-chiefs-under-house-arrest-over-ukrainian-fiasco-n2604484


Processing URLs:  59%|█████▉    | 594/1000 [22:22<09:16,  1.37s/it]

Error extracting text from https://venturebeat.com/2021/02/12/u-s-console-sales-just-had-the-best-january-in-more-than-a-generation/: 403 Client Error: Forbidden for url: https://venturebeat.com/2021/02/12/u-s-console-sales-just-had-the-best-january-in-more-than-a-generation/


Processing URLs:  60%|█████▉    | 595/1000 [23:22<1:52:54, 16.73s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/article77494532.html#storylink=cpy: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  60%|█████▉    | 597/1000 [23:23<1:00:10,  8.96s/it]

Error extracting text from https://www.wsj.com/articles/maduro-seeks-to-diffuse-tensions-in-venezuela-1491050119: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/maduro-seeks-to-diffuse-tensions-in-venezuela-1491050119


Processing URLs:  60%|█████▉    | 598/1000 [23:24<44:30,  6.64s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2016-11-24/china-puts-oil-over-politics-in-deal-to-boost-venezuela-output


Processing URLs:  60%|██████    | 600/1000 [23:25<25:35,  3.84s/it]

Error extracting text from http://maritime-executive.com/article/date-set-for-panama-canal-lock-fix: 404 Client Error: Not Found for url: https://maritime-executive.com/403.shtml


Processing URLs:  60%|██████    | 602/1000 [23:27<17:14,  2.60s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu-scotland-idUSKCN18A0C4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKCN18A0C4


Processing URLs:  60%|██████    | 604/1000 [23:29<11:47,  1.79s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/26/750041/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/26/750041/story.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN10Z07J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-turkey-idUSKCN10Z07J


Processing URLs:  61%|██████    | 606/1000 [23:43<30:44,  4.68s/it]

Error extracting text from https://www.washingtonpost.com/world/the_americas/nail-biter-race-for-perus-presidency-remains-tight/2016/06/07/be784a92-2d18-11e6-b9d5-3c3063f8332c_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/the_americas/nail-biter-race-for-perus-presidency-remains-tight/2016/06/07/be784a92-2d18-11e6-b9d5-3c3063f8332c_story.html


Processing URLs:  61%|██████    | 607/1000 [23:45<25:01,  3.82s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-12-02/inflation-pickup-too-little-too-late-to-keep-ecb-from-easing


Processing URLs:  61%|██████    | 609/1000 [23:45<14:37,  2.24s/it]

Error extracting text from http://election.princeton.edu/2016/11/24/a-lower-court-win-on-partisan-gerrymandering/: HTTPSConnectionPool(host='election.princeton.edu2016', port=443): Max retries exceeded with url: /11/24/a-lower-court-win-on-partisan-gerrymandering/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x303418a70>: Failed to resolve 'election.princeton.edu2016' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  61%|██████▏   | 614/1000 [23:51<09:29,  1.48s/it]

Error extracting text from https://www.sightmagazine.com.au/news/6330-fulani-attacks-in-nigeria-kill-more-than-1-200-investigation-finds: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  62%|██████▏   | 615/1000 [23:53<11:01,  1.72s/it]

URL filtered: https://www.stratfor.com/analysis/eu-britain-will-leave-behind?utm_source=LinkedIn&amp;utm_medium=social&amp;utm_campaign=article


Processing URLs:  62%|██████▏   | 617/1000 [23:54<07:58,  1.25s/it]

Error extracting text from http://uk.reuters.com/article/mideast-crisis-turkey-russia-nuclear-idUKL8N13Y31G20151209: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  62%|██████▏   | 622/1000 [25:15<2:08:45, 20.44s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/article68956712.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  62%|██████▎   | 625/1000 [25:21<55:43,  8.92s/it]  

URL filtered: https://techcrunch.com/2017/11/15/study-russian-twitter-bots-sent-45k-brexit-tweets-close-to-vote/
Error extracting text from http://www.nytimes.com/1998/01/09/world/house-graft-tracing-bhutto-millions-special-report-bhutto-clan-leaves-trail.html?pagewanted=all: 403 Client Error: Forbidden for url: http://www.nytimes.com/1998/01/09/world/house-graft-tracing-bhutto-millions-special-report-bhutto-clan-leaves-trail.html?pagewanted=all


Processing URLs:  63%|██████▎   | 634/1000 [25:31<07:42,  1.26s/it]

Error extracting text from http://www.panarmenian.net/eng/world/news/65948/: 403 Client Error: Forbidden for url: http://www.panarmenian.net/eng/world/news/65948/


Processing URLs:  64%|██████▎   | 636/1000 [25:33<05:33,  1.09it/s]

Error extracting text from http://www.reuters.com/article/us-brazil-rousseff-idUSKBN0TZ37D20151217: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-rousseff-idUSKBN0TZ37D20151217


Processing URLs:  64%|██████▍   | 638/1000 [26:35<1:53:40, 18.84s/it]

Error extracting text from http://www.usnews.com/news/business/articles/2015/12/10/vw-indicates-investigation-wont-spare-top-managers: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  64%|██████▍   | 639/1000 [26:37<1:22:41, 13.74s/it]

Error extracting text from https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/experts-bet-on-first-deepfakes-political-scandal: 404 Client Error: Not Found for url: https://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/experts-bet-on-first-deepfakes-political-scandal


Processing URLs:  64%|██████▍   | 642/1000 [26:39<30:50,  5.17s/it]  

Error extracting text from http://thehill.com/homenews/campaign/312076-trump-computers-have-complicated-lives-very-greatly: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/312076-trump-computers-have-complicated-lives-very-greatly/


Processing URLs:  64%|██████▍   | 643/1000 [26:40<22:31,  3.78s/it]

URL filtered: https://www.reuters.com/article/us-germany-cyber-facebook/german-domestic-spy-agency-hits-out-at-silicon-valley-idUSKBN1DR1XA


Processing URLs:  65%|██████▍   | 649/1000 [26:44<07:10,  1.23s/it]

Error extracting text from http://www.dvb.no/news/suu-kyi-pleased-with-first-days-of-new-reign/60607: 403 Client Error: Forbidden for url: https://www.dvb.no/news/suu-kyi-pleased-with-first-days-of-new-reign/60607


Processing URLs:  65%|██████▌   | 653/1000 [26:49<06:13,  1.08s/it]

Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-police-explainer/netanyahu-what-happens-next-idUSKCN1FX2WZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-police-explainer/netanyahu-what-happens-next-idUSKCN1FX2WZ?il=0


Processing URLs:  66%|██████▌   | 655/1000 [27:00<17:29,  3.04s/it]

Error extracting text from http://www.parl.gc.ca/parliamentarians/en/partystandings: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  66%|██████▌   | 660/1000 [27:16<12:44,  2.25s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/latest_polls/president/: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/latest_polls/president/


Processing URLs:  66%|██████▋   | 663/1000 [27:19<08:49,  1.57s/it]

URL filtered: https://www.linkedin.com/in/valette-liedtke-hendrickson-ph-d-07835824/


Processing URLs:  67%|██████▋   | 666/1000 [27:20<04:35,  1.21it/s]

Error extracting text from http://www.reuters.com/article/idUSKCN0XF29U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XF29U


Processing URLs:  68%|██████▊   | 676/1000 [27:40<05:42,  1.06s/it]

Error extracting text from http://warontherocks.com/2016/02/saving-ourselves-from-water-torture-in-the-south-china-sea/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/02/saving-ourselves-from-water-torture-in-the-south-china-sea/
Error extracting text from http://news.yahoo.com/us-government-informs-congress-plan-sell-two-warships-164022476.html: 404 Client Error: Not Found for url: http://news.yahoo.com/us-government-informs-congress-plan-sell-two-warships-164022476.html
Error extracting text from https://tas-nextev.taleo.net/careersection/nextev_mobile/jobdetail.ftl?job=16000115: HTTPSConnectionPool(host='tas-nextev.taleo.net', port=443): Max retries exceeded with url: /careersection/nextev_mobile/jobdetail.ftl?job=16000115 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x307b2ee70>: Failed to resolve 'tas-nextev.taleo.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 681/1000 [27:48<06:58,  1.31s/it]

Error extracting text from http://www.nytimes.com/2016/07/13/world/asia/south-china-sea-hague-ruling-philippines.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/13/world/asia/south-china-sea-hague-ruling-philippines.html


Processing URLs:  68%|██████▊   | 684/1000 [27:51<04:52,  1.08it/s]

Error extracting text from http://news.softpedia.com/news/hackers-steal-research-and-user-data-from-japanese-nuclear-research-lab-509380.shtml: 403 Client Error: Forbidden for url: https://news.softpedia.com/news/hackers-steal-research-and-user-data-from-japanese-nuclear-research-lab-509380.shtml
Error extracting text from http://www.reuters.com/article/us-eu-gazprom-nordstream-analysis-idUSKBN16V20S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-gazprom-nordstream-analysis-idUSKBN16V20S


Processing URLs:  69%|██████▊   | 687/1000 [27:56<08:32,  1.64s/it]

Error extracting text from https://reut.rs/3jsWyMH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics-draghi-factbox/factbox-the-obstacles-to-draghi-forming-a-government-in-italy-idUSKBN2A50T2?il=0


Processing URLs:  69%|██████▉   | 692/1000 [28:02<06:31,  1.27s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.oregional.com.br/2016/03/sinval-malheiros-aparece-como-indeciso-em-relacao-ao-impeachment-de-dilma_320340&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://www.oregional.com.br/2016/03/sinval-malheiros-aparece-como-indeciso-em-relacao-ao-impeachment-de-dilma_320340&amp;prev=search


Processing URLs:  69%|██████▉   | 694/1000 [28:16<22:29,  4.41s/it]

Error extracting text from https://www.washingtonpost.com/business/judge-to-hear-arguments-on-dakota-access-oil-pipeline-work/2017/02/28/8806165c-fd7d-11e6-9b78-824ccab94435_story.html?utm_term=.132bcb7b3a1b: 404 Client Error: Not Found for url: https://www.washingtonpost.com/business/judge-to-hear-arguments-on-dakota-access-oil-pipeline-work/2017/02/28/8806165c-fd7d-11e6-9b78-824ccab94435_story.html?utm_term=.132bcb7b3a1b


Processing URLs:  70%|██████▉   | 695/1000 [28:17<17:08,  3.37s/it]

Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950129001259: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950129001259 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304dd5eb0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  70%|██████▉   | 698/1000 [28:19<09:11,  1.82s/it]

Error extracting text from http://www.timesofisrael.com/iran-removes-core-of-arak-reactor-looks-to-sanctions-relief/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iran-removes-core-of-arak-reactor-looks-to-sanctions-relief/


Processing URLs:  70%|███████   | 701/1000 [28:23<07:32,  1.51s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XO0RK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XO0RK


Processing URLs:  70%|███████   | 705/1000 [28:30<07:10,  1.46s/it]

Error extracting text from http://www.un.org/depts/los/convention_agreements/texts/unclos/unclos_e.pdf: 403 Client Error: Forbidden for url: https://www.un.org/depts/los/convention_agreements/texts/unclos/unclos_e.pdf
Error extracting text from http://www.nytimes.com/2015/10/04/world/americas/chinas-ambitious-rail-projects-crash-into-harsh-realities-in-latin-america.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/04/world/americas/chinas-ambitious-rail-projects-crash-into-harsh-realities-in-latin-america.html


Processing URLs:  71%|███████   | 706/1000 [29:02<48:56,  9.99s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/19102-nld-leader-calls-region-speakers-to-nay-pyi-taw.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/19102-nld-leader-calls-region-speakers-to-nay-pyi-taw.html


Processing URLs:  71%|███████   | 711/1000 [29:11<13:21,  2.77s/it]

Error extracting text from http://www.consilium.europa.eu/media/32236/15-euco-art50-guidelines-en.pdf: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/media/32236/15-euco-art50-guidelines-en.pdf


Processing URLs:  71%|███████   | 712/1000 [29:12<10:09,  2.12s/it]

Error extracting text from http://thehill.com/homenews/campaign/361295-moore-campaign-attempts-to-discredit-accusers-story-of-sexual-assault: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/361295-moore-campaign-attempts-to-discredit-accusers-story-of-sexual-assault/


Processing URLs:  71%|███████▏  | 713/1000 [29:13<09:11,  1.92s/it]

Error extracting text from http://www.migrationobservatory.ox.ac.uk/briefings/who-counts-migrant-definitions-and-their-consequences: 403 Client Error: Forbidden for url: http://migrationobservatory.ox.ac.uk/briefings/who-counts-migrant-definitions-and-their-consequences


Processing URLs:  72%|███████▏  | 719/1000 [29:19<05:02,  1.08s/it]

URL filtered: http://www.bloomberg.com/quote/BNKTR:TI


Processing URLs:  72%|███████▏  | 724/1000 [29:27<06:13,  1.35s/it]

Error extracting text from https://balkaninsight.com/2021/04/01/north-macedonias-postponed-census-mired-in-legal-limbo/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/04/01/north-macedonias-postponed-census-mired-in-legal-limbo/


Processing URLs:  73%|███████▎  | 729/1000 [29:32<04:13,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/lawmakers-urge-u-s-to-block-irans-wto-bid-1475793488: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/lawmakers-urge-u-s-to-block-irans-wto-bid-1475793488
URL filtered: https://www.youtube.com/watch?v=QwbqiMktZfM


Processing URLs:  73%|███████▎  | 733/1000 [30:34<1:12:25, 16.28s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-09-05/trump-family-and-associates-to-be-in-russia-probe-crosshairs: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  74%|███████▎  | 735/1000 [30:49<52:05, 11.80s/it]  

URL filtered: https://twitter.com/earlywarnproj/status/709763927731806208
Error extracting text from http://news.yahoo.com/oil-export-ban-play-final-stage-talks-budget-183258539--politics.html: 404 Client Error: Not Found for url: http://news.yahoo.com/oil-export-ban-play-final-stage-talks-budget-183258539--politics.html


Processing URLs:  74%|███████▍  | 741/1000 [31:06<18:41,  4.33s/it]

Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:01c0407bb0524641977a6dc5ff9bed64: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:01c0407bb0524641977a6dc5ff9bed64 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761c980>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  74%|███████▍  | 743/1000 [31:09<12:58,  3.03s/it]

Error extracting text from https://thecipherbrief.com/article/tech/why-target-dnc-1092: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/tech/why-target-dnc-1092


Processing URLs:  74%|███████▍  | 744/1000 [31:11<12:36,  2.96s/it]

Error extracting text from http://www.boxofficemojo.com/movies: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/


Processing URLs:  75%|███████▍  | 747/1000 [31:16<08:32,  2.03s/it]

Error extracting text from https://www.wsj.com/articles/what-comes-after-the-trump-trade-for-markets-1490285715: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/what-comes-after-the-trump-trade-for-markets-1490285715


Processing URLs:  75%|███████▌  | 752/1000 [31:28<08:12,  1.98s/it]

Error extracting text from https://www.yahoo.com/finance/news/obliviously-huge-deal-us-navy-201336151.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/obliviously-huge-deal-us-navy-201336151.html


Processing URLs:  75%|███████▌  | 754/1000 [31:30<05:14,  1.28s/it]

Error extracting text from https://www.scotsman.com/news/politics/scottish-election-2021-record-number-of-voters-registered-for-thursdays-poll-3223484: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/scottish-election-2021-record-number-of-voters-registered-for-thursdays-poll-3223484


Processing URLs:  76%|███████▌  | 757/1000 [31:35<05:40,  1.40s/it]

Error extracting text from http://www.wsj.com/articles/inside-ubers-new-self-driving-cars-in-pittsburgh-1473847202: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/inside-ubers-new-self-driving-cars-in-pittsburgh-1473847202


Processing URLs:  76%|███████▌  | 758/1000 [31:45<15:43,  3.90s/it]

Error extracting text from https://www.washingtonpost.com/politics/whitehouse/qanda-the-complex-issues-of-the-russia-probe-special-counsel/2017/06/16/89390ef4-52d6-11e7-b74e-0d2785d3083d_story.html?utm_term=.d4b0f82f7fe7: 404 Client Error: Not Found for url: https://www.washingtonpost.com/politics/whitehouse/qanda-the-complex-issues-of-the-russia-probe-special-counsel/2017/06/16/89390ef4-52d6-11e7-b74e-0d2785d3083d_story.html?utm_term=.d4b0f82f7fe7


Processing URLs:  76%|███████▌  | 759/1000 [31:46<13:09,  3.28s/it]

URL filtered: https://www.youtube.com/watch?v=4bv_ALKkTjQ


Processing URLs:  77%|███████▋  | 767/1000 [31:59<06:35,  1.70s/it]

URL filtered: https://twitter.com/NatSecCNN/status/768860904666566656


Processing URLs:  77%|███████▋  | 771/1000 [33:02<1:02:56, 16.49s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-03-29/gop-predicts-neil-gorsuch-will-be-confirmed-to-supreme-court: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  78%|███████▊  | 775/1000 [33:09<20:16,  5.41s/it]  

Error extracting text from https://www.nytimes.com/2017/08/11/us/politics/combative-trump-pulls-his-punches-for-one-man-putin.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/11/us/politics/combative-trump-pulls-his-punches-for-one-man-putin.html


Processing URLs:  78%|███████▊  | 781/1000 [33:20<09:32,  2.62s/it]

Error extracting text from https://www.ctc.usma.edu/posts/a-caliphate-under-strain-the-documentary-evidence: HTTPSConnectionPool(host='www.ctc.usma.edu', port=443): Max retries exceeded with url: /posts/a-caliphate-under-strain-the-documentary-evidence (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.ctc.usma.edu'. (_ssl.c:1000)")))


Processing URLs:  78%|███████▊  | 782/1000 [33:21<07:07,  1.96s/it]

Error extracting text from https://seekingalpha.com/article/4101256-t-time-warner-latest-merger-update: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4101256-t-time-warner-latest-merger-update


Processing URLs:  78%|███████▊  | 784/1000 [33:23<05:02,  1.40s/it]

Error extracting text from http://www.au.af.mil/au/ssq/: HTTPConnectionPool(host='www.au.af.mil', port=80): Max retries exceeded with url: /au/ssq/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304723230>: Failed to resolve 'www.au.af.mil' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▊  | 787/1000 [33:26<04:05,  1.15s/it]

URL filtered: https://www.facebook.com/berkeleybreathed/photos/a.114529165244512.10815.108793262484769/1558033984227349/?type=3&amp;theater


Processing URLs:  79%|███████▉  | 793/1000 [33:31<02:34,  1.34it/s]

Error extracting text from http://app.debka.com/p/article/25613/Rushed-evacuation-of-US-nukes-from-Incirlik: 404 Client Error: Not Found for url: http://app.debka.com/p/article/25613/Rushed-evacuation-of-US-nukes-from-Incirlik
Error extracting text from https://www.wsj.com/articles/oil-prices-rise-on-compliance-on-opec-production-cuts-1486726340: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/oil-prices-rise-on-compliance-on-opec-production-cuts-1486726340


Processing URLs:  80%|████████  | 801/1000 [33:44<04:53,  1.47s/it]

Error extracting text from http://www.newsweek.com/david-cameron-brexit-tata-steel-panama-papers-ids-resignation-445522: 403 Client Error: Forbidden for url: https://www.newsweek.com/david-cameron-brexit-tata-steel-panama-papers-ids-resignation-445522


Processing URLs:  81%|████████  | 808/1000 [34:00<07:16,  2.27s/it]

Error extracting text from https://www.google.ch/amp/www.telegraph.co.uk/news/2018/01/13/jacob-zuma-booed-pressure-grows-south-african-president-step/amp/: 404 Client Error: Not Found for url: https://www.telegraph.co.uk/news/2018/01/13/jacob-zuma-booed-pressure-grows-south-african-president-step/amp/


Processing URLs:  81%|████████  | 810/1000 [34:04<06:23,  2.02s/it]

Error extracting text from http://en.trend.az/business/energy/2438665.html: 404 Client Error: Not Found for url: https://www.trend.az/business/energy/2438665.html


Processing URLs:  81%|████████  | 811/1000 [34:04<04:38,  1.48s/it]

Error extracting text from https://www.wsj.com/articles/in-afghan-debate-is-there-a-lesson-in-the-2011-pullout-from-iraq-1501752604?mod=nwsrl_brussels_beat&amp;cx_refModule=nwsrl: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-afghan-debate-is-there-a-lesson-in-the-2011-pullout-from-iraq-1501752604?mod=nwsrl_brussels_beat&amp;cx_refModule=nwsrl


Processing URLs:  81%|████████▏ | 813/1000 [34:05<02:51,  1.09it/s]

Error extracting text from http://www.timesofisrael.com/iranian-navy-announces-20-military-drills-over-coming-months/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/iranian-navy-announces-20-military-drills-over-coming-months/


Processing URLs:  82%|████████▏ | 818/1000 [34:11<03:20,  1.10s/it]

Error extracting text from http://www.postwesternworld.com/2016/02/13/venezuela-declared-historical/: 406 Client Error: Not Acceptable for url: http://www.postwesternworld.com/2016/02/13/venezuela-declared-historical/


Processing URLs:  82%|████████▏ | 821/1000 [34:15<02:54,  1.03it/s]

Error extracting text from http://www.nytimes.com/2015/10/31/business/international/bank-of-japan-stimulus-abenomics.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/31/business/international/bank-of-japan-stimulus-abenomics.html?_r=0
Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-anc-to-force-zuma-to-quit-as-president-enca-tv-idUSKBN1F90A5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-anc-to-force-zuma-to-quit-as-president-enca-tv-idUSKBN1F90A5


Processing URLs:  82%|████████▏ | 824/1000 [34:16<01:45,  1.66it/s]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKCN1B4230: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKCN1B4230
Error extracting text from http://www.reuters.com/article/2015/10/28/us-brazil-rousseff-impeachment-idUSKCN0SM2NC20151028: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/28/us-brazil-rousseff-impeachment-idUSKCN0SM2NC20151028


Processing URLs:  82%|████████▎ | 825/1000 [34:16<01:28,  1.97it/s]

Error extracting text from http://www.reuters.com/article/us-usa-venezuela-military-response-idUSKBN1AS014: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-venezuela-military-response-idUSKBN1AS014


Processing URLs:  83%|████████▎ | 830/1000 [34:22<01:55,  1.48it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-china-idUSKCN10R10R?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-china-idUSKCN10R10R?il=0
Error extracting text from http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?emc=edit_th_20160610&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/10/business/tesla-model-s-nhtsa-suspension-failure.html?emc=edit_th_20160610&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  83%|████████▎ | 831/1000 [34:22<01:32,  1.84it/s]

Error extracting text from http://www.wsj.com/articles/the-new-dictators-club-1471908089: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/the-new-dictators-club-1471908089


Processing URLs:  83%|████████▎ | 832/1000 [34:24<02:19,  1.21it/s]

Error extracting text from https://www.stripes.com/news/afghan-forces-break-from-tradition-and-launch-winter-offensive-1.440504: 404 Client Error: Not Found for url: https://www.stripes.com/news/afghan-forces-break-from-tradition-and-launch-winter-offensive-1.440504


Processing URLs:  84%|████████▍ | 840/1000 [34:30<01:45,  1.52it/s]

Error extracting text from http://www.bakermckenzie.com/-/media/images/insight/publications/2017/01/gtf/globaltransactions2017.pdf: 403 Client Error: Forbidden for url: http://www.bakermckenzie.com/-/media/images/insight/publications/2017/01/gtf/globaltransactions2017.pdf


Processing URLs:  84%|████████▍ | 843/1000 [34:31<01:08,  2.31it/s]

Error extracting text from http://www.nato.int/cps/en/natohq/topics_49736.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/topics_49736.htm?selectedLocale=en
Error extracting text from http://www.totalsportek.com/cricket/icc-twenty20-world-cup-2016-predictions/: HTTPConnectionPool(host='www.totalsportek.com', port=80): Max retries exceeded with url: /cricket/icc-twenty20-world-cup-2016-predictions/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ff3db8f0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  84%|████████▍ | 845/1000 [34:35<02:22,  1.08it/s]

Error extracting text from https://www.devex.com/news/un-secretary-general-dialogues-did-anyone-come-out-on-top-88039: 403 Client Error: Forbidden for url: https://www.devex.com/news/un-secretary-general-dialogues-did-anyone-come-out-on-top-88039


Processing URLs:  85%|████████▍ | 848/1000 [34:43<04:48,  1.90s/it]

Error extracting text from http://www.takepart.com/article/2016/04/14/big-soda-wins-california-soda-tax-dies-legislature: 404 Client Error: Not Found for url: https://participant.com/article/2016/04/14/big-soda-wins-california-soda-tax-dies-legislature


Processing URLs:  85%|████████▌ | 854/1000 [34:50<02:39,  1.09s/it]

Error extracting text from http://stage.peaceinsider.com/: HTTPConnectionPool(host='stage.peaceinsider.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306189370>: Failed to resolve 'stage.peaceinsider.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/12/01/us-usa-fed-evans-idUSKBN0TK5FB20151201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/us-usa-fed-evans-idUSKBN0TK5FB20151201


Processing URLs:  86%|████████▌ | 856/1000 [34:58<06:06,  2.54s/it]



Processing URLs:  86%|████████▌ | 857/1000 [35:00<05:28,  2.30s/it]

Error extracting text from https://pledgetimes.com/2021/06/disinformation-why-america-is-demanding-the-extradition-of-a-russian-businessman/: 404 Client Error: Not Found for url: https://pledgetimes.com/2021/06/disinformation-why-america-is-demanding-the-extradition-of-a-russian-businessman/
URL filtered: https://www.youtube.com/watch?v=FMLI5qI8WU0


Processing URLs:  86%|████████▌ | 859/1000 [35:00<03:08,  1.34s/it]

Error extracting text from http://www.realclearpolitics.com/articles/2017/01/26/trumps_dizzying_first_week_sets_pace_for_lawmakers.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/articles/2017/01/26/trumps_dizzying_first_week_sets_pace_for_lawmakers.html


Processing URLs:  86%|████████▌ | 860/1000 [35:00<02:34,  1.10s/it]

Error extracting text from http://www.wsj.com/articles/china-rethinks-its-alliance-with-reeling-venezuela-1473628506: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-rethinks-its-alliance-with-reeling-venezuela-1473628506


Processing URLs:  86%|████████▌ | 862/1000 [35:02<02:16,  1.01it/s]

Error extracting text from http://www.reuters.tv/v/2Gu/2016/10/17/iraqis-meet-isis-resistance-as-battle-for-mosul-begins: HTTPConnectionPool(host='www.reuters.tv', port=80): Max retries exceeded with url: /v/2Gu/2016/10/17/iraqis-meet-isis-resistance-as-battle-for-mosul-begins (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306188590>: Failed to resolve 'www.reuters.tv' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  86%|████████▋ | 863/1000 [35:03<02:04,  1.10it/s]

Error extracting text from http://www.tradingeconomics.com/venezuela/rating: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/venezuela/rating


Processing URLs:  87%|████████▋ | 866/1000 [35:06<02:09,  1.04it/s]

Error extracting text from http://allafrica.com/stories/201607280845.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201607280845.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x304a22d20>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from https://www.mofa.go.jp/press/release/press4e_001968.html: 403 Client Error: Forbidden for url: https://www.mofa.go.jp/press/release/press4e_001968.html


Processing URLs:  87%|████████▋ | 868/1000 [35:18<06:25,  2.92s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKCN0VH213: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-usa-idUSKCN0VH213


Processing URLs:  87%|████████▋ | 871/1000 [35:21<03:24,  1.58s/it]

Error extracting text from http://thehill.com/homenews/administration/315634-rubio-to-vote-for-tillerson: 403 Client Error: Forbidden for url: https://thehill.com/homenews/administration/315634-rubio-to-vote-for-tillerson/
URL filtered: http://www.bloomberg.com/news/articles/2016-03-22/frantic-phone-call-failed-to-contain-china-indonesia-sea-spat


Processing URLs:  87%|████████▋ | 873/1000 [35:22<02:27,  1.16s/it]

Error extracting text from http://data.unhcr.org/mediterranean/regional.php#_ga=1.245950604.904439853.1452068963: 404 Client Error: Not Found for url: https://data.unhcr.org:443/mediterranean/regional.php#_ga=1.245950604.904439853.1452068963


Processing URLs:  88%|████████▊ | 875/1000 [35:30<04:49,  2.32s/it]

Error extracting text from https://www.nord-stream2.com/media-info/news-events/the-offshore-part-of-one-line-of-nord-stream-2-has-been-mechanically-completed-149/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /media-info/news-events/the-offshore-part-of-one-line-of-nord-stream-2-has-been-mechanically-completed-149/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe655820>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  88%|████████▊ | 879/1000 [35:34<02:56,  1.46s/it]

Error extracting text from http://www.aina.org/news/20160729192605.htm: 404 Client Error:  for url: http://www.aina.org/news/20160729192605.htm
Error extracting text from http://www.reuters.com/article/us-southkorea-politics-ban-idUSKBN14W0A4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southkorea-politics-ban-idUSKBN14W0A4


Processing URLs:  88%|████████▊ | 880/1000 [35:36<03:02,  1.52s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/10/11/national/crime-legal/toyama-tritium-researchers-data-targeted-cyberattacks/#.WAGiySSTPUs: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/10/11/national/crime-legal/toyama-tritium-researchers-data-targeted-cyberattacks/#.WAGiySSTPUs


Processing URLs:  88%|████████▊ | 885/1000 [35:40<01:34,  1.22it/s]

Error extracting text from http://www.balkaninsight.com/en/article/bosnia-seeks-better-ties-with-iran-02-22-2016: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/bosnia-seeks-better-ties-with-iran-02-22-2016


Processing URLs:  89%|████████▊ | 886/1000 [35:41<01:45,  1.08it/s]

Error extracting text from http://www.peruthisweek.com/news-cpi-survey-keiko-ppk-109095: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-cpi-survey-keiko-ppk-109095 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304a23440>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://thehill.com/policy/technology/560839-facebook-antitrust-victory-poses-big-test-for-new-ftc-chief


Processing URLs:  89%|████████▉ | 888/1000 [35:44<02:15,  1.21s/it]

URL filtered: https://nypost.com/2020/12/30/biden-picks-ex-facebook-lawyer-to-be-white-house-staff-secretary/
URL filtered: https://www.youtube.com/watch?v=iE5-Gz2cfLc


Processing URLs:  89%|████████▉ | 891/1000 [35:45<01:13,  1.49it/s]

Error extracting text from http://www.inquirer.net/specials/exclusive-china-militarization-south-china-sea: 403 Client Error: Forbidden for url: https://www.inquirer.net/specials/exclusive-china-militarization-south-china-sea


Processing URLs:  89%|████████▉ | 892/1000 [35:46<01:26,  1.24it/s]

Error extracting text from https://www.reuters.com/article/us-usa-china-alaska/top-american-chinese-diplomats-clash-publicly-at-start-of-first-talks-of-biden-presidency-idUSKBN2BA2A7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-alaska/top-american-chinese-diplomats-clash-publicly-at-start-of-first-talks-of-biden-presidency-idUSKBN2BA2A7


Processing URLs:  90%|████████▉ | 897/1000 [35:48<00:46,  2.22it/s]

Error extracting text from http://www.latimes.com/world/europe/la-fg-spain-politics-deadlock-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/europe/la-fg-spain-politics-deadlock-snap-story.html
URL filtered: https://www.youtube.com/watch?v=7EPxjoatllI


Processing URLs:  90%|████████▉ | 898/1000 [35:48<00:40,  2.50it/s]

Error extracting text from https://www.wsj.com/articles/eu-set-to-sanction-syria-scientists-military-officers-over-chemical-attacks-1500244200: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/eu-set-to-sanction-syria-scientists-military-officers-over-chemical-attacks-1500244200


Processing URLs:  90%|█████████ | 900/1000 [35:52<01:43,  1.04s/it]

Error extracting text from https://pca-cpa.org/wp-content/uploads/sites/175/2016/07/PH-CN-20160712-Award.pdf: 404 Client Error: Not Found for url: https://pca-cpa.org/wp-content/uploads/sites/175/2016/07/PH-CN-20160712-Award.pdf


Processing URLs:  90%|█████████ | 902/1000 [35:59<03:39,  2.24s/it]

Error extracting text from http://fleminggazette.com/5138/u-s-usa-pledges-to-raise-iran-missile-test-at-un-security/: 404 Client Error: Not Found for url: http://www.fleminggazette.com/5138/u-s-usa-pledges-to-raise-iran-missile-test-at-un-security/


Processing URLs:  90%|█████████ | 903/1000 [36:00<03:05,  1.91s/it]

Error extracting text from https://news.google.com/newspapers?nid=897&amp;dat=19491201&amp;id=PtEKAAAAIBAJ&amp;sjid=FFADAAAAIBAJ&amp;pg=3186,1095855&amp;hl=en: 404 Client Error: Not Found for url: https://news.google.com/newspapers?nid=897&amp;dat=19491201&amp;id=PtEKAAAAIBAJ&amp;sjid=FFADAAAAIBAJ&amp;pg=3186,1095855&amp;hl=en


Processing URLs:  90%|█████████ | 905/1000 [36:03<02:49,  1.78s/it]

Error extracting text from http://www.nba.com/2016/news/07/21/nba-statement-all-star-game-relocation-from-charlotte/: 404 Client Error: Not Found for url: https://www.nba.com/2016/news/07/21/nba-statement-all-star-game-relocation-from-charlotte


Processing URLs:  91%|█████████ | 907/1000 [36:04<01:36,  1.03s/it]

Error extracting text from https://www.nti.org/analysis/articles/cns-north-korea-missile-test-database/.: 403 Client Error: Forbidden for url: https://www.nti.org/analysis/articles/cns-north-korea-missile-test-database/
Error extracting text from http://www.nytimes.com/2015/12/19/world/asia/us-bomber-mistakenly-flew-near-disputed-island-in-south-china-sea.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/19/world/asia/us-bomber-mistakenly-flew-near-disputed-island-in-south-china-sea.html?_r=0


Processing URLs:  91%|█████████ | 911/1000 [36:35<11:40,  7.87s/it]

Error extracting text from http://www.washingtonpost.com/news/wonkblog/wp/2015/09/18/why-the-fed-might-not-raise-rates-anytime-soon/: 404 Client Error: Not Found for url: https://www.washingtonpost.com/news/wonkblog/wp/2015/09/18/why-the-fed-might-not-raise-rates-anytime-soon/


Processing URLs:  91%|█████████▏| 913/1000 [36:37<06:10,  4.26s/it]

Error extracting text from http://www.reuters.com/article/us-health-diabetes-medtronic-idUSKCN11Z04Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-diabetes-medtronic-idUSKCN11Z04Y


Processing URLs:  92%|█████████▏| 917/1000 [36:45<03:17,  2.38s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/North-Korea-may-be-preparing-for-missile-launch-Japan-gov-t-source: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/North-Korea-may-be-preparing-for-missile-launch-Japan-gov-t-source


Processing URLs:  92%|█████████▏| 918/1000 [36:45<02:25,  1.77s/it]

Error extracting text from https://africanarguments.org/2021/04/chad-france-firmly-backs-continuity-but-will-the-people/: 403 Client Error: Forbidden for url: https://africanarguments.org/2021/04/chad-france-firmly-backs-continuity-but-will-the-people/


Processing URLs:  92%|█████████▏| 919/1000 [36:46<02:08,  1.59s/it]

Error extracting text from https://yhoo.it/36QwR3W: 404 Client Error: Not Found for url: https://finance.yahoo.com/amphtml/news/biggest-iranian-flotilla-yet-en-025239653.html?guccounter=1


Processing URLs:  92%|█████████▏| 921/1000 [36:48<01:35,  1.20s/it]

Error extracting text from https://www.nytimes.com/2017/12/16/us/politics/unidentified-flying-object-navy.html?action=click&contentCollection=Politics&module=RelatedCoverage&region=Marginalia&pgtype=article: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/16/us/politics/unidentified-flying-object-navy.html?action=click&contentCollection=Politics&module=RelatedCoverage&region=Marginalia&pgtype=article


Processing URLs:  92%|█████████▏| 922/1000 [36:49<01:32,  1.19s/it]

URL filtered: https://mobile.twitter.com/EuropeElects/status/842486476847898626


Processing URLs:  92%|█████████▏| 924/1000 [36:50<01:03,  1.20it/s]

Error extracting text from http://ndb.int/China-inks-deal-with-BRICS-bank-approving-525-mn-yuan-loan.php: HTTPConnectionPool(host='ndb.int', port=80): Max retries exceeded with url: /China-inks-deal-with-BRICS-bank-approving-525-mn-yuan-loan.php (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303418ef0>: Failed to resolve 'ndb.int' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  92%|█████████▎| 925/1000 [36:51<00:53,  1.40it/s]

Error extracting text from http://www.scotsman.com/news/politics/alyn-smith-mep-eu-open-to-scottish-independence-post-brexit-1-4183136: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/alyn-smith-mep-eu-open-to-scottish-independence-post-brexit-1-4183136


Processing URLs:  93%|█████████▎| 928/1000 [37:05<03:12,  2.68s/it]

Error extracting text from https://www.nytimes.com/2017/01/25/us/politics/refugees-immigrants-wall-trump.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/01/25/us/politics/refugees-immigrants-wall-trump.html


Processing URLs:  93%|█████████▎| 933/1000 [37:13<02:05,  1.87s/it]

Error extracting text from http://www.turkishweekly.net/2016/07/12/news/eu-commissioner-sends-warm-message-to-turkey-on-visa-deal/: 404 Client Error: Not Found for url: https://turkishweekly.net/2016/07/12/news/eu-commissioner-sends-warm-message-to-turkey-on-visa-deal/
Error extracting text from http://www.nytimes.com/2016/02/20/world/europe/kosovo-opposition-tear-gas-parliament.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/20/world/europe/kosovo-opposition-tear-gas-parliament.html


Processing URLs:  94%|█████████▎| 937/1000 [37:15<00:57,  1.10it/s]

Error extracting text from https://www.tuko.co.ke/247833-uhuru-lures-north-eastern-voters-multi-billion-projects.html: 410 Client Error: Gone for url: https://www.tuko.co.ke/247833-uhuru-lures-north-eastern-voters-multi-billion-projects.html
Error extracting text from https://blog.openai.com/ai-and-compute/: HTTPSConnectionPool(host='blog.openai.com', port=443): Max retries exceeded with url: /ai-and-compute/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3061897c0>: Failed to resolve 'blog.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://bigstory.ap.org/article/3327956a32d7485a86725e141caa0b3b/carter-arrives-iraq-talks-how-beef-fight: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/3327956a32d7485a86725e141caa0b3b/carter-arrives-iraq-talks-how-beef-fight (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304a21790>: Failed t

Processing URLs:  94%|█████████▍| 938/1000 [37:17<01:07,  1.09s/it]

Error extracting text from https://www.reuters.com/article/us-iraq-security-targets/iran-believed-to-have-deliberately-missed-u-s-forces-in-iraq-strikes-sources-idUSKBN1Z7283: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iraq-security-targets/iran-believed-to-have-deliberately-missed-u-s-forces-in-iraq-strikes-sources-idUSKBN1Z7283


Processing URLs:  94%|█████████▍| 943/1000 [37:21<00:46,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-usa-oilexports-congress-idUSKBN0TR2V520151208: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-oilexports-congress-idUSKBN0TR2V520151208


Processing URLs:  94%|█████████▍| 944/1000 [37:22<00:37,  1.49it/s]

Error extracting text from https://www.nytimes.com/2022/02/01/world/europe/putin-russia-ukraine.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2022/02/01/world/europe/putin-russia-ukraine.html


Processing URLs:  95%|█████████▍| 948/1000 [37:26<00:58,  1.13s/it]

URL filtered: https://www.youtube.com/watch?v=RerJWv5vwxc


Processing URLs:  95%|█████████▌| 951/1000 [37:29<00:48,  1.00it/s]

Error extracting text from http://1forall.us/teach-the-first-amendment/the-first-amendment/#a4: 403 Client Error: Forbidden for url: https://mtsu.edu/first-amendment/page/about-1-for-all#a4
URL filtered: http://www.businessinsider.com/russians-facebook-black-lives-matter-muslim-group-disinformation-2017-9


Processing URLs:  95%|█████████▌| 953/1000 [37:29<00:31,  1.50it/s]

Error extracting text from https://www.amazon.com/Amazon-Prime-Air/b?node=8037720011: 503 Server Error: Service Unavailable for url: https://www.amazon.com/Amazon-Prime-Air/b?node=8037720011


Processing URLs:  96%|█████████▌| 958/1000 [37:39<01:04,  1.53s/it]

Error extracting text from http://en.trend.az/azerbaijan/business/2462109.html: 404 Client Error: Not Found for url: https://www.trend.az/azerbaijan/business/2462109.html


Processing URLs:  96%|█████████▌| 962/1000 [37:51<01:48,  2.85s/it]

Error extracting text from http://www.ibtimes.com/isis-cyber-attack-us-government-planes-threatened-malware-hacking-islamic-state-2242272: 403 Client Error: Forbidden for url: https://www.ibtimes.com/isis-cyber-attack-us-government-planes-threatened-malware-hacking-islamic-state-2242272
URL filtered: https://www.kcrg.com/2021/03/18/us-russia-ties-nosedive-after-biden-putin-tit-for-tat/?utm_source=facebook&amp;utm_medium=social&amp;utm_campaign=snd&amp;utm_content=kcrg&amp;fbclid=IwAR3ukQ6KFDfxioxWuWteyNzlel_Ee9XxdyTE_smU6UTIy2O2q6wGQuAD0yM


Processing URLs:  97%|█████████▋| 966/1000 [37:53<00:41,  1.21s/it]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2015/06/22-russia-sanctions/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2015/06/22-russia-sanctions/
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-france-idUSKCN0YV15P: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-france-idUSKCN0YV15P


Processing URLs:  97%|█████████▋| 967/1000 [37:58<01:11,  2.16s/it]

Error extracting text from http://www.livetradingnews.com/iran-to-privatize-state-assets-119136.htm: 404 Client Error: Not Found for url: https://www.livetradingnews.com/iran-to-privatize-state-assets-119136.htm


Processing URLs:  97%|█████████▋| 969/1000 [38:01<00:57,  1.87s/it]

Error extracting text from https://reut.rs/3CFEEzD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/healthcare-pharmaceuticals/biden-says-us-intelligence-community-divided-covid-origin-2021-05-26/


Processing URLs:  97%|█████████▋| 973/1000 [38:07<00:36,  1.34s/it]

Error extracting text from https://www.nytimes.com/2020/06/19/world/asia/afghanistan-us-troop-withdrawal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/06/19/world/asia/afghanistan-us-troop-withdrawal.html


Processing URLs:  97%|█████████▋| 974/1000 [38:18<01:55,  4.44s/it]

URL filtered: https://gizmodo.com/facebook-finally-rolls-out-disputed-news-tag-everyone-w-1792959827
Error extracting text from https://www.reuters.com/article/us-germany-politics/merkels-allies-further-defying-spd-seek-cuts-to-tax-and-asylum-seeker-benefits-idUSKBN1EQ0YR?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkels-allies-further-defying-spd-seek-cuts-to-tax-and-asylum-seeker-benefits-idUSKBN1EQ0YR?il=0


Processing URLs:  98%|█████████▊| 977/1000 [38:20<00:50,  2.21s/it]

Error extracting text from https://www.strifeblog.org/2020/11/27/daniel-ortega-weapons-of-covid-19-destruction/: 503 Server Error: Service Unavailable for url: https://www.strifeblog.org/2020/11/27/daniel-ortega-weapons-of-covid-19-destruction/
Error extracting text from https://www.reuters.com/technology/us-china-hawks-seek-cut-sales-chip-making-tools-beijing-2021-04-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/technology/us-china-hawks-seek-cut-sales-chip-making-tools-beijing-2021-04-15/


Processing URLs:  98%|█████████▊| 979/1000 [38:21<00:33,  1.60s/it]

Error extracting text from http://baotintuc.vn/the-gioi/montenegro-gia-nhap-nato-nam-2017-20160219070037038.htm: HTTPSConnectionPool(host='baotintuc.vn', port=443): Max retries exceeded with url: /the-gioi/montenegro-gia-nhap-nato-nam-2017-20160219070037038.htm (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  98%|█████████▊| 980/1000 [39:21<04:25, 13.29s/it]

Error extracting text from http://www.sacbee.com/news/politics-government/article142585424.html: HTTPConnectionPool(host='www.sacbee.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  98%|█████████▊| 984/1000 [39:32<01:32,  5.75s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-arms-idUSKBN14G12Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-arms-idUSKBN14G12Y


Processing URLs:  99%|█████████▊| 987/1000 [39:35<00:34,  2.63s/it]

Error extracting text from http://www.wsj.com/articles/nato-linked-websites-go-down-cyberattack-suspected-1468001918: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/nato-linked-websites-go-down-cyberattack-suspected-1468001918


Processing URLs:  99%|█████████▉| 990/1000 [39:40<00:19,  1.94s/it]

Error extracting text from http://nationalinterest.org/feature/chinas-east-china-sea-adiz-gamble-past-present-south-china-13150?page=4: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/chinas-east-china-sea-adiz-gamble-past-present-south-china-13150?page=4


Processing URLs:  99%|█████████▉| 991/1000 [39:40<00:13,  1.45s/it]

Error extracting text from https://www.oecd.org/tax/beps/: 403 Client Error: Forbidden for url: https://www.oecd.org/tax/beps/


Processing URLs:  99%|█████████▉| 992/1000 [39:58<00:49,  6.24s/it]

Error extracting text from https://www.investopedia.com/articles/03/122203.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/03/122203.asp


Processing URLs:  99%|█████████▉| 994/1000 [39:58<00:19,  3.20s/it]

Error extracting text from http://splash247.com/panamas-president-expects-may-completion-of-canal-expansion/: 403 Client Error: Forbidden for url: https://splash247.com/panamas-president-expects-may-completion-of-canal-expansion/


Processing URLs: 100%|█████████▉| 996/1000 [40:04<00:11,  2.86s/it]

Error extracting text from https://www.amazon.com/Vory-Russias-Super-Mafia/dp/0300243200/ref=sr_1_1: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Vory-Russias-Super-Mafia/dp/0300243200/ref=sr_1_1


Processing URLs: 100%|██████████| 1000/1000 [40:13<00:00,  2.41s/it]
Processing URLs:   0%|          | 2/1000 [00:02<14:02,  1.18it/s]

Error extracting text from https://www.whitehouse.gov/the-press-office/2016/03/16/executive-order-blocking-property-government-north-korea-and-workershttps://www.gjopen.com/questions/97-before-the-end-of-2016-will-a-north-american-country-the-eu-or-an-eu-member-state-impose-sanctions-on-another-country-in-response-to-a-cyber-attack-or-cyber-espionage#comments_exclude_forecast_info: 404 Client Error: Not Found for url: https://obamawhitehouse.archives.gov/the-press-office/2016/03/16/executive-order-blocking-property-government-north-korea-and-workershttps://www.gjopen.com/questions/97-before-the-end-of-2016-will-a-north-american-country-the-eu-or-an-eu-member-state-impose-sanctions-on-another-country-in-response-to-a-cyber-attack-or-cyber-espionage#comments_exclude_forecast_info
Error extracting text from http://www.reuters.com/article/us-un-israel-report-hezbollah-idUSKBN16P0LA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-israel-report-hezbollah-idUS

Processing URLs:   1%|          | 6/1000 [00:06<18:17,  1.10s/it]

URL filtered: https://twitter.com/florian_krammer/status/1397888405149732865


Processing URLs:   1%|▏         | 13/1000 [01:44<6:42:59, 24.50s/it]

Error extracting text from http://collection.lib.tpu.ru/list/f/?query=cuba.authorityAuthorCode%3D%22RU%5CTPU%5Cpers%5C32542%22: HTTPConnectionPool(host='collection.lib.tpu.ru', port=80): Max retries exceeded with url: /list/f/?query=cuba.authorityAuthorCode%3D%22RU%5CTPU%5Cpers%5C32542%22 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x300a5e4b0>, 'Connection to collection.lib.tpu.ru timed out. (connect timeout=60)'))
Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:d944a858ce0040bbb5dfbf82566f36e4: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:d944a858ce0040bbb5dfbf82566f36e4 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300a5e000>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   2%|▏         | 23/1000 [01:55<30:19,  1.86s/it]  

Error extracting text from https://www.nytimes.com/2017/09/23/world/asia/afghanistan-taliban-oruzgan-hospitals.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/09/23/world/asia/afghanistan-taliban-oruzgan-hospitals.html?_r=0


Processing URLs:   2%|▎         | 25/1000 [01:57<25:14,  1.55s/it]

URL filtered: http://www.bloomberg.com/quote/USDTRY:CUR
URL filtered: https://twitter.com/BahmanKalbasi/status/688362184208863232


Processing URLs:   3%|▎         | 29/1000 [01:59<13:47,  1.17it/s]

Error extracting text from http://www.pravdareport.com/video/10-02-2016/133294-chechen-0/: 404 Client Error: Not Found for url: https://www.pravda.ru/video/10-02-2016/133294-chechen-0/


Processing URLs:   3%|▎         | 30/1000 [01:59<11:30,  1.41it/s]

Error extracting text from https://www.nytimes.com/2021/08/25/world/europe/navalny-interview-excerpts.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/25/world/europe/navalny-interview-excerpts.html


Processing URLs:   4%|▎         | 35/1000 [02:08<27:02,  1.68s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-07-23/zuma-to-survive-no-confidence-vote-top-opposition-lawmaker-says


Processing URLs:   4%|▍         | 41/1000 [02:17<24:07,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/u-s-move-to-expand-nuclear-sanctions-on-north-korea-angers-china-1454712713: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-move-to-expand-nuclear-sanctions-on-north-korea-angers-china-1454712713


Processing URLs:   4%|▍         | 42/1000 [02:18<20:57,  1.31s/it]

Error extracting text from http://capx.co/parliamentary-rules-make-june-referendum-ambitious/: 403 Client Error: Forbidden for url: http://capx.co/parliamentary-rules-make-june-referendum-ambitious/


Processing URLs:   4%|▍         | 45/1000 [02:28<39:35,  2.49s/it]

Error extracting text from http://38north.org/2015/09/sohae090315: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:   5%|▍         | 48/1000 [02:30<21:25,  1.35s/it]

Error extracting text from https://www.reuters.com/article/us-portugal-websummit-europol/fast-growing-cyber-crime-threatens-financial-sector-europol-idUSKBN1D82QS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-portugal-websummit-europol/fast-growing-cyber-crime-threatens-financial-sector-europol-idUSKBN1D82QS


Processing URLs:   5%|▍         | 49/1000 [02:31<20:40,  1.30s/it]

URL filtered: https://www.youtube.com/watch?v=lWA2pjMjpBs


Processing URLs:   5%|▌         | 53/1000 [02:38<25:48,  1.63s/it]

Error extracting text from http://www.newsweek.com/its-time-deal-north-korea-411029: 403 Client Error: Forbidden for url: https://www.newsweek.com/its-time-deal-north-korea-411029


Processing URLs:   6%|▌         | 55/1000 [02:39<16:51,  1.07s/it]

Error extracting text from https://www.mofa.go.jp/ecm/ep/page23e_000337.html: 403 Client Error: Forbidden for url: https://www.mofa.go.jp/ecm/ep/page23e_000337.html


Processing URLs:   6%|▌         | 56/1000 [02:41<19:33,  1.24s/it]

URL filtered: https://twitter.com/LovedayM/status/697707171228745729


Processing URLs:   6%|▌         | 58/1000 [02:41<12:32,  1.25it/s]

Error extracting text from http://blogs.reuters.com/great-debate/2016/02/24/the-many-ways-senate-republicans-can-block-obamas-supreme-court-nominee/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2016/02/24/the-many-ways-senate-republicans-can-block-obamas-supreme-court-nominee/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f12b0>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   6%|▌         | 60/1000 [02:43<12:59,  1.21it/s]

Error extracting text from https://www.lloyds.com/news-and-insight/risk-insight/library/natural-environment/solar-storm: HTTPSConnectionPool(host='www.lloyds.com', port=443): Max retries exceeded with url: /news-and-insight/risk-insight/library/natural-environment/solar-storm (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
URL filtered: https://about.fb.com/news/2021/06/facebook-response-to-oversight-board-recommendations-trump/


Processing URLs:   6%|▋         | 63/1000 [02:49<25:51,  1.66s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-05/this-company-has-built-a-profile-on-every-american-adult


Processing URLs:   6%|▋         | 65/1000 [02:51<21:09,  1.36s/it]

Error extracting text from http://www.cnbc.com/2017/03/27/us-senate-panel-to-question-trump-son-in-law-on-russians.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2017/03/27/us-senate-panel-to-question-trump-son-in-law-on-russians.html


Processing URLs:   7%|▋         | 66/1000 [02:51<17:25,  1.12s/it]

Error extracting text from https://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/06/08/world/middleeast/defiant-assad-vows-to-retake-every-inch-of-syria-from-his-foes.html?_r=0


Processing URLs:   7%|▋         | 72/1000 [03:04<30:07,  1.95s/it]

Error extracting text from http://press.ihs.com/press-release/aerospace-defense-security/islamic-states-caliphate-shrinks-14-percent-2015: 403 Client Error: Forbidden for url: https://investor.spglobal.com/news-releases/default.aspx


Processing URLs:   8%|▊         | 77/1000 [03:08<17:25,  1.13s/it]

Error extracting text from https://www.amny.com/coronavirus/governor-cuomo-allows-nyc-restaurants-to-stay-open-until-midnight-starting-april-19/: 403 Client Error: Forbidden for url: https://www.amny.com/coronavirus/governor-cuomo-allows-nyc-restaurants-to-stay-open-until-midnight-starting-april-19/


Processing URLs:   8%|▊         | 81/1000 [03:24<38:45,  2.53s/it]

Error extracting text from https://www.timesofisrael.com/polls-show-support-for-netanyahu-may-be-rising-after-police-allegations/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/polls-show-support-for-netanyahu-may-be-rising-after-police-allegations/


Processing URLs:   8%|▊         | 82/1000 [03:24<28:58,  1.89s/it]

Error extracting text from http://www.businessinsider.com.au/media-society-panel-event-polling-eu-referendum-brexit-statistics-2016-5?r=UK&amp;IR=T: 404 Client Error: Not Found for url: http://www.businessinsider.com.au/media-society-panel-event-polling-eu-referendum-brexit-statistics-2016-5?r=UK&amp;IR=T


Processing URLs:   8%|▊         | 84/1000 [03:26<19:15,  1.26s/it]

Error extracting text from https://www.nytimes.com/2017/08/28/us/politics/trump-tower-putin-felix-sater.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/28/us/politics/trump-tower-putin-felix-sater.html


Processing URLs:   8%|▊         | 85/1000 [03:26<14:31,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/turkey-says-russian-fighter-jet-violated-its-airspace-with-syria-1444040488: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/turkey-says-russian-fighter-jet-violated-its-airspace-with-syria-1444040488


Processing URLs:   9%|▉         | 91/1000 [03:37<28:53,  1.91s/it]

Error extracting text from http://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/ab385337-2d62-493e-b695-f4350bd8f4d1.pdf: 404 Client Error: Not Found for url: https://www.monmouth.edu/assets/0/32212254770/32212254991/32212254992/32212254994/32212254995/30064771087/ab385337-2d62-493e-b695-f4350bd8f4d1.pdf


Processing URLs:   9%|▉         | 93/1000 [03:39<22:12,  1.47s/it]

Error extracting text from http://www.latimes.com/politics/la-na-pol-polling-differences-20160809-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/politics/la-na-pol-polling-differences-20160809-snap-story.html


Processing URLs:  10%|▉         | 98/1000 [03:46<17:16,  1.15s/it]

Error extracting text from https://www.bbhub.io/bnef/sites/4/2016/10/BNEF_McKinsey_The-Future-of-Mobility_11-10-16.pdf: 404 Client Error: Not Found for url: https://www.bbhub.io/bnef/sites/4/2016/10/BNEF_McKinsey_The-Future-of-Mobility_11-10-16.pdf


Processing URLs:  10%|█         | 100/1000 [03:47<13:30,  1.11it/s]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20140113/1605112305.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20140113/1605112305.html
Error extracting text from https://www.reuters.com/article/us-israel-netanyahu-protests/thousands-march-in-tel-aviv-to-protest-against-netanyahu-corruption-idUSKBN1E30RW?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-netanyahu-protests/thousands-march-in-tel-aviv-to-protest-against-netanyahu-corruption-idUSKBN1E30RW?il=0


Processing URLs:  11%|█         | 108/1000 [04:15<27:18,  1.84s/it]  

Error extracting text from https://www.reuters.com/article/us-italy-politics/italys-renzi-would-like-ex-ecbs-draghi-to-head-italy-government-source-idUSKBN2A00B0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-italy-politics/italys-renzi-would-like-ex-ecbs-draghi-to-head-italy-government-source-idUSKBN2A00B0
Error extracting text from https://www.hg.org/article.asp?id=7879: 403 Client Error: Forbidden for url: https://www.hg.org/article.asp?id=7879


Processing URLs:  11%|█         | 111/1000 [04:19<24:33,  1.66s/it]

Error extracting text from https://www.cia.gov/library/publications/the-world-factbook/: 404 Client Error: Not Found for url: https://www.cia.gov/library/publications/the-world-factbook/


Processing URLs:  11%|█▏        | 113/1000 [04:24<28:58,  1.96s/it]

Error extracting text from http://www.nytimes.com/2015/11/05/business/economy/fed-yellen-congress-interest-rates.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/05/business/economy/fed-yellen-congress-interest-rates.html


Processing URLs:  12%|█▏        | 115/1000 [04:25<17:31,  1.19s/it]

Error extracting text from http://www.nti.org/analysis/tools/table/133/: 403 Client Error: Forbidden for url: https://www.nti.org/analysis/tools/table/133/


Processing URLs:  12%|█▏        | 117/1000 [04:26<12:51,  1.14it/s]

Error extracting text from http://www.nytimes.com/2015/11/02/world/asia/china-japan-and-south-korea-conduct-first-trilateral-meeting-in-3-years.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/02/world/asia/china-japan-and-south-korea-conduct-first-trilateral-meeting-in-3-years.html


Processing URLs:  12%|█▏        | 121/1000 [04:33<19:41,  1.34s/it]

Error extracting text from http://thehill.com/blogs/floor-action/senate/253818-dems-suggest-tying-ex-im-bank-to-spending-bill: 403 Client Error: Forbidden for url: https://thehill.com/blogs/floor-action/senate/253818-dems-suggest-tying-ex-im-bank-to-spending-bill/


Processing URLs:  12%|█▏        | 123/1000 [04:37<21:09,  1.45s/it]

URL filtered: https://twitter.com/KyivPost/status/1497533776364285953
URL filtered: https://www.bloomberg.com/politics/articles/2017-01-31/slow-pace-of-obamacare-repeal-leaves-house-conservatives-fuming?bpolANews=true


Processing URLs:  13%|█▎        | 126/1000 [04:37<10:17,  1.41it/s]

Error extracting text from http://www.scotsman.com/news/uk/theresa-may-preparing-for-indyref2-call-once-brexit-starts-1-4377532: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/uk/theresa-may-preparing-for-indyref2-call-once-brexit-starts-1-4377532


Processing URLs:  13%|█▎        | 132/1000 [05:58<4:32:25, 18.83s/it]

Error extracting text from http://en.kremlin.ru/catalog/persons/356/events: HTTPConnectionPool(host='en.kremlin.ru', port=80): Read timed out. (read timeout=60)


Processing URLs:  13%|█▎        | 133/1000 [05:59<3:18:15, 13.72s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-02-23/time-inc-said-to-be-interested-in-joining-fray-of-yahoo-suitors


Processing URLs:  14%|█▎        | 135/1000 [06:00<1:53:14,  7.85s/it]

Error extracting text from https://reut.rs/3pbXOow: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-china/india-china-agree-to-pull-back-troops-from-disputed-himalayan-lake-idUSKBN2AB0EX?il=0


Processing URLs:  14%|█▍        | 139/1000 [06:18<1:11:25,  4.98s/it]

Error extracting text from http://38north.org/2015/12/punggye123015/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  15%|█▍        | 146/1000 [06:28<18:16,  1.28s/it]  

Error extracting text from https://www.wionews.com/world/myanmar-did-not-make-a-request-to-speak-unga-president-abdulla-shahid-clarifies-to-wion-416617: 403 Client Error: Forbidden for url: https://www.wionews.com/world/myanmar-did-not-make-a-request-to-speak-unga-president-abdulla-shahid-clarifies-to-wion-416617


Processing URLs:  15%|█▍        | 149/1000 [06:34<26:43,  1.88s/it]

Error extracting text from https://www.21stcenturysciencetech.com/Articles_2012/Spring-Summer_2012/04_Biospere_Noosphere.pdf: 406 Client Error: Not Acceptable for url: https://www.21stcenturysciencetech.com/Articles_2012/Spring-Summer_2012/04_Biospere_Noosphere.pdf


Processing URLs:  15%|█▌        | 152/1000 [06:47<48:01,  3.40s/it]  

Error extracting text from http://www.reuters.com/article/us-britain-eu-usa-congress-idUSKCN0ZE1M7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-usa-congress-idUSKCN0ZE1M7


Processing URLs:  16%|█▌        | 156/1000 [06:49<18:32,  1.32s/it]

Error extracting text from http://www.reuters.com/article/us-france-submarines-india-minister-idUSKCN10Z0CC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-france-submarines-india-minister-idUSKCN10Z0CC?il=0


Processing URLs:  16%|█▌        | 157/1000 [06:51<19:35,  1.39s/it]

Error extracting text from http://www.stripes.com/news/b-52-b-1-b-2-bombers-share-tarmac-on-guam-for-first-time-1.423418: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/b-52-b-1-b-2-bombers-share-tarmac-on-guam-for-first-time-1.423418
Error extracting text from http://news.usni.org/2016/01/27/pacom-co-harris-more-u-s-south-china-sea-freedom-of-navigation-missions-are-coming: 403 Client Error: Forbidden for url: http://news.usni.org/2016/01/27/pacom-co-harris-more-u-s-south-china-sea-freedom-of-navigation-missions-are-coming


Processing URLs:  16%|█▌        | 160/1000 [06:54<15:19,  1.09s/it]

Error extracting text from https://www.breakingtravelnews.com/news/article/expo-2020-welcomes-millions-of-guest-in-first-six-weeks/: 403 Client Error: Forbidden for url: https://www.breakingtravelnews.com/news/article/expo-2020-welcomes-millions-of-guest-in-first-six-weeks/


Processing URLs:  16%|█▋        | 164/1000 [06:58<14:02,  1.01s/it]

Error extracting text from http://www.leginfo.ca.gov/.html/wat_table_of_contents.html: 404 Client Error: Not found for url: http://www.leginfo.ca.gov/.html/wat_table_of_contents.html
Error extracting text from http://www.nytimes.com/2015/12/22/world/asia/after-victory-in-myanmar-aung-san-suu-kyi-quietly-shapes-a-transition.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/22/world/asia/after-victory-in-myanmar-aung-san-suu-kyi-quietly-shapes-a-transition.html?_r=0


Processing URLs:  17%|█▋        | 166/1000 [07:59<2:58:14, 12.82s/it]

Error extracting text from https://www.usnews.com/news/world/articles/2022-03-27/houthis-say-prisoner-swap-deal-includes-16-saudis-brother-of-yemen-president: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-ukraine-crisis-malware-idUSKBN0UE0ZZ20151231: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ukraine-crisis-malware-idUSKBN0UE0ZZ20151231


Processing URLs:  17%|█▋        | 167/1000 [09:00<6:17:29, 27.19s/it]

Error extracting text from http://www.andina.com.pe/Ingles/noticia-peru-fujimori-ahead-of-kuczynski-according-to-ipsos-614688.aspx: HTTPConnectionPool(host='www.andina.com.pe', port=80): Max retries exceeded with url: /Ingles/noticia-peru-fujimori-ahead-of-kuczynski-according-to-ipsos-614688.aspx (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303008650>, 'Connection to www.andina.com.pe timed out. (connect timeout=60)'))


Processing URLs:  17%|█▋        | 168/1000 [10:00<8:31:43, 36.90s/it]

Error extracting text from http://tech.firstpost.com/news-analysis/apple-iphone-sales-to-crash-in-2016-says-analyst-but-apple-could-easily-prove-it-wrong-293516.html: HTTPConnectionPool(host='tech.firstpost.com', port=80): Max retries exceeded with url: /news-analysis/apple-iphone-sales-to-crash-in-2016-says-analyst-but-apple-could-easily-prove-it-wrong-293516.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x303009700>, 'Connection to tech.firstpost.com timed out. (connect timeout=60)'))


Processing URLs:  17%|█▋        | 171/1000 [10:04<3:06:22, 13.49s/it]

Error extracting text from http://bit.ly/1WRvOGI: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2016/Jun-17/357547-iraqi-pm-says-to-declare-victory-in-falluja-after-rapid-advances.ashx
Error extracting text from http://www.thecarconnection.com/compare/toyota_mirai_2016_choices: 403 Client Error: Forbidden for url: http://www.thecarconnection.com/compare/toyota_mirai_2016_choices


Processing URLs:  17%|█▋        | 174/1000 [10:10<1:21:31,  5.92s/it]

Error extracting text from http://www.reuters.com/article/us-north-dakota-pipeline-burgum-idUSKBN14X16L?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-north-dakota-pipeline-burgum-idUSKBN14X16L?il=0


Processing URLs:  18%|█▊        | 177/1000 [10:22<1:02:09,  4.53s/it]

Error extracting text from https://www.fremont.gov/2295/SunShares-Program: 503 Server Error: Service Unavailable for url: https://www.fremont.gov/2295/SunShares-Program


Processing URLs:  18%|█▊        | 178/1000 [10:22<45:19,  3.31s/it]  

Error extracting text from http://www.caracaschronicles.com/mapa/index.php?situacionPais=12: 403 Client Error: Forbidden for url: http://www.caracaschronicles.com/mapa/index.php?situacionPais=12


Processing URLs:  18%|█▊        | 180/1000 [10:23<25:14,  1.85s/it]

Error extracting text from http://news.yahoo.com/cameron-meet-juncker-brexit-talks-close-deal-031359358.html: 404 Client Error: Not Found for url: http://news.yahoo.com/cameron-meet-juncker-brexit-talks-close-deal-031359358.html


Processing URLs:  18%|█▊        | 181/1000 [10:24<23:01,  1.69s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKBN0MD0GR20150317: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKBN0MD0GR20150317


Processing URLs:  18%|█▊        | 184/1000 [10:27<13:23,  1.02it/s]

Error extracting text from http://www.chron.com/news/article/Most-New-Hampshire-Democratic-superdelegates-back-6629512.php: 403 Client Error: Forbidden for url: https://www.chron.com/news/article/Most-New-Hampshire-Democratic-superdelegates-back-6629512.php
Error extracting text from http://www.nytimes.com/2015/11/18/us/politics/ben-carson-is-struggling-to-grasp-foreign-policy-advisers-say.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/18/us/politics/ben-carson-is-struggling-to-grasp-foreign-policy-advisers-say.html


Processing URLs:  19%|█▊        | 187/1000 [10:30<13:14,  1.02it/s]

Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN1640PW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN1640PW
URL filtered: http://www.bloomberg.com/news/articles/2015-10-15/impeaching-a-brazilian-president-is-complicated-a-quick-guide


Processing URLs:  19%|█▉        | 193/1000 [10:37<13:42,  1.02s/it]

Error extracting text from https://balkaninsight.com/2021/03/29/north-macedonia-postpones-census-over-vaccination-delays/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/03/29/north-macedonia-postpones-census-over-vaccination-delays/


Processing URLs:  20%|█▉        | 195/1000 [10:46<33:02,  2.46s/it]

Error extracting text from https://www.nytimes.com/2017/07/06/science/cern-quarks-charm-baryon.html?emc=edit_mbe_20170707&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;src=twr&amp;te=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/06/science/cern-quarks-charm-baryon.html?emc=edit_mbe_20170707&amp;nl=morning-briefing-europe&amp;nlid=77825025&amp;src=twr&amp;te=1


Processing URLs:  20%|█▉        | 196/1000 [10:47<26:01,  1.94s/it]

Error extracting text from https://phys.org/news/2021-05-boeing-starliner-capsule-aiming-july.html: 400 Client Error: Bad request for url: https://phys.org/news/2021-05-boeing-starliner-capsule-aiming-july.html


Processing URLs:  20%|█▉        | 197/1000 [10:48<23:05,  1.73s/it]

Error extracting text from https://www.lesswrong.com/posts/krgNxiooRfnP9L4ZD/follow-up-to-petrov-day-2019: 403 Client Error: Forbidden for url: https://www.lesswrong.com/posts/krgNxiooRfnP9L4ZD/follow-up-to-petrov-day-2019


Processing URLs:  20%|██        | 200/1000 [10:50<13:02,  1.02it/s]

Error extracting text from http://www.who.int/mediacentre/news/statements/2015/ihr-ec-poliovirus/en/: 404 Client Error: Not Found for url: https://www.who.int/mediacentre/news/statements/2015/ihr-ec-poliovirus/en/
Error extracting text from http://www.abc.es/espana/abci-rareza-abstencion-pp-y-psoe-nunca-facilitaron-investiduras-rivales-201607300252_noticia.html: 403 Client Error: Forbidden for url: http://www.abc.es/espana/abci-rareza-abstencion-pp-y-psoe-nunca-facilitaron-investiduras-rivales-201607300252_noticia.html
Error extracting text from http://www.energy.vffunds.com/power_generation.html: HTTPConnectionPool(host='www.energy.vffunds.com', port=80): Max retries exceeded with url: /power_generation.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3011f1a60>: Failed to resolve 'www.energy.vffunds.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  20%|██        | 203/1000 [10:54<13:51,  1.04s/it]

Error extracting text from http://www.businessinsider.com/ap-with-little-room-to-maneuver-syrias-rebels-head-for-talks-2017-1: 404 Client Error: Not Found for url: https://www.businessinsider.com/ap-with-little-room-to-maneuver-syrias-rebels-head-for-talks-2017-1


Processing URLs:  21%|██        | 207/1000 [10:58<13:25,  1.02s/it]

Error extracting text from https://www.reuters.com/world/africa/eu-military-mission-mozambique-be-approved-next-month-2021-06-23/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/africa/eu-military-mission-mozambique-be-approved-next-month-2021-06-23/


Processing URLs:  21%|██        | 209/1000 [11:02<19:10,  1.45s/it]

Error extracting text from http://www.abqjournal.com/657113/news/vote-on-the-oil-export-ban-today.html: 404 Client Error: Not Found for url: https://www.abqjournal.com/657113/news/vote-on-the-oil-export-ban-today.html


Processing URLs:  22%|██▏       | 216/1000 [11:15<22:29,  1.72s/it]

Error extracting text from http://tass.ru/en/defense/848350: 404 Client Error: Not Found for url: https://tass.ru/en/defense/848350
URL filtered: https://twitter.com/marsadiraq/status/799594238698786817


Processing URLs:  22%|██▏       | 219/1000 [11:26<34:02,  2.62s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16S244: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-scotland-idUSKBN16S244
URL filtered: https://twitter.com/GertonvdAkker/status/761644882859614208/photo/1


Processing URLs:  22%|██▏       | 222/1000 [11:31<26:35,  2.05s/it]

Error extracting text from http://www.wsj.com/articles/shell-meets-investigators-about-role-in-nigeria-deal-1459337791: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/shell-meets-investigators-about-role-in-nigeria-deal-1459337791
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=da&amp;u=http://politiken.dk/indland/ECE2953997/df-flygtninge-skal-aflevere-dyre-armbaandsure-ved-graensen/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=da&amp;u=http://politiken.dk/indland/ECE2953997/df-flygtninge-skal-aflevere-dyre-armbaandsure-ved-graensen/&amp;prev=search


Processing URLs:  22%|██▏       | 224/1000 [11:31<16:35,  1.28s/it]

Error extracting text from http://www.reuters.com/article/us-china-corruption-idUSKCN12006D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-corruption-idUSKCN12006D


Processing URLs:  23%|██▎       | 231/1000 [12:01<35:22,  2.76s/it]  

Error extracting text from http://www.wsj.com/articles/u-s-backed-force-takes-key-syrian-city-from-islamic-state-1471037799: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-backed-force-takes-key-syrian-city-from-islamic-state-1471037799


Processing URLs:  23%|██▎       | 233/1000 [12:05<30:10,  2.36s/it]

Error extracting text from http://russiapedia.rt.com/on-this-day/december-5/: 404 Client Error: Not Found for url: https://russiapedia.rt.com/on-this-day/december-5/


Processing URLs:  23%|██▎       | 234/1000 [12:07<29:00,  2.27s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/latest-refugee-chief-fears-eu-turkey-deal-43983726: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/latest-refugee-chief-fears-eu-turkey-deal-43983726


Processing URLs:  24%|██▎       | 235/1000 [12:08<25:31,  2.00s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-02/treasuries-gain-after-jobs-figures-cast-doubt-on-fed-increase


Processing URLs:  24%|██▍       | 241/1000 [12:16<19:43,  1.56s/it]

Error extracting text from http://thehill.com/homenews/news/340918-russia-steps-up-spying-efforts-after-election-report: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/340918-russia-steps-up-spying-efforts-after-election-report/


Processing URLs:  24%|██▍       | 242/1000 [12:18<18:16,  1.45s/it]

Error extracting text from http://scottwalkerwatch.com/koch-brothers/walkers-punked-phone-call/: 404 Client Error: Not Found for url: http://scottwalkerwatch.com/koch-brothers/walkers-punked-phone-call/


Processing URLs:  24%|██▍       | 245/1000 [12:23<22:51,  1.82s/it]

Error extracting text from http://38north.org/2015/11/punggye110615/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  25%|██▍       | 246/1000 [12:26<27:07,  2.16s/it]

Error extracting text from http://www.petra.gov.jo/Public_News/Nws_NewsDetails.aspx?Site_Id=1&amp;lang=2&amp;NewsID=265643&amp;CatID=13&amp;Type=Home&amp;GType=1: 404 Client Error:  for url: https://www.petra.gov.jo/Public_News/Nws_NewsDetails.aspx?Site_Id=1&amp;lang=2&amp;NewsID=265643&amp;CatID=13&amp;Type=Home&amp;GType=1


Processing URLs:  25%|██▌       | 250/1000 [12:32<23:21,  1.87s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://aluizioamorim.blogspot.com/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://aluizioamorim.blogspot.com/&amp;prev=search


Processing URLs:  25%|██▌       | 253/1000 [12:35<16:20,  1.31s/it]

Error extracting text from http://thehill.com/policy/energy-environment/262874-deal-to-end-oil-export-ban-in-sight: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/262874-deal-to-end-oil-export-ban-in-sight/


Processing URLs:  25%|██▌       | 254/1000 [12:36<15:12,  1.22s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/business/global-sovereign-debt-to-hit-new-all-time-high-sp/articleshow/57337690.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/news/international/business/global-sovereign-debt-to-hit-new-all-time-high-sp/articleshow/57337690.cms


Processing URLs:  26%|██▌       | 255/1000 [13:37<3:35:18, 17.34s/it]

Error extracting text from http://archive.is/20120721195102/http://www.af.mil/news/story.asp?storyID=123012699: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  26%|██▌       | 262/1000 [13:46<37:31,  3.05s/it]  

Error extracting text from http://www.reuters.com/article/us-venezuela-citigroup-idUSKCN1162L6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-citigroup-idUSKCN1162L6


Processing URLs:  26%|██▋       | 263/1000 [13:46<27:41,  2.25s/it]

Error extracting text from http://english.aawsat.com/2016/08/article55355828/war-fatalities-expose-iranian-army-flow-syria: 403 Client Error: Forbidden for url: http://english.aawsat.com/2016/08/article55355828/war-fatalities-expose-iranian-army-flow-syria


Processing URLs:  27%|██▋       | 267/1000 [13:57<28:07,  2.30s/it]

Error extracting text from http://www.nytimes.com/2015/03/03/world/middleeast/iraq-tikrit-isis.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/03/03/world/middleeast/iraq-tikrit-isis.html?_r=0


Processing URLs:  27%|██▋       | 269/1000 [14:00<22:45,  1.87s/it]

Error extracting text from https://academic.oup.com/mnras/article-lookup/doi/10.1093/mnras/20.1.13: 403 Client Error: Forbidden for url: https://academic.oup.com/mnras/article-lookup/doi/10.1093/mnras/20.1.13


Processing URLs:  27%|██▋       | 271/1000 [14:02<17:06,  1.41s/it]

Error extracting text from https://www.nytimes.com/2017/07/27/world/asia/north-korea-hacking-cybersecurity.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/27/world/asia/north-korea-hacking-cybersecurity.html


Processing URLs:  28%|██▊       | 275/1000 [14:06<15:01,  1.24s/it]

Error extracting text from http://www.cnbc.com/2015/09/01/reuters-america-update-1-china-signs-off-on-5-bln-loan-to-boost-venezuela-oil-output-maduro.html: 404 Client Error: Not Found for url: https://www.cnbc.com/2015/09/01/reuters-america-update-1-china-signs-off-on-5-bln-loan-to-boost-venezuela-oil-output-maduro.html


Processing URLs:  28%|██▊       | 276/1000 [14:08<15:17,  1.27s/it]

Error extracting text from https://arcaspace.com/en/LAS.htm: 404 Client Error: Not Found for url: https://www.arcaspace.com/en/LAS.htm


Processing URLs:  28%|██▊       | 279/1000 [14:23<34:36,  2.88s/it]

Error extracting text from http://www.france24.com/en/20160217-france-city-london-brexit-banks-financial-regulation: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160217-france-city-london-brexit-banks-financial-regulation


Processing URLs:  28%|██▊       | 280/1000 [14:25<31:34,  2.63s/it]

Error extracting text from https://ec.europa.eu/neighbourhood-enlargement/countries/check-current-status_en: 403 Client Error: Forbidden for url: https://neighbourhood-enlargement.ec.europa.eu/enlargement-policy/negotiations-status_en


Processing URLs:  28%|██▊       | 282/1000 [14:27<20:52,  1.74s/it]

Error extracting text from http://www.balkaninsight.com/en/article/montenegro-to-probe-police-actions-against-protesters-10-20-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-to-probe-police-actions-against-protesters-10-20-2015


Processing URLs:  29%|██▊       | 286/1000 [14:32<15:11,  1.28s/it]

Error extracting text from https://www.reuters.com/article/us-wesfarmers-divestiture-coles/australias-wesfarmers-in-no-hurry-for-deals-after-coles-spinoff-idUSKCN1MF08Y: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-wesfarmers-divestiture-coles/australias-wesfarmers-in-no-hurry-for-deals-after-coles-spinoff-idUSKCN1MF08Y
Error extracting text from https://www.reuters.com/business/aerospace-defense/american-airlines-reports-smaller-loss-pickup-travel-offsets-omicron-blip-2022-04-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/aerospace-defense/american-airlines-reports-smaller-loss-pickup-travel-offsets-omicron-blip-2022-04-21/


Processing URLs:  29%|██▉       | 290/1000 [14:37<14:41,  1.24s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-05/26/c_135389649.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-05/26/c_135389649.htm


Processing URLs:  29%|██▉       | 291/1000 [14:39<14:29,  1.23s/it]

Error extracting text from https://www.parliament.uk/about/how/elections-and-voting/general/: 403 Client Error: Forbidden for url: https://www.parliament.uk/about/how/elections-and-voting/general/
URL filtered: https://www.youtube.com/watch?v=4vuW6tQ0218


Processing URLs:  30%|██▉       | 295/1000 [14:44<16:57,  1.44s/it]

Error extracting text from http://www.torontosun.com/2013/07/13/canadas-time-for-action-on-arctic-sovereignty: 403 Client Error: Forbidden for url: https://torontosun.com/2013/07/13/canadas-time-for-action-on-arctic-sovereignty


Processing URLs:  30%|██▉       | 296/1000 [14:45<13:11,  1.12s/it]

Error extracting text from http://www.nti.org/analysis/articles/global-dialogue-faq-cyber-threat-nuclear-facilities/: 403 Client Error: Forbidden for url: https://www.nti.org/analysis/articles/global-dialogue-faq-cyber-threat-nuclear-facilities/


Processing URLs:  30%|███       | 303/1000 [14:58<19:58,  1.72s/it]

Error extracting text from http://uk.reuters.com/article/uk-afghanistan-minister-idUKKCN0XX0GA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  30%|███       | 304/1000 [15:00<19:31,  1.68s/it]

Error extracting text from http://warontherocks.com/2015/09/turkish-winter-is-coming/: 403 Client Error: Forbidden for url: http://warontherocks.com/2015/09/turkish-winter-is-coming/


Processing URLs:  30%|███       | 305/1000 [15:00<15:12,  1.31s/it]

Error extracting text from https://thehill.com/homenews/556587-sinema-defends-filibuster-sparking-progressive-fury: 403 Client Error: Forbidden for url: https://thehill.com/homenews/556587-sinema-defends-filibuster-sparking-progressive-fury/


Processing URLs:  31%|███       | 307/1000 [16:02<3:39:58, 19.05s/it]

Error extracting text from http://www.mcclatchydc.com/news/nation-world/world/article179799311.html: HTTPConnectionPool(host='www.mcclatchydc.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  31%|███       | 311/1000 [16:05<57:37,  5.02s/it]  

Error extracting text from http://www.timesofisrael.com/in-call-with-riyadh-trump-commits-to-rigorously-enforce-iran-deal/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/in-call-with-riyadh-trump-commits-to-rigorously-enforce-iran-deal/
Error extracting text from https://sg.news.yahoo.com/china-blames-philippines-stirring-trouble-dispute-041639582.html: 404 Client Error: Not Found for url: https://sg.news.yahoo.com/china-blames-philippines-stirring-trouble-dispute-041639582.html


Processing URLs:  32%|███▏      | 315/1000 [16:14<27:40,  2.42s/it]

Error extracting text from http://adage.com/article/media/image-ad-company-signs-partnership-time/301586/: 403 Client Error: Forbidden for url: https://adage.com/article/media/image-ad-company-signs-partnership-time/301586/


Processing URLs:  32%|███▏      | 319/1000 [16:19<16:50,  1.48s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-pullout-idUSKCN0WG23C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-pullout-idUSKCN0WG23C


Processing URLs:  32%|███▏      | 320/1000 [16:20<15:14,  1.34s/it]

Error extracting text from http://www.caam.org.cn/AutomotivesStatistics/20160318/1305187601.html: 404 Client Error: Not Found for url: http://www.caam.org.cn/AutomotivesStatistics/20160318/1305187601.html


Processing URLs:  32%|███▏      | 322/1000 [16:22<12:33,  1.11s/it]

Error extracting text from http://www.nytimes.com/2015/12/31/world/europe/russia-putin-turkey-sanctions.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/31/world/europe/russia-putin-turkey-sanctions.html


Processing URLs:  33%|███▎      | 326/1000 [16:25<07:35,  1.48it/s]

URL filtered: https://www.linkedin.com/company/deepmind/people/
Error extracting text from http://www.nytimes.com/2015/09/18/upshot/yellen-blinks-on-interest-rates.html?rref=homepage&amp;module=Ribbon&amp;version=origin&amp;region=Header&amp;action=click&amp;contentCollection=Home%20Page&amp;pgtype=Multimedia: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/18/upshot/yellen-blinks-on-interest-rates.html?rref=homepage&amp;module=Ribbon&amp;version=origin&amp;region=Header&amp;action=click&amp;contentCollection=Home%20Page&amp;pgtype=Multimedia


Processing URLs:  33%|███▎      | 328/1000 [16:26<07:40,  1.46it/s]

Error extracting text from http://www.reuters.com/article/us-nato-summit-afghanistan-idUSKCN0ZP0AC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nato-summit-afghanistan-idUSKCN0ZP0AC


Processing URLs:  33%|███▎      | 330/1000 [16:29<10:16,  1.09it/s]

Error extracting text from https://mizzima.com/article/ambassador-u-kyaw-moe-tun-speaks-mizzima: 403 Client Error: Forbidden for url: https://mizzima.com/article/ambassador-u-kyaw-moe-tun-speaks-mizzima


Processing URLs:  34%|███▎      | 335/1000 [16:36<14:18,  1.29s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53007#.VrToZtDAo2x: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53007#.VrToZtDAo2x
Error extracting text from https://www.reuters.com/article/us-refinery-operations-pdvsa-paraguana/top-venezuela-refineries-at-34-percent-of-capacity-union-documents-idUSKBN1CL2T0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-refinery-operations-pdvsa-paraguana/top-venezuela-refineries-at-34-percent-of-capacity-union-documents-idUSKBN1CL2T0


Processing URLs:  34%|███▍      | 338/1000 [16:40<15:29,  1.40s/it]

URL filtered: https://www.youtube.com/watch?v=LkqiDu1BQXY


Processing URLs:  34%|███▍      | 343/1000 [16:54<27:37,  2.52s/it]

URL filtered: https://www.bloomberg.com/graphics/2017-oil-rigs/


Processing URLs:  35%|███▍      | 348/1000 [17:00<16:56,  1.56s/it]

Error extracting text from https://www.afghanistan-analysts.org/afghanistan-has-now-a-constitutional-cabinet-eleven-minister-candidates-received-votes-of-confidence/: 403 Client Error: Forbidden for url: https://www.afghanistan-analysts.org/afghanistan-has-now-a-constitutional-cabinet-eleven-minister-candidates-received-votes-of-confidence/


Processing URLs:  35%|███▍      | 349/1000 [17:03<20:06,  1.85s/it]



Processing URLs:  35%|███▌      | 353/1000 [17:06<10:06,  1.07it/s]

Error extracting text from http://bit.ly/1X5xkVv: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/06/05/Tuesday-first-day-of-Ramadan-Saudi-Arabia.html
Error extracting text from https://thenewdaily.com.au/news/politics/australian-politics/2021/07/17/atkins-morrison-early-election/: 403 Client Error: Forbidden for url: https://thenewdaily.com.au/news/politics/australian-politics/2021/07/17/atkins-morrison-early-election/


Processing URLs:  36%|███▌      | 357/1000 [17:09<08:16,  1.30it/s]

Error extracting text from https://www.yahoo.com/news/defense-secretary-carter-makes-surprise-visit-afghanistan-054354459--politics.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/defense-secretary-carter-makes-surprise-visit-afghanistan-054354459--politics.html


Processing URLs:  36%|███▌      | 360/1000 [17:13<09:54,  1.08it/s]

Error extracting text from https://www.reuters.com/article/us-turkey-eu-parliament-idUSKBN19R194: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-eu-parliament-idUSKBN19R194


Processing URLs:  36%|███▋      | 365/1000 [17:17<07:35,  1.39it/s]

Error extracting text from https://www.nytimes.com/2018/07/06/world/asia/thai-cave-rescue-divers.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/07/06/world/asia/thai-cave-rescue-divers.html


Processing URLs:  37%|███▋      | 366/1000 [17:18<08:10,  1.29it/s]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55525#.WCgZrfkrKUk: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55525#.WCgZrfkrKUk


Processing URLs:  37%|███▋      | 370/1000 [17:40<53:02,  5.05s/it]

Error extracting text from https://www.manchin.senate.gov/newsroom/press-releases/manchin-reintroduces-legislation-to-simplify-student-loan-repayment-programs: 503 Server Error: Service Unavailable for url: https://www.manchin.senate.gov/newsroom/press-releases/manchin-reintroduces-legislation-to-simplify-student-loan-repayment-programs


Processing URLs:  38%|███▊      | 379/1000 [17:56<16:48,  1.62s/it]

Error extracting text from http://www.pocket-lint.com/news/139769-what-is-google-waymo-and-when-can-you-expect-the-first-cars: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


Processing URLs:  38%|███▊      | 380/1000 [18:57<3:18:16, 19.19s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-09-07/donald-trump-jr-heads-to-capitol-to-explain-2016-meeting: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  38%|███▊      | 381/1000 [18:58<2:23:49, 13.94s/it]

Error extracting text from https://jpt.spe.org/us-beats-out-russia-to-remain-no-1-in-lng-sales-to-europe: 403 Client Error: Forbidden for url: https://jpt.spe.org/us-beats-out-russia-to-remain-no-1-in-lng-sales-to-europe


Processing URLs:  38%|███▊      | 384/1000 [19:03<56:28,  5.50s/it]  

Error extracting text from http://www.reuters.com/article/us-global-oil-iea-idUSKCN0WP0S0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-iea-idUSKCN0WP0S0


Processing URLs:  39%|███▉      | 394/1000 [19:40<29:57,  2.97s/it]  

Error extracting text from https://labour.org.uk/wp-content/uploads/2020/04/Rule-Book-2020.pdf: 403 Client Error: Forbidden for url: https://labour.org.uk/wp-content/uploads/2020/04/Rule-Book-2020.pdf
URL filtered: https://www.youtube.com/embed/Q9-KPvEhGh4&quot


Processing URLs:  40%|███▉      | 396/1000 [19:40<16:57,  1.68s/it]

Error extracting text from https://www.wsj.com/articles/greece-seeks-inclusion-in-ecbs-stimulus-plan-1496078306: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/greece-seeks-inclusion-in-ecbs-stimulus-plan-1496078306


Processing URLs:  40%|████      | 400/1000 [19:46<15:12,  1.52s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0V01S6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKCN0V01S6


Processing URLs:  40%|████      | 402/1000 [19:48<11:31,  1.16s/it]

Error extracting text from https://www.amnesty.org/en/press-releases/2015/11/iran-death-penalty-facts/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/press-release/2015/11/iran-death-penalty-facts/


Processing URLs:  40%|████      | 403/1000 [19:54<24:21,  2.45s/it]

URL filtered: https://twitter.com/Robotbeat/status/1076553768760676353


Processing URLs:  41%|████      | 406/1000 [19:56<13:25,  1.36s/it]

Error extracting text from https://ctovision.com/: 403 Client Error: Forbidden for url: https://ctovision.com/
Error extracting text from http://www.latimes.com/nation/la-na-flynn-lobbying-20170425-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-na-flynn-lobbying-20170425-story.html


Processing URLs:  41%|████      | 409/1000 [19:59<11:04,  1.13s/it]

Error extracting text from https://www.nytimes.com/2020/01/08/climate/2019-temperatures.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/01/08/climate/2019-temperatures.html


Processing URLs:  41%|████      | 411/1000 [20:02<10:18,  1.05s/it]

Error extracting text from http://www.nytimes.com/2016/11/29/business/economy/us-economy-grew-at-3-2-rate-in-3rd-quarter.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/29/business/economy/us-economy-grew-at-3-2-rate-in-3rd-quarter.html


Processing URLs:  42%|████▏     | 417/1000 [20:14<22:03,  2.27s/it]

URL filtered: http://www.politico.eu/article/half-of-first-time-voters-support-angela-merkel-survey/?utm_content=buffer6e284&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  42%|████▏     | 420/1000 [20:22<22:35,  2.34s/it]

Error extracting text from http://www.rand.org/search.html?query=self+driving+cars&amp;sortby=relevance: 403 Client Error: Forbidden for url: https://www.rand.org/search.html?query=self+driving+cars&amp;sortby=relevance
Error extracting text from http://english.farsnews.com/newstext.aspx?nn=13940820000692: HTTPConnectionPool(host='english.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13940820000692 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303a75340>: Failed to resolve 'english.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  42%|████▏     | 424/1000 [20:31<20:43,  2.16s/it]

Error extracting text from http://nationalinterest.org/feature/the-russia-iran-alliance-weaker-you-think-15685?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/the-russia-iran-alliance-weaker-you-think-15685?page=2


Processing URLs:  43%|████▎     | 427/1000 [20:37<20:32,  2.15s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-09-16/brazil-s-opposition-takes-most-decisive-steps-toward-impeachment


Processing URLs:  43%|████▎     | 432/1000 [20:41<12:47,  1.35s/it]

URL filtered: https://twitter.com/navalny/status/960609997116887042


Processing URLs:  44%|████▎     | 436/1000 [20:45<09:58,  1.06s/it]

Error extracting text from https://www.nytimes.com/2020/12/19/world/europe/coronavirus-uk-new-variant.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/19/world/europe/coronavirus-uk-new-variant.html


Processing URLs:  44%|████▎     | 437/1000 [20:48<12:49,  1.37s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3865852/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3865852/


Processing URLs:  44%|████▍     | 438/1000 [20:48<11:11,  1.20s/it]

Error extracting text from http://www.shiitenews.org/index.php/iraq/item/23984-iraqi-fm-calls-on-arab-states-to-pressure-turkey-into-recalling-troops: 404 Client Error: Not Found for url: http://www.shiitenews.org/index.php/iraq/item/23984-iraqi-fm-calls-on-arab-states-to-pressure-turkey-into-recalling-troops


Processing URLs:  44%|████▍     | 441/1000 [20:51<08:10,  1.14it/s]

Error extracting text from http://www.nytimes.com/2015/09/11/world/middleeast/whats-next-for-the-iran-nuclear-deal.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/09/11/world/middleeast/whats-next-for-the-iran-nuclear-deal.html


Processing URLs:  45%|████▍     | 447/1000 [21:06<16:58,  1.84s/it]

Error extracting text from http://www.sense-eu.info: 410 Client Error: Gone for url: http://www.sense-eu.info/


Processing URLs:  45%|████▌     | 451/1000 [21:10<10:15,  1.12s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-08-04/imf-sees-single-digit-saudi-budget-gap-in-2017-on-spending-cuts
Error extracting text from http://www.nytimes.com/2015/11/19/business/economy/fed-minutes-interest-rate-increase.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/19/business/economy/fed-minutes-interest-rate-increase.html


Processing URLs:  45%|████▌     | 452/1000 [21:11<11:16,  1.23s/it]

URL filtered: https://www.youtube.com/watch?v=fUdmR-NlW6o


Processing URLs:  45%|████▌     | 454/1000 [21:24<30:19,  3.33s/it]

URL filtered: https://twitter.com/cullenroche/status/1480387674976313346?t=zFEbGpZ_qrP7owo4HG0RQg&amp;s=19


Processing URLs:  46%|████▌     | 456/1000 [21:59<1:17:26,  8.54s/it]

URL filtered: https://twitter.com/camwolfe/status/1399880579282313216


Processing URLs:  46%|████▌     | 458/1000 [21:59<50:54,  5.64s/it]  

Error extracting text from https://www.wsj.com/articles/why-central-banks-are-stockpiling-foreign-reserves-1488914148: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/why-central-banks-are-stockpiling-foreign-reserves-1488914148


Processing URLs:  46%|████▌     | 459/1000 [22:00<42:02,  4.66s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/02/23/asia-pacific/s-korea-u-s-delay-talks-on-thaad-missile-shield-amid-talks-with-china/#.Vsz65fA8KrV: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/02/23/asia-pacific/s-korea-u-s-delay-talks-on-thaad-missile-shield-amid-talks-with-china/#.Vsz65fA8KrV


Processing URLs:  46%|████▌     | 460/1000 [22:00<33:14,  3.69s/it]

Error extracting text from http://www.themalaymailonline.com/features/article/make-peace-not-war-colombia-rebels-take-peace-lessons-in-jungle-camps: 404 Client Error: Not Found for url: https://www.malaymail.com/features/article/make-peace-not-war-colombia-rebels-take-peace-lessons-in-jungle-camps


Processing URLs:  46%|████▌     | 462/1000 [22:03<24:49,  2.77s/it]

Error extracting text from https://www.channelnewsasia.com/news/world/brazilian-amazon-deforestation-hits-record-for-may-14949678: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/world/brazilian-amazon-deforestation-hits-record-may-1830666


Processing URLs:  46%|████▋     | 464/1000 [22:08<22:42,  2.54s/it]

Error extracting text from http://tass.ru/en/search?query=assad: 404 Client Error: Not Found for url: https://tass.ru/en/search?query=assad


Processing URLs:  47%|████▋     | 467/1000 [22:14<19:46,  2.23s/it]

Error extracting text from http://www.hybridcars.com/hyundai-and-kia-now-have-as-many-hybrids-as-toyota/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/hyundai-and-kia-now-have-as-many-hybrids-as-toyota/


Processing URLs:  47%|████▋     | 468/1000 [22:15<15:12,  1.72s/it]

Error extracting text from http://www.sciencedirect.com/science/article/pii/S0038092X15003394: 403 Client Error: Forbidden for url: http://www.sciencedirect.com/science/article/pii/S0038092X15003394


Processing URLs:  48%|████▊     | 479/1000 [22:38<16:52,  1.94s/it]

Error extracting text from http://in.reuters.com/article/2015/10/06/us-imf-g20-japan-idINKCN0S01T320151006: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  48%|████▊     | 481/1000 [22:40<12:22,  1.43s/it]

Error extracting text from http://www.ritholtz.com/blog/2013/05/nikkei-downtrend-1982-present/: 403 Client Error: Forbidden for url: https://ritholtz.com/blog/2013/05/nikkei-downtrend-1982-present/


Processing URLs:  48%|████▊     | 484/1000 [23:42<2:42:03, 18.84s/it]

Error extracting text from http://www.usnews.com/news/politics/articles/2016-10-19/ap-fact-check-trump-gets-facts-wrong-on-start-treaty: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  49%|████▊     | 486/1000 [23:43<1:20:55,  9.45s/it]

Error extracting text from http://www.straitstimes.com/world/europe/spains-king-asks-acting-pm-to-form-govt: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  49%|████▊     | 487/1000 [23:45<1:01:47,  7.23s/it]

Error extracting text from http://ekurd.net/biden-iraq-cooperate-kurdistan-2016-08-04: 403 Client Error: Forbidden for url: https://ekurd.net/biden-iraq-cooperate-kurdistan-2016-08-04


Processing URLs:  49%|████▉     | 488/1000 [23:47<49:08,  5.76s/it]  

Error extracting text from http://www.futuredirections.org.au/publication/south-asian-association-regional-co-operation-part-one-problems-saarc/: HTTPConnectionPool(host='www.futuredirections.org.au', port=80): Max retries exceeded with url: /publication/south-asian-association-regional-co-operation-part-one-problems-saarc/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3061882c0>: Failed to resolve 'www.futuredirections.org.au' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  49%|████▉     | 490/1000 [23:50<31:52,  3.75s/it]

Error extracting text from http://ncr-iran.org/en/news/iran-world/20161-lord-maginnis-italy-s-renzi-is-sending-an-ill-advised-signal-with-iran-visit: 403 Client Error: Forbidden for url: https://ncr-iran.org/en/news/iran-world/20161-lord-maginnis-italy-s-renzi-is-sending-an-ill-advised-signal-with-iran-visit


Processing URLs:  49%|████▉     | 493/1000 [23:57<25:43,  3.04s/it]

URL filtered: https://www.youtube.com/watch?v=P9SBMIgbOcA


Processing URLs:  50%|████▉     | 497/1000 [24:02<14:23,  1.72s/it]

Error extracting text from https://www.congress.gov/bill/114th-congress/senate-bill/2276: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/114th-congress/senate-bill/2276


Processing URLs:  50%|████▉     | 498/1000 [24:03<12:15,  1.46s/it]

Error extracting text from https://www.moodys.com/sites/products/DefaultResearch/2006800000445742.pdfx: 404 Client Error: Not Found for url: https://www.moodys.com/sites/products/DefaultResearch/2006800000445742.pdfx


Processing URLs:  50%|█████     | 503/1000 [24:11<11:10,  1.35s/it]

Error extracting text from https://finance.yahoo.com/news/bitcoin-price-volatility-expectations-slip-180126704.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/bitcoin-price-volatility-expectations-slip-180126704.html


Processing URLs:  50%|█████     | 504/1000 [24:12<09:16,  1.12s/it]

Error extracting text from http://thehill.com/homenews/news/361569-former-alabama-police-officer-we-were-told-to-make-sure-moore-didnt-hang-around: 403 Client Error: Forbidden for url: https://thehill.com/homenews/news/361569-former-alabama-police-officer-we-were-told-to-make-sure-moore-didnt-hang-around/


Processing URLs:  51%|█████     | 508/1000 [24:17<08:26,  1.03s/it]

Error extracting text from http://www.reuters.com/article/2015/12/02/us-northkorea-nuclear-tunnel-idUSKBN0TL2UF20151202: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/02/us-northkorea-nuclear-tunnel-idUSKBN0TL2UF20151202


Processing URLs:  51%|█████     | 509/1000 [24:18<09:14,  1.13s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O3N4GA6K50XV01-6R70TNJJIAP8DSHUPU1L5UN2BT


Processing URLs:  51%|█████     | 511/1000 [24:23<15:13,  1.87s/it]

URL filtered: https://twitter.com/BBCBreaking/status/1475509168362737664


Processing URLs:  52%|█████▏    | 516/1000 [24:38<22:33,  2.80s/it]

Error extracting text from https://www.nytimes.com/2020/12/18/opinion/coronavirus-vaccine-doses.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/12/18/opinion/coronavirus-vaccine-doses.html


Processing URLs:  52%|█████▏    | 520/1000 [24:45<14:54,  1.86s/it]

Error extracting text from https://abcnews.go.com/International/wireStory/netanyahu-israel-risk-friction-us-iran-78019750: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/netanyahu-israel-risk-friction-us-iran-78019750


Processing URLs:  52%|█████▏    | 521/1000 [24:46<13:07,  1.64s/it]

Error extracting text from http://arcgis.sd.gov/server/denr/spillsviewer/: 404 Client Error: Not Found for url: http://arcgis.sd.gov/server/denr/spillsviewer/


Processing URLs:  52%|█████▏    | 524/1000 [24:50<11:13,  1.42s/it]

Error extracting text from http://www.cdm.me/english/jasavic-we-made-no-deals-speculations-about-decision-of-positive-montenegro-are-unfounded-and: 403 Client Error: Forbidden for url: https://www.cdm.me/english/jasavic-we-made-no-deals-speculations-about-decision-of-positive-montenegro-are-unfounded-and


Processing URLs:  52%|█████▎    | 525/1000 [24:52<11:49,  1.49s/it]

Error extracting text from http://www.middle-east-online.com/english/?id=74752: 404 Client Error: Not Found for url: https://www.middle-east-online.com/english/?id=74752


Processing URLs:  53%|█████▎    | 531/1000 [25:02<11:20,  1.45s/it]

Error extracting text from https://www.nytimes.com/2020/03/02/us/supreme-court-obamacare-appeal.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/03/02/us/supreme-court-obamacare-appeal.html


Processing URLs:  53%|█████▎    | 533/1000 [25:04<07:40,  1.01it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-kirkuk-idUSKCN12O1VU?mod=related&amp;channelName=worldNews: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-kirkuk-idUSKCN12O1VU?mod=related&amp;channelName=worldNews


Processing URLs:  53%|█████▎    | 534/1000 [26:04<2:25:26, 18.73s/it]

Error extracting text from https://euroinsight.mni-news.com/posts/imf-participation-in-greek-bailout-programme-not-yet-assured-7308: HTTPSConnectionPool(host='euroinsight.mni-news.com', port=443): Max retries exceeded with url: /posts/imf-participation-in-greek-bailout-programme-not-yet-assured-7308 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3040e85f0>, 'Connection to euroinsight.mni-news.com timed out. (connect timeout=60)'))


Processing URLs:  54%|█████▎    | 535/1000 [26:04<1:42:20, 13.21s/it]

Error extracting text from https://unherd.com/2021/04/how-anarchists-captured-portland/: 403 Client Error: Forbidden for url: https://unherd.com/2021/04/how-anarchists-captured-portland/


Processing URLs:  54%|█████▎    | 537/1000 [26:05<50:53,  6.59s/it]  

Error extracting text from http://www.reuters.com/article/us-usa-trump-tax-idUSKCN1BA14X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-tax-idUSKCN1BA14X
Error extracting text from https://www.thelocal.de/20180114/germanys-spd-at-odds-over-coalition-plan: 403 Client Error: Forbidden for url: https://www.thelocal.de/20180114/germanys-spd-at-odds-over-coalition-plan


Processing URLs:  54%|█████▍    | 540/1000 [26:07<20:05,  2.62s/it]

Error extracting text from http://www.iol.co.za/news/politics/ehrenreich-turns-on-zuma-2062009: 403 Client Error: Forbidden for url: http://www.iol.co.za/news/politics/ehrenreich-turns-on-zuma-2062009


Processing URLs:  54%|█████▍    | 542/1000 [26:11<16:26,  2.15s/it]

Error extracting text from http://www.wsj.com/articles/trump-and-cruz-have-trouble-in-the-middle-1452461353: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trump-and-cruz-have-trouble-in-the-middle-1452461353


Processing URLs:  55%|█████▍    | 547/1000 [26:42<44:31,  5.90s/it]  

Error extracting text from http://www.nytimes.com/2016/01/14/opinion/campaign-stops/why-i-will-never-vote-for-donald-trump.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/14/opinion/campaign-stops/why-i-will-never-vote-for-donald-trump.html


Processing URLs:  55%|█████▍    | 549/1000 [26:44<25:43,  3.42s/it]

Error extracting text from http://ajw.asahi.com/article/asia/korean_peninsula/AJ201510020026: HTTPConnectionPool(host='ajw.asahi.com', port=80): Max retries exceeded with url: /article/asia/korean_peninsula/AJ201510020026 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3040e8a70>: Failed to resolve 'ajw.asahi.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▌    | 552/1000 [26:48<16:08,  2.16s/it]

URL filtered: https://www.youtube.com/watch?v=RT0K70bTysM


Processing URLs:  56%|█████▌    | 556/1000 [26:54<13:15,  1.79s/it]

Error extracting text from http://m.nouvelobs.com/monde/20160222.REU9475/matteo-renzi-se-rendra-en-visite-en-iran-en-avril.html?xtref=https%3A%2F%2Fwww.google.it%2F#https://www.google.it/: 404 Client Error: Not Found for url: https://www.nouvelobs.com/monde/20160222.REU9475/matteo-renzi-se-rendra-en-visite-en-iran-en-avril.html?xtref=https%3A%2F%2Fwww.google.it%2F#https://www.google.it/


Processing URLs:  56%|█████▌    | 557/1000 [26:54<10:52,  1.47s/it]

Error extracting text from http://thehill.com/homenews/campaign/362798-gay-gop-group-urges-good-christians-to-oppose-moore: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/362798-gay-gop-group-urges-good-christians-to-oppose-moore/


Processing URLs:  56%|█████▌    | 558/1000 [27:00<18:47,  2.55s/it]

Error extracting text from http://nwpr.org/post/northwest-governors-vow-resist-trump-plans-gut-climate-change-rules: 404 Client Error: Not Found for url: https://www.nwpb.org/post/northwest-governors-vow-resist-trump-plans-gut-climate-change-rules


Processing URLs:  56%|█████▌    | 559/1000 [27:00<14:00,  1.91s/it]

Error extracting text from https://www.nytimes.com/2017/06/21/world/middleeast/saudi-arabia-crown-prince-mohammed-bin-salman.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/06/21/world/middleeast/saudi-arabia-crown-prince-mohammed-bin-salman.html


Processing URLs:  56%|█████▌    | 562/1000 [27:04<09:32,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-peru-election-poll-idUSKCN0VU0U8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-peru-election-poll-idUSKCN0VU0U8
URL filtered: https://twitter.com/kylieatwood/status/685897148890624000


Processing URLs:  56%|█████▋    | 565/1000 [27:08<10:53,  1.50s/it]

Error extracting text from https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-convinced-trump-will-soon-pull-plug-on-nafta-sources-idUSKBN1EZ2K4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta-canada-exclusive/exclusive-canada-convinced-trump-will-soon-pull-plug-on-nafta-sources-idUSKBN1EZ2K4


Processing URLs:  57%|█████▋    | 566/1000 [27:08<08:36,  1.19s/it]

Error extracting text from https://www.predictit.org/Market/2461/Who-will-be-elected-German-chancellor-in-2017: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/2461/Who-will-be-elected-German-chancellor-in-2017


Processing URLs:  57%|█████▋    | 568/1000 [27:13<11:56,  1.66s/it]

Error extracting text from http://www.ibtimes.com/rwanda-recruiting-training-burundian-rebels-us-envoy-alarmed-over-credible-reports-2303256: 403 Client Error: Forbidden for url: https://www.ibtimes.com/rwanda-recruiting-training-burundian-rebels-us-envoy-alarmed-over-credible-reports-2303256


Processing URLs:  57%|█████▋    | 569/1000 [35:13<16:31:26, 138.02s/it]

Error extracting text from https://www.thespainreport.com/newsitems/622-160709100532-update-pedro-sanchez-breaks-13-day-silence-to-say-psoe-will-vote-against-rajoy: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /newsitems/622-160709100532-update-pedro-sanchez-breaks-13-day-silence-to-say-psoe-will-vote-against-rajoy (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30601f320>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  57%|█████▊    | 575/1000 [35:21<2:07:29, 18.00s/it]  

Error extracting text from http://www.c-span.org/video/?401781-1/defense-secretary-ashton-carter-testimony-us-strategy-isis: 403 Client Error: Forbidden for url: https://www.c-span.org/video/?401781-1/defense-secretary-ashton-carter-testimony-us-strategy-isis


Processing URLs:  58%|█████▊    | 577/1000 [37:21<4:00:07, 34.06s/it]

Error extracting text from https://www.asia-first.com/newsletter/japan-minister-says-recession-is-just-an-illusion.html: HTTPSConnectionPool(host='www.asia-first.com', port=443): Max retries exceeded with url: /newsletter/japan-minister-says-recession-is-just-an-illusion.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x30601fd10>, 'Connection to www.asia-first.com timed out. (connect timeout=60)'))


Processing URLs:  58%|█████▊    | 578/1000 [37:22<2:48:11, 23.91s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-wilders-idUSKCN0ZA0HO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-wilders-idUSKCN0ZA0HO


Processing URLs:  58%|█████▊    | 581/1000 [37:45<1:45:27, 15.10s/it]

Error extracting text from http://www.ew.com/article/2015/11/08/donald-trump-snl-ratings: 406 Client Error: Not Acceptable for url: https://www.ew.com/article/2015/11/08/donald-trump-snl-ratings


Processing URLs:  59%|█████▊    | 586/1000 [37:47<23:45,  3.44s/it]  

URL filtered: https://www.bloombergquint.com/global-economics/biden-will-need-more-than-52-billion-to-counter-china-in-chips
Error extracting text from http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html?smid=nytcore-iphone-share&amp;smprod=nytcore-iphone: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/20/us/politics/donald-trump-campaign-lags-in-mobilizing-iowa-caucus-voters.html?smid=nytcore-iphone-share&amp;smprod=nytcore-iphone


Processing URLs:  59%|█████▉    | 590/1000 [38:05<25:02,  3.67s/it]

Error extracting text from http://en.trend.az/iran/society/2479046.html: 404 Client Error: Not Found for url: https://www.trend.az/iran/society/2479046.html


Processing URLs:  59%|█████▉    | 593/1000 [38:10<18:14,  2.69s/it]

URL filtered: https://twitter.com/ThreshedThought/status/1497304143836454921
URL filtered: https://www.youtube.com/watch?v=wM6exo00T5I


Processing URLs:  60%|█████▉    | 596/1000 [38:12<09:55,  1.47s/it]

Error extracting text from https://tradingeconomics.com/oman/rating: 405 Client Error: Not Allowed for url: https://tradingeconomics.com/oman/rating


Processing URLs:  60%|█████▉    | 598/1000 [38:15<09:26,  1.41s/it]

Error extracting text from https://www.reuters.com/article/us-usa-tax-healthcare/u-s-senate-tax-bill-accomplishes-major-obamacare-repeal-goal-idUSKBN1DW07T?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-tax-healthcare/u-s-senate-tax-bill-accomplishes-major-obamacare-repeal-goal-idUSKBN1DW07T?il=0


Processing URLs:  60%|██████    | 601/1000 [38:15<05:34,  1.19it/s]

Error extracting text from https://www.reuters.com/article/us-cyber-hbo-indictment/u-s-prosecutors-charge-iranian-in-game-of-thrones-hack-idUSKBN1DL1YT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-hbo-indictment/u-s-prosecutors-charge-iranian-in-game-of-thrones-hack-idUSKBN1DL1YT


Processing URLs:  61%|██████    | 606/1000 [38:22<07:55,  1.21s/it]

Error extracting text from https://nationalinterest.org/blog/buzz/rs-28-sarmat-new-russian-icbm-enter-combat-duty-2022-174616: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/buzz/rs-28-sarmat-new-russian-icbm-enter-combat-duty-2022-174616


Processing URLs:  61%|██████    | 609/1000 [38:24<05:54,  1.10it/s]

Error extracting text from http://www.timesofisrael.com/israel-bitterly-rejects-obamas-claim-it-now-backs-iran-nuclear-deal/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/israel-bitterly-rejects-obamas-claim-it-now-backs-iran-nuclear-deal/


Processing URLs:  61%|██████    | 610/1000 [38:25<05:32,  1.17it/s]

Error extracting text from https://bit.ly/2PM733c: 403 Client Error: Forbidden for url: https://www.financialexpress.com/defence/china-refuses-to-vacate-four-friction-points-in-ladakh-heres-everything-you-need-to-know-about-gogra-and-hot-springs/2236611/


Processing URLs:  61%|██████    | 611/1000 [38:26<06:05,  1.06it/s]

Error extracting text from https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L:2020:442:FULL&amp;from=EN: 404 Client Error: Not Found for url: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L:2020:442:FULL&amp;from=EN


Processing URLs:  61%|██████    | 612/1000 [38:27<05:43,  1.13it/s]

Error extracting text from http://www.bmgresearch.co.uk/scots-2017-independence-vote/: 403 Client Error: Forbidden for url: http://www.bmgresearch.co.uk/scots-2017-independence-vote/


Processing URLs:  61%|██████▏   | 613/1000 [38:29<08:50,  1.37s/it]

Error extracting text from http://www.ap.com/mobile: 404 Client Error: Not Found for url: https://www.ap.com/mobile
Error extracting text from http://criticalinfrastructurealliance.com/?s=Rocket+kitten: HTTPConnectionPool(host='criticalinfrastructurealliance.com', port=80): Max retries exceeded with url: /?s=Rocket+kitten (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306e0ea20>: Failed to resolve 'criticalinfrastructurealliance.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  62%|██████▏   | 616/1000 [38:30<04:33,  1.40it/s]

Error extracting text from https://www.nytimes.com/2020/02/03/world/asia/coronavirus-china.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/02/03/world/asia/coronavirus-china.html


Processing URLs:  62%|██████▏   | 618/1000 [38:33<06:24,  1.01s/it]

Error extracting text from https://reut.rs/2YLpS7L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-scotland/judges-not-voters-may-decide-on-scottish-independence-vote-idUSKBN2A51IV


Processing URLs:  62%|██████▎   | 625/1000 [38:46<10:23,  1.66s/it]

URL filtered: http://www.dailymail.co.uk/news/article-4042194/Facebook-fact-checker-arbitrate-fake-news-accused-defrauding-website-pay-prostitutes-staff-includes-escort-porn-star-Vice-Vixen-domme.html


Processing URLs:  63%|██████▎   | 627/1000 [38:48<08:15,  1.33s/it]

Error extracting text from http://www.ibtimes.com/panama-canal-expansion-be-complete-may-panamas-president-juan-carlos-varela-says-2247279: 403 Client Error: Forbidden for url: https://www.ibtimes.com/panama-canal-expansion-be-complete-may-panamas-president-juan-carlos-varela-says-2247279


Processing URLs:  63%|██████▎   | 631/1000 [38:54<09:11,  1.49s/it]

URL filtered: http://www.bloomberg.com/politics/trackers/2016-03-16/-no-question-mcconnell-will-cave-on-garland-nomination-reid


Processing URLs:  63%|██████▎   | 634/1000 [38:56<05:51,  1.04it/s]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/18/748118/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/18/748118/story.html
Error extracting text from http://www.reuters.com/article/us-southchinasea-vietnam-china/china-brushes-off-vietnam-protests-over-south-china-sea-drills-idUSKCN1BH110: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-vietnam-china/china-brushes-off-vietnam-protests-over-south-china-sea-drills-idUSKCN1BH110


Processing URLs:  64%|██████▍   | 638/1000 [39:09<13:00,  2.15s/it]

Error extracting text from http://finance.yahoo.com/news/trump-even-cheating-spouse-shouldnt-stop-iowa-caucus-195646721--election.html: 404 Client Error: Not Found for url: https://finance.yahoo.com/news/trump-even-cheating-spouse-shouldnt-stop-iowa-caucus-195646721--election.html


Processing URLs:  64%|██████▍   | 640/1000 [39:13<11:14,  1.87s/it]

Error extracting text from https://www.reuters.com/article/us-nicaragua-russia/russian-warships-visit-cold-war-ally-nicaragua-idUSTRE4BD00620081214: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nicaragua-russia/russian-warships-visit-cold-war-ally-nicaragua-idUSTRE4BD00620081214


Processing URLs:  64%|██████▍   | 642/1000 [39:14<07:04,  1.19s/it]

Error extracting text from http://www.reuters.com/article/2015/11/03/us-usa-election-trump-fed-idUSKCN0SS22A20151103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/03/us-usa-election-trump-fed-idUSKCN0SS22A20151103


Processing URLs:  65%|██████▍   | 648/1000 [39:24<06:33,  1.12s/it]

Error extracting text from http://www.nytimes.com/2016/10/03/world/colombia-peace-deal-defeat.html?emc=edit_na_20161002&amp;nlid=52725637&amp;ref=headline&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/03/world/colombia-peace-deal-defeat.html?emc=edit_na_20161002&amp;nlid=52725637&amp;ref=headline&amp;_r=0


Processing URLs:  65%|██████▌   | 651/1000 [39:27<06:25,  1.10s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-05-02/u-a-e-minister-sees-logic-in-oil-cuts-extension-beyond-june


Processing URLs:  65%|██████▌   | 653/1000 [39:30<07:05,  1.23s/it]

Error extracting text from http://www.radio.gov.pk/03-Apr-2017/erdogan-calls-on-turkish-voters-in-europe: 404 Client Error: Not Found for url: https://www.radio.gov.pk/03-Apr-2017/erdogan-calls-on-turkish-voters-in-europe


Processing URLs:  66%|██████▌   | 655/1000 [39:32<06:15,  1.09s/it]

Error extracting text from http://neven1.typepad.com/: 403 Client Error: Forbidden for url: https://neven1.typepad.com/


Processing URLs:  66%|██████▌   | 657/1000 [39:37<09:09,  1.60s/it]

Error extracting text from http://www.nytimes.com/2015/11/04/world/middleeast/backlash-against-us-in-iran-seems-to-gather-force-after-nuclear-deal.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/04/world/middleeast/backlash-against-us-in-iran-seems-to-gather-force-after-nuclear-deal.html


Processing URLs:  66%|██████▌   | 661/1000 [39:43<07:31,  1.33s/it]

Error extracting text from https://www.hkex.com.hk/eng/listing/listreq_pro/listreq/equities.htm: 404 Client Error: Not Found for url: https://www.hkex.com.hk/eng/listing/listreq_pro/listreq/equities.htm


Processing URLs:  66%|██████▋   | 664/1000 [39:45<04:27,  1.25it/s]

Error extracting text from http://blogs.barrons.com/emergingmarketsdaily/2016/11/21/venezuelas-pdvsa-inching-closer-to-debt-default/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/emergingmarketsdaily/2016/11/21/venezuelas-pdvsa-inching-closer-to-debt-default/
Error extracting text from https://www.reuters.com/article/us-venezuela-politics-lawmakers-idUSKBN19H0E8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-lawmakers-idUSKBN19H0E8


Processing URLs:  66%|██████▋   | 665/1000 [40:03<34:06,  6.11s/it]

Error extracting text from https://www.thebalance.com/oil-price-forecast-3306219: 406 Client Error: Not Acceptable for url: https://www.thebalancemoney.com:443/oil-price-forecast-3306219


Processing URLs:  67%|██████▋   | 670/1000 [40:15<15:14,  2.77s/it]

Error extracting text from http://www.francophonie.org/CP-97e-session-du-CPF-46886.html: HTTPConnectionPool(host='www.francophonie.org', port=80): Max retries exceeded with url: /CP-97e-session-du-CPF-46886.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30300b860>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  67%|██████▋   | 672/1000 [40:18<11:23,  2.08s/it]

Error extracting text from http://www.medialifemagazine.com/think-papers-websites-are-gaining-think-again/: HTTPConnectionPool(host='www.medialifemagazine.com', port=80): Max retries exceeded with url: /think-papers-websites-are-gaining-think-again/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303009010>: Failed to resolve 'www.medialifemagazine.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  68%|██████▊   | 676/1000 [40:25<09:18,  1.72s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-russia-fp-comment-33714a66-995f-11e7-87fc-c3f7ee4035c9-20170914-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-russia-fp-comment-33714a66-995f-11e7-87fc-c3f7ee4035c9-20170914-story.html


Processing URLs:  68%|██████▊   | 678/1000 [40:32<13:49,  2.58s/it]

Error extracting text from http://www.whec.com/news/self-driving-cars-is-rochester-ready/4774758/: 404 Client Error: Not Found for url: https://www.whec.com/news/self-driving-cars-is-rochester-ready/4774758/


Processing URLs:  68%|██████▊   | 679/1000 [40:33<10:59,  2.05s/it]

Error extracting text from http://english.yonhapnews.co.kr/national/2016/06/22/0301000000AEN20160622000400315.html?08691448: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  68%|██████▊   | 681/1000 [40:34<05:52,  1.10s/it]

Error extracting text from https://www.nytimes.com/2016/03/15/world/asia/china-labor-strike-protest.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/15/world/asia/china-labor-strike-protest.html
Error extracting text from https://www.yahoo.com/tech/nextev-unveil-1-million-1-160000164.html: 404 Client Error: Not Found for url: https://www.yahoo.com/tech/nextev-unveil-1-million-1-160000164.html


Processing URLs:  68%|██████▊   | 682/1000 [40:34<04:20,  1.22it/s]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-russia-idUSKCN11H051?ftcamp=crm/email//nbe/FirstFTAsia/product: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-russia-idUSKCN11H051?ftcamp=crm/email//nbe/FirstFTAsia/product


Processing URLs:  69%|██████▊   | 686/1000 [40:40<06:13,  1.19s/it]

Error extracting text from http://www.straitstimes.com/asia/east-asia/china-steps-up-criticism-of-us-over-possible-air-defence-zone-in-south-china-sea: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  69%|██████▊   | 687/1000 [40:41<06:10,  1.18s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/Policy-Politics/Redesign-of-North-Korea-ICBM-will-delay-deployment-to-2020-US-website: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/Policy-Politics/Redesign-of-North-Korea-ICBM-will-delay-deployment-to-2020-US-website


Processing URLs:  69%|██████▉   | 689/1000 [40:44<06:57,  1.34s/it]

Error extracting text from http://in.reuters.com/article/vietnam-china-protests-idINKBN0UG0HV20160103: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  69%|██████▉   | 690/1000 [40:46<07:17,  1.41s/it]

Error extracting text from http://thehill.com/opinion/campaign/364255-sen-shelby-could-save-alabama-from-moore: 403 Client Error: Forbidden for url: https://thehill.com/opinion/campaign/364255-sen-shelby-could-save-alabama-from-moore/
URL filtered: http://fivethirtyeight.com/features/the-new-game-of-thrones-season-starts-in-april-but-what-about-that-book/?ex_cid=538twitter


Processing URLs:  69%|██████▉   | 693/1000 [40:47<04:04,  1.26it/s]

Error extracting text from http://www.bangkokpost.com/news/asia/1140120/two-dead-in-fighting-in-myanmar-town-on-china-border: 404 Client Error: Not Found for url: https://www.bangkokpost.com/news/asia/1140120/two-dead-in-fighting-in-myanmar-town-on-china-border
Error extracting text from http://blogs.wsj.com/economics/2016/03/04/all-clear-on-recession-risk-not-yet/: 403 Client Error: Forbidden for url: http://blogs.wsj.com/economics/2016/03/04/all-clear-on-recession-risk-not-yet/
URL filtered: https://www.facebook.com/DonaldTrump/posts/10157185178405725


Processing URLs:  70%|███████   | 700/1000 [40:51<03:42,  1.35it/s]

Error extracting text from http://gazettereview.com/2016/07/u-s-will-send-560-troops-iraq/: 403 Client Error: Forbidden for url: http://gazettereview.com/2016/07/u-s-will-send-560-troops-iraq/


Processing URLs:  70%|███████   | 705/1000 [40:56<03:23,  1.45it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-06-27/50-cent-isn-t-the-only-one-seeking-big-protection-in-vix-options
Error extracting text from http://www.nytimes.com/2016/11/24/world/middleeast/iraq-mosul-isis-civilians.html?emc=edit_ee_20161124&amp;nl=todaysheadlines-europe&amp;nlid=77825025: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/24/world/middleeast/iraq-mosul-isis-civilians.html?emc=edit_ee_20161124&amp;nl=todaysheadlines-europe&amp;nlid=77825025


Processing URLs:  71%|███████   | 707/1000 [40:57<02:56,  1.66it/s]

Error extracting text from http://www.wsj.com/articles/ban-ki-moon-returns-to-south-korea-in-bid-to-lead-it-1484205041: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ban-ki-moon-returns-to-south-korea-in-bid-to-lead-it-1484205041


Processing URLs:  71%|███████   | 708/1000 [40:59<04:14,  1.15it/s]

Error extracting text from http://www.prensa.com/in_english/Canal_21_4372272734.html: 404 Client Error: Not Found for url: https://www.prensa.com/in_english/Canal_21_4372272734.html


Processing URLs:  71%|███████▏  | 713/1000 [41:03<03:17,  1.45it/s]

Error extracting text from http://www.yorkshirepost.co.uk/news/exclusive-labour-expects-eu-referendum-will-be-held-in-june-2016-1-7676374#ixzz3xHdkj0l4: 403 Client Error: Forbidden for url: https://www.yorkshirepost.co.uk/news/exclusive-labour-expects-eu-referendum-will-be-held-in-june-2016-1-7676374#ixzz3xHdkj0l4


Processing URLs:  72%|███████▏  | 716/1000 [41:07<05:15,  1.11s/it]

Error extracting text from http://www.reuters.com/article/us-syria-aleppo-commentary-idUSKCN0XW25N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-syria-aleppo-commentary-idUSKCN0XW25N


Processing URLs:  72%|███████▏  | 719/1000 [41:12<07:00,  1.50s/it]

URL filtered: https://twitter.com/CNN/status/1486520801046282240


Processing URLs:  72%|███████▏  | 721/1000 [41:18<10:31,  2.27s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-25/temer-earns-first-win-as-brazil-congress-approves-fiscal-target


Processing URLs:  72%|███████▏  | 723/1000 [41:19<07:12,  1.56s/it]

Error extracting text from http://in.reuters.com/article/tillerson-asia-southkorea-idINKBN16O0BD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  72%|███████▎  | 725/1000 [41:23<07:48,  1.70s/it]

Error extracting text from https://www.econlib.org/i-fear-stagflation-and-general-price-controls-are-coming/: 403 Client Error: Forbidden for url: https://www.econlib.org/i-fear-stagflation-and-general-price-controls-are-coming/


Processing URLs:  73%|███████▎  | 727/1000 [41:25<06:17,  1.38s/it]

Error extracting text from http://www.sciencemag.org/news/2016/09/nasa-moves-rejoin-sped-gravitational-wave-mission: 403 Client Error: Forbidden for url: https://www.science.org/news/2016/09/nasa-moves-rejoin-sped-gravitational-wave-mission
Error extracting text from https://www.reuters.com/article/us-venezuela-politics-idUSKBN18Y3B3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKBN18Y3B3


Processing URLs:  73%|███████▎  | 732/1000 [41:30<04:45,  1.06s/it]

Error extracting text from https://www.reuters.com/article/brazil-politics/covid-19-not-enough-to-impeach-bolsonaro-brazils-likely-lower-house-head-says-idUSL1N2K22WU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/brazil-politics/covid-19-not-enough-to-impeach-bolsonaro-brazils-likely-lower-house-head-says-idUSL1N2K22WU
Error extracting text from http://www.reuters.com/article/us-spain-politics-idUSKCN1181AB?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-spain-politics-idUSKCN1181AB?il=0


Processing URLs:  73%|███████▎  | 733/1000 [41:32<05:47,  1.30s/it]

Error extracting text from http://marketrealist.com/2017/02/goldman-sachs-believes-chance-market-disruption/?utm_source=yahoo&amp;utm_medium=feed: 404 Client Error: Not Found for url: https://marketrealist.com:443/2017/02/goldman-sachs-believes-chance-market-disruption/?utm_source=yahoo&amp;utm_medium=feed


Processing URLs:  73%|███████▎  | 734/1000 [41:34<06:18,  1.42s/it]

Error extracting text from https://www.sec.gov/news/press-release/2017-219: 403 Client Error: Forbidden for url: https://www.sec.gov/news/press-release/2017-219


Processing URLs:  74%|███████▎  | 735/1000 [41:35<05:25,  1.23s/it]

Error extracting text from http://gawker.com/trump-embraces-putins-praise-nobody-has-proven-that-h-1748985215: 404 Client Error: Not Found for url: https://gawker.com/trump-embraces-putins-praise-nobody-has-proven-that-h-1748985215


Processing URLs:  74%|███████▎  | 737/1000 [41:39<08:14,  1.88s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-01-22/russia-seeks-syria-peace-with-turkey-iran-as-u-s-sidelined


Processing URLs:  74%|███████▍  | 741/1000 [41:44<06:26,  1.49s/it]

Error extracting text from http://bit.ly/2flWD5E: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950822001202 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30153a540>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  75%|███████▍  | 749/1000 [42:05<12:36,  3.01s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-islamicstate-idUSKCN0Y01TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-islamicstate-idUSKCN0Y01TL


Processing URLs:  75%|███████▌  | 752/1000 [42:09<06:56,  1.68s/it]

Error extracting text from http://www.hg.org/article.asp?id=31688: 403 Client Error: Forbidden for url: http://www.hg.org/article.asp?id=31688


Processing URLs:  76%|███████▌  | 755/1000 [42:17<08:44,  2.14s/it]

Error extracting text from http://thehill.com/blogs/congress-blog/politics/353084-the-significance-of-the-supreme-court-case-on-extreme-partisan: 403 Client Error: Forbidden for url: https://thehill.com/blogs/congress-blog/politics/353084-the-significance-of-the-supreme-court-case-on-extreme-partisan/


Processing URLs:  76%|███████▌  | 759/1000 [42:33<12:10,  3.03s/it]

Error extracting text from http://uk.reuters.com/article/uk-colombia-peace-idUKKBN1370PZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  76%|███████▋  | 763/1000 [42:36<04:42,  1.19s/it]

Error extracting text from http://www.nytimes.com/2015/10/02/us/politics/congress-debt-limit.html?emc=edit_th_20151002&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/02/us/politics/congress-debt-limit.html?emc=edit_th_20151002&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  76%|███████▋  | 764/1000 [42:38<06:15,  1.59s/it]

Error extracting text from http://tass.ru/en/politics/854231: 404 Client Error: Not Found for url: https://tass.ru/en/politics/854231


Processing URLs:  77%|███████▋  | 768/1000 [43:43<1:13:47, 19.08s/it]

Error extracting text from http://www.usnews.com/news/the-report/articles/2015/10/02/washington-whispers-an-unclear-future-for-journalist-jason-rezaian: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  77%|███████▋  | 770/1000 [43:46<37:45,  9.85s/it]  

Error extracting text from http://www.opec.org/opec_web/en/press_room/2938.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2938.htm


Processing URLs:  78%|███████▊  | 775/1000 [43:56<12:34,  3.35s/it]

Error extracting text from http://nerdist.com/google-deepmind-beats-legend-lee-sedol-in-game-one-of-five-game-go-series/: 403 Client Error: Forbidden for url: http://nerdist.com/google-deepmind-beats-legend-lee-sedol-in-game-one-of-five-game-go-series/


Processing URLs:  78%|███████▊  | 779/1000 [44:10<10:59,  2.98s/it]

Error extracting text from https://www.nytimes.com/2017/12/12/world/asia/afghanistan-ashraf-ghani-elections.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/12/world/asia/afghanistan-ashraf-ghani-elections.html?_r=0


Processing URLs:  78%|███████▊  | 780/1000 [44:10<08:06,  2.21s/it]

Error extracting text from https://www.sciencedirect.com/science/article/pii/S0960982216302470?via%3Dihub: 403 Client Error: Forbidden for url: https://www.sciencedirect.com/science/article/pii/S0960982216302470?via%3Dihub


Processing URLs:  78%|███████▊  | 782/1000 [44:13<06:35,  1.81s/it]

URL filtered: http://time.com/4207995/north-korea-trash-balloons/?xid=time_socialflow_facebook


Processing URLs:  78%|███████▊  | 784/1000 [44:14<04:02,  1.12s/it]

Error extracting text from http://www.latimes.com/nation/la-fg-trump-thanks-russia-20170811-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/la-fg-trump-thanks-russia-20170811-story.html


Processing URLs:  79%|███████▊  | 786/1000 [44:18<05:44,  1.61s/it]

Error extracting text from https://www.drugabuse.gov/publications/teaching-packets/neurobiology-ecstasy/section-ii/1-how-does-ecstasy-work-serotonin-pathways-in-brain: 404 Client Error: Not Found for url: https://nida.nih.gov/publications/teaching-packets/neurobiology-ecstasy/section-ii/1-how-does-ecstasy-work-serotonin-pathways-in-brain


Processing URLs:  79%|███████▉  | 789/1000 [44:28<08:27,  2.41s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0X51OC: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0X51OC


Processing URLs:  79%|███████▉  | 792/1000 [44:37<09:22,  2.70s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-03/23/c_135213624.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-03/23/c_135213624.htm


Processing URLs:  80%|███████▉  | 795/1000 [44:49<11:59,  3.51s/it]

Error extracting text from http://bankruptcydata.com/p/data-research: 404 Client Error: Not Found for url: https://www.bankruptcydata.com/p/data-research


Processing URLs:  80%|███████▉  | 797/1000 [44:50<06:26,  1.90s/it]

Error extracting text from http://www.reuters.com/article/us-india-brics-idUSKBN1711LV?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-india-brics-idUSKBN1711LV?il=0


Processing URLs:  80%|███████▉  | 799/1000 [44:52<04:59,  1.49s/it]

Error extracting text from http://blogs.barrons.com/asiastocks/2015/10/04/why-bank-of-japan-wont-announce-another-round-of-qe/: 403 Client Error: Forbidden for url: http://blogs.barrons.com/asiastocks/2015/10/04/why-bank-of-japan-wont-announce-another-round-of-qe/


Processing URLs:  80%|████████  | 802/1000 [44:59<05:55,  1.79s/it]

Error extracting text from https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://rota2014.blogspot.com/2016/03/policia-federal-cumpre-ao-menos-15.html&amp;usg=ALkJrhg0oYiFrlHwQQjJ4-UYj70GyqHj8Q: 404 Client Error: Not Found for url: https://translate.googleusercontent.com/translate_c?depth=1&amp;hl=en&amp;prev=search&amp;rurl=translate.google.com&amp;sl=pt-BR&amp;u=http://rota2014.blogspot.com/2016/03/policia-federal-cumpre-ao-menos-15.html&amp;usg=ALkJrhg0oYiFrlHwQQjJ4-UYj70GyqHj8Q


Processing URLs:  80%|████████  | 805/1000 [45:06<07:06,  2.19s/it]

Error extracting text from http://osp.od.nih.gov/sites/default/files/NSABB_Working_Group_Draft_Report.pdf: 404 Client Error: Not Found for url: https://osp.od.nih.gov/sites/default/files/NSABB_Working_Group_Draft_Report.pdf


Processing URLs:  81%|████████  | 808/1000 [45:08<03:53,  1.22s/it]

Error extracting text from http://www.arabnews.com/node/1194446/world: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1194446/world
URL filtered: https://www.bloomberg.com/news/articles/2018-02-09/uber-settles-waymo-driverless-car-lawsuit-averts-jury-verdict-jdg486do


Processing URLs:  81%|████████  | 810/1000 [45:09<03:11,  1.01s/it]

Error extracting text from http://tass.ru/en/world/829999: 404 Client Error: Not Found for url: https://tass.ru/en/world/829999


Processing URLs:  82%|████████▏ | 816/1000 [45:24<05:29,  1.79s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-21/brazil-impeachment-papers-about-to-drop-as-crisis-hits-new-stage
Error extracting text from http://www.worldbulletin.net/todays-news/178794/rohingya-groups-say-rakhine-deaths-now-excuse-for-purge: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/todays-news/178794/rohingya-groups-say-rakhine-deaths-now-excuse-for-purge


Processing URLs:  82%|████████▏ | 821/1000 [45:34<03:56,  1.32s/it]

Error extracting text from http://www.investmentnews.com/article/20150907/FREE/150909938/federal-reserve-may-wait-a-long-time-for-second-rate-hike: 403 Client Error: Forbidden for url: http://www.investmentnews.com/article/20150907/FREE/150909938/federal-reserve-may-wait-a-long-time-for-second-rate-hike


Processing URLs:  82%|████████▏ | 824/1000 [45:42<05:29,  1.87s/it]

Error extracting text from http://www.businessinsider.my/brexit-betting-the-odds-have-moved-even-more-in-remains-favour-2016-6/?r=UK&amp;IR=T#vjOucF524PHp7p1F.97: HTTPConnectionPool(host='www.businessinsider.my', port=80): Max retries exceeded with url: /brexit-betting-the-odds-have-moved-even-more-in-remains-favour-2016-6/?r=UK&amp;IR=T (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30761e780>: Failed to resolve 'www.businessinsider.my' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  83%|████████▎ | 826/1000 [45:44<03:49,  1.32s/it]

Error extracting text from http://www.wsj.com/articles/brazil-records-worst-fiscal-result-on-record-in-2015-1454072346: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-records-worst-fiscal-result-on-record-in-2015-1454072346


Processing URLs:  83%|████████▎ | 827/1000 [45:45<03:44,  1.30s/it]

Error extracting text from http://mobile.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=1&amp;referer=http://www.theatlantic.com/international/archive/2016/03/iran-election-results-winner/472128/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/03/03/world/middleeast/iran-elections.html?_r=1&amp;referer=http://www.theatlantic.com/international/archive/2016/03/iran-election-results-winner/472128/


Processing URLs:  83%|████████▎ | 830/1000 [45:53<05:34,  1.97s/it]

URL filtered: https://www.youtube.com/watch?v=IfLoayApkk4


Processing URLs:  84%|████████▍ | 840/1000 [46:18<12:32,  4.70s/it]

Error extracting text from https://www.weeklystandard.com/blogs/biden-if-i-dont-move-ill-be-demoted-secretary-state-or-something_1042269.html: Exceeded 30 redirects.


Processing URLs:  84%|████████▍ | 842/1000 [46:21<07:55,  3.01s/it]

Error extracting text from http://www.reuters.com/article/us-usa-trump-advisers-idUSKCN1BB075: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-advisers-idUSKCN1BB075


Processing URLs:  84%|████████▍ | 843/1000 [46:21<05:41,  2.18s/it]

Error extracting text from http://india-wris.nrsc.gov.in/wrpinfo/index.php?title=Narmada: HTTPConnectionPool(host='india-wris.nrsc.gov.in', port=80): Max retries exceeded with url: /wrpinfo/index.php?title=Narmada (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307b2fc20>: Failed to resolve 'india-wris.nrsc.gov.in' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  84%|████████▍ | 845/1000 [46:24<04:12,  1.63s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKCN0Z508B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKCN0Z508B


Processing URLs:  85%|████████▍ | 848/1000 [46:28<03:26,  1.36s/it]

Error extracting text from http://www.mb.com.ph/us-sees-new-flashpoint-in-south-china-sea/: 403 Client Error: Forbidden for url: https://mb.com.ph/us-sees-new-flashpoint-in-south-china-sea/


Processing URLs:  86%|████████▌ | 856/1000 [47:41<46:10, 19.24s/it]

Error extracting text from http://www.post-gazette.com/news/politics-nation/2017/11/29/Analysis-Roy-Moore-can-still-win-the-Alabama-Senate-Race-Here-s-why/stories/201711290304: HTTPConnectionPool(host='www.post-gazette.com', port=80): Max retries exceeded with url: /news/politics-nation/2017/11/29/Analysis-Roy-Moore-can-still-win-the-Alabama-Senate-Race-Here-s-why/stories/201711290304 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x307d7afc0>, 'Connection to www.post-gazette.com timed out. (connect timeout=60)'))


Processing URLs:  86%|████████▌ | 857/1000 [48:41<1:15:02, 31.49s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-04-04/the-latest-suicide-attacks-near-baghdad-kill-at-least-10: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  86%|████████▌ | 859/1000 [48:47<39:31, 16.82s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2015-11-25/gold-shrugs-off-turkey-tensions-as-focus-returns-to-u-s-rates


Processing URLs:  86%|████████▌ | 861/1000 [48:49<22:02,  9.51s/it]

Error extracting text from https://www.cdc.gov/coronavirus/2019-ncov/more/science-and-research/scientific-brief-emerging-variants.html: 404 Client Error: Not Found for url: https://www.cdc.gov/coronavirus/2019-ncov/more/science-and-research/scientific-brief-emerging-variants.html


Processing URLs:  86%|████████▌ | 862/1000 [48:51<17:39,  7.68s/it]

Error extracting text from http://www.nationalreview.com/corner/429206/fda-drug-approval-process-risk-averse: 404 Client Error: Not Found for url: https://www.nationalreview.com/corner/429206/fda-drug-approval-process-risk-averse/


Processing URLs:  86%|████████▋ | 865/1000 [48:52<07:34,  3.37s/it]

URL filtered: https://www.youtube.com/watch?v=Zh7eAG2jJkA
Error extracting text from http://www.timesofisrael.com/state-prosecutor-hopes-netnyahu-corruption-case-wont-take-too-long/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/state-prosecutor-hopes-netnyahu-corruption-case-wont-take-too-long/


Processing URLs:  87%|████████▋ | 866/1000 [48:55<07:38,  3.42s/it]

Error extracting text from http://www.scmagazine.com/update-schumer-confirms-expected-indictment-of-iranian-hackers-for-ny-dam-breach/article/483351/: 404 Client Error: Not Found for url: https://www.scmagazine.com/news/update-schumer-confirms-expected-indictment-of-iranian-hackers-for-ny-dam-breach


Processing URLs:  87%|████████▋ | 867/1000 [48:56<05:53,  2.66s/it]

Error extracting text from https://www.nytimes.com/2021/02/25/us/politics/biden-syria-airstrike-iran.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/25/us/politics/biden-syria-airstrike-iran.html


Processing URLs:  87%|████████▋ | 869/1000 [49:03<06:45,  3.09s/it]

URL filtered: http://www.ctvnews.ca/mobile/sci-tech/facebook-to-launch-fake-news-education-tool-but-won-t-flag-content-1.3357474


Processing URLs:  88%|████████▊ | 880/1000 [49:28<04:03,  2.03s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-poland-kaczynski-idUSKCN0ZA24W?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-poland-kaczynski-idUSKCN0ZA24W?il=0


Processing URLs:  88%|████████▊ | 882/1000 [49:45<11:56,  6.07s/it]

Error extracting text from http://www.focus-fen.net/news/2015/12/14/392414/no-points-of-discord-in-the-ttip-negotiations.html: HTTPConnectionPool(host='www.focus-fen.net', port=80): Max retries exceeded with url: /news/2015/12/14/392414/no-points-of-discord-in-the-ttip-negotiations.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306e0c110>: Failed to resolve 'www.focus-fen.net' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.youtube.com/watch?v=jCPlYfAsQ_M


Processing URLs:  88%|████████▊ | 884/1000 [49:47<07:15,  3.75s/it]

Error extracting text from http://news.xinhuanet.com/english/china/2012-04/10/c_131518307.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/china/2012-04/10/c_131518307.htm


Processing URLs:  89%|████████▊ | 887/1000 [49:49<03:21,  1.79s/it]

Error extracting text from http://www.nytimes.com/2016/12/16/world/asia/united-nations-ban-ki-moon-south-korea-president.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/16/world/asia/united-nations-ban-ki-moon-south-korea-president.html?_r=0


Processing URLs:  89%|████████▉ | 888/1000 [49:53<04:41,  2.51s/it]

URL filtered: https://twitter.com/joepike/status/1341760441773678592?s=21


Processing URLs:  89%|████████▉ | 891/1000 [49:54<02:04,  1.14s/it]

Error extracting text from http://www.reuters.com/article/us-eu-summit-balkans-idUSKBN16G32S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-eu-summit-balkans-idUSKBN16G32S
Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2020/12/29/eu-uk-trade-and-cooperation-agreement-council-adopts-decision-on-the-signing/#: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2020/12/29/eu-uk-trade-and-cooperation-agreement-council-adopts-decision-on-the-signing/


Processing URLs:  89%|████████▉ | 892/1000 [49:54<01:36,  1.11it/s]

Error extracting text from https://cointelegraph.com/news/many-pieces-of-the-diem-puzzle-still-missing-as-launch-gets-delayed: 403 Client Error: Forbidden for url: https://cointelegraph.com/news/many-pieces-of-the-diem-puzzle-still-missing-as-launch-gets-delayed


Processing URLs:  89%|████████▉ | 894/1000 [49:58<02:17,  1.29s/it]

Error extracting text from http://www.wsj.com/articles/this-week-in-asian-insecurity-1470956129: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/this-week-in-asian-insecurity-1470956129


Processing URLs:  90%|█████████ | 900/1000 [50:05<01:35,  1.05it/s]

Error extracting text from http://www.nytimes.com/2016/03/25/world/europe/radovan-karadzic-verdict.html?ribbon-ad-idx=10&amp;rref=homepage&amp;module=ArrowsNav&amp;contentCollection=Middle%20East&amp;action=click&amp;region=FixedLeft&amp;pgtype=article: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/25/world/europe/radovan-karadzic-verdict.html?ribbon-ad-idx=10&amp;rref=homepage&amp;module=ArrowsNav&amp;contentCollection=Middle%20East&amp;action=click&amp;region=FixedLeft&amp;pgtype=article


Processing URLs:  90%|█████████ | 902/1000 [50:08<01:50,  1.13s/it]

Error extracting text from http://mnras.oxfordjournals.org/content/early/2016/11/01/mnras.stw2798.abstract: 403 Client Error: Forbidden for url: http://mnras.oxfordjournals.org/content/early/2016/11/01/mnras.stw2798.abstract


Processing URLs:  90%|█████████ | 903/1000 [50:08<01:30,  1.08it/s]

Error extracting text from http://www.carscoops.com/2015/10/denmarks-first-toyota-mirai-gets.html: 403 Client Error: Forbidden for url: https://www.carscoops.com:443/2015/10/denmarks-first-toyota-mirai-gets.html


Processing URLs:  91%|█████████ | 906/1000 [50:18<03:48,  2.43s/it]

Error extracting text from https://www.gjopen.com/comments/1325867).: 404 Client Error: Not Found for url: https://www.gjopen.com/comments/1325867).


Processing URLs:  91%|█████████ | 907/1000 [50:19<03:16,  2.11s/it]

URL filtered: https://www.reuters.com/technology/facebook-backed-crypto-project-diem-launch-us-stablecoin-major-shift-2021-05-12/


Processing URLs:  91%|█████████ | 910/1000 [50:21<01:48,  1.21s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-un-russia-idUSKBN16324L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-un-russia-idUSKBN16324L


Processing URLs:  92%|█████████▏| 916/1000 [51:12<11:26,  8.17s/it]

Error extracting text from http://www.wantchinatimes.com/news-subclass-cnt.aspx?id=20150725000101&amp;cid=1101: 522 Server Error:  for url: http://www.wantchinatimes.com/news-subclass-cnt.aspx?id=20150725000101&amp;cid=1101
Error extracting text from http://www.reuters.com/article/us-northkorea-missile-usa-idUSKCN0V61TL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-usa-idUSKCN0V61TL


Processing URLs:  92%|█████████▏| 917/1000 [51:13<08:05,  5.85s/it]

Error extracting text from http://www.wsj.com/articles/suspect-charged-with-murder-of-jo-cox-appears-in-court-1466243768: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/suspect-charged-with-murder-of-jo-cox-appears-in-court-1466243768


Processing URLs:  92%|█████████▏| 919/1000 [51:14<04:26,  3.28s/it]

Error extracting text from http://www.nytimes.com/2016/03/15/us/politics/allies-say-obamas-court-pick-is-near-and-will-be-hard-for-republicans-to-ignore.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/15/us/politics/allies-say-obamas-court-pick-is-near-and-will-be-hard-for-republicans-to-ignore.html


Processing URLs:  92%|█████████▏| 920/1000 [51:16<03:41,  2.77s/it]

URL filtered: https://www.justsecurity.org/45135/facebook-crack-trump-russia-case/


Processing URLs:  92%|█████████▏| 923/1000 [51:36<07:28,  5.83s/it]

Error extracting text from http://www.investopedia.com/terms/a/alpha.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/terms/a/alpha.asp


Processing URLs:  93%|█████████▎| 928/1000 [51:50<04:21,  3.63s/it]

Error extracting text from http://www.nytimes.com/2015/10/26/opinion/jimmy-carter-a-five-nation-plan-to-end-the-syrian-crisis.html?action=click&amp;pgtype=Homepage&amp;module=opinion-c-col-right-region&amp;region=opinion-c-col-right-region&amp;WT.nav=opinion-c-col-right-region: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/26/opinion/jimmy-carter-a-five-nation-plan-to-end-the-syrian-crisis.html?action=click&amp;pgtype=Homepage&amp;module=opinion-c-col-right-region&amp;region=opinion-c-col-right-region&amp;WT.nav=opinion-c-col-right-region


Processing URLs:  93%|█████████▎| 929/1000 [51:52<03:31,  2.98s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0ZX0XZ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0ZX0XZ?il=0


Processing URLs:  93%|█████████▎| 931/1000 [51:54<02:17,  2.00s/it]

Error extracting text from https://www.aljazeera.com/news/2021/11/23/israel-signals-readiness-to-escalate-iran-confrontation;: 404 Client Error: Not Found for url: https://www.aljazeera.com/news/2021/11/23/israel-signals-readiness-to-escalate-iran-confrontation;


Processing URLs:  94%|█████████▎| 935/1000 [52:11<04:10,  3.86s/it]

Error extracting text from http://atimes.com/2016/02/obama-wont-take-no-for-an-answer-from-putin/: 404 Client Error: Not Found for url: https://atimes.com/2016/02/obama-wont-take-no-for-an-answer-from-putin/


Processing URLs:  94%|█████████▍| 938/1000 [52:17<02:38,  2.55s/it]

Error extracting text from http://predictwise.com/politics/2016-president-primaries: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016-president-primaries
Error extracting text from https://www.reuters.com/article/russia-nato-drills/russia-says-it-will-join-drills-with-nato-member-ships-off-pakistan-idUSKBN28K1K7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/russia-nato-drills/russia-says-it-will-join-drills-with-nato-member-ships-off-pakistan-idUSKBN28K1K7


Processing URLs:  94%|█████████▍| 943/1000 [52:28<02:04,  2.18s/it]

Error extracting text from http://www.reuters.com/article/us-britain-eu-juncker-idUSKBN16P0PF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-juncker-idUSKBN16P0PF
Error extracting text from https://www.reuters.com/business/energy/oil-extends-gains-us-crude-inventory-draw-points-strong-demand-2021-10-21/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/oil-extends-gains-us-crude-inventory-draw-points-strong-demand-2021-10-21/


Processing URLs:  94%|█████████▍| 945/1000 [52:32<01:44,  1.90s/it]

Error extracting text from https://beincrypto.com/ecb-seeks-veto-power-over-emerging-stablecoins-in-europe/: 403 Client Error: Forbidden for url: https://beincrypto.com/ecb-seeks-veto-power-over-emerging-stablecoins-in-europe/


Processing URLs:  95%|█████████▍| 949/1000 [52:39<01:36,  1.89s/it]

Error extracting text from http://www.amazon.com/Countdown-Zero-Day-Stuxnet-Digital/dp/0770436196/ref=sr_1_1: 500 Server Error:  for url: https://www.amazon.com/Countdown-Zero-Day-Stuxnet-Digital/dp/0770436196/ref=sr_1_1


Processing URLs:  96%|█████████▌| 956/1000 [52:47<00:32,  1.37it/s]

Error extracting text from http://www.straitstimes.com/world/united-states/cleveland-on-edge: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))
Error extracting text from https://m.geo.tv/#category%7Clatest-news%7Cp117215-PM-says-CPEC-to-bring-economic-revolution-in-Pakistan: HTTPSConnectionPool(host='m.geo.tv', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fe74fa40>: Failed to resolve 'm.geo.tv' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from https://www.reuters.com/article/us-trade-nafta-canada/canada-pm-doesnt-think-trump-will-pull-u-s-out-of-nafta-idUSKBN1FK28E: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-trade-nafta-canada/canada-pm-doesnt-think-trump-will-pull-u-s-out-of-nafta-idUSKBN1FK28E


Processing URLs:  96%|█████████▌| 957/1000 [52:47<00:27,  1.59it/s]

Error extracting text from https://www.nytimes.com/2021/09/26/world/europe/keir-starmer.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/09/26/world/europe/keir-starmer.html


Processing URLs:  96%|█████████▌| 960/1000 [52:52<00:47,  1.19s/it]

Error extracting text from https://www.reuters.com/world/americas/brazils-bolsonaro-says-may-not-accept-2022-election-under-current-voting-system-2021-07-07/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/americas/brazils-bolsonaro-says-may-not-accept-2022-election-under-current-voting-system-2021-07-07/


Processing URLs:  96%|█████████▋| 964/1000 [52:59<00:47,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-iran-oil-exports-idUSKBN15B0QU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-oil-exports-idUSKBN15B0QU


Processing URLs:  96%|█████████▋| 965/1000 [53:00<00:43,  1.24s/it]

Error extracting text from http://iran-times.com/editor-says-leader-opposes-nuke-deal/: 406 Client Error: Not Acceptable for url: http://iran-times.com/editor-says-leader-opposes-nuke-deal/


Processing URLs:  97%|█████████▋| 966/1000 [53:00<00:32,  1.05it/s]

Error extracting text from http://www.kurdistan24.net/en/news/51448582-dd64-41bb-9f78-4e5b63acd978/Iraqi-official--Peshmerga-will-not-enter-Mosul: 403 Client Error: Forbidden for url: https://www.kurdistan24.net/en/news/51448582-dd64-41bb-9f78-4e5b63acd978/Iraqi-official--Peshmerga-will-not-enter-Mosul


Processing URLs:  97%|█████████▋| 968/1000 [53:02<00:30,  1.05it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/09/29/0200000000AEN20150929004952315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
Error extracting text from http://xheimlichkeit.com/methods/2015/10/12/how-to-update-probabilities.html: HTTPConnectionPool(host='xheimlichkeit.com', port=80): Max retries exceeded with url: /methods/2015/10/12/how-to-update-probabilities.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307f7d340>: Failed to resolve 'xheimlichkeit.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  97%|█████████▋| 971/1000 [53:08<00:39,  1.37s/it]

Error extracting text from http://www.jamestown.org/: HTTPConnectionPool(host='www.jamestown.org', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x306e0e9c0>: Failed to resolve 'www.jamestown.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  97%|█████████▋| 974/1000 [53:11<00:29,  1.15s/it]

Error extracting text from http://www.cdm.me/english/decision-on-inviting-montenegro-on-december-1st: 403 Client Error: Forbidden for url: https://www.cdm.me/english/decision-on-inviting-montenegro-on-december-1st


Processing URLs:  98%|█████████▊| 981/1000 [53:22<00:24,  1.27s/it]

Error extracting text from http://www.challengergray.com/press/press-releases/2018-november-ceo-report-147-ceos-out-ytd-24-percent: 403 Client Error: Forbidden for url: http://www.challengergray.com/press/press-releases/2018-november-ceo-report-147-ceos-out-ytd-24-percent


Processing URLs:  98%|█████████▊| 985/1000 [53:24<00:08,  1.70it/s]

Error extracting text from http://www.todayonline.com/world/islamic-state-digs-behind-mosul-moat-battle-city-looms?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s: 403 Client Error: Forbidden for url: https://www.todayonline.com/world/islamic-state-digs-behind-mosul-moat-battle-city-looms?cx_tag=similartd&amp;cid=tg:recos:similartd:standard#cxrecs_s
Error extracting text from https://www.reuters.com/article/us-usa-stocks-weekahead/tax-loss-selling-to-pressure-2017s-losers-in-december-idUSKBN1E22AZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-stocks-weekahead/tax-loss-selling-to-pressure-2017s-losers-in-december-idUSKBN1E22AZ
URL filtered: https://www.youtube.com/watch?v=cXLUXs8-Uf8


Processing URLs:  99%|█████████▊| 987/1000 [53:25<00:04,  2.85it/s]

Error extracting text from https://www.geekwire.com/2021/amazon-web-services-posts-record-13-5b-profits-2020-andy-jassys-aws-swan-song/: 403 Client Error: Forbidden for url: https://www.geekwire.com/2021/amazon-web-services-posts-record-13-5b-profits-2020-andy-jassys-aws-swan-song/


Processing URLs:  99%|█████████▉| 991/1000 [53:34<00:11,  1.33s/it]

Error extracting text from https://www.france24.com/en/live-news/20210926-north-macedonia-holds-first-high-stakes-census-first-in-20-years: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210926-north-macedonia-holds-first-high-stakes-census-first-in-20-years
Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/geral/impeachment-de-dilma-por-que-acho-que-um-ministro-do-supremo-tambem-tem-de-ser-impichado/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://veja.abril.com.br/blog/reinaldo/geral/impeachment-de-dilma-por-que-acho-que-um-ministro-do-supremo-tambem-tem-de-ser-impichado/&amp;prev=search


Processing URLs: 100%|██████████| 1000/1000 [53:46<00:00,  3.23s/it]
Processing URLs:   0%|          | 0/1000 [00:00<?, ?it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-france-idUSKCN11C1HR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-france-idUSKCN11C1HR


Processing URLs:   0%|          | 5/1000 [00:03<10:18,  1.61it/s]

Error extracting text from https://www.nytimes.com/2018/02/20/us/politics/trump-bump-stocks.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/20/us/politics/trump-bump-stocks.html


Processing URLs:   1%|          | 11/1000 [00:13<19:14,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-brazil-politics-idUSKCN0YW1EM: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-brazil-politics-idUSKCN0YW1EM
Error extracting text from https://www.reuters.com/world/europe/biden-expressed-concerns-about-nord-stream-2-merkel-2021-07-15/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/biden-expressed-concerns-about-nord-stream-2-merkel-2021-07-15/


Processing URLs:   2%|▏         | 15/1000 [00:17<15:37,  1.05it/s]

Error extracting text from http://www.foxnews.com/us/2015/12/15/no-regime-change-in-syria-after-talks-in-moscow-kerry-accepts-russian-stance-on/: 404 Client Error: Not Found for url: https://www.foxnews.com/us/2015/12/15/no-regime-change-in-syria-after-talks-in-moscow-kerry-accepts-russian-stance-on/
URL filtered: http://www.bloomberg.com/news/articles/2015-11-05/brazil-government-said-to-see-fragile-anti-impeachment-majority
Error extracting text from http://www.reuters.com/article/us-usa-trump-pipeline-legal-idUSKBN15901J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-pipeline-legal-idUSKBN15901J


Processing URLs:   2%|▏         | 18/1000 [00:36<1:02:21,  3.81s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/02/18/world/asia/ap-as-australia-new-zealand-asylum-seekers.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/02/18/world/asia/ap-as-australia-new-zealand-asylum-seekers.html


Processing URLs:   2%|▏         | 19/1000 [00:37<48:50,  2.99s/it]  

URL filtered: http://www.theverge.com/2017/1/19/14314680/germany-fake-news-facebook-russia-election-merkel


Processing URLs:   2%|▏         | 22/1000 [00:37<23:02,  1.41s/it]

Error extracting text from https://www.wsj.com/articles/gops-proposed-tax-changes-are-no-match-for-status-quo-1496055605: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/gops-proposed-tax-changes-are-no-match-for-status-quo-1496055605


Processing URLs:   2%|▏         | 23/1000 [00:38<20:53,  1.28s/it]

Error extracting text from http://thehill.com/policy/finance/255043-exiting-boehner-could-bode-well-for-ex-im: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/255043-exiting-boehner-could-bode-well-for-ex-im/


Processing URLs:   2%|▏         | 24/1000 [00:38<16:56,  1.04s/it]

Error extracting text from http://www.wsj.com/articles/volkswagen-may-not-face-environmental-criminal-charges-1443567204: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/volkswagen-may-not-face-environmental-criminal-charges-1443567204


Processing URLs:   2%|▎         | 25/1000 [00:38<13:39,  1.19it/s]

Error extracting text from https://www.france24.com/en/live-news/20210806-india-china-pull-back-from-part-of-contested-himalayan-border: 403 Client Error: Forbidden for url: https://www.france24.com/en/live-news/20210806-india-china-pull-back-from-part-of-contested-himalayan-border


Processing URLs:   3%|▎         | 29/1000 [00:50<48:55,  3.02s/it]

Error extracting text from http://www.polioeradication.org/Portals/0/Wild_poliovirus_list_2010-2015_22SEP.pdf: 404 Client Error: Not Found for url: https://polioeradication.org/Portals/0/Wild_poliovirus_list_2010-2015_22SEP.pdf


Processing URLs:   3%|▎         | 30/1000 [00:50<35:52,  2.22s/it]

Error extracting text from https://www.nytimes.com/2017/07/14/world/asia/back-in-afghan-hot-spot-us-marines-chase-diminished-goals.html?smid=fb-nytimes&amp;smtyp=cur: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/14/world/asia/back-in-afghan-hot-spot-us-marines-chase-diminished-goals.html?smid=fb-nytimes&amp;smtyp=cur


Processing URLs:   3%|▎         | 32/1000 [00:52<22:44,  1.41s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-oil-idUSKCN11Y1UA: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-oil-idUSKCN11Y1UA


Processing URLs:   4%|▍         | 41/1000 [01:01<13:03,  1.22it/s]

Error extracting text from https://medium.com/waymo/a-first-look-at-our-waymo-fully-self-driving-chrysler-pacifica-hybrid-minivans-5677e5e67750#.ut3pgxaj5: 403 Client Error: Forbidden for url: https://medium.com/waymo/a-first-look-at-our-waymo-fully-self-driving-chrysler-pacifica-hybrid-minivans-5677e5e67750#.ut3pgxaj5
URL filtered: https://www.bloomberg.com/news/articles/2017-02-22/exxon-takes-historic-cut-to-oil-reserves-amid-crude-market-rout
Error extracting text from http://bigstory.ap.org/urn:publicid:ap.org:c33cfe0452e44ba4806a0e5dd5781933: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /urn:publicid:ap.org:c33cfe0452e44ba4806a0e5dd5781933 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304dd75f0>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 50/1000 [01:26<44:09,  2.79s/it]

URL filtered: https://www.youtube.com/watch?v=TeU6cp-irbY


Processing URLs:   6%|▌         | 56/1000 [01:31<18:50,  1.20s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/07/13/asia-pacific/china-blames-philippines-stirring-trouble-inherent-territory/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/07/13/asia-pacific/china-blames-philippines-stirring-trouble-inherent-territory/


Processing URLs:   6%|▌         | 57/1000 [01:31<16:41,  1.06s/it]

Error extracting text from http://africanarguments.org/2017/07/12/kenyatta-vs-odinga-a-study-in-contrasts/: 403 Client Error: Forbidden for url: http://africanarguments.org/2017/07/12/kenyatta-vs-odinga-a-study-in-contrasts/


Processing URLs:   6%|▋         | 64/1000 [01:42<22:32,  1.44s/it]

Error extracting text from http://www.debatingeurope.eu/2016/07/11/best-alternative-eu-membership/: 404 Client Error: Not Found for url: https://debatingeurope.eu/2016/07/11/best-alternative-eu-membership/
Error extracting text from http://www.reuters.com/article/us-britain-eu-business-idUSKBN1AQ1TT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-business-idUSKBN1AQ1TT


Processing URLs:   6%|▋         | 65/1000 [01:44<20:32,  1.32s/it]

URL filtered: https://twitter.com/ashishkjha/status/1343768397084053505


Processing URLs:   7%|▋         | 68/1000 [02:46<4:06:04, 15.84s/it]

Error extracting text from http://aa.com.tr/en/asia-pacific/myanmar-troops-jailed-for-civilian-deaths-in-rare-move/646922: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:   7%|▋         | 69/1000 [02:48<3:10:13, 12.26s/it]

Error extracting text from http://atimes.com/2016/02/china-plans-s-china-sea-military-buildup-after-us-warship-makes-second-pass-near-island/: 404 Client Error: Not Found for url: https://atimes.com/2016/02/china-plans-s-china-sea-military-buildup-after-us-warship-makes-second-pass-near-island/


Processing URLs:   7%|▋         | 73/1000 [02:57<1:17:31,  5.02s/it]

Error extracting text from http://www.elmundo.com.ve/noticias/economia/politicas-publicas/sidor-continua-sin-producir-acero-liquido-desde-en.aspx#ixzz47m2NFhKK: HTTPConnectionPool(host='www.elmundo.com.ve', port=80): Max retries exceeded with url: /noticias/economia/politicas-publicas/sidor-continua-sin-producir-acero-liquido-desde-en.aspx (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307f7d3a0>: Failed to resolve 'www.elmundo.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   7%|▋         | 74/1000 [03:00<1:07:09,  4.35s/it]

Error extracting text from http://iran.usembassy.gov/: HTTPConnectionPool(host='iran.usembassy.gov', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307c815e0>: Failed to resolve 'iran.usembassy.gov' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   8%|▊         | 79/1000 [03:16<41:17,  2.69s/it]  

Error extracting text from http://uk.reuters.com/article/uk-usa-somalia-military/u-s-carries-out-first-strikes-against-islamic-state-in-somalia-idUKKBN1D323D: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk
Error extracting text from http://www.nytimes.com/aponline/2016/02/20/world/europe/ap-eu-britain-europe.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/02/20/world/europe/ap-eu-britain-europe.html


Processing URLs:   8%|▊         | 80/1000 [03:17<34:06,  2.22s/it]

Error extracting text from http://www.fao.org/news/story/en/item/1041473/icode/: 404 Client Error: Not Found for url: https://www.fao.org/news/story/en/item/1041473/icode/


Processing URLs:   8%|▊         | 82/1000 [03:22<35:05,  2.29s/it]

URL filtered: https://www.youtube.com/watch?v=CclLCEHW4ZE


Processing URLs:   9%|▊         | 87/1000 [03:26<16:56,  1.11s/it]

Error extracting text from http://thehill.com/homenews/campaign/261252-gop-in-panic-over-trump: 403 Client Error: Forbidden for url: https://thehill.com/homenews/campaign/261252-gop-in-panic-over-trump/


Processing URLs:   9%|▉         | 88/1000 [03:28<20:45,  1.37s/it]

Error extracting text from https://www.dom.com/nuclear: 403 Client Error: Forbidden for url: https://www.dominionenergy.com/nuclear


Processing URLs:   9%|▉         | 90/1000 [03:29<17:08,  1.13s/it]

Error extracting text from https://cleantechnica.com/2016/07/14/nextev-co-president-electric-supercar-revealed-later-2016/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2016/07/14/nextev-co-president-electric-supercar-revealed-later-2016/


Processing URLs:   9%|▉         | 91/1000 [03:30<17:06,  1.13s/it]

Error extracting text from https://www.legislation.gov.uk/ukpga/2020/7/schedule/22.: 404 Client Error: Not Found for url: https://www.legislation.gov.uk/ukpga/2020/7/schedule/22.


Processing URLs:   9%|▉         | 92/1000 [03:32<18:26,  1.22s/it]

Error extracting text from http://atimes.com/2015/09/japan-accuses-china-over-gas-flares-on-border/: 404 Client Error: Not Found for url: https://atimes.com/2015/09/japan-accuses-china-over-gas-flares-on-border/
Error extracting text from https://www.reuters.com/article/us-russia-nato/russia-gears-up-for-major-war-games-neighbors-watch-with-unease-idUSKCN1BB1KI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-nato/russia-gears-up-for-major-war-games-neighbors-watch-with-unease-idUSKCN1BB1KI


Processing URLs:  10%|▉         | 95/1000 [03:32<08:39,  1.74it/s]

Error extracting text from http://www.reuters.com/article/us-timewarner-att-fcc-idUSKBN166297: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-timewarner-att-fcc-idUSKBN166297
Error extracting text from http://www.washingtontimes.com/news/2016/jan/23/chuck-grassley-attends-donald-trump-rally-iowa/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2016/jan/23/chuck-grassley-attends-donald-trump-rally-iowa/


Processing URLs:  10%|▉         | 97/1000 [03:36<16:12,  1.08s/it]

URL filtered: https://www.prnewswire.com/news-releases/twitter-confirms-receipt-of-unsolicited-non-binding-proposal-from-elon-musk-301525749.html


Processing URLs:  10%|▉         | 99/1000 [03:37<15:12,  1.01s/it]

Error extracting text from http://en.abna24.com/service/pictorial/archive/2016/05/16/754470/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/pictorial/archive/2016/05/16/754470/story.html


Processing URLs:  10%|█         | 102/1000 [03:46<28:52,  1.93s/it]

Error extracting text from http://www.debka.com/article/25749/Iran%E2%80%99s-Tel-Afar-op-is-in-sync-with-Russia-in-Syria: HTTPSConnectionPool(host='www.debka.com', port=443): Max retries exceeded with url: /article/25749/Iran%E2%80%99s-Tel-Afar-op-is-in-sync-with-Russia-in-Syria (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))


Processing URLs:  10%|█         | 103/1000 [03:46<21:50,  1.46s/it]

Error extracting text from http://www.wsj.com/articles/indonesian-warship-fires-on-foreign-fishing-boats-in-south-china-sea-1466350384: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/indonesian-warship-fires-on-foreign-fishing-boats-in-south-china-sea-1466350384


Processing URLs:  10%|█         | 104/1000 [03:48<22:01,  1.48s/it]

URL filtered: https://fullfact.org/blog/2016/dec/facebook-announces-plan-fight-fake-news-factcheckers/


Processing URLs:  11%|█         | 106/1000 [03:51<22:40,  1.52s/it]

Error extracting text from http://english.chinamil.com.cn/news-channels/china-military-news/2015-09/17/content_6686306.htm: 404 Client Error: Not Found for url: http://eng.chinamil.com.cn/news-channels/china-military-news/2015-09/17/content_6686306.htm


Processing URLs:  12%|█▏        | 117/1000 [04:10<21:46,  1.48s/it]

Error extracting text from http://www.nytimes.com/2015/08/27/world/middleeast/tehran-asks-us-to-release-19-iranian-citizens.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/08/27/world/middleeast/tehran-asks-us-to-release-19-iranian-citizens.html


Processing URLs:  12%|█▏        | 119/1000 [04:13<20:34,  1.40s/it]

Error extracting text from http://www.crisis.acleddata.com/local-violence-monitoring/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /local-violence-monitoring/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb7320>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://en.farsnews.com/newstext.aspx?nn=13950307000475: HTTPConnectionPool(host='en.farsnews.com', port=80): Max retries exceeded with url: /newstext.aspx?nn=13950307000475 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x301865fa0>: Failed to resolve 'en.farsnews.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  12%|█▏        | 123/1000 [04:14<10:02,  1.46it/s]

Error extracting text from https://www.nytimes.com/2017/08/07/world/asia/north-korea-responds-sanctions-united-states.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/07/world/asia/north-korea-responds-sanctions-united-states.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news&amp;_r=0


Processing URLs:  12%|█▎        | 125/1000 [04:17<13:33,  1.08it/s]

Error extracting text from https://allafrica.com/stories/202107130131.html: HTTPSConnectionPool(host='allafrica.com', port=443): Max retries exceeded with url: /stories/202107130131.html (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2fedb7a40>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  13%|█▎        | 127/1000 [04:18<10:36,  1.37it/s]

Error extracting text from https://www.us-cert.gov/ncas/alerts/TA17-318B: 403 Client Error: Forbidden for url: https://www.us-cert.gov/ncas/alerts/TA17-318B


Processing URLs:  13%|█▎        | 133/1000 [04:26<15:58,  1.11s/it]

URL filtered: https://www.youtube.com/watch?v=aNUr__-VZeQ


Processing URLs:  14%|█▎        | 137/1000 [04:30<16:54,  1.18s/it]

Error extracting text from https://www.sec.gov/news/public-statement/statement-clayton-2017-12-11: 403 Client Error: Forbidden for url: https://www.sec.gov/news/public-statement/statement-clayton-2017-12-11


Processing URLs:  14%|█▍        | 139/1000 [04:33<15:43,  1.10s/it]

Error extracting text from http://www.businessinsider.com/russia-is-seizing-the-initiative-in-syria-as-the-us-scrambles-for-answers-2015-9: 404 Client Error: Not Found for url: https://www.businessinsider.com/russia-is-seizing-the-initiative-in-syria-as-the-us-scrambles-for-answers-2015-9


Processing URLs:  14%|█▍        | 141/1000 [04:37<21:47,  1.52s/it]

Error extracting text from https://goo.gl/A1I3dw: HTTPSConnectionPool(host='unmannedcargoaircraftconference.com', port=443): Max retries exceeded with url: /speaker/uas-program-transport-unicef-innovation-presented-sara-de-la-rosa-unicef-innovation/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3051d88c0>: Failed to resolve 'unmannedcargoaircraftconference.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  14%|█▍        | 145/1000 [04:43<20:34,  1.44s/it]

Error extracting text from http://www.sowetanlive.co.za/news/2016/06/01/zuma-alone-must-pay---those-who-raise-funds-will-be-on-the-wrong-side-of-the-law: 404 Client Error: Not Found for url: https://www.sowetanlive.co.za/news/2016/06/01/zuma-alone-must-pay---those-who-raise-funds-will-be-on-the-wrong-side-of-the-law


Processing URLs:  15%|█▍        | 147/1000 [04:44<13:16,  1.07it/s]

Error extracting text from https://www.axios.com/steel-prices-hit-lowest-level-year-922ee741-78fd-4b5f-9c7c-89ecbdf2b05b.html: 403 Client Error: Forbidden for url: https://www.axios.com/steel-prices-hit-lowest-level-year-922ee741-78fd-4b5f-9c7c-89ecbdf2b05b.html
Error extracting text from http://www.reuters.com/article/2015/11/30/us-eurozone-greece-debt-idUSKBN0TJ1K720151130#eumIewRXJtAVJGbi.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-eurozone-greece-debt-idUSKBN0TJ1K720151130#eumIewRXJtAVJGbi.97


Processing URLs:  15%|█▌        | 152/1000 [04:56<32:43,  2.32s/it]

Error extracting text from https://www.watson.ch/!688925062?utm_source=whatsapp&amp;utm_medium=social-user&amp;utm_campaign=watson-app-ios: 404 Client Error: Not Found for url: https://www.watson.ch/!688925062?utm_source=whatsapp&amp;utm_medium=social-user&amp;utm_campaign=watson-app-ios
URL filtered: https://www.facebook.com/BlazingFastIO/posts/553624194844018


Processing URLs:  16%|█▌        | 156/1000 [05:00<21:17,  1.51s/it]

URL filtered: https://www.youtube.com/watch?v=Vb8jGHhchnE


Processing URLs:  16%|█▌        | 159/1000 [05:02<14:39,  1.05s/it]

Error extracting text from http://www.nytimes.com/2015/11/18/world/europe/john-kerry-france-isis.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/11/18/world/europe/john-kerry-france-isis.html


Processing URLs:  16%|█▌        | 162/1000 [05:07<18:00,  1.29s/it]

Error extracting text from https://www.google.ca/amp/www.iraqinews.com/iraq-war/bomb-blast-in-mosul-kills-six-isis-members/amp/?client=ms-android-rogers-ca: 404 Client Error: Not Found for url: http://www.iraqinews.com/iraq-war/bomb-blast-in-mosul-kills-six-isis-members/amp/


Processing URLs:  17%|█▋        | 168/1000 [06:19<4:25:36, 19.15s/it]

Error extracting text from http://www.miamiherald.com/news/business/international-business/article48346215.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  17%|█▋        | 170/1000 [06:21<2:15:47,  9.82s/it]

Error extracting text from https://www.reuters.com/article/us-nigeria-security-disease/cholera-hits-camp-for-displaced-in-northeast-nigeria-idUSKCN1BB254: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security-disease/cholera-hits-camp-for-displaced-in-northeast-nigeria-idUSKCN1BB254


Processing URLs:  17%|█▋        | 173/1000 [06:35<1:27:23,  6.34s/it]

Error extracting text from http://survincity.com/2013/09/first-stinger/: 404 Client Error: Not Found for url: https://survincity.com/2013/09/first-stinger/


Processing URLs:  18%|█▊        | 175/1000 [06:38<53:34,  3.90s/it]  

Error extracting text from http://abcnews.go.com/US/wireStory/water-officials-vote-extend-california-drought-emergency-36682284: 404 Client Error: Not Found for url: https://abcnews.go.com/US/wireStory/water-officials-vote-extend-california-drought-emergency-36682284


Processing URLs:  18%|█▊        | 177/1000 [06:39<30:45,  2.24s/it]

Error extracting text from http://www.nytimes.com/2016/02/06/world/middleeast/syria-aleppo.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/06/world/middleeast/syria-aleppo.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news
Error extracting text from https://www.hindustantimes.com/india-news/csiriict-ties-up-with-suven-pharma-for-new-anti-covid-drug-molnupiravir-101623385235875.html: 401 Client Error: Unauthorized for url: https://www.hindustantimes.com/india-news/csiriict-ties-up-with-suven-pharma-for-new-anti-covid-drug-molnupiravir-101623385235875.html


Processing URLs:  18%|█▊        | 180/1000 [06:41<15:56,  1.17s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-nuclear/south-korea-eyes-bigger-warheads-north-korean-icbm-reportedly-on-the-move-idUSKCN1BD0VW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-nuclear/south-korea-eyes-bigger-warheads-north-korean-icbm-reportedly-on-the-move-idUSKCN1BD0VW


Processing URLs:  18%|█▊        | 182/1000 [06:46<22:10,  1.63s/it]

Error extracting text from https://www.reuters.com/article/us-palestinians-reconciliation/palestinian-prime-minister-visits-gaza-in-move-to-reconcile-with-hamas-idUSKCN1C710M: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-palestinians-reconciliation/palestinian-prime-minister-visits-gaza-in-move-to-reconcile-with-hamas-idUSKCN1C710M


Processing URLs:  19%|█▉        | 189/1000 [06:54<15:12,  1.12s/it]

Error extracting text from http://pressroom.toyota.com/releases/april-2016-sales-chart.htm: 403 Client Error: Forbidden for url: http://pressroom.toyota.com/april-2016-sales-chart/


Processing URLs:  19%|█▉        | 190/1000 [06:55<14:06,  1.04s/it]

Error extracting text from http://uk.reuters.com/article/2015/10/29/uk-mideast-crisis-syria-talks-idUKKCN0SN27S20151029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  19%|█▉        | 193/1000 [07:00<17:10,  1.28s/it]

Error extracting text from https://www.us-cert.gov/sites/default/files/publications/JAR_16-20296A_GRIZZLY%20STEPPE-2016-1229.pdf: 403 Client Error: Forbidden for url: https://www.us-cert.gov/sites/default/files/publications/JAR_16-20296A_GRIZZLY%20STEPPE-2016-1229.pdf


Processing URLs:  20%|██        | 205/1000 [07:16<16:22,  1.24s/it]

Error extracting text from http://www.latimes.com/world/la-fg-colombia-peace-accord-20160902-snap-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/la-fg-colombia-peace-accord-20160902-snap-story.html


Processing URLs:  21%|██        | 212/1000 [07:23<11:02,  1.19it/s]

Error extracting text from https://www.unicef.org/about/execboard/index_48248.html: 403 Client Error: Forbidden for url: https://www.unicef.org/about/execboard/index_48248.html


Processing URLs:  21%|██▏       | 214/1000 [08:25<4:07:23, 18.88s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-11-04/how-russian-hackers-pried-into-clinton-campaign-emails: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  22%|██▏       | 217/1000 [08:28<1:31:35,  7.02s/it]

Error extracting text from https://thehill.com/homenews/senate/585192-senate-passes-bill-to-avoid-filibuster-on-debt-ceiling-hike: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/585192-senate-passes-bill-to-avoid-filibuster-on-debt-ceiling-hike/


Processing URLs:  22%|██▏       | 219/1000 [08:32<58:17,  4.48s/it]  

Error extracting text from http://www.rusaviainsider.com/turkey-urges-open-skies-deal-russia/: 403 Client Error: Forbidden for url: https://www.rusaviainsider.com/turkey-urges-open-skies-deal-russia/


Processing URLs:  22%|██▏       | 220/1000 [08:34<50:56,  3.92s/it]

Error extracting text from http://www.dailystar.com.lb/News/Middle-East/2012/Jul-01/178928-britain-considered-knighthood-for-syrias-assad-in-2002-report.ashx: 404 Client Error: Not Found for url: https://dailystar.com.lb/News/Middle-East/2012/Jul-01/178928-britain-considered-knighthood-for-syrias-assad-in-2002-report.ashx


Processing URLs:  22%|██▏       | 223/1000 [08:36<22:03,  1.70s/it]

Error extracting text from http://www.historyofvaccines.org/kennedy-vaccine-panel-trump: 404 Client Error: Not Found for url: https://historyofvaccines.org/kennedy-vaccine-panel-trump
Error extracting text from http://www.reuters.com/article/us-britain-election-scotland-idUSKBN18I2SD: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-election-scotland-idUSKBN18I2SD


Processing URLs:  22%|██▎       | 225/1000 [08:38<17:57,  1.39s/it]

Error extracting text from http://uk.mobile.reuters.com/article/idUKKBN0TJ27X20151130?irpc=932: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUKKBN0TJ27X20151130?irpc=932


Processing URLs:  23%|██▎       | 231/1000 [08:52<27:47,  2.17s/it]

Error extracting text from http://laautoshow.com/debut-vehicles/: 404 Client Error: Not Found for url: https://laautoshow.com/debut-vehicles/


Processing URLs:  23%|██▎       | 234/1000 [08:56<20:05,  1.57s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0RB0W120150911: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0RB0W120150911


Processing URLs:  24%|██▍       | 238/1000 [09:01<14:52,  1.17s/it]

Error extracting text from http://www.nytimes.com/2016/03/25/world/americas/dilma-rousseff-president-of-brazil-resists-calls-for-her-resignation.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/25/world/americas/dilma-rousseff-president-of-brazil-resists-calls-for-her-resignation.html
Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_democratic_presidential_primary-3351.html#polls: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/nh/new_hampshire_democratic_presidential_primary-3351.html#polls


Processing URLs:  24%|██▍       | 242/1000 [09:09<18:36,  1.47s/it]

Error extracting text from http://www.israelhayom.com/site/newsletter_article.php?id=33745: 403 Client Error: Forbidden for url: https://www.israelhayom.com/site/newsletter_article.php?id=33745


Processing URLs:  25%|██▍       | 246/1000 [09:22<23:17,  1.85s/it]

Error extracting text from http://www.biznews.com/leadership/2016/08/05/anc-set-lose-control-pretoria-johannesburg: 404 Client Error: Not Found for url: https://www.biznews.com/leadership/2016/08/05/anc-set-lose-control-pretoria-johannesburg
Error extracting text from http://www.reuters.com/article/us-usa-iran-navy-idUSKBN16D1X3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-iran-navy-idUSKBN16D1X3


Processing URLs:  25%|██▍       | 248/1000 [09:23<15:10,  1.21s/it]

URL filtered: https://www.youtube.com/watch?v=UTMxfAkxfQ0


Processing URLs:  25%|██▌       | 251/1000 [09:26<12:00,  1.04it/s]

Error extracting text from http://www.straitstimes.com/asia/se-asia/china-to-work-with-others-to-finalise-16-nation-trade-pact-early-xi: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  25%|██▌       | 252/1000 [09:26<11:12,  1.11it/s]

URL filtered: https://www.facebook.com/profile.php?id=100010204140464
URL filtered: https://www.youtube.com/watch?v=InJXAw-MKs4
URL filtered: https://www.bloomberg.com/news/articles/2017-02-15/aetna-ceo-says-obamacare-in-a-death-spiral-with-sick-customers


Processing URLs:  26%|██▌       | 257/1000 [09:28<06:24,  1.93it/s]

Error extracting text from https://www.atr.org/soda-tax-pops-around-world: 403 Client Error: Forbidden for url: https://www.atr.org/soda-tax-pops-around-world


Processing URLs:  26%|██▌       | 259/1000 [09:30<09:29,  1.30it/s]

Error extracting text from http://www.volkswagenag.com/content/vwcorp/content/de/investor_relations/annual_general_meeting.html: 404 Client Error: Not Found for url: https://www.volkswagen-group.com/content/vwcorp/content/de/investor_relations/annual_general_meeting.html


Processing URLs:  26%|██▌       | 260/1000 [09:32<13:35,  1.10s/it]

Error extracting text from http://elcomercio.pe/politica/elecciones/simulacro-ipsos-keiko-fujimori-441-y-ppk-438-segunda-vuelta-noticia-1901609: 404 Client Error: Not Found for url: https://elcomercio.pe/politica/elecciones/simulacro-ipsos-keiko-fujimori-441-y-ppk-438-segunda-vuelta-noticia-1901609/


Processing URLs:  26%|██▌       | 261/1000 [09:32<10:43,  1.15it/s]

Error extracting text from https://www.wsj.com/articles/china-asean-back-framework-for-south-china-sea-talks-1502023247: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-asean-back-framework-for-south-china-sea-talks-1502023247


Processing URLs:  26%|██▌       | 262/1000 [09:36<21:45,  1.77s/it]

Error extracting text from http://vestnikkavkaza.net/news/Jean-Claude-Juncker-calls-to-convene-summit-on-refugees-with-Turkey.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/Jean-Claude-Juncker-calls-to-convene-summit-on-refugees-with-Turkey.html


Processing URLs:  26%|██▋       | 263/1000 [09:37<18:38,  1.52s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-NW84YU6KLVS401-3I8MUNB66A0Q43DFAABCL0FTEG


Processing URLs:  27%|██▋       | 267/1000 [09:45<26:54,  2.20s/it]

Error extracting text from http://www.parl.gc.ca/About/House/Compendium/web-content/c_g_legislativeprocess-e.htm: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))


Processing URLs:  27%|██▋       | 271/1000 [10:01<35:38,  2.93s/it]

Error extracting text from https://www.nytimes.com/2017/04/04/world/middleeast/syria-gas-attack.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/04/world/middleeast/syria-gas-attack.html?_r=0


Processing URLs:  27%|██▋       | 272/1000 [10:02<28:02,  2.31s/it]

URL filtered: https://www.youtube.com/watch?v=Hl49w2Zz_6-2017


Processing URLs:  28%|██▊       | 275/1000 [10:06<19:21,  1.60s/it]

Error extracting text from https://www.reuters.com/article/us-germany-election-lower-saxony/state-vote-unlikely-to-give-merkel-boost-in-german-coalition-talks-idUSKBN1CJ0T5: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-lower-saxony/state-vote-unlikely-to-give-merkel-boost-in-german-coalition-talks-idUSKBN1CJ0T5


Processing URLs:  28%|██▊       | 278/1000 [10:11<21:25,  1.78s/it]

Error extracting text from http://www.cfr.org/immigration/us-supreme-court-obamas-immigration-actions/p37630: 404 Client Error: Not Found for url: https://www.cfr.org/immigration/us-supreme-court-obamas-immigration-actions/p37630


Processing URLs:  28%|██▊       | 280/1000 [10:16<23:49,  1.99s/it]

URL filtered: https://twitter.com/vonderleyen/status/1497695982020145156


Processing URLs:  28%|██▊       | 283/1000 [10:17<12:13,  1.02s/it]

Error extracting text from http://www.steptoeinternationalcomplianceblog.com/2017/04/trump-administration-certifies-irans-compliance-with-nuclear-deal-but-initiates-review-of-sanctions-relief/: 403 Client Error: Forbidden for url: https://www.steptoeinternationalcomplianceblog.com/2017/04/trump-administration-certifies-irans-compliance-with-nuclear-deal-but-initiates-review-of-sanctions-relief/


Processing URLs:  29%|██▊       | 286/1000 [10:23<17:28,  1.47s/it]

Error extracting text from http://www.cdm.me/english/azerbaijanis-want-to-build-a-luxury-ski-resort-on-zabljak: 403 Client Error: Forbidden for url: https://www.cdm.me/english/azerbaijanis-want-to-build-a-luxury-ski-resort-on-zabljak


Processing URLs:  29%|██▉       | 288/1000 [10:28<21:17,  1.79s/it]

Error extracting text from https://www.iata.org/en/pressroom/2021-releases/2021-09-01-01/: 404 Client Error: Not Found for url: https://www.iata.org/en/pressroom/2021-releases/2021-09-01-01/


Processing URLs:  29%|██▉       | 291/1000 [10:31<13:42,  1.16s/it]

Error extracting text from http://post.understandingwar.org/sites/default/files/Iraq%20Blobby%20map%2025%20AUG%202016.pdf: HTTPConnectionPool(host='post.understandingwar.org', port=80): Max retries exceeded with url: /sites/default/files/Iraq%20Blobby%20map%2025%20AUG%202016.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3065cdd00>: Failed to resolve 'post.understandingwar.org' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.nytimes.com/2016/06/15/business/dealbook/the-global-stakes-of-a-saudi-ipo.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/15/business/dealbook/the-global-stakes-of-a-saudi-ipo.html?_r=0


Processing URLs:  29%|██▉       | 292/1000 [10:31<10:09,  1.16it/s]

Error extracting text from http://www.cdm.me/english/north-atlantic-council-we-are-working-on-montenegros-invitation: 403 Client Error: Forbidden for url: https://www.cdm.me/english/north-atlantic-council-we-are-working-on-montenegros-invitation


Processing URLs:  29%|██▉       | 293/1000 [11:00<1:49:07,  9.26s/it]

Error extracting text from https://www.dailyfx.com/forex/market_alert/2016/10/04/IMF-Cuts-Global-Growth-Projections-Amid-Woes-in-Advanced-Economies.html: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-31-january-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-31-january-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051db650>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  30%|██▉       | 296/1000 [11:02<49:41,  4.24s/it]  

URL filtered: https://twitter.com/elonmusk/status/1064741356080222209?lang=en


Processing URLs:  30%|███       | 301/1000 [11:06<19:08,  1.64s/it]

Error extracting text from https://cameronwebb.files.wordpress.com/2013/05/gollybardbatandmosquito.jpg?w=584: 404 Client Error: Not Found for url: https://cameronwebb.files.wordpress.com/2013/05/gollybardbatandmosquito.jpg?w=584


Processing URLs:  30%|███       | 303/1000 [11:11<25:55,  2.23s/it]

Error extracting text from http://www.rollcall.com/news/politics/poll-russ-feingold-leads-ron-johnson-senate: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/poll-russ-feingold-leads-ron-johnson-senate


Processing URLs:  31%|███       | 306/1000 [11:18<23:15,  2.01s/it]

Error extracting text from https://www.newsweek.com/john-hopkins-doctor-thinks-covid-will-largely-gone-april-half-us-has-herd-immunity-1570615: 403 Client Error: Forbidden for url: https://www.newsweek.com/john-hopkins-doctor-thinks-covid-will-largely-gone-april-half-us-has-herd-immunity-1570615


Processing URLs:  31%|███       | 307/1000 [11:20<24:50,  2.15s/it]

URL filtered: https://www.bloomberg.com/politics/articles/2017-04-26/trump-aides-in-raging-debate-over-how-quickly-to-move-on-nafta


Processing URLs:  31%|███       | 310/1000 [11:21<12:10,  1.06s/it]

Error extracting text from http://www.wsj.com/articles/syrian-government-plans-to-retake-aleppo-with-russian-support-1460296021: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syrian-government-plans-to-retake-aleppo-with-russian-support-1460296021


Processing URLs:  31%|███       | 312/1000 [11:24<13:12,  1.15s/it]

Error extracting text from https://freedomhouse.org/report/freedom-world/2016/dominican-republic: 404 Client Error: Not Found for url: https://freedomhouse.org/report/freedom-world/2016/dominican-republic


Processing URLs:  31%|███▏      | 313/1000 [11:24<10:21,  1.11it/s]

Error extracting text from https://news.yahoo.com/ethiopia-army-says-4-key-120308133.html?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAAD4ogkpRjs0JSUmDZtuFBOBWcuSE9hs1wpQ60djQyswoqnZKtrwrheJNlTRr3d5ayMAegTKm85aKnFqyIarpmplylCzK4AciV_Gb-wsi_PpEAQssWK9YgmycqMYYa67pXP9LScEyfjsx3ULxRv2TT0cg06pC1NP-KXe-6KR9bubw: 404 Client Error: Not Found for url: https://news.yahoo.com/ethiopia-army-says-4-key-120308133.html?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAAD4ogkpRjs0JSUmDZtuFBOBWcuSE9hs1wpQ60djQyswoqnZKtrwrheJNlTRr3d5ayMAegTKm85aKnFqyIarpmplylCzK4AciV_Gb-wsi_PpEAQssWK9YgmycqMYYa67pXP9LScEyfjsx3ULxRv2TT0cg06pC1NP-KXe-6KR9bubw


Processing URLs:  32%|███▏      | 316/1000 [11:26<07:16,  1.57it/s]

Error extracting text from http://www.reuters.com/article/us-health-birdflu-china-idUSKBN19X0CL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-health-birdflu-china-idUSKBN19X0CL


Processing URLs:  32%|███▏      | 318/1000 [11:27<07:50,  1.45it/s]

Error extracting text from http://thehill.com/policy/finance/262171-obama-signs-305b-highway-bill: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/262171-obama-signs-305b-highway-bill/


Processing URLs:  32%|███▏      | 323/1000 [11:33<08:35,  1.31it/s]

Error extracting text from https://www.consilium.europa.eu/en/meetings/european-council/2020/12/10-11/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/meetings/european-council/2020/12/10-11/
Error extracting text from http://www.reuters.com/article/us-usa-trump-highlights-idUSKBN15721X?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-highlights-idUSKBN15721X?il=0


Processing URLs:  33%|███▎      | 326/1000 [11:33<04:54,  2.29it/s]

Error extracting text from http://www.reuters.com/article/us-afghanistan-taliban-idUSKCN0XM0TN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-afghanistan-taliban-idUSKCN0XM0TN


Processing URLs:  33%|███▎      | 329/1000 [11:39<15:17,  1.37s/it]

Error extracting text from http://uk.reuters.com/article/uk-iran-satellite-un-idUKKBN1AI1UJ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  33%|███▎      | 332/1000 [11:44<16:31,  1.48s/it]

Error extracting text from http://www.latimes.com/nation/nationnow/la-na-dakota-pipeline-judge-ruling-20170307-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/nation/nationnow/la-na-dakota-pipeline-judge-ruling-20170307-story.html


Processing URLs:  33%|███▎      | 334/1000 [12:46<3:31:27, 19.05s/it]

Error extracting text from http://aa.com.tr/en/politics/-myanmar-marchers-demand-suu-kyi-does-not-become-president/528678: HTTPConnectionPool(host='aa.com.tr', port=80): Read timed out. (read timeout=60)


Processing URLs:  34%|███▎      | 336/1000 [12:48<1:48:48,  9.83s/it]

Error extracting text from http://www.nytimes.com/.../how-long-does-it-take-to-confirm-a-supreme-court-nominee.html: 404 Client Error: Not Found for url: https://archive.nytimes.com/www.nytimes.com/.../how-long-does-it-take-to-confirm-a-supreme-court-nominee.html


Processing URLs:  34%|███▍      | 338/1000 [13:51<4:18:31, 23.43s/it]

Error extracting text from http://www.usnews.com/news/articles/2016-04-13/how-to-protect-nuclear-plants-from-terrorists: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  34%|███▍      | 342/1000 [14:03<1:25:45,  7.82s/it]

Error extracting text from http://advanced.jhu.edu/about-us/faculty/sarah-miller-beebe/: 404 Client Error: Not Found for url: https://advanced.jhu.edu/about-us/faculty/sarah-miller-beebe/


Processing URLs:  34%|███▍      | 345/1000 [14:08<39:39,  3.63s/it]  

Error extracting text from https://www.nytimes.com/2021/04/10/world/europe/covid-russia-death.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/04/10/world/europe/covid-russia-death.html


Processing URLs:  35%|███▌      | 350/1000 [14:21<18:49,  1.74s/it]  

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-mueller-subpoenas-lobbying-firms-russia-investigation-20170825-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-mueller-subpoenas-lobbying-firms-russia-investigation-20170825-story.html
URL filtered: http://fivethirtyeight.com/features/does-donald-trump-have-a-ceiling/?ex_cid=538twitter
Error extracting text from http://www.balkaninsight.com/en/article/montenegro-eyes-nato-membership-invitation-11-30-2015: 403 Client Error: Forbidden for url: http://www.balkaninsight.com/en/article/montenegro-eyes-nato-membership-invitation-11-30-2015


Processing URLs:  35%|███▌      | 352/1000 [14:24<18:26,  1.71s/it]

Error extracting text from http://beltroadresearch.com/: HTTPConnectionPool(host='beltroadresearch.com', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff7179e0>: Failed to resolve 'beltroadresearch.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  36%|███▌      | 356/1000 [14:26<08:16,  1.30it/s]

Error extracting text from http://www.reuters.com/article/us-israel-palestinians-un-trump-idUSKBN14C23I: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-israel-palestinians-un-trump-idUSKBN14C23I
URL filtered: https://twitter.com/navalny
Error extracting text from http://splash247.com/panama-canal-expansion-deadline-pushed-back/: 403 Client Error: Forbidden for url: https://splash247.com/panama-canal-expansion-deadline-pushed-back/


Processing URLs:  36%|███▌      | 357/1000 [14:28<11:30,  1.07s/it]

Error extracting text from https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&amp;s=wcrfpus2: 404 Client Error: Not Found for url: https://www.eia.gov/dnav/GenericErrorPage.aspx?aspxerrorpath=/dnav/pet/hist/LeafHandler.ashx


Processing URLs:  36%|███▌      | 359/1000 [14:30<10:44,  1.01s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Abe-looks-to-follow-up-Putin-summit-with-security-talks: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Abe-looks-to-follow-up-Putin-summit-with-security-talks


Processing URLs:  36%|███▌      | 360/1000 [14:31<10:57,  1.03s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/italian-church-groups-open-refugee-humanitarian-corridors-36419007: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/italian-church-groups-open-refugee-humanitarian-corridors-36419007


Processing URLs:  37%|███▋      | 367/1000 [14:40<12:07,  1.15s/it]

Error extracting text from https://www.axios.com/us-iran-nuclear-deal-talks-raisi-inauguration-ed00555e-db84-4d26-9e2b-1ad8b21f9fc5.html: 403 Client Error: Forbidden for url: https://www.axios.com/us-iran-nuclear-deal-talks-raisi-inauguration-ed00555e-db84-4d26-9e2b-1ad8b21f9fc5.html


Processing URLs:  37%|███▋      | 368/1000 [14:43<16:37,  1.58s/it]

URL filtered: http://www.bloomberg.com/news/articles/2015-10-19/venezuela-foreign-reserves-fall-to-12-year-low-as-payments-loom


Processing URLs:  37%|███▋      | 370/1000 [14:45<15:05,  1.44s/it]

Error extracting text from https://amti.csis.org/prepare-stormy-2017-south-china-sea/: 403 Client Error: Forbidden for url: https://amti.csis.org/prepare-stormy-2017-south-china-sea/


Processing URLs:  37%|███▋      | 372/1000 [14:45<09:49,  1.07it/s]

Error extracting text from http://www.wsj.com/articles/south-african-rand-plunges-to-record-low-1452165505: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/south-african-rand-plunges-to-record-low-1452165505


Processing URLs:  37%|███▋      | 374/1000 [14:47<09:27,  1.10it/s]

Error extracting text from http://cleantechnica.com/2016/03/18/china-electric-car-sales-may-not-be-as-rosy-as-thought-lux-research-writes/: 403 Client Error: Forbidden for url: http://cleantechnica.com/2016/03/18/china-electric-car-sales-may-not-be-as-rosy-as-thought-lux-research-writes/


Processing URLs:  38%|███▊      | 377/1000 [15:01<41:10,  3.97s/it]

Error extracting text from https://www.washingtonpost.com/world/africa/burundi-grenade-attacks-target-army-hospital-in-the-capital/2016/03/24/4363aa7a-f1c4-11e5-a2a3-d4e9697917d1_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/africa/burundi-grenade-attacks-target-army-hospital-in-the-capital/2016/03/24/4363aa7a-f1c4-11e5-a2a3-d4e9697917d1_story.html
URL filtered: https://www.politico.com/news/2020/12/26/anti-facebook-agitators-biden-era-450347


Processing URLs:  38%|███▊      | 383/1000 [15:21<34:46,  3.38s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/rcep-negotiations-still/2283112.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/rcep-negotiations-still/2283112.html


Processing URLs:  38%|███▊      | 384/1000 [15:22<26:56,  2.62s/it]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Putin-s-draw-not-so-even: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Putin-s-draw-not-so-even


Processing URLs:  39%|███▊      | 386/1000 [15:24<17:43,  1.73s/it]

Error extracting text from http://www.nytimes.com/aponline/2016/03/24/world/africa/ap-af-burundi-violence.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/aponline/2016/03/24/world/africa/ap-af-burundi-violence.html?_r=0


Processing URLs:  39%|███▊      | 387/1000 [15:25<14:14,  1.39s/it]

Error extracting text from http://www.laht.com/article.asp?ArticleId=2426922&amp;CategoryId=10717: 404 Client Error: Not Found for url: http://www.laht.com/article.asp?ArticleId=2426922&amp;CategoryId=10717


Processing URLs:  39%|███▉      | 390/1000 [15:28<13:01,  1.28s/it]

Error extracting text from https://www.expouav.com/news/latest/perfect-union-us-regulations-trend-toward-coexistence-uavs-aircraft/: 403 Client Error: Forbidden for url: https://www.expouav.com/news/latest/perfect-union-us-regulations-trend-toward-coexistence-uavs-aircraft/


Processing URLs:  40%|███▉      | 395/1000 [15:46<33:31,  3.33s/it]

Error extracting text from https://deepmind.com/research/alphago/: 404 Client Error: Not Found for url: https://deepmind.google/research/alphago/


Processing URLs:  40%|███▉      | 398/1000 [15:51<18:23,  1.83s/it]

Error extracting text from http://www.sec.gov/rules/other/2015/investors-exchange-form-1.htm: 403 Client Error: Forbidden for url: http://www.sec.gov/rules/other/2015/investors-exchange-form-1.htm
Error extracting text from http://www.nato.int/docu/review/2007/issue2/english/art3.html: 403 Client Error: Forbidden for url: http://www.nato.int/docu/review/2007/issue2/english/art3.html


Processing URLs:  40%|████      | 403/1000 [15:57<12:29,  1.26s/it]

Error extracting text from http://www.nytimes.com/2016/03/18/us/politics/merrick-garlands-record-and-style-hint-at-his-appeal.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/18/us/politics/merrick-garlands-record-and-style-hint-at-his-appeal.html


Processing URLs:  41%|████      | 406/1000 [15:59<09:36,  1.03it/s]

Error extracting text from https://seekingalpha.com/article/4092703-t-time-warner-done: 403 Client Error: Forbidden for url: https://seekingalpha.com/article/4092703-t-time-warner-done


Processing URLs:  41%|████      | 408/1000 [18:01<6:04:17, 36.92s/it]

Error extracting text from http://performance.morningstar.com/Performance/index-c/performance-return.action?t=SPX: HTTPConnectionPool(host='performance.morningstar.com', port=80): Max retries exceeded with url: /Performance/index-c/performance-return.action?t=SPX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3081a5370>, 'Connection to performance.morningstar.com timed out. (connect timeout=60)'))


Processing URLs:  41%|████      | 411/1000 [18:05<2:12:16, 13.47s/it]

Error extracting text from http://www.scout.com/military/warrior/story/1683721-us-attacks-on-isis-in-mosul-kill-leaders: 403 Client Error: Forbidden for url: https://247sports.com/


Processing URLs:  42%|████▏     | 419/1000 [18:23<24:24,  2.52s/it]  

Error extracting text from http://www.ibtimes.com/senior-russian-diplomat-tells-assad-step-back-line-2314473: 403 Client Error: Forbidden for url: https://www.ibtimes.com/senior-russian-diplomat-tells-assad-step-back-line-2314473
URL filtered: https://m.youtube.com/watch?v=DZGCJVnhMRc
Error extracting text from http://www.reuters.com/article/us-un-election-idUSKCN12D215?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-election-idUSKCN12D215?il=0


Processing URLs:  42%|████▏     | 421/1000 [18:25<17:35,  1.82s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-aramco-ipo-idUSKBN1710RS?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-ipo-idUSKBN1710RS?il=0


Processing URLs:  43%|████▎     | 428/1000 [19:37<2:13:49, 14.04s/it]

Error extracting text from http://www.nytimes.com/2016/08/17/us/pennsylvania-attorney-general-kathleen-kane-resigns.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/17/us/pennsylvania-attorney-general-kathleen-kane-resigns.html?_r=0


Processing URLs:  43%|████▎     | 429/1000 [19:39<1:38:19, 10.33s/it]

URL filtered: https://www.youtube.com/watch?v=u44UqeUBphY


Processing URLs:  43%|████▎     | 431/1000 [19:40<55:32,  5.86s/it]  

Error extracting text from http://www.reuters.com/article/us-iran-election-vote-idUSKCN0VZ0E7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-election-vote-idUSKCN0VZ0E7


Processing URLs:  43%|████▎     | 432/1000 [19:41<44:18,  4.68s/it]

Error extracting text from http://economictimes.indiatimes.com/articleshow/52300898.cms: 410 Client Error: Gone for url: https://economictimes.indiatimes.com/articleshow/52300898.cms


Processing URLs:  43%|████▎     | 434/1000 [19:44<29:08,  3.09s/it]

Error extracting text from https://www.nytimes.com/2017/12/06/us/politics/tax-bill-obamacare-mandate-collins.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/06/us/politics/tax-bill-obamacare-mandate-collins.html


Processing URLs:  44%|████▍     | 438/1000 [19:52<19:12,  2.05s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-aramco-idUSKCN0Y10XL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-aramco-idUSKCN0Y10XL


Processing URLs:  44%|████▍     | 441/1000 [20:00<22:26,  2.41s/it]

Error extracting text from http://www.insidesources.com/the-future-of-britain-is-on-the-ballot/: 403 Client Error: Forbidden for url: https://insidesources.com/the-future-of-britain-is-on-the-ballot/


Processing URLs:  44%|████▍     | 443/1000 [20:02<16:20,  1.76s/it]

Error extracting text from http://uk.reuters.com/article/uk-iran-missiles-un-idUKKCN0WG1NE: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  45%|████▍     | 446/1000 [20:05<10:43,  1.16s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2017/01/30/0200000000AEN20170130001100315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:  45%|████▍     | 449/1000 [20:07<07:57,  1.15it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-10-25/amazon-s-dream-of-drone-deliveries-get-closer-with-trump-order


Processing URLs:  45%|████▌     | 453/1000 [20:10<07:11,  1.27it/s]

URL filtered: https://www.bloomberg.com/news/articles/2017-01-03/tesla-falls-as-quarterly-deliveries-trail-analysts-estimates


Processing URLs:  46%|████▌     | 458/1000 [20:12<04:26,  2.03it/s]

Error extracting text from https://www.reuters.com/business/energy/whats-behind-wild-surges-global-lng-prices-risks-ahead-2021-10-01/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/whats-behind-wild-surges-global-lng-prices-risks-ahead-2021-10-01/
Error extracting text from http://www.investinganswers.com/financial-dictionary/economics/federal-funds-rate-62: 403 Client Error: Forbidden for url: http://www.investinganswers.com/financial-dictionary/economics/federal-funds-rate-62


Processing URLs:  46%|████▋     | 464/1000 [20:23<10:21,  1.16s/it]

Error extracting text from https://www.riverineherald.com.au/world/2021/06/29/4557836/new-zealands-vaccine-stocks-dwindling: 500 Server Error: Internal Server Error for url: https://www.riverineherald.com.au/world/2021/06/29/4557836/new-zealands-vaccine-stocks-dwindling


Processing URLs:  46%|████▋     | 465/1000 [20:23<09:27,  1.06s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-nireland-idUKKBN16Z1G0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  47%|████▋     | 467/1000 [20:28<13:30,  1.52s/it]

Error extracting text from https://cleantechnica.com/2013/10/06/an-island-tokelau-powered-100-by-solar-energy/: 403 Client Error: Forbidden for url: https://cleantechnica.com/2013/10/06/an-island-tokelau-powered-100-by-solar-energy/


Processing URLs:  47%|████▋     | 470/1000 [20:32<11:03,  1.25s/it]

Error extracting text from http://www.theepochtimes.com/n3/2186725-exclusive-chinas-anti-corruption-chief-will-be-exempt-from-forced-retirement/: 410 Client Error: Gone for url: https://www.theepochtimes.com/n3/2186725-exclusive-chinas-anti-corruption-chief-will-be-exempt-from-forced-retirement/
Error extracting text from http://www.reuters.com/article/us-southchinasea-malaysia-idUSKCN0YM2SV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-malaysia-idUSKCN0YM2SV


Processing URLs:  47%|████▋     | 472/1000 [20:35<10:45,  1.22s/it]

Error extracting text from https://www.wsj.com/articles/fed-tees-up-taper-and-signals-rate-rises-possible-next-year-11632333617?mod=hp_lead_pos3: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-tees-up-taper-and-signals-rate-rises-possible-next-year-11632333617?mod=hp_lead_pos3
URL filtered: https://www.linkedin.com/pulse/panamas-new-locks-aaron-ahlburn?trk=hb_ntf_MEGAPHONE_ARTICLE_POST


Processing URLs:  47%|████▋     | 474/1000 [20:35<06:16,  1.40it/s]

Error extracting text from https://www.nytimes.com/2020/11/17/world/as-brazils-virus-crisis-eases-jair-bolsonaros-popularity-rises.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/11/17/world/as-brazils-virus-crisis-eases-jair-bolsonaros-popularity-rises.html


Processing URLs:  48%|████▊     | 479/1000 [20:42<10:09,  1.17s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu-davis/uk-brexit-minister-says-eu-agreement-likely-but-uk-ready-for-no-deal-idUSKBN1DL19Q: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-davis/uk-brexit-minister-says-eu-agreement-likely-but-uk-ready-for-no-deal-idUSKBN1DL19Q


Processing URLs:  48%|████▊     | 481/1000 [20:47<14:16,  1.65s/it]

Error extracting text from http://38north.org/2017/04/ychang042517/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  48%|████▊     | 484/1000 [20:49<08:31,  1.01it/s]

Error extracting text from https://www.nytimes.com/2017/07/18/us/politics/trump-meeting-russia.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/18/us/politics/trump-meeting-russia.html
Error extracting text from https://balkaninsight.com/2021/05/12/bulgarias-caretaker-govt-to-maintain-north-macedonia-blockade/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2021/05/12/bulgarias-caretaker-govt-to-maintain-north-macedonia-blockade/


Processing URLs:  48%|████▊     | 485/1000 [20:49<06:23,  1.34it/s]

Error extracting text from http://www.nytimes.com/2016/03/01/world/middleeast/after-gains-against-isis-american-focus-is-turning-to-mosul.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/01/world/middleeast/after-gains-against-isis-american-focus-is-turning-to-mosul.html?_r=0


Processing URLs:  49%|████▉     | 488/1000 [20:53<10:37,  1.24s/it]

Error extracting text from http://tass.ru/opinions/interviews/3670759: 404 Client Error: Not Found for url: https://tass.ru/opinions/interviews/3670759


Processing URLs:  49%|████▉     | 490/1000 [20:55<09:35,  1.13s/it]

Error extracting text from http://www.yenisafak.com/en/news/cias-clandestine-meeting-in-istanbul-on-coup-night-2499850: 422 Client Error:  for url: http://www.yenisafak.com/en/news/cias-clandestine-meeting-in-istanbul-on-coup-night-2499850


Processing URLs:  49%|████▉     | 492/1000 [21:02<20:52,  2.47s/it]

Error extracting text from http://www.osp.od.nih.gov/office-biotechnology-activities/biomedical-technology-assessment/hgt/rac: 404 Client Error: Not Found for url: https://osp.od.nih.gov/office-biotechnology-activities/biomedical-technology-assessment/hgt/rac


Processing URLs:  49%|████▉     | 493/1000 [21:02<15:34,  1.84s/it]

Error extracting text from https://www.hklaw.com/en/insights/publications/2021/03/community-project-funding-117th-congress-revives-the-earmark-process: 403 Client Error: Forbidden for url: https://www.hklaw.com/en/insights/publications/2021/03/community-project-funding-117th-congress-revives-the-earmark-process


Processing URLs:  50%|████▉     | 497/1000 [21:11<17:28,  2.08s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53125#.VrO-sqVgmcw: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53125#.VrO-sqVgmcw


Processing URLs:  50%|████▉     | 499/1000 [21:14<16:21,  1.96s/it]

Error extracting text from http://www.crisis.acleddata.com/update-burundi-local-data-on-recent-unrest-26-apr-2015-13-march-2016/: HTTPConnectionPool(host='www.crisis.acleddata.com', port=80): Max retries exceeded with url: /update-burundi-local-data-on-recent-unrest-26-apr-2015-13-march-2016/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ff3d9460>: Failed to resolve 'www.crisis.acleddata.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|█████     | 502/1000 [21:15<08:23,  1.01s/it]

Error extracting text from http://afghanistantimes.af/uncertainty-in-electoral-calendar-challenges-nug-legitimacy-un-envoy/: 403 Client Error: Forbidden for url: https://afghanistantimes.af/uncertainty-in-electoral-calendar-challenges-nug-legitimacy-un-envoy/


Processing URLs:  50%|█████     | 503/1000 [21:17<10:15,  1.24s/it]

Error extracting text from http://www.ibtimes.com/south-china-sea-chinese-ships-have-left-philippines-may-still-protest-2329206: 403 Client Error: Forbidden for url: https://www.ibtimes.com/south-china-sea-chinese-ships-have-left-philippines-may-still-protest-2329206


Processing URLs:  51%|█████     | 510/1000 [23:30<1:19:58,  9.79s/it]

Error extracting text from https://www.africa.com/dine-by-night-free-entry-to-expo-2020-dubai/: 403 Client Error: Forbidden for url: https://www.africa.com/dine-by-night-free-entry-to-expo-2020-dubai/


Processing URLs:  51%|█████     | 512/1000 [23:34<45:29,  5.59s/it]  

Error extracting text from http://www.reuters.com/article/us-china-wangyi-korea-usa-idUSKCN0VL15S: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-wangyi-korea-usa-idUSKCN0VL15S


Processing URLs:  51%|█████▏    | 513/1000 [23:34<33:00,  4.07s/it]

Error extracting text from http://www.raps.org/Regulatory-Focus/News/2017/01/20/26651/FDA-Begins-Accepting-Regenerative-Therapy-Applications-for-RAT-Designation/: 403 Client Error: Forbidden for url: https://www.raps.org/Regulatory-Focus/News/2017/01/20/26651/FDA-Begins-Accepting-Regenerative-Therapy-Applications-for-RAT-Designation/


Processing URLs:  52%|█████▏    | 516/1000 [23:41<24:52,  3.08s/it]

Error extracting text from https://ke.usembassy.gov/al-shabaab-threats-to-western-and-kenyan-targets/: 404 Client Error: Not Found for url: https://ke.usembassy.gov/al-shabaab-threats-to-western-and-kenyan-targets/


Processing URLs:  52%|█████▏    | 519/1000 [23:48<18:48,  2.35s/it]

Error extracting text from http://www.newsweek.com/china-military-base-live-fire-djibouti-africa-670588: 403 Client Error: Forbidden for url: https://www.newsweek.com/china-military-base-live-fire-djibouti-africa-670588


Processing URLs:  52%|█████▏    | 524/1000 [23:59<15:36,  1.97s/it]

Error extracting text from http://www.stripes.com/news/pentagon-hesitant-to-commit-to-no-fly-zone-given-challenges-1.380304: 404 Client Error: Not Found for url: https://www.stripes.com:443/news/pentagon-hesitant-to-commit-to-no-fly-zone-given-challenges-1.380304


Processing URLs:  53%|█████▎    | 528/1000 [24:03<10:00,  1.27s/it]

Error extracting text from http://www.wsj.com/articles/venezuelas-pdvsa-says-all-options-are-open-if-debt-exchange-fails-1476832992: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/venezuelas-pdvsa-says-all-options-are-open-if-debt-exchange-fails-1476832992


Processing URLs:  53%|█████▎    | 530/1000 [24:05<08:03,  1.03s/it]

Error extracting text from http://www.ibtimes.com/us-considering-intermediate-range-missiles-deployment-retaliation-russias-alleged-2360888: 403 Client Error: Forbidden for url: https://www.ibtimes.com/us-considering-intermediate-range-missiles-deployment-retaliation-russias-alleged-2360888
Error extracting text from http://www.macrotrends.net/2324/sp-500-historical-chart-data: 403 Client Error: Forbidden for url: http://www.macrotrends.net/2324/sp-500-historical-chart-data


Processing URLs:  53%|█████▎    | 533/1000 [24:09<10:14,  1.32s/it]

Error extracting text from http://data.unhcr.org/syrianrefugees/country.php?id=107: 404 Client Error: Not Found for url: https://data.unhcr.org:443/syrianrefugees/country.php?id=107


Processing URLs:  53%|█████▎    | 534/1000 [24:10<09:24,  1.21s/it]

Error extracting text from http://news.xinhuanet.com/english/2016-01/21/c_135032844.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/2016-01/21/c_135032844.htm


Processing URLs:  54%|█████▍    | 539/1000 [24:20<15:19,  1.99s/it]

Error extracting text from https://www.thecipherbrief.com/article/tech/new-technology-humanitarian-assistance-1092: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/article/tech/new-technology-humanitarian-assistance-1092


Processing URLs:  54%|█████▍    | 541/1000 [24:48<1:02:02,  8.11s/it]

Error extracting text from https://mobile.almasdarnews.com/article/turkish-regime-shifts-stance-assads-future-syria/: 522 Server Error:  for url: https://www.almasdarnews.com/article/turkish-regime-shifts-stance-assads-future-syria/


Processing URLs:  54%|█████▍    | 543/1000 [24:49<31:58,  4.20s/it]  

Error extracting text from http://www.npr.org/news/specials/putin/biotimeline.html: 404 Client Error: Not Found for url: https://www.npr.org/news/specials/putin/biotimeline.html
Error extracting text from https://www.middleeastmonitor.com/20160721-turkey-syria-assad-and-the-deliberate-mystery/: 403 Client Error: Forbidden for url: https://www.middleeastmonitor.com/20160721-turkey-syria-assad-and-the-deliberate-mystery/
URL filtered: http://digiday.com/publishers/german-facebook-fake-news/


Processing URLs:  55%|█████▍    | 546/1000 [25:22<1:14:03,  9.79s/it]

Error extracting text from http://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18200-daw-aung-san-suu-kyi-meets-eight-ceasefire-signatories.html: 522 Server Error:  for url: https://www.mmtimes.com/index.php/national-news/nay-pyi-taw/18200-daw-aung-san-suu-kyi-meets-eight-ceasefire-signatories.html


Processing URLs:  55%|█████▍    | 548/1000 [25:25<45:07,  5.99s/it]  

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538446/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538446/


Processing URLs:  55%|█████▌    | 550/1000 [25:27<26:38,  3.55s/it]

Error extracting text from http://www.opec.org/opec_web/en/press_room/2313.htm: 403 Client Error: Forbidden for url: http://www.opec.org/opec_web/en/press_room/2313.htm


Processing URLs:  55%|█████▌    | 551/1000 [25:28<20:33,  2.75s/it]

Error extracting text from http://www.foreignpolicyi.org/content/fpi-bulletin-venezuela%E2%80%99s-election-peril: 403 Client Error: Forbidden for url: http://www.foreignpolicyi.org/content/fpi-bulletin-venezuela%E2%80%99s-election-peril


Processing URLs:  55%|█████▌    | 552/1000 [25:30<19:16,  2.58s/it]



Processing URLs:  55%|█████▌    | 553/1000 [25:30<14:50,  1.99s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/270207-kasich-touts-electability-against-clinton: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/270207-kasich-touts-electability-against-clinton/


Processing URLs:  56%|█████▌    | 558/1000 [25:36<07:12,  1.02it/s]

Error extracting text from http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347902A1.pdf: 403 Client Error: Forbidden for url: http://transition.fcc.gov/Daily_Releases/Daily_Business/2017/db1121/DOC-347902A1.pdf


Processing URLs:  56%|█████▌    | 561/1000 [25:38<06:13,  1.18it/s]

Error extracting text from http://www.wsj.com/articles/german-prosecutors-california-open-fresh-investigations-into-vw-cheating-1448488852: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/german-prosecutors-california-open-fresh-investigations-into-vw-cheating-1448488852


Processing URLs:  56%|█████▋    | 563/1000 [25:49<19:58,  2.74s/it]

Error extracting text from http://www.reuters.com/article/2015/05/07/us-southchinasea-: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/05/07/us-southchinasea-


Processing URLs:  56%|█████▋    | 564/1000 [25:50<15:19,  2.11s/it]

Error extracting text from http://www.reuters.com/article/us-turkey-politics-germany-idUSKBN17O0ON?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-politics-germany-idUSKBN17O0ON?il=0


Processing URLs:  57%|█████▋    | 569/1000 [25:54<07:16,  1.01s/it]

Error extracting text from https://www.centerforsecuritypolicy.org/2021/01/27/dhss-unintelligible-threat-bulletin/: 403 Client Error: Forbidden for url: https://www.centerforsecuritypolicy.org/2021/01/27/dhss-unintelligible-threat-bulletin/
Error extracting text from http://www.reuters.com/article/us-global-markets-idUSKBN14W01R: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-markets-idUSKBN14W01R


Processing URLs:  57%|█████▋    | 570/1000 [25:55<05:30,  1.30it/s]

Error extracting text from http://www.nytimes.com/2016/01/17/us/politics/hillary-clinton-regrets-not-attacking-bernie-sanders-earlier-her-allies-say.html?hp&amp;target=comments&amp;_r=0#commentsContainer: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/17/us/politics/hillary-clinton-regrets-not-attacking-bernie-sanders-earlier-her-allies-say.html?hp&amp;target=comments&amp;_r=0#commentsContainer


Processing URLs:  57%|█████▊    | 575/1000 [26:02<07:26,  1.05s/it]

Error extracting text from https://www.nytimes.com/2021/02/28/opinion/brazil-covid-vaccines.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/02/28/opinion/brazil-covid-vaccines.html


Processing URLs:  58%|█████▊    | 576/1000 [26:04<09:09,  1.30s/it]

Error extracting text from https://www.ecoi.net/file_upload/4765_1467293044_2016q1myanmar-en.pdf: 404 Client Error: Not Found for url: https://www.ecoi.net/file_upload/4765_1467293044_2016q1myanmar-en.pdf


Processing URLs:  58%|█████▊    | 578/1000 [26:06<07:59,  1.14s/it]

Error extracting text from https://www.google.com/amp/s/www.nytimes.com/2021/03/07/us/politics/joe-manchin-filibuster-stimulus.amp.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/07/us/politics/joe-manchin-filibuster-stimulus.html


Processing URLs:  58%|█████▊    | 581/1000 [26:13<11:46,  1.69s/it]

Error extracting text from http://www.politico.com/polls/president/2016-election/new-hampshire/2016-new-hampshire-democratic-primary-003196#.VromMlMrLow: 404 Client Error: Not Found for url: https://www.politico.com/polls/president/2016-election/new-hampshire/2016-new-hampshire-democratic-primary-003196#.VromMlMrLow


Processing URLs:  58%|█████▊    | 583/1000 [26:17<12:17,  1.77s/it]

Error extracting text from https://www.dailynews.co.zw/articles/2016/10/25/tsvangirai-will-crush-mugabe: HTTPSConnectionPool(host='www.dailynews.co.zw', port=443): Max retries exceeded with url: /articles/2016/10/25/tsvangirai-will-crush-mugabe (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.dailynews.co.zw'. (_ssl.c:1000)")))


Processing URLs:  58%|█████▊    | 585/1000 [26:19<08:33,  1.24s/it]

Error extracting text from http://www.reuters.com/article/2015/10/21/us-mideast-crisis-assad-putin-idUSKCN0SF0I020151021: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/21/us-mideast-crisis-assad-putin-idUSKCN0SF0I020151021


Processing URLs:  60%|█████▉    | 597/1000 [26:51<14:44,  2.20s/it]

Error extracting text from http://english.capital.gr/News.asp?id=2395533: 404 Client Error: Not Found for url: http://en.capital.gr/News.asp?id=2395533


Processing URLs:  60%|█████▉    | 599/1000 [26:52<08:23,  1.25s/it]

Error extracting text from http://www.nytimes.com/2015/06/07/magazine/the-agency.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/07/magazine/the-agency.html


Processing URLs:  61%|██████    | 607/1000 [27:14<17:38,  2.69s/it]

Error extracting text from https://www.icrc.org/en/war-and-law/treaties-customary-law/geneva-conventions: 403 Client Error: Forbidden for url: https://www.icrc.org/en/war-and-law/treaties-customary-law/geneva-conventions


Processing URLs:  61%|██████    | 608/1000 [27:14<13:03,  2.00s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-02-27/u-s-shale-surge-threatens-opec-strategy


Processing URLs:  61%|██████    | 610/1000 [27:16<09:36,  1.48s/it]

URL filtered: https://www.youtube.com/watch?v=6oK8mGH8IX4


Processing URLs:  62%|██████▏   | 615/1000 [27:23<08:14,  1.28s/it]

Error extracting text from http://www.predictwise.com/politics/2016RepNomination: 404 Client Error: Not Found for url: https://www.predictwise.com/politics/2016RepNomination
Error extracting text from https://www.wsj.com/articles/covid-19-aid-package-in-limbo-after-trumps-surprise-demand-to-boost-direct-payments-11608739678?mod=hp_lead_pos1: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/covid-19-aid-package-in-limbo-after-trumps-surprise-demand-to-boost-direct-payments-11608739678?mod=hp_lead_pos1


Processing URLs:  62%|██████▏   | 616/1000 [27:24<09:13,  1.44s/it]

URL filtered: https://www.bloomberg.com/news/articles/2017-12-20/trump-is-said-to-plan-tax-signing-jan-3-due-to-technical-issue


Processing URLs:  63%|██████▎   | 627/1000 [27:46<09:03,  1.46s/it]

Error extracting text from http://www.reuters.com/article/us-germany-election-poll-idUSKBN16I0R0?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-election-poll-idUSKBN16I0R0?il=0


Processing URLs:  63%|██████▎   | 628/1000 [27:47<08:30,  1.37s/it]

Error extracting text from https://www.reuters.com/world/europe/german-inflation-reach-over-7-march-state-data-suggest-2022-03-30/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/german-inflation-reach-over-7-march-state-data-suggest-2022-03-30/


Processing URLs:  63%|██████▎   | 631/1000 [27:57<14:10,  2.31s/it]

Error extracting text from http://www.wsj.com/articles/three-ukrainian-soldiers-killed-in-fighting-against-rebel-forces-1473784739: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/three-ukrainian-soldiers-killed-in-fighting-against-rebel-forces-1473784739


Processing URLs:  63%|██████▎   | 634/1000 [28:01<09:09,  1.50s/it]

Error extracting text from https://www.nytimes.com/2018/02/04/world/middleeast/isis-syria-al-qaeda.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/04/world/middleeast/isis-syria-al-qaeda.html


Processing URLs:  64%|██████▎   | 635/1000 [28:03<10:20,  1.70s/it]

URL filtered: http://www.bloomberglaw.com/product/blaw/exp_blp/ewogICAgImN0eHQiOiAiRE9DIiwKICAgICJpZCI6ICJPWVAyRFo2SzUwWFM/cmVzb3VyY2VfaWQ9NzA2YWM


Processing URLs:  64%|██████▍   | 638/1000 [28:04<05:35,  1.08it/s]

Error extracting text from http://english.alarabiya.net/en/News/middle-east/2016/08/29/Saudi-deputy-crown-prince-visits-China-for-economic-security-talks.html: 403 Client Error: Forbidden for url: https://english.alarabiya.net/en/News/middle-east/2016/08/29/Saudi-deputy-crown-prince-visits-China-for-economic-security-talks.html


Processing URLs:  64%|██████▍   | 644/1000 [28:28<16:50,  2.84s/it]

Error extracting text from http://www.startribune.com/johnson-feingold-to-debate-in-tightening-senate-race/397083991/: 404 Client Error: Not Found for url: https://www.startribune.com/johnson-feingold-to-debate-in-tightening-senate-race/397083991/


Processing URLs:  64%|██████▍   | 645/1000 [28:28<12:15,  2.07s/it]

Error extracting text from https://in.rbth.com/news/2016/06/17/brics-bank-may-issue-bonds-in-russian-ruble-before-end-of-2016_603903: HTTPSConnectionPool(host='in.rbth.com', port=443): Max retries exceeded with url: /news/2016/06/17/brics-bank-may-issue-bonds-in-russian-ruble-before-end-of-2016_603903 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2fdf5c6b0>: Failed to resolve 'in.rbth.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  65%|██████▌   | 651/1000 [28:41<12:55,  2.22s/it]

URL filtered: https://www.bloombergquint.com/business/asml-ceo-says-trying-to-control-chip-sales-to-china-won-t-work


Processing URLs:  65%|██████▌   | 653/1000 [28:42<07:44,  1.34s/it]

URL filtered: https://www.youtube.com/watch?v=SsmVgoXDq2w
URL filtered: http://www.bloomberg.com/news/videos/2016-02-09/david-stockman-markets-going-to-be-mauled-by-bear


Processing URLs:  66%|██████▌   | 658/1000 [28:43<03:46,  1.51it/s]

Error extracting text from https://www.reuters.com/world/us/us-summer-travelers-can-expect-long-lines-higher-prices-covid-restrictions-ease-2022-05-05/?utm_source=Sailthru&amp;utm_medium=newsletter&amp;utm_campaign=daily-briefing: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/us/us-summer-travelers-can-expect-long-lines-higher-prices-covid-restrictions-ease-2022-05-05/?utm_source=Sailthru&amp;utm_medium=newsletter&amp;utm_campaign=daily-briefing


Processing URLs:  66%|██████▋   | 664/1000 [29:02<13:33,  2.42s/it]

Error extracting text from https://www.researchgate.net/publication/281765164_Distilling_the_Wisdom_of_Crowds_Prediction_Markets_versus_Prediction_Polls: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/281765164_Distilling_the_Wisdom_of_Crowds_Prediction_Markets_versus_Prediction_Polls


Processing URLs:  67%|██████▋   | 668/1000 [29:05<05:32,  1.00s/it]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-minister-idUSKCN0XY0E1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-minister-idUSKCN0XY0E1
Error extracting text from http://www.nytimes.com/1983/03/06/world/mugabe-s-fifth-brigade-grounded-in-loyalty.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/1983/03/06/world/mugabe-s-fifth-brigade-grounded-in-loyalty.html
URL filtered: https://www.youtube.com/watch?v=iCXerGxRfRc


Processing URLs:  68%|██████▊   | 675/1000 [30:14<1:35:25, 17.62s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/haiti/article102892432.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  68%|██████▊   | 680/1000 [30:22<21:17,  3.99s/it]  

Error extracting text from http://study.com/academy/lesson/hereditary-diseases-definition-types-treatments.html: 403 Client Error: HTTP Forbidden for url: https://study.com/academy/lesson/hereditary-diseases-definition-types-treatments.html


Processing URLs:  68%|██████▊   | 681/1000 [30:22<15:52,  2.99s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/01/12/national/politics-diplomacy/former-foreign-minister-passes-abe-message-putin/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/01/12/national/politics-diplomacy/former-foreign-minister-passes-abe-message-putin/


Processing URLs:  68%|██████▊   | 685/1000 [30:26<06:39,  1.27s/it]

Error extracting text from https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://f7online.com.br/2016/03/29/lula-ve-democracia-em-risco-e-diz-dilma-resistira-sem-pmdb/&amp;prev=search: 400 Client Error: Bad Request for url: https://translate.google.com/translate?hl=en&amp;sl=pt&amp;u=http://f7online.com.br/2016/03/29/lula-ve-democracia-em-risco-e-diz-dilma-resistira-sem-pmdb/&amp;prev=search


Processing URLs:  69%|██████▉   | 689/1000 [30:32<06:00,  1.16s/it]

Error extracting text from http://www.chicagotribune.com/news/nationworld/politics/ct-roy-moore-alabama-election-20171210-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/nationworld/politics/ct-roy-moore-alabama-election-20171210-story.html


Processing URLs:  70%|██████▉   | 699/1000 [30:49<05:18,  1.06s/it]

Error extracting text from https://www.nytimes.com/2018/09/05/opinion/trump-white-house-anonymous-resistance.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/09/05/opinion/trump-white-house-anonymous-resistance.html


Processing URLs:  71%|███████   | 706/1000 [31:02<06:40,  1.36s/it]

Error extracting text from https://finance.yahoo.com/quote/%5Evix?ltr=1: 404 Client Error: Not Found for url: https://finance.yahoo.com/quote/%5Evix?ltr=1


Processing URLs:  71%|███████   | 712/1000 [31:13<08:08,  1.69s/it]



Processing URLs:  72%|███████▏  | 715/1000 [31:19<07:45,  1.63s/it]

Error extracting text from http://www.reuters.com/article/us-yemen-security-talks-idUSKBN18Q26C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-talks-idUSKBN18Q26C


Processing URLs:  72%|███████▏  | 717/1000 [31:28<13:35,  2.88s/it]

Error extracting text from http://www.reuters.com/article/us-usa-russia-libya-exclusive-idUSKBN16K2RY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-russia-libya-exclusive-idUSKBN16K2RY


Processing URLs:  72%|███████▏  | 720/1000 [31:29<06:20,  1.36s/it]

Error extracting text from http://www.straitstimes.com/world/europe/nato-to-boost-use-of-cyber-weaponry-to-combat-russia: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  72%|███████▏  | 722/1000 [31:32<07:16,  1.57s/it]

Error extracting text from http://myanmareiti.org/download/file/fid/276: 404 Client Error: Not Found for url: https://myanmareiti.org/en/download/file/fid/276


Processing URLs:  73%|███████▎  | 726/1000 [31:39<05:57,  1.30s/it]

Error extracting text from http://cf.cdn.unwto.org/sites/all/files/pdf/unwto_barom16_04_july_excerpt_.pdf: HTTPSConnectionPool(host='cf.cdn.unwto.org', port=443): Max retries exceeded with url: /sites/all/files/pdf/unwto_barom16_04_july_excerpt_.pdf (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] ssl/tls alert handshake failure (_ssl.c:1000)')))


Processing URLs:  73%|███████▎  | 730/1000 [31:55<15:07,  3.36s/it]

Error extracting text from http://www.sfb-governance.de/en/publikationen/working_papers/wp56/index.html: 404 Client Error: Not Found for url: https://www.sfb-governance.de/en/publikationen/working_papers/wp56/index.html


Processing URLs:  73%|███████▎  | 731/1000 [31:56<12:22,  2.76s/it]

Error extracting text from http://uk.reuters.com/article/uk-britain-election-eu-idUKKBN1900I8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  73%|███████▎  | 732/1000 [31:57<09:45,  2.19s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-04-03/cameron-denies-brexit-referendum-is-splitting-u-k-government


Processing URLs:  74%|███████▎  | 735/1000 [31:58<04:22,  1.01it/s]

Error extracting text from http://www.japantimes.co.jp/news/2015/10/28/asia-pacific/politics-diplomacy-asia-pacific/seoul-tokyo-still-bickering-days-three-way-summit-china/: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2015/10/28/asia-pacific/politics-diplomacy-asia-pacific/seoul-tokyo-still-bickering-days-three-way-summit-china/
Error extracting text from http://www.nytimes.com/2015/10/14/business/economy/a-2nd-fed-governor-opposes-raising-rates-this-year-breaking-with-yellen.html?ref=business: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/14/business/economy/a-2nd-fed-governor-opposes-raising-rates-this-year-breaking-with-yellen.html?ref=business


Processing URLs:  74%|███████▍  | 740/1000 [32:06<06:17,  1.45s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/north-korea-threatens-fire-us-south-korean-troops-41688373?yptr=yahoo: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/north-korea-threatens-fire-us-south-korean-troops-41688373?yptr=yahoo


Processing URLs:  74%|███████▍  | 742/1000 [32:10<07:24,  1.72s/it]

Error extracting text from https://www.retail-week.com/topics/technology/amazons-innovation-boss-on-building-autonomous-drones/7025959.article: 403 Client Error: Forbidden for url: https://www.retail-week.com/topics/technology/amazons-innovation-boss-on-building-autonomous-drones/7025959.article


Processing URLs:  74%|███████▍  | 743/1000 [32:12<07:08,  1.67s/it]

Error extracting text from http://bigstory.ap.org/article/c0f10cb873be4a3fbdebe99f7dc1daa3/syrian-tv-president-assad-meets-putin-moscow: HTTPConnectionPool(host='bigstory.ap.org', port=80): Max retries exceeded with url: /article/c0f10cb873be4a3fbdebe99f7dc1daa3/syrian-tv-president-assad-meets-putin-moscow (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x30650e2d0>: Failed to resolve 'bigstory.ap.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  75%|███████▌  | 750/1000 [32:24<07:50,  1.88s/it]

Error extracting text from http://www.foxnews.com/world/2015/09/30/no-future-for-assad-regime-in-syria-saudi-arabia-foreign-minister-says/: 404 Client Error: Not Found for url: https://www.foxnews.com/world/2015/09/30/no-future-for-assad-regime-in-syria-saudi-arabia-foreign-minister-says/


Processing URLs:  77%|███████▋  | 767/1000 [32:57<06:56,  1.79s/it]

Error extracting text from http://www.polioeradication.org/mediaroom/newsstories/SAGE-confirms-global-polio-vaccine-switch-date-as-April-2016/tabid/526/news/1307/Default.aspx?popUp=true: 404 Client Error: Not Found for url: https://polioeradication.org/mediaroom/newsstories/SAGE-confirms-global-polio-vaccine-switch-date-as-April-2016/tabid/526/news/1307/Default.aspx?popUp=true


Processing URLs:  77%|███████▋  | 769/1000 [33:02<07:17,  1.89s/it]

Error extracting text from http://thehill.com/policy/cybersecurity/337538-russian-cyberattack-on-us-electoral-systems-more-widespread-than-reported: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/337538-russian-cyberattack-on-us-electoral-systems-more-widespread-than-reported/


Processing URLs:  77%|███████▋  | 772/1000 [33:09<08:06,  2.13s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-12-14/negotiators-push-for-deal-after-weekend-progress-brexit-update?srnd=brexit


Processing URLs:  78%|███████▊  | 780/1000 [33:17<04:18,  1.18s/it]

Error extracting text from http://www.reuters.com/article/us-china-politics-idUSKCN1130J4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-china-politics-idUSKCN1130J4


Processing URLs:  78%|███████▊  | 784/1000 [33:39<10:54,  3.03s/it]

Error extracting text from http://www.nytimes.com/2016/04/13/us/politics/donald-trump-losing-ground-tries-to-blame-the-system.html?emc=edit_th_20160413&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/13/us/politics/donald-trump-losing-ground-tries-to-blame-the-system.html?emc=edit_th_20160413&amp;nl=todaysheadlines&amp;nlid=28699183&amp;_r=0


Processing URLs:  79%|███████▉  | 788/1000 [33:43<05:01,  1.42s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/presidential-races/279977-trump-rubio-not-being-considered-for-vp: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/presidential-races/279977-trump-rubio-not-being-considered-for-vp/


Processing URLs:  79%|███████▉  | 790/1000 [33:44<03:05,  1.13it/s]

Error extracting text from https://www.reuters.com/article/us-germany-politics-coalition-factbox/factbox-german-coalition-watch-lets-not-be-perfectionists-in-coalition-talks-says-merkel-ally-idUSKBN1CJ0RR?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics-coalition-factbox/factbox-german-coalition-watch-lets-not-be-perfectionists-in-coalition-talks-says-merkel-ally-idUSKBN1CJ0RR?il=0


Processing URLs:  79%|███████▉  | 792/1000 [33:49<05:34,  1.61s/it]

Error extracting text from http://inserbia.info/today/2015/10/hahn-montenegro-nato-accession-important-for-regional-stability/: 404 Client Error: Not Found for url: https://inserbia.info/today/2015/10/hahn-montenegro-nato-accession-important-for-regional-stability/


Processing URLs:  79%|███████▉  | 793/1000 [33:52<07:24,  2.15s/it]

Error extracting text from http://www.greekcrisis.net/2016/02/schaeuble-hints-germany-may-be-ready-to.html: 404 Client Error: Not Found for url: http://greekcrisis.net/2016/02/schaeuble-hints-germany-may-be-ready-to.html


Processing URLs:  80%|███████▉  | 799/1000 [34:01<03:44,  1.12s/it]

Error extracting text from http://www.publications.parliament.uk/pa/bills/cbill/2015-2016/0002/cbill_2015-20160002_en_2.htm#pb1-l1g5: 403 Client Error: Forbidden for url: https://publications.parliament.uk/pa/bills/cbill/2015-2016/0002/cbill_2015-20160002_en_2.htm#pb1-l1g5
URL filtered: https://www.bloomberg.com/news/articles/2017-06-08/pound-drops-as-u-k-exit-poll-unnerves-investors-markets-wrap


Processing URLs:  81%|████████  | 806/1000 [34:09<04:13,  1.31s/it]

Error extracting text from http://www.amazon.com/Lights-Out-Cyberattack-Unprepared-Surviving/dp/055341996X/: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Lights-Out-Cyberattack-Unprepared-Surviving/dp/055341996X/


Processing URLs:  81%|████████  | 812/1000 [34:25<04:26,  1.42s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKCN0VR2KQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-usa-sanctions-idUSKCN0VR2KQ


Processing URLs:  82%|████████▏ | 815/1000 [34:28<03:51,  1.25s/it]

Error extracting text from https://global.handelsblatt.com/edition/387/ressort/politics/article/france-non-to-watered-down-trade-treaty: 403 Client Error: Forbidden for url: https://www.handelsblatt.com/edition/387/ressort/politics/article/france-non-to-watered-down-trade-treaty


Processing URLs:  82%|████████▏ | 816/1000 [34:30<04:01,  1.31s/it]

URL filtered: https://www.youtube.com/watch?v=kPqeV4RYS0M


Processing URLs:  82%|████████▏ | 818/1000 [34:33<04:16,  1.41s/it]

Error extracting text from https://www.reuters.com/article/us-britain-eu-tusk-idUSKCN1PN2TZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-tusk-idUSKCN1PN2TZ


Processing URLs:  82%|████████▏ | 822/1000 [34:37<03:23,  1.14s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/thailand-s-next-election-in-exactly-one-year-deputy-pm/3503030.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/thailand-s-next-election-in-exactly-one-year-deputy-pm/3503030.html


Processing URLs:  82%|████████▏ | 823/1000 [34:37<02:41,  1.10it/s]

Error extracting text from https://www.scientificamerican.com/article/drilling-resumes-on-the-dakota-access-pipeline/: 403 Client Error: Forbidden for url: https://www.scientificamerican.com/article/drilling-resumes-on-the-dakota-access-pipeline/


Processing URLs:  82%|████████▎ | 825/1000 [34:42<04:25,  1.52s/it]

Error extracting text from https://www.reuters.com/article/us-germany-politics/merkel-defends-painful-coalition-concessions-denies-authority-waning-idUSKBN1FV0TO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-germany-politics/merkel-defends-painful-coalition-concessions-denies-authority-waning-idUSKBN1FV0TO


Processing URLs:  83%|████████▎ | 827/1000 [34:44<03:56,  1.37s/it]

Error extracting text from http://www.notitarde.com/Economia/Proyectan-inflacion-de-3265-para-Venezuela-este-ano-2016-2642351/2016/03/27/924468/: 404 Client Error: Not Found for url: http://www.notitarde.com/Economia/Proyectan-inflacion-de-3265-para-Venezuela-este-ano-2016-2642351/2016/03/27/924468/


Processing URLs:  83%|████████▎ | 831/1000 [34:51<04:32,  1.61s/it]

Error extracting text from http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x808854/BB31868E-588E-4E95-B72E-729E88E9E932/Q4_14_Shareholder_Letter_Final.pdf: 403 Client Error: Forbidden for url: http://files.shareholder.com/downloads/ABEA-4CW8X0/1756805355x0x808854/BB31868E-588E-4E95-B72E-729E88E9E932/Q4_14_Shareholder_Letter_Final.pdf


Processing URLs:  83%|████████▎ | 832/1000 [34:52<03:42,  1.32s/it]

Error extracting text from http://bearingdrift.com/2015/10/01/senate-to-consider-lifting-ban-on-crude-oil-exports-warner-reportedly-on-the-fence/: 403 Client Error: Forbidden for url: http://bearingdrift.com/2015/10/01/senate-to-consider-lifting-ban-on-crude-oil-exports-warner-reportedly-on-the-fence/


Processing URLs:  83%|████████▎ | 833/1000 [34:53<03:29,  1.25s/it]

Error extracting text from http://phys.org/news/2016-02-human-champion-hell-ai-ancient.html: 400 Client Error: Bad request for url: https://phys.org/news/2016-02-human-champion-hell-ai-ancient.html


Processing URLs:  84%|████████▎ | 835/1000 [34:55<03:11,  1.16s/it]

Error extracting text from http://www.newsweek.com/putin-calls-erdogan-voice-support-order-coup-turkey-481307: 403 Client Error: Forbidden for url: https://www.newsweek.com/putin-calls-erdogan-voice-support-order-coup-turkey-481307


Processing URLs:  84%|████████▎ | 837/1000 [34:58<03:26,  1.27s/it]

Error extracting text from https://larswericson.wordpress.com/2016/05/10/gitrep-9may16pm/: 410 Client Error: Gone for url: https://larswericson.wordpress.com/2016/05/10/gitrep-9may16pm/


Processing URLs:  84%|████████▍ | 841/1000 [35:13<09:53,  3.73s/it]

Error extracting text from http://ewn.co.za/2016/08/08/EFF-will-only-enter-into-coalition-with-ANC-is-it-removes-Zuma: 404 Client Error: Not Found for url: https://www.ewn.co.za/2016/08/08/EFF-will-only-enter-into-coalition-with-ANC-is-it-removes-Zuma


Processing URLs:  84%|████████▍ | 842/1000 [35:14<07:33,  2.87s/it]

Error extracting text from http://news.xinhuanet.com/english/china/2012-03/15/c_131468566.htm: 404 Client Error: Not Found for url: http://www.xinhuanet.com//english/china/2012-03/15/c_131468566.htm


Processing URLs:  84%|████████▍ | 845/1000 [35:15<03:09,  1.22s/it]

Error extracting text from https://news.usni.org/2017/08/25/navy-orion-hammer-investigation-uss-john-mccain-collision-turned-no-evidence-cyber-attack: 403 Client Error: Forbidden for url: https://news.usni.org/2017/08/25/navy-orion-hammer-investigation-uss-john-mccain-collision-turned-no-evidence-cyber-attack


Processing URLs:  85%|████████▍ | 846/1000 [35:32<15:12,  5.92s/it]

Error extracting text from http://www.investopedia.com/news/saudi-aramco-ceo-expect-record-ipo-2018/: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/news/saudi-aramco-ceo-expect-record-ipo-2018/


Processing URLs:  85%|████████▍ | 849/1000 [35:38<08:25,  3.34s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=53686#.V17wsNQ8KrV: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=53686#.V17wsNQ8KrV


Processing URLs:  85%|████████▌ | 851/1000 [35:38<04:38,  1.87s/it]

Error extracting text from http://www.libdemvoice.org/liblink-tim-farron-cameron-and-corbyn-stance-on-brexit-downright-pathetic-47871.html: 403 Client Error: Forbidden for url: http://www.libdemvoice.org/liblink-tim-farron-cameron-and-corbyn-stance-on-brexit-downright-pathetic-47871.html


Processing URLs:  85%|████████▌ | 852/1000 [35:40<04:43,  1.91s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-02-21/iran-compromise-with-atomic-monitors-taps-brakes-on-escalation


Processing URLs:  86%|████████▌ | 855/1000 [35:42<03:12,  1.33s/it]

Error extracting text from http://www.dailyherald.com/article/20161217/business/161219129/: 404 Client Error: Not Found for url: https://www.dailyherald.com/article/20161217/business/161219129/


Processing URLs:  86%|████████▌ | 856/1000 [35:43<02:35,  1.08s/it]

Error extracting text from https://www.congress.gov/bill/116th-congress/house-bill/4: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/116th-congress/house-bill/4


Processing URLs:  86%|████████▌ | 857/1000 [35:43<02:04,  1.15it/s]

Error extracting text from http://www.wsj.com/articles/wsj-survey-most-economists-predict-fed-will-stay-on-hold-in-september-1441980000: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/wsj-survey-most-economists-predict-fed-will-stay-on-hold-in-september-1441980000


Processing URLs:  86%|████████▌ | 859/1000 [35:45<02:07,  1.11it/s]

URL filtered: http://www.bloomberg.com/news/articles/2016-03-04/oil-market-storm-clears-as-output-deal-seen-stabilizing-prices


Processing URLs:  86%|████████▌ | 861/1000 [35:45<01:16,  1.82it/s]

Error extracting text from http://www.wsj.com/articles/vice-president-joe-biden-follows-a-candidates-schedule-1443553494: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/vice-president-joe-biden-follows-a-candidates-schedule-1443553494


Processing URLs:  87%|████████▋ | 869/1000 [35:56<02:49,  1.30s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=55283#.V__74TJh1gg: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=55283#.V__74TJh1gg


Processing URLs:  87%|████████▋ | 871/1000 [35:57<01:34,  1.37it/s]

Error extracting text from https://www.nytimes.com/2017/08/06/world/middleeast/yemen-qaeda-shabwa-province.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/08/06/world/middleeast/yemen-qaeda-shabwa-province.html?_r=0
Error extracting text from http://www.france24.com/en/20160226-iran-2016-elections-majlis-parliament-assembly-experts-legislative: 403 Client Error: Forbidden for url: http://www.france24.com/en/20160226-iran-2016-elections-majlis-parliament-assembly-experts-legislative


Processing URLs:  88%|████████▊ | 875/1000 [36:01<01:43,  1.21it/s]

Error extracting text from http://www.reuters.com/article/us-asean-summit-trade-idUSKBN17S0T2: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-asean-summit-trade-idUSKBN17S0T2


Processing URLs:  88%|████████▊ | 878/1000 [36:04<01:47,  1.13it/s]

Error extracting text from http://www.aina.org/news/20160309230225.htm: 404 Client Error:  for url: http://www.aina.org/news/20160309230225.htm
Error extracting text from http://www.reuters.com/article/us-tesla-gigafactory-idUSKCN10G2E2?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-tesla-gigafactory-idUSKCN10G2E2?il=0
Error extracting text from https://www.reuters.com/article/us-cyber-summit-ukraine-police-exclusive/exclusive-ukraine-hit-by-stealthier-phishing-attacks-during-badrabbit-strike-idUSKBN1D2263: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-cyber-summit-ukraine-police-exclusive/exclusive-ukraine-hit-by-stealthier-phishing-attacks-during-badrabbit-strike-idUSKBN1D2263


Processing URLs:  88%|████████▊ | 881/1000 [36:08<02:08,  1.08s/it]

Error extracting text from http://www.mckinsey.com/about_us/what_we_do: 404 Client Error: Not Found for url: https://www.mckinsey.com/NotFound.aspx


Processing URLs:  88%|████████▊ | 884/1000 [36:16<03:42,  1.91s/it]

Error extracting text from http://www.reuters.com/article/us-usa-mattis-israel-idUSKBN17N0SN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-mattis-israel-idUSKBN17N0SN


Processing URLs:  89%|████████▉ | 888/1000 [36:20<02:37,  1.41s/it]

Error extracting text from https://www.reuters.com/article/uk-britain-eu-may/irish-border-row-thwarts-may-bid-to-clinch-brexit-trade-deal-idUSKBN1DX0XR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/uk-britain-eu-may/irish-border-row-thwarts-may-bid-to-clinch-brexit-trade-deal-idUSKBN1DX0XR


Processing URLs:  89%|████████▉ | 891/1000 [36:21<01:29,  1.22it/s]

Error extracting text from http://national.suntimes.com/national-world-news/7/72/2400387/new-trump-ad-features-clintons-weiner-cosby: HTTPConnectionPool(host='national.suntimes.com', port=80): Max retries exceeded with url: /national-world-news/7/72/2400387/new-trump-ad-features-clintons-weiner-cosby (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x117cdb500>: Failed to resolve 'national.suntimes.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  89%|████████▉ | 892/1000 [36:22<01:24,  1.27it/s]

Error extracting text from https://www.reuters.com/article/us-southsudan-security-un/u-n-council-fails-to-impose-arms-embargo-on-south-sudan-idUSKBN14C1KY: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southsudan-security-un/u-n-council-fails-to-impose-arms-embargo-on-south-sudan-idUSKBN14C1KY


Processing URLs:  90%|████████▉ | 895/1000 [36:23<01:01,  1.71it/s]

Error extracting text from https://www.transportation.gov/briefing-room/drone-focus-conference?utm_source=Triggermail&amp;utm_medium=email&amp;utm_campaign=Post%20Blast%20%28bii-e-commerce%29:%20Privatizing%20air%20traffic%20control%20for%20drone%20delivery%20—%20Walmart%20tests%20grocery%20pickup%20automation%20—%20India%20is%20top%20country%20for%20retail%20investment&amp;utm_term=BII%20List%20E-Comm%20ALL: 403 Client Error: Forbidden for url: https://www.transportation.gov/briefing-room/drone-focus-conference?utm_source=Triggermail&amp;utm_medium=email&amp;utm_campaign=Post%20Blast%20%28bii-e-commerce%29:%20Privatizing%20air%20traffic%20control%20for%20drone%20delivery%20%E2%80%94%20Walmart%20tests%20grocery%20pickup%20automation%20%E2%80%94%20India%20is%20top%20country%20for%20retail%20investment&amp;utm_term=BII%20List%20E-Comm%20ALL


Processing URLs:  90%|████████▉ | 896/1000 [36:25<01:22,  1.26it/s]

Error extracting text from http://www.reuters.com/article/us-russia-usa-nuclear-idUSKCN1230YN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-usa-nuclear-idUSKCN1230YN


Processing URLs:  90%|█████████ | 900/1000 [36:27<01:13,  1.36it/s]

Error extracting text from http://orientalreview.org/2014/11/21/why-is-russia-going-to-skip-the-nuclear-security-summit-in-the-us/: HTTPConnectionPool(host='orientalreview.org', port=80): Max retries exceeded with url: /2014/11/21/why-is-russia-going-to-skip-the-nuclear-security-summit-in-the-us/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304720aa0>: Failed to resolve 'orientalreview.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  90%|█████████ | 901/1000 [36:28<01:06,  1.48it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/290531-obama-in-tough-spot-with-russia: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/290531-obama-in-tough-spot-with-russia/


Processing URLs:  90%|█████████ | 902/1000 [36:29<01:21,  1.20it/s]

Error extracting text from https://carnegieendowment.org/2020/09/30/rough-waters-ahead-for-vietnam-china-relations-pub-82826: 403 Client Error: Forbidden for url: https://carnegieendowment.org/2020/09/30/rough-waters-ahead-for-vietnam-china-relations-pub-82826


Processing URLs:  90%|█████████ | 904/1000 [36:30<01:03,  1.52it/s]

Error extracting text from http://www.reuters.com/article/us-russia-turkey-putin-idUSKBN17Z1O8: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-russia-turkey-putin-idUSKBN17Z1O8
Error extracting text from http://www.reuters.com/article/us-britain-eu-idUSKBN1A10AGWorld: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-idUSKBN1A10AGWorld


Processing URLs:  91%|█████████ | 907/1000 [36:32<00:58,  1.58it/s]

Error extracting text from http://allafrica.com/stories/201802180153.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201802180153.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3047236e0>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  91%|█████████ | 909/1000 [36:39<03:00,  1.98s/it]

Error extracting text from http://ewn.co.za/2016/09/03/Occupy-Luthuli-House-organisers-remain-resolute-despite-ANCYL-condemnation: 404 Client Error: Not Found for url: https://www.ewn.co.za/2016/09/03/Occupy-Luthuli-House-organisers-remain-resolute-despite-ANCYL-condemnation


Processing URLs:  91%|█████████ | 912/1000 [36:54<05:24,  3.68s/it]

Error extracting text from http://www.asiafoundation.org/resources/pdfs/ConflictTerritorialAdministrationfullreportENG.pdf: 403 Client Error: Forbidden for url: https://asiafoundation.org/publications/all


Processing URLs:  92%|█████████▏| 919/1000 [37:04<02:01,  1.50s/it]

Error extracting text from http://in.reuters.com/article/2015/11/29/volkswagen-emissions-polo-idINKBN0TI0NO20151129: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  92%|█████████▏| 920/1000 [37:05<01:46,  1.33s/it]

Error extracting text from http://vneconomictimes.com/article/vietnam-today/workable-alternative: 403 Client Error: Forbidden for url: https://vneconomictimes.com/article/vietnam-today/workable-alternative


Processing URLs:  92%|█████████▏| 924/1000 [37:14<01:58,  1.55s/it]

Error extracting text from http://www.wsj.com/articles/china-loosens-debt-terms-for-venezuela-1416858616: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/china-loosens-debt-terms-for-venezuela-1416858616


Processing URLs:  93%|█████████▎| 928/1000 [38:19<22:57, 19.13s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2016-12-23/the-latest-putin-nothing-unusual-in-trump-nukes-comment: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 930/1000 [38:21<11:23,  9.77s/it]

Error extracting text from https://www.nationalworld.com/news/politics/boris-johnson-vs-keir-starmer-polls-latest-opinion-polls-on-party-leaders-ahead-of-2021-local-elections-3219135: 403 Client Error: Forbidden for url: https://www.nationalworld.com/news/politics/boris-johnson-vs-keir-starmer-polls-latest-opinion-polls-on-party-leaders-ahead-of-2021-local-elections-3219135


Processing URLs:  93%|█████████▎| 931/1000 [38:23<08:36,  7.49s/it]

Error extracting text from http://www.swissinfo.ch/eng/french-special-forces-waging--secret-war--in-libya---report/41980840: 404 Client Error: Not Found for url: https://www.swissinfo.ch/eng/french-special-forces-waging--secret-war--in-libya---report/41980840


Processing URLs:  93%|█████████▎| 933/1000 [38:26<04:49,  4.32s/it]

Error extracting text from https://www.wsj.com/articles/first-conspiracy-charges-filed-over-capitol-riot-11611080191?mod=politics_lead_pos3: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/first-conspiracy-charges-filed-over-capitol-riot-11611080191?mod=politics_lead_pos3


Processing URLs:  93%|█████████▎| 934/1000 [38:27<03:36,  3.29s/it]

URL filtered: http://www.bloomberg.com/view/articles/2016-04-11/opec-and-russia-s-fake-freeze-on-oil-is-good-enough


Processing URLs:  94%|█████████▎| 936/1000 [38:27<02:01,  1.89s/it]

Error extracting text from http://www.nytimes.com/2016/03/07/world/asia/philippines-north-korea-sanctions-united-nations-cargo-ship-seizure.html?utm_source=Paramount&amp;utm_medium=email&amp;utm_campaign=rundown030716&amp;ref=world&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/07/world/asia/philippines-north-korea-sanctions-united-nations-cargo-ship-seizure.html?utm_source=Paramount&amp;utm_medium=email&amp;utm_campaign=rundown030716&amp;ref=world&amp;_r=0


Processing URLs:  94%|█████████▍| 938/1000 [38:30<01:36,  1.55s/it]

Error extracting text from http://asmdc.org/members/a14/news-room/press-releases/bonilla-s-autonomous-vehicle-bill-drives-testing-forward-for-contra-costa-county: 404 Client Error: Not Found for url: https://asmdc.org:443/members/a14/news-room/press-releases/bonilla-s-autonomous-vehicle-bill-drives-testing-forward-for-contra-costa-county
Error extracting text from https://www.reuters.com/article/us-britain-eu/eus-barnier-says-just-hours-left-for-a-brexit-trade-deal-idUSKBN28S0RH?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu/eus-barnier-says-just-hours-left-for-a-brexit-trade-deal-idUSKBN28S0RH?il=0


Processing URLs:  94%|█████████▍| 944/1000 [38:41<01:46,  1.91s/it]

Error extracting text from http://www.comres.co.uk/eu-referendum-all-still-to-play-for-by-not-neck-and-neck/: 403 Client Error: Forbidden for url: http://comresglobal.com/eu-referendum-all-still-to-play-for-by-not-neck-and-neck/
Error extracting text from http://www.nato.int/cps/en/natohq/official_texts_125591.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/official_texts_125591.htm?selectedLocale=en


Processing URLs:  95%|█████████▌| 951/1000 [38:49<01:03,  1.29s/it]

Error extracting text from http://www.businessinsider.com/the-kurds-new-offensive-is-going-straight-for-isis-iraqi-capital-2016-8: 404 Client Error: Not Found for url: https://www.businessinsider.com/the-kurds-new-offensive-is-going-straight-for-isis-iraqi-capital-2016-8


Processing URLs:  95%|█████████▌| 952/1000 [38:52<01:21,  1.70s/it]

Error extracting text from http://aranews.net/2016/04/joint-us-kurdish-operation-kills-isis-leader-southern-mosul/: 404 Client Error: Not Found for url: http://aranews.net/2016/04/joint-us-kurdish-operation-kills-isis-leader-southern-mosul/


Processing URLs:  95%|█████████▌| 953/1000 [38:55<01:43,  2.21s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-06-01/venezuelan-credit-dashboard-short-term-default-concerns-ease


Processing URLs:  96%|█████████▌| 955/1000 [38:56<01:07,  1.49s/it]

Error extracting text from https://www.nytimes.com/2007/01/18/world/asia/18cnd-china.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2007/01/18/world/asia/18cnd-china.html


Processing URLs:  96%|█████████▌| 958/1000 [39:02<01:14,  1.77s/it]

Error extracting text from http://www.nato.int/cps/en/natohq/news_123958.htm?selectedLocale=en: 403 Client Error: Forbidden for url: http://www.nato.int/cps/en/natohq/news_123958.htm?selectedLocale=en


Processing URLs:  96%|█████████▋| 965/1000 [39:31<02:54,  4.97s/it]

Error extracting text from https://www.nytimes.com/2017/04/29/world/asia/marines-return-to-helmand-province-for-a-job-they-thought-was-done.html?_r=1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/04/29/world/asia/marines-return-to-helmand-province-for-a-job-they-thought-was-done.html?_r=1


Processing URLs:  97%|█████████▋| 969/1000 [39:38<01:07,  2.18s/it]

Error extracting text from https://www.yahoo.com/news/iraq-forces-fierce-mosul-fighting-jihadists-000756845.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/iraq-forces-fierce-mosul-fighting-jihadists-000756845.html
Error extracting text from http://www.reuters.com/article/us-usa-china-pentagon-idUSKCN0Y42J1: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-china-pentagon-idUSKCN0Y42J1


Processing URLs:  97%|█████████▋| 970/1000 [39:40<01:05,  2.19s/it]

Error extracting text from https://cleanvehiclerebate.org/eng/pfp: 404 Client Error: Not Found for url: https://cleanvehiclerebate.org/en/pfp


Processing URLs:  97%|█████████▋| 971/1000 [39:40<00:46,  1.60s/it]

Error extracting text from http://thenationonlineng.net/residents-panic-military-searches-militants-fighter-jets/: 403 Client Error: Forbidden for url: https://thenationonlineng.net/residents-panic-military-searches-militants-fighter-jets/


Processing URLs:  98%|█████████▊| 975/1000 [39:45<00:28,  1.14s/it]

Error extracting text from https://www.straitstimes.com/world/americas/el-salvador-ratifies-ties-with-china-after-taiwan-switch: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))


Processing URLs:  98%|█████████▊| 978/1000 [39:47<00:18,  1.17it/s]

Error extracting text from http://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#QfSaymwcdMOLtUGY.97: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/30/us-opec-meeting-indonesia-idUSKBN0TJ27U20151130#QfSaymwcdMOLtUGY.97


Processing URLs:  98%|█████████▊| 980/1000 [39:51<00:25,  1.29s/it]

Error extracting text from http://www.pcacases.com/web/sendAttach/1530: 406 Client Error: Not Acceptable for url: http://www.pcacases.com/web/sendAttach/1530


Processing URLs:  98%|█████████▊| 982/1000 [39:52<00:13,  1.37it/s]

Error extracting text from http://www.latimes.com/business/hollywood/la-fi-ct-stankey-att-time-warner-culture-20170907-htmlstory.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/hollywood/la-fi-ct-stankey-att-time-warner-culture-20170907-htmlstory.html


Processing URLs:  98%|█████████▊| 984/1000 [39:56<00:25,  1.57s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3627607/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3627607/


Processing URLs:  98%|█████████▊| 985/1000 [39:58<00:23,  1.54s/it]

Error extracting text from http://mobile.reuters.com/article/idUSKCN0XO29C: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/idUSKCN0XO29C


Processing URLs:  99%|█████████▉| 988/1000 [40:02<00:15,  1.26s/it]

Error extracting text from http://www.nytimes.com/2016/04/26/us/politics/ted-cruz-john-kasich-donald-trump.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/04/26/us/politics/ted-cruz-john-kasich-donald-trump.html


Processing URLs:  99%|█████████▉| 989/1000 [40:02<00:11,  1.04s/it]

Error extracting text from http://thehill.com/policy/finance/322971-flake-becomes-latest-gop-senator-to-raise-concerns-about-border-tax-proposal: 403 Client Error: Forbidden for url: https://thehill.com/policy/finance/322971-flake-becomes-latest-gop-senator-to-raise-concerns-about-border-tax-proposal/


Processing URLs:  99%|█████████▉| 990/1000 [40:03<00:09,  1.02it/s]

Error extracting text from http://www.ifrs.org/Alerts/Publication/Pages/Saudi-Arabia-to-require-use-of-IFRS-Standards-in-2017-and-IFRS-for-SMEs-in-2018.aspx: 404 Client Error: Not Found for url: https://www.ifrs.org/Alerts/Publication/Pages/Saudi-Arabia-to-require-use-of-IFRS-Standards-in-2017-and-IFRS-for-SMEs-in-2018.aspx


Processing URLs:  99%|█████████▉| 994/1000 [40:15<00:15,  2.55s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-11-11/u-s-warns-europe-that-russian-troops-may-plan-ukraine-invasion?utm_source=twitter&amp;utm_medium=social&amp;utm_campaign=socialflow-organic&amp;utm_content=politics&amp;cmpid%3D=socialflow-twitter-politics
URL filtered: https://www.youtube.com/watch?v=-IdwjJUtuJE
URL filtered: https://www.youtube.com/watch?v=8nTFjVm9sTQ


Processing URLs: 100%|██████████| 1000/1000 [40:19<00:00,  2.42s/it]
Processing URLs:   0%|          | 1/1000 [00:00<03:43,  4.47it/s]

Error extracting text from https://www.nytimes.com/2018/01/27/world/asia/afghanistan-kabul-attack.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/01/27/world/asia/afghanistan-kabul-attack.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region


Processing URLs:   0%|          | 3/1000 [00:18<1:37:12,  5.85s/it]

Error extracting text from http://people.com/movies/lina-esco-swat-harvey-weinstein-sexual-harassment/: 406 Client Error: Not Acceptable for url: https://people.com/movies/lina-esco-swat-harvey-weinstein-sexual-harassment/
Error extracting text from https://www.npd.com/wps/portal/npd/us/news/press-releases/2018/npd-bookscan-recaps-the-year-in-books-2017/: HTTPSConnectionPool(host='www.npd.com', port=443): Max retries exceeded with url: /wps/portal/npd/us/news/press-releases/2018/npd-bookscan-recaps-the-year-in-books-2017/ (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'www.npd.com'. (_ssl.c:1000)")))


Processing URLs:   1%|          | 7/1000 [00:30<1:03:12,  3.82s/it]

Error extracting text from http://www.quotenet.com/bond/search?borrower=50951: HTTPConnectionPool(host='www.quotenet.com', port=80): Max retries exceeded with url: /bond/search?borrower=50951 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307d79160>: Failed to resolve 'www.quotenet.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   1%|          | 8/1000 [00:30<45:51,  2.77s/it]  

Error extracting text from http://www.americanshipper.com/Main/News/How_big_are_the_big_ships_calling_US_ports_62250.aspx?taxonomy=Ports1#hide: 404 Client Error: Not Found for url: http://www.americanshipper.com/Main/News/How_big_are_the_big_ships_calling_US_ports_62250.aspx?taxonomy=Ports1#hide
URL filtered: http://www.bloomberg.com/politics/articles/2016-03-11/few-barriers-to-adoption-of-self-driving-vehicles-study-says


Processing URLs:   1%|          | 11/1000 [00:33<27:57,  1.70s/it]

Error extracting text from http://www.channelnewsasia.com/news/asiapacific/china-says-missile/2522662.html: 404 Client Error: Not Found for url: https://www.channelnewsasia.com/news/asiapacific/china-says-missile/2522662.html


Processing URLs:   1%|          | 12/1000 [00:34<24:43,  1.50s/it]

Error extracting text from https://www.csoonline.com/article/3238472/phishing/the-internets-dark-triad-that-we-need-to-protect-ourselves-against.html: 404 Client Error: Not Found for url: https://www.csoonline.com/article/3238472/phishing/the-internets-dark-triad-that-we-need-to-protect-ourselves-against.html


Processing URLs:   2%|▏         | 15/1000 [01:36<4:56:12, 18.04s/it]

Error extracting text from https://www.itv.com/news/2020-12-24/a-brexit-deal-has-been-done-this-is-what-happens-next: HTTPSConnectionPool(host='www.itv.com', port=443): Read timed out. (read timeout=60)


Processing URLs:   2%|▏         | 17/1000 [01:36<2:30:22,  9.18s/it]

Error extracting text from https://www.reuters.com/article/us-safrica-politics/south-africas-zuma-faces-new-no-confidence-vote-this-month-idUSKBN1FM1B5?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/south-africas-zuma-faces-new-no-confidence-vote-this-month-idUSKBN1FM1B5?il=0


Processing URLs:   2%|▏         | 22/1000 [01:43<40:13,  2.47s/it]  

URL filtered: https://www.youtube.com/watch?v=pBWcRqPesws


Processing URLs:   3%|▎         | 26/1000 [01:49<31:52,  1.96s/it]

URL filtered: https://www.youtube.com/watch?v=vBDup86RyUg
URL filtered: http://www.bloomberg.com/news/articles/2015-10-22/venezuela-s-pdvsa-has-8-billion-of-u-s-assets-at-risk-in-probe


Processing URLs:   3%|▎         | 29/1000 [01:50<16:23,  1.01s/it]

Error extracting text from https://thehill.com/homenews/senate/546215-senate-parliamentarian-to-let-democrats-bypass-filibuster-with-third-bill: 403 Client Error: Forbidden for url: https://thehill.com/homenews/senate/546215-senate-parliamentarian-to-let-democrats-bypass-filibuster-with-third-bill/


Processing URLs:   3%|▎         | 30/1000 [01:50<14:38,  1.10it/s]

Error extracting text from https://www.sciencedirect.com/science/article/abs/pii/S1058330003000387: 403 Client Error: Forbidden for url: https://www.sciencedirect.com/science/article/abs/pii/S1058330003000387


Processing URLs:   3%|▎         | 31/1000 [01:52<18:12,  1.13s/it]

Error extracting text from http://www.theglobeandmail.com/news/world/rousseff-impeachment-trial-to-hear-closing-arguments/article31602783/?service=mobile: 404 Client Error: Not Found for url: https://www.theglobeandmail.com/news/world/rousseff-impeachment-trial-to-hear-closing-arguments/article31602783/?service=mobile
URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O4GFRU6JIJVD01-7CBR7VD7GOCV6UCFOOCE9SAL0A


Processing URLs:   3%|▎         | 33/1000 [01:53<13:23,  1.20it/s]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/11/15/0200000000AEN20151115000700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))


Processing URLs:   4%|▎         | 35/1000 [01:56<17:46,  1.10s/it]

Error extracting text from http://www.breakbulk.com/panama-canal-expansion-passes-first-floodgate-test/: 404 Client Error: Not Found for url: https://breakbulk.com/panama-canal-expansion-passes-first-floodgate-test/


Processing URLs:   4%|▎         | 37/1000 [01:57<13:28,  1.19it/s]

Error extracting text from http://www.reuters.com/article/2015/11/06/us-southchinasea-usa-warship-idUSKCN0SV05420151106#QRqQqiRIiLJWuxHk.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/06/us-southchinasea-usa-warship-idUSKCN0SV05420151106#QRqQqiRIiLJWuxHk.99


Processing URLs:   4%|▍         | 38/1000 [01:58<10:43,  1.50it/s]

Error extracting text from https://news.yahoo.com/israel-arrests-top-hamas-leader-032618231.html: 404 Client Error: Not Found for url: https://news.yahoo.com/israel-arrests-top-hamas-leader-032618231.html


Processing URLs:   4%|▍         | 40/1000 [01:59<12:02,  1.33it/s]

Error extracting text from http://asia.nikkei.com/Politics-Economy/International-Relations/Important-shipping-lane-could-become-China-s-Caribbean: 404 Client Error: Not Found for url: https://asia.nikkei.com/Politics-Economy/International-Relations/Important-shipping-lane-could-become-China-s-Caribbean


Processing URLs:   4%|▍         | 42/1000 [02:01<12:46,  1.25it/s]

Error extracting text from http://amzn.to/1PKffFp: 500 Server Error: Internal Server Error for url: https://www.amazon.com/Superforecasting-The-Art-Science-Prediction/dp/0804136696/ref=as_li_ss_tl?ie=UTF8&linkCode=sl1&tag=junoandherpea2-20&linkId=755265eafe584c89b7ed53f7edd2ee9c


Processing URLs:   4%|▍         | 43/1000 [02:02<14:28,  1.10it/s]

Error extracting text from http://www.cncda.org/CMS/Pubs/CA%20Auto%20Outlook%201Q%202016.pdf: 403 Client Error: Forbidden for url: http://www.cncda.org/CMS/Pubs/CA%20Auto%20Outlook%201Q%202016.pdf


Processing URLs:   4%|▍         | 45/1000 [02:03<11:38,  1.37it/s]

Error extracting text from http://www.reuters.com/article/us-europe-migrants-schengen-idUSKCN0W613U: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-migrants-schengen-idUSKCN0W613U


Processing URLs:   5%|▍         | 47/1000 [02:07<17:46,  1.12s/it]

Error extracting text from http://www.securityweek.com/us-cyber-command-launched-ddos-attack-against-north-korea-report: 403 Client Error: Forbidden for url: https://www.securityweek.com/us-cyber-command-launched-ddos-attack-against-north-korea-report


Processing URLs:   5%|▍         | 48/1000 [02:14<45:52,  2.89s/it]

Error extracting text from https://www.investopedia.com/terms/c/consumerpriceindex.asp: 405 Client Error: Signal - Not Acceptable for url: https://www.investopedia.com/terms/c/consumerpriceindex.asp


Processing URLs:   5%|▌         | 50/1000 [02:14<23:58,  1.51s/it]

Error extracting text from http://www.wsj.com/articles/some-hillary-clinton-donors-defect-to-movement-to-draft-joe-biden-1443816232: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/some-hillary-clinton-donors-defect-to-movement-to-draft-joe-biden-1443816232
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-ceasefire-idUSKCN11J0JH: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-ceasefire-idUSKCN11J0JH


Processing URLs:   5%|▌         | 51/1000 [02:16<22:46,  1.44s/it]

Error extracting text from http://www.ayyaantuu.net/ethiopia-the-tplf-hidden-agenda-of-reducing-the-oromo-population-must-be-stopped/: HTTPConnectionPool(host='www.ayyaantuu.net', port=80): Max retries exceeded with url: /ethiopia-the-tplf-hidden-agenda-of-reducing-the-oromo-population-must-be-stopped/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x303fdcd70>: Failed to resolve 'www.ayyaantuu.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:   5%|▌         | 54/1000 [02:18<15:52,  1.01s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-amiri-idUSKCN0V41Z7: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-amiri-idUSKCN0V41Z7


Processing URLs:   6%|▌         | 56/1000 [02:20<12:51,  1.22it/s]

Error extracting text from https://thehill.com/opinion/national-security/535570-urgent-extend-new-start-treaty-with-russia-now: 403 Client Error: Forbidden for url: https://thehill.com/opinion/national-security/535570-urgent-extend-new-start-treaty-with-russia-now/


Processing URLs:   6%|▌         | 61/1000 [02:26<14:56,  1.05it/s]

Error extracting text from http://inhomelandsecurity.com/afghan-troop-surge-marines/?utm_source=IHS&amp;utm_medium=newsletter&amp;utm_content=afghan-troop-surge-marines&amp;utm_campaign=20170901IHS: 403 Client Error: Forbidden for url: https://amuedge.com/afghan-troop-surge-marines/?utm_source=IHS&amp;utm_medium=newsletter&amp;utm_content=afghan-troop-surge-marines&amp;utm_campaign=20170901IHS
Error extracting text from https://news.usni.org/2016/09/22/experts-advocate-harder-stance-against-illegal-claims-in-south-china-sea: 403 Client Error: Forbidden for url: https://news.usni.org/2016/09/22/experts-advocate-harder-stance-against-illegal-claims-in-south-china-sea


Processing URLs:   6%|▌         | 62/1000 [02:26<10:59,  1.42it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN12I17X: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-idUSKCN12I17X


Processing URLs:   6%|▋         | 64/1000 [02:29<20:07,  1.29s/it]

Error extracting text from https://www.google.com/amp/amp.slate.com/articles/news_and_politics/politics/2017/01/what_happens_to_the_obamacare_mandate_now.html?client=safari: 404 Client Error: Not Found for url: https://slate.com/news-and-politics/2017/01/what-happens-to-the-obamacare-mandate-now.amp


Processing URLs:   7%|▋         | 66/1000 [02:31<18:58,  1.22s/it]

Error extracting text from https://bit.ly/3pJ4X0L: 404 Client Error: Not Found for url: https://www.kob.com/business-news/italys-president-4-days-to-see-if-coalition-can-be-reborn-/5994694/


Processing URLs:   7%|▋         | 69/1000 [02:36<26:05,  1.68s/it]

Error extracting text from https://gcn.com/articles/2012/03/13/cybersecurity-vs-foia-protecting-sensitive-data.aspx: 404 Client Error: NOT FOUND for url: https://www.route-fifty.com/articles/2012/03/13/cybersecurity-vs-foia-protecting-sensitive-data.aspx/


Processing URLs:   8%|▊         | 76/1000 [02:50<20:09,  1.31s/it]

Error extracting text from http://www.reuters.com/article/us-nigeria-oil-security-idUSKCN0YP0XI: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-oil-security-idUSKCN0YP0XI


Processing URLs:   8%|▊         | 80/1000 [02:57<25:51,  1.69s/it]

Error extracting text from http://campaign.r20.constantcontact.com/render?ca=ae168016-99b3-4445-a00d-5c8a1e5684e6&amp;c=b713dc60-5ada-11e4-b4e7-d4ae529ce120&amp;ch=b7190c80-5ada-11e4-b4e7-d4ae529ce120: 500 Server Error: Internal Server Error for url: http://campaign.r20.constantcontact.com/render?ca=ae168016-99b3-4445-a00d-5c8a1e5684e6&amp;c=b713dc60-5ada-11e4-b4e7-d4ae529ce120&amp;ch=b7190c80-5ada-11e4-b4e7-d4ae529ce120


Processing URLs:   9%|▊         | 86/1000 [03:08<24:19,  1.60s/it]

Error extracting text from https://reut.rs/2OWw0Io: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/markets/bonds


Processing URLs:   9%|▉         | 89/1000 [03:14<35:26,  2.33s/it]

Error extracting text from http://www.themedialine.org/news/palestinian-prime-minister-says-security-cooperation-with-israel-to-end/: 403 Client Error: Forbidden for url: https://themedialine.org/news/palestinian-prime-minister-says-security-cooperation-with-israel-to-end/


Processing URLs:   9%|▉         | 90/1000 [03:20<50:30,  3.33s/it]

Error extracting text from http://news.trust.org/item/20160210224619-c2652: 404 Client Error:  for url: https://news.trust.org:443/item/20160210224619-c2652


Processing URLs:   9%|▉         | 93/1000 [03:22<23:01,  1.52s/it]

Error extracting text from http://www.nytimes.com/2015/10/22/us/politics/assad-finds-chilly-embrace-in-moscow-trip.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=first-column-region&amp;region=top-news&amp;WT: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/10/22/us/politics/assad-finds-chilly-embrace-in-moscow-trip.html?hp&amp;action=click&amp;pgtype=Homepage&amp;module=first-column-region&amp;region=top-news&amp;WT


Processing URLs:  10%|█         | 101/1000 [03:30<14:42,  1.02it/s]

Error extracting text from https://www.timesofisrael.com/how-do-you-solve-a-problem-like-palestinian-reconciliation-slowly-if-at-all/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/how-do-you-solve-a-problem-like-palestinian-reconciliation-slowly-if-at-all/


Processing URLs:  10%|█         | 102/1000 [03:32<15:44,  1.05s/it]

Error extracting text from https://www.wsj.com/articles/in-himalayas-a-new-power-rises-water-1495013404?mod=e2fb: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/in-himalayas-a-new-power-rises-water-1495013404?mod=e2fb


Processing URLs:  10%|█         | 104/1000 [03:37<25:55,  1.74s/it]

Error extracting text from https://scontent-grt2-1.xx.fbcdn.net/v/t1.0-9/13339517_1063272960433341_8165003861634655758_n.jpg?oh=3f5351c52a76a9b376b6a8f0d6cccbdc&amp;oe=57DCBF12: HTTPSConnectionPool(host='scontent-grt2-1.xx.fbcdn.net', port=443): Max retries exceeded with url: /v/t1.0-9/13339517_1063272960433341_8165003861634655758_n.jpg?oh=3f5351c52a76a9b376b6a8f0d6cccbdc&amp;oe=57DCBF12 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x305df2150>: Failed to resolve 'scontent-grt2-1.xx.fbcdn.net' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  11%|█▏        | 113/1000 [03:59<26:52,  1.82s/it]

Error extracting text from http://www.cnet.com/news/teslas-gigafactory-grand-opening-to-take-place-july-29/: 410 Client Error: Gone for url: https://www.cnet.com/tech/tech-industry/teslas-gigafactory-grand-opening-to-take-place-july-29/


Processing URLs:  11%|█▏        | 114/1000 [04:00<23:46,  1.61s/it]

Error extracting text from http://www.state.gov/t/vci/trty/126118.htm: 404 Client Error: Not Found for url: https://www.state.gov/t/vci/trty/126118.htm


Processing URLs:  12%|█▏        | 115/1000 [04:00<19:27,  1.32s/it]

Error extracting text from https://goodjudgment.io/economist/#1426: 404 Client Error: Not Found for url: https://goodjudgment.io/economist/#1426


Processing URLs:  12%|█▏        | 116/1000 [04:02<19:58,  1.36s/it]

Error extracting text from https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/111300/rafael-nadal-withdraws-from-one-of-his-favorite-tournaments/: 403 Client Error: Forbidden for url: https://www.tennisworldusa.org/tennis/news/Rafael_Nadal/111300/rafael-nadal-withdraws-from-one-of-his-favorite-tournaments/


Processing URLs:  12%|█▏        | 117/1000 [04:04<23:34,  1.60s/it]

Error extracting text from http://www.polioeradication.org/Mediaroom/NewsletterPolioNews.aspx: 404 Client Error: Not Found for url: https://polioeradication.org/Mediaroom/NewsletterPolioNews.aspx


Processing URLs:  12%|█▏        | 119/1000 [04:07<19:42,  1.34s/it]

Error extracting text from https://www.barrons.com/articles/how-the-stimulus-bill-could-come-undone-and-what-it-would-mean-for-the-stock-market-51608813011: 403 Client Error: Forbidden for url: https://www.barrons.com/articles/how-the-stimulus-bill-could-come-undone-and-what-it-would-mean-for-the-stock-market-51608813011


Processing URLs:  12%|█▏        | 122/1000 [04:11<20:44,  1.42s/it]

Error extracting text from http://www.un.org/apps/news/story.asp?NewsID=58214#.Wi2YfkysO8U: 403 Client Error: Forbidden for url: https://www.un.org/apps/news/story.asp?NewsID=58214#.Wi2YfkysO8U


Processing URLs:  12%|█▏        | 124/1000 [04:14<22:32,  1.54s/it]

Error extracting text from http://www.sfexaminer.com/vallejo-unveils-plans-electric-car-plant-ex-shipyard/: 404 Client Error: Not Found for url: https://www.sfexaminer.com/vallejo-unveils-plans-electric-car-plant-ex-shipyard/


Processing URLs:  12%|█▎        | 125/1000 [04:16<21:59,  1.51s/it]

Error extracting text from https://citizen.co.za/news/south-africa/1818561/step-down-or-face-vote-of-no-confidence-mantashe-warns-zuma/: 404 Client Error: Not Found for url: https://www.citizen.co.za/news/south-africa/step-down-or-face-vote-of-no-confidence-mantashe-warns-zuma/


Processing URLs:  13%|█▎        | 129/1000 [04:21<18:26,  1.27s/it]

Error extracting text from http://in.reuters.com/article/thailand-politics-idINKBN13J0BN: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in


Processing URLs:  13%|█▎        | 131/1000 [04:22<11:33,  1.25it/s]

Error extracting text from http://static.googleusercontent.com/media/www.google.com/en//selfdrivingcar/files/reports/report-0515.pdf: 404 Client Error: Not Found for url: http://static.googleusercontent.com/media/www.google.com/en//selfdrivingcar/files/reports/report-0515.pdf


Processing URLs:  13%|█▎        | 133/1000 [04:22<07:20,  1.97it/s]

Error extracting text from https://msf.exposure.co/trapped-at-europes-borders: 403 Client Error: Forbidden for url: https://msf.exposure.co/trapped-at-europes-borders


Processing URLs:  14%|█▎        | 136/1000 [04:25<10:53,  1.32it/s]

Error extracting text from http://www.boxofficemojo.com/movies/?page=daily&amp;id=jurassicpark4.htm: 404 Client Error: Not Found for url: https://www.boxofficemojo.com/movies/?page=daily&amp;id=jurassicpark4.htm
Error extracting text from http://www.worldbulletin.net/international-media/171795/turkish-press-review-on-apr-20: 403 Client Error: Forbidden for url: http://www.worldbulletin.net/international-media/171795/turkish-press-review-on-apr-20


Processing URLs:  14%|█▎        | 137/1000 [05:25<4:09:31, 17.35s/it]

Error extracting text from http://www.miamiherald.com/news/nation-world/world/americas/colombia/article114862333.html: HTTPConnectionPool(host='www.miamiherald.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  14%|█▍        | 138/1000 [05:27<3:04:27, 12.84s/it]

Error extracting text from https://macropolo.org/the-committee/: 403 Client Error: Forbidden for url: https://macropolo.org/the-committee/


Processing URLs:  14%|█▍        | 139/1000 [05:28<2:15:42,  9.46s/it]

Error extracting text from http://www.northjersey.com/news/stile-christie-lends-gentle-hand-to-trump-campaign-1.1621860: 404 Client Error: OK for url: https://www.northjersey.com/news/stile-christie-lends-gentle-hand-to-trump-campaign-1.1621860/


Processing URLs:  15%|█▍        | 147/1000 [06:40<4:43:35, 19.95s/it]

Error extracting text from http://ewp.dali.dartmouth.edu/questions/14: HTTPConnectionPool(host='ewp.dali.dartmouth.edu', port=80): Max retries exceeded with url: /questions/14 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fe8421e0>, 'Connection to ewp.dali.dartmouth.edu timed out. (connect timeout=60)'))


Processing URLs:  15%|█▍        | 149/1000 [06:42<2:26:48, 10.35s/it]

Error extracting text from http://evobsession.com/baic-takes-gold-bronze-october-china/: 403 Client Error: Forbidden for url: http://evobsession.com/baic-takes-gold-bronze-october-china/


Processing URLs:  15%|█▌        | 154/1000 [06:57<56:40,  4.02s/it]  

Error extracting text from http://www.nrttv.com/EN/Details.aspx?Jimare=9138: 403 Client Error: Forbidden for url: https://www.nrttv.com/EN/Details.aspx?Jimare=9138


Processing URLs:  16%|█▌        | 156/1000 [07:00<38:36,  2.75s/it]

Error extracting text from http://www.realclearpolitics.com/video/2015/11/13/obama_on_isis_we_have_contained_them.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/video/2015/11/13/obama_on_isis_we_have_contained_them.html


Processing URLs:  16%|█▌        | 157/1000 [07:01<29:23,  2.09s/it]

Error extracting text from http://warisboring.com/articles/the-u-s-embassy-in-baghdad-is-preparing-for-iraqs-mosul-dam-to-burst/?mc_cid=c3bda8b7db&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/the-u-s-embassy-in-baghdad-is-preparing-for-iraqs-mosul-dam-to-burst/?mc_cid=c3bda8b7db&amp;mc_eid=0467f21653


Processing URLs:  16%|█▌        | 159/1000 [07:02<19:13,  1.37s/it]

Error extracting text from https://www.congress.gov/resources/display/content/The+Federalist+Papers#TheFederalistPapers-1: 403 Client Error: Forbidden for url: https://www.congress.gov/resources/display/content/The+Federalist+Papers#TheFederalistPapers-1


Processing URLs:  16%|█▌        | 160/1000 [07:05<23:07,  1.65s/it]

URL filtered: https://www.youtube.com/watch?v=rRcrjSB2xvs


Processing URLs:  16%|█▌        | 162/1000 [07:05<13:32,  1.03it/s]

Error extracting text from http://www.reuters.com/article/us-nigeria-security/suicide-bombers-kill-10-in-nigerias-maiduguri-emergency-official-idUSKBN1DF2UA?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-security/suicide-bombers-kill-10-in-nigerias-maiduguri-emergency-official-idUSKBN1DF2UA?il=0


Processing URLs:  16%|█▋        | 165/1000 [07:12<19:06,  1.37s/it]

Error extracting text from https://www.nytimes.com/2016/07/13/world/asia/south-china-sea-hague-ruling-philippines.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/07/13/world/asia/south-china-sea-hague-ruling-philippines.html


Processing URLs:  17%|█▋        | 169/1000 [07:31<1:04:03,  4.63s/it]

Error extracting text from http://www.neweasterneurope.eu/articles-and-commentary/2245-brexit-and-the-western-balkans-what-to-expect: 404 Client Error: Not Found for url: https://neweasterneurope.eu/articles-and-commentary/2245-brexit-and-the-western-balkans-what-to-expect


Processing URLs:  18%|█▊        | 176/1000 [07:36<12:58,  1.06it/s]  

Error extracting text from https://www.lavozdelsandinismo.com/nicaragua/2020-10-12/sistema-de-salud-de-nicaragua-recibe-nueva-donacion-de-taiwan/: HTTPSConnectionPool(host='www.lavozdelsandinismo.com', port=443): Max retries exceeded with url: /nicaragua/2020-10-12/sistema-de-salud-de-nicaragua-recibe-nueva-donacion-de-taiwan/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30708a540>: Failed to resolve 'www.lavozdelsandinismo.com' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/2015/11/16/us-g20-turkey-russia-japan-idUSKCN0T518F20151116#P7E614DF8845XVEt.99: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/11/16/us-g20-turkey-russia-japan-idUSKCN0T518F20151116#P7E614DF8845XVEt.99


Processing URLs:  18%|█▊        | 178/1000 [07:38<12:13,  1.12it/s]

Error extracting text from https://www.timesofisrael.com/top-iranian-general-forces-in-syria-lebanon-awaiting-orders-to-destroy-israel: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/top-iranian-general-forces-in-syria-lebanon-awaiting-orders-to-destroy-israel


Processing URLs:  18%|█▊        | 181/1000 [07:43<14:12,  1.04s/it]

Error extracting text from http://www.latimes.com/business/la-fi-net-neutrality-fcc-20170517-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/business/la-fi-net-neutrality-fcc-20170517-story.html
Error extracting text from http://www.reuters.com/article/us-turkey-europe-netherlands-idUSKBN16I07O?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-turkey-europe-netherlands-idUSKBN16I07O?il=0


Processing URLs:  18%|█▊        | 182/1000 [07:44<14:12,  1.04s/it]

Error extracting text from https://www.nord-stream2.com/de/media-info/neuigkeiten/letztes-rohr-der-nord-stream-2-pipeline-verschweit-151/: HTTPSConnectionPool(host='www.nord-stream2.com', port=443): Max retries exceeded with url: /de/media-info/neuigkeiten/letztes-rohr-der-nord-stream-2-pipeline-verschweit-151/ (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x30650da90>: Failed to resolve 'www.nord-stream2.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  18%|█▊        | 184/1000 [07:44<08:45,  1.55it/s]

Error extracting text from https://www.nytimes.com/2021/03/01/us/extremism-capitol-riot.html?campaign_id=2&amp;emc=edit_th_20210302&amp;instance_id=27628&amp;nl=todaysheadlines&amp;regi_id=77825025&amp;segment_id=52599&amp;user_id=dc6933c94fc82416d25dd297622374d1: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/03/01/us/extremism-capitol-riot.html?campaign_id=2&amp;emc=edit_th_20210302&amp;instance_id=27628&amp;nl=todaysheadlines&amp;regi_id=77825025&amp;segment_id=52599&amp;user_id=dc6933c94fc82416d25dd297622374d1


Processing URLs:  19%|█▊        | 187/1000 [07:47<10:08,  1.34it/s]

Error extracting text from http://allafrica.com/stories/201604151113.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201604151113.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x30650ea20>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  19%|█▉        | 192/1000 [08:00<35:56,  2.67s/it]

Error extracting text from https://english.aawsat.com/carlotta-gall/features/iran-gains-ground-afghanistan-us-presence-wanes: 403 Client Error: Forbidden for url: https://english.aawsat.com/carlotta-gall/features/iran-gains-ground-afghanistan-us-presence-wanes


Processing URLs:  19%|█▉        | 193/1000 [08:03<33:57,  2.52s/it]

URL filtered: https://twitter.com/khamenei_ir?lang=fa


Processing URLs:  20%|█▉        | 195/1000 [08:03<19:50,  1.48s/it]

Error extracting text from http://pakobserver.net/2016/07/15/no-plans-for-taliban-talks-ghani/: 403 Client Error: Forbidden for url: http://pakobserver.net/2016/07/15/no-plans-for-taliban-talks-ghani/


Processing URLs:  20%|█▉        | 196/1000 [08:05<22:06,  1.65s/it]

URL filtered: https://cleantechnica.com/2017/07/14/bloomberg-tesla-set-win/


Processing URLs:  20%|██        | 200/1000 [09:12<3:37:05, 16.28s/it]

Error extracting text from http://www.usnews.com/news/world/articles/2015/11/27/the-latest-france-focused-on-destroying-is-hq-in-raqqa: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  20%|██        | 203/1000 [09:14<1:28:18,  6.65s/it]

Error extracting text from http://www.reuters.com/article/2015/12/01/britain-eu-osborne-idUSU8N12L05820151201: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/12/01/britain-eu-osborne-idUSU8N12L05820151201


Processing URLs:  20%|██        | 204/1000 [09:15<1:08:21,  5.15s/it]

Error extracting text from http://dbiosla.org/publications/resources/dbio100.html: 406 Client Error: Not Acceptable for url: http://dbiosla.org/publications/resources/dbio100.html


Processing URLs:  21%|██        | 206/1000 [09:18<40:33,  3.06s/it]  

Error extracting text from https://www.nytimes.com/2021/08/25/world/europe/navalny-jail-prison.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/25/world/europe/navalny-jail-prison.html


Processing URLs:  21%|██        | 210/1000 [09:23<22:31,  1.71s/it]

Error extracting text from https://www.reuters.com/world/middle-east/nuclear-monitoring-deal-between-iran-iaea-has-expired-says-top-lawmaker-2021-05-23/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/nuclear-monitoring-deal-between-iran-iaea-has-expired-says-top-lawmaker-2021-05-23/


Processing URLs:  21%|██▏       | 214/1000 [09:36<47:02,  3.59s/it]

Error extracting text from https://www.washingtonpost.com/world/asia_pacific/recent-developments-surrounding-the-south-china-sea/2017/02/19/3476bddc-f71c-11e6-aa1e-5f735ee31334_story.html?utm_term=.2868bdd0baa8: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/asia_pacific/recent-developments-surrounding-the-south-china-sea/2017/02/19/3476bddc-f71c-11e6-aa1e-5f735ee31334_story.html?utm_term=.2868bdd0baa8


Processing URLs:  22%|██▎       | 225/1000 [10:00<22:18,  1.73s/it]

Error extracting text from http://www.wsj.com/articles/europes-banking-union-flunks-its-first-test-1453939388: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/europes-banking-union-flunks-its-first-test-1453939388


Processing URLs:  23%|██▎       | 228/1000 [10:03<16:27,  1.28s/it]

Error extracting text from http://www.reuters.com/article/us-haiti-election-idUSKCN0XC0D4: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-haiti-election-idUSKCN0XC0D4


Processing URLs:  23%|██▎       | 230/1000 [10:06<16:47,  1.31s/it]

Error extracting text from http://en.delfi.lt/eu/some-of-camerons-proposals-require-amending-eu-treaties-president-grybauskaite-says.d?id=69549354: 403 Client Error: Forbidden for url: https://www.delfi.lt/en/eu/some-of-camerons-proposals-require-amending-eu-treaties-president-grybauskaite-says.d?id=69549354


Processing URLs:  23%|██▎       | 233/1000 [10:13<21:08,  1.65s/it]

URL filtered: https://twitter.com/rosatom
URL filtered: https://www.youtube.com/watch?v=sM8ix0siRVQ


Processing URLs:  24%|██▍       | 238/1000 [10:17<14:42,  1.16s/it]

Error extracting text from http://thehill.com/policy/energy-environment/332810-dakota-access-pipeline-leaks-84-gallons-of-oil-in-sd: 403 Client Error: Forbidden for url: https://thehill.com/policy/energy-environment/332810-dakota-access-pipeline-leaks-84-gallons-of-oil-in-sd/


Processing URLs:  24%|██▍       | 240/1000 [10:18<09:56,  1.27it/s]

Error extracting text from http://www.nytimes.com/2016/02/13/world/americas/court-ruling-in-venezuela-prompts-accusation-of-an-illegal-power-grab.html?ref=americas: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/13/world/americas/court-ruling-in-venezuela-prompts-accusation-of-an-illegal-power-grab.html?ref=americas


Processing URLs:  24%|██▍       | 242/1000 [10:19<07:44,  1.63it/s]

Error extracting text from http://thehill.com/blogs/pundits-blog/defense/269464-in-afghanistan-things-arent-getting-better: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/defense/269464-in-afghanistan-things-arent-getting-better/


Processing URLs:  24%|██▍       | 245/1000 [10:21<06:59,  1.80it/s]

Error extracting text from http://www.reuters.com/article/us-burundi-security-idUSKCN0WO29G: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-burundi-security-idUSKCN0WO29G


Processing URLs:  25%|██▍       | 247/1000 [10:33<37:10,  2.96s/it]

URL filtered: https://www.youtube.com/watch?v=JrMiSQAGOS4


Processing URLs:  25%|██▍       | 249/1000 [10:34<22:37,  1.81s/it]

URL filtered: https://twitter.com/nytimes/status/1497066350715490317


Processing URLs:  26%|██▌       | 260/1000 [10:48<15:11,  1.23s/it]

Error extracting text from http://www.wsj.com/articles/brazil-president-dilma-rousseff-asks-hostile-congress-to-pass-austerity-bills-1454443939: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/brazil-president-dilma-rousseff-asks-hostile-congress-to-pass-austerity-bills-1454443939


Processing URLs:  26%|██▌       | 262/1000 [11:15<1:36:08,  7.82s/it]

Error extracting text from http://www.ew.com/article/2016/01/02/george-rr-martin-thrones-winds-of-winter: 406 Client Error: Not Acceptable for url: https://www.ew.com/article/2016/01/02/george-rr-martin-thrones-winds-of-winter


Processing URLs:  27%|██▋       | 268/1000 [11:38<42:12,  3.46s/it]  

Error extracting text from http://www.thepharmaletter.com/article/us-fda-novel-new-drug-approvals-in-2015-hit-a-19-year-record-high-of-45: 403 Client Error: Forbidden for url: http://www.thepharmaletter.com/article/us-fda-novel-new-drug-approvals-in-2015-hit-a-19-year-record-high-of-45


Processing URLs:  27%|██▋       | 271/1000 [11:41<20:56,  1.72s/it]

Error extracting text from https://www.consilium.europa.eu/en/press/press-releases/2020/12/29/eu-uk-trade-and-cooperation-agreement-council-adopts-decision-on-the-signing/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2020/12/29/eu-uk-trade-and-cooperation-agreement-council-adopts-decision-on-the-signing/


Processing URLs:  27%|██▋       | 274/1000 [11:54<36:30,  3.02s/it]

Error extracting text from http://learningandfinance.com/2017/03/12/russian-federation-and-turkey-hail-syria-cooperation-and/: 403 Client Error: Forbidden for url: https://www.hugedomains.com/domain_profile.cfm?d=learningandfinance.com


Processing URLs:  28%|██▊       | 275/1000 [11:55<26:29,  2.19s/it]

Error extracting text from https://www.nytimes.com/2017/11/03/business/venezuela-debt.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/03/business/venezuela-debt.html


Processing URLs:  28%|██▊       | 283/1000 [12:27<52:02,  4.35s/it]

Error extracting text from https://www.washingtonpost.com/business/austria-wants-a-full-stop-to-migrant-influx-along-balkans/2016/02/24/ab831f7a-daf2-11e5-8210-f0bd8de915f6_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/business/austria-wants-a-full-stop-to-migrant-influx-along-balkans/2016/02/24/ab831f7a-daf2-11e5-8210-f0bd8de915f6_story.html


Processing URLs:  29%|██▊       | 286/1000 [12:33<30:57,  2.60s/it]

Error extracting text from http://www.tradingeconomics.com/japan/government-debt-to-gdp: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/japan/government-debt-to-gdp


Processing URLs:  29%|██▉       | 289/1000 [12:49<1:00:13,  5.08s/it]

Error extracting text from https://www.washingtonpost.com/world/africa/un-chief-says-donors-kept-famine-at-bay--but-aid-needed/2017/10/12/cfc6d678-af9a-11e7-9b93-b97043e57a22_story.html?utm_term=.100fcc7fabf5: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/africa/un-chief-says-donors-kept-famine-at-bay--but-aid-needed/2017/10/12/cfc6d678-af9a-11e7-9b93-b97043e57a22_story.html?utm_term=.100fcc7fabf5


Processing URLs:  29%|██▉       | 294/1000 [12:58<27:21,  2.33s/it]  

URL filtered: https://www.bloomberg.com/news/articles/2021-02-19/u-s-iran-standoff-shows-difficulty-of-salvaging-nuclear-deal


Processing URLs:  30%|██▉       | 297/1000 [13:09<33:33,  2.86s/it]

Error extracting text from https://www.reuters.com/article/us-usa-cyber-russia-symantec/exclusive-symantec-ceo-says-source-code-reviews-pose-unacceptable-risk-idUSKBN1CF2SB: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-cyber-russia-symantec/exclusive-symantec-ceo-says-source-code-reviews-pose-unacceptable-risk-idUSKBN1CF2SB


Processing URLs:  30%|██▉       | 298/1000 [13:10<27:13,  2.33s/it]

URL filtered: https://www.bloomberg.com/news/articles/2020-12-16/leaders-cite-progress-in-talks-no-deal-yet-congress-update?srnd=premium&amp;sref=NP2QRiNN


Processing URLs:  30%|███       | 305/1000 [13:20<17:16,  1.49s/it]

Error extracting text from https://www.timesofisrael.com/netanyahu-launches-fresh-attack-on-cops-investigating-him/: 403 Client Error: Forbidden for url: https://www.timesofisrael.com/netanyahu-launches-fresh-attack-on-cops-investigating-him/


Processing URLs:  31%|███       | 307/1000 [13:23<18:08,  1.57s/it]

Error extracting text from http://uk.reuters.com/article/uk-poland-eu-kaczynski-idUKKCN12A0JL: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  31%|███       | 310/1000 [13:25<10:31,  1.09it/s]

Error extracting text from https://1tvnews.af/12/07/2021/8893/: 406 Client Error: Not Acceptable for url: https://1tvnews.af/12/07/2021/8893/
Error extracting text from http://www.reuters.com/article/us-nigeria-gas-idUSKCN0YB1YW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-nigeria-gas-idUSKCN0YB1YW


Processing URLs:  31%|███       | 311/1000 [13:26<07:45,  1.48it/s]

Error extracting text from http://www.reuters.com/article/us-australia-usa-japan-idUSKBN0OA1GE20150526: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-australia-usa-japan-idUSKBN0OA1GE20150526


Processing URLs:  31%|███▏      | 314/1000 [13:28<09:59,  1.14it/s]

Error extracting text from http://www.reuters.com/article/us-venezuela-politics-idUSKCN18G0WW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-politics-idUSKCN18G0WW


Processing URLs:  32%|███▏      | 316/1000 [13:31<12:22,  1.09s/it]

URL filtered: https://twitter.com/fchollet/status/820780027058847745


Processing URLs:  32%|███▏      | 319/1000 [13:34<11:23,  1.00s/it]

Error extracting text from https://investors.modernatx.com/news-releases/news-release-details/moderna-announces-first-participants-dosed-phase-23-study-0: 403 Client Error: Forbidden for url: https://investors.modernatx.com/news-releases/news-release-details/moderna-announces-first-participants-dosed-phase-23-study-0
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN16R017: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN16R017


Processing URLs:  32%|███▏      | 322/1000 [13:35<06:23,  1.77it/s]

Error extracting text from http://allafrica.com/stories/201603280235.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201603280235.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3011f3050>: Failed to establish a new connection: [Errno 61] Connection refused'))
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-usa-iraq-idUSKCN0Z80CO: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-usa-iraq-idUSKCN0Z80CO


Processing URLs:  32%|███▏      | 323/1000 [13:35<06:23,  1.77it/s]

Error extracting text from http://thehill.com/policy/cybersecurity/361417-donna-brazile-says-russians-destroyed-critical-dnc-data: 403 Client Error: Forbidden for url: https://thehill.com/policy/cybersecurity/361417-donna-brazile-says-russians-destroyed-critical-dnc-data/


Processing URLs:  33%|███▎      | 329/1000 [14:10<31:41,  2.83s/it]  

Error extracting text from https://www.reuters.com/world/middle-east/iran-nuclear-deal-parties-meet-wrap-up-latest-round-talks-2021-06-02/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/middle-east/iran-nuclear-deal-parties-meet-wrap-up-latest-round-talks-2021-06-02/


Processing URLs:  33%|███▎      | 330/1000 [14:13<31:17,  2.80s/it]

URL filtered: https://twitter.com/katyafimava/status/1436384882100514819
URL filtered: https://www.bloomberg.com/view/articles/2017-05-08/the-buck-stops-with-mitch-mcconnell


Processing URLs:  34%|███▎      | 336/1000 [14:21<20:57,  1.89s/it]

Error extracting text from http://www.reuters.com/article/2015/10/29/uk-mideast-crisis-syria-talks-idUKKCN0SN27S20151029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/10/29/uk-mideast-crisis-syria-talks-idUKKCN0SN27S20151029


Processing URLs:  34%|███▍      | 339/1000 [14:23<11:03,  1.00s/it]

Error extracting text from https://www.predictit.org/Contract/1792/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2016#openoffers: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/1792/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2016#openoffers
Error extracting text from http://www.reuters.com/article/us-apec-summit-usa-russia/trump-says-he-trusts-putins-denials-of-election-meddling-idUSKBN1DB04N: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-apec-summit-usa-russia/trump-says-he-trusts-putins-denials-of-election-meddling-idUSKBN1DB04N


Processing URLs:  34%|███▍      | 343/1000 [14:30<15:38,  1.43s/it]

Error extracting text from https://www.wsj.com/articles/republican-senators-say-no-boycott-planned-for-jackson-committee-vote-11648148340: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/republican-senators-say-no-boycott-planned-for-jackson-committee-vote-11648148340


Processing URLs:  35%|███▍      | 349/1000 [14:38<12:00,  1.11s/it]

Error extracting text from http://www.cnas.org/flashpoints/timeline: 404 Client Error: Not Found for url: https://www.cnas.org:443/flashpoints/timeline
Error extracting text from https://medium.com/@fabrice_deprez/spiteful-tongues-how-telegram-became-the-go-to-place-for-russian-political-gossip-cc50d9182ddf: 403 Client Error: Forbidden for url: https://medium.com/@fabrice_deprez/spiteful-tongues-how-telegram-became-the-go-to-place-for-russian-political-gossip-cc50d9182ddf


Processing URLs:  35%|███▌      | 351/1000 [14:38<07:07,  1.52it/s]

Error extracting text from https://www.reuters.com/article/us-usa-biden-yellen-trade/u-s-treasury-pick-yellen-says-domestic-investment-needed-before-new-trade-deals-idUSKBN29Q2RZ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-biden-yellen-trade/u-s-treasury-pick-yellen-says-domestic-investment-needed-before-new-trade-deals-idUSKBN29Q2RZ
Error extracting text from http://www.rand.org/blog/2015/04/south-koreas-missile-defense-system-decision-qa-with.html: 403 Client Error: Forbidden for url: https://www.rand.org/blog/2015/04/south-koreas-missile-defense-system-decision-qa-with.html


Processing URLs:  36%|███▌      | 356/1000 [14:50<19:32,  1.82s/it]

Error extracting text from https://www.cmu.edu/dietrich/sds/docs/fischhoff/AF-GPH.pdf: 404 Client Error: Not Found for url: https://www.cmu.edu/dietrich/sds/docs/fischhoff/AF-GPH.pdf


Processing URLs:  36%|███▌      | 360/1000 [14:57<17:27,  1.64s/it]

Error extracting text from http://english.yonhapnews.co.kr/news/2015/12/04/0200000000AEN20151204003700315.html: HTTPSConnectionPool(host='en.yna.co.kr', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1000)')))
URL filtered: http://www.bloomberg.com/news/articles/2016-03-29/iran-s-rouhani-suddenly-cancels-trip-to-vienna-citing-security
URL filtered: http://www.bloomberg.com/news/articles/2016-03-16/burundi-shootings-kill-three-target-ruling-party-officials
Error extracting text from https://www.reuters.com/article/us-safrica-politics/zumas-exit-not-on-south-african-ruling-partys-meeting-agenda-idUSKBN1EZ12K?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-safrica-politics/zumas-exit-not-on-south-african-ruling-partys-meeting-agenda-idUSKBN1EZ12K?il=0


Processing URLs:  37%|███▋      | 370/1000 [15:05<07:31,  1.40it/s]

Error extracting text from http://iranimex.ir/: HTTPConnectionPool(host='iranimex.ir', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebe360>: Failed to resolve 'iranimex.ir' ([Errno 8] nodename nor servname provided, or not known)"))
Error extracting text from http://www.reuters.com/article/us-usa-yemen-idUSKBN1691PV: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-yemen-idUSKBN1691PV


Processing URLs:  37%|███▋      | 373/1000 [15:08<09:58,  1.05it/s]

Error extracting text from http://infojustice.org/archives/35875: 406 Client Error: Not Acceptable for url: http://infojustice.org/archives/35875


Processing URLs:  38%|███▊      | 375/1000 [15:10<09:18,  1.12it/s]

Error extracting text from http://www.space-library.com/0905ChinaSecurity_Issue14_extract.pdf: 404 Client Error: Not Found for url: http://www.space-library.com/0905ChinaSecurity_Issue14_extract.pdf


Processing URLs:  38%|███▊      | 376/1000 [15:11<10:21,  1.00it/s]

Error extracting text from https://www.state.gov/t/avc/newstart/274550.htm: 404 Client Error: Not Found for url: https://www.state.gov/t/avc/newstart/274550.htm


Processing URLs:  38%|███▊      | 377/1000 [15:15<18:41,  1.80s/it]

URL filtered: http://washpost.bloomberg.com/Story?docId=1376-O98PC6SYF01X01-3F3SQMQMS2G0FVP8RBD4H33BIM


Processing URLs:  38%|███▊      | 380/1000 [15:18<14:42,  1.42s/it]

Error extracting text from http://carnegie-mec.org/2015/12/06/isil-sells-its-oil-but-who-is-buying-it/imro: 403 Client Error: Forbidden for url: http://carnegie-mec.org/2015/12/06/isil-sells-its-oil-but-who-is-buying-it/imro


Processing URLs:  39%|███▊      | 387/1000 [15:28<09:42,  1.05it/s]

URL filtered: https://www.bloomberg.com/news/articles/2016-10-18/china-holdings-of-u-s-treasuries-drop-to-almost-four-year-low
Error extracting text from https://www.arabnews.com/node/1857101/world: 403 Client Error: Forbidden for url: https://www.arabnews.com/node/1857101/world


Processing URLs:  39%|███▉      | 390/1000 [15:35<17:32,  1.73s/it]

Error extracting text from http://www.wsj.com/articles/chinas-xi-affirms-commitment-to-sustainable-economic-growth-1472898634: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinas-xi-affirms-commitment-to-sustainable-economic-growth-1472898634


Processing URLs:  39%|███▉      | 392/1000 [15:38<14:57,  1.48s/it]

Error extracting text from http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/05/business/vw-sued-justice-department-emissions-scandal.html?_r=0


Processing URLs:  39%|███▉      | 394/1000 [15:41<13:22,  1.32s/it]

Error extracting text from http://www.chicagotribune.com/news/sns-wp-trade-india-analysis-42c18c34-1ba5-11e7-855e-4824bbb5d748-20170407-story.html: 404 Client Error: Not Found for url: https://www.chicagotribune.com/news/sns-wp-trade-india-analysis-42c18c34-1ba5-11e7-855e-4824bbb5d748-20170407-story.html


Processing URLs:  40%|███▉      | 395/1000 [15:42<12:48,  1.27s/it]

Error extracting text from http://www.metronews.ca/news/toronto/2017/08/22/canadians-donate-21-million-in-funds-for-famine.html: 503 Server Error: Service Unavailable for url: http://www.metronews.ca/news/toronto/2017/08/22/canadians-donate-21-million-in-funds-for-famine.html


Processing URLs:  40%|███▉      | 396/1000 [15:43<13:23,  1.33s/it]

Error extracting text from https://www.idga.org/events-hypersonicweapons/speakers/dr-gillian-bussey: 403 Client Error: Forbidden for url: https://www.idga.org/events-hypersonicweapons/speakers/dr-gillian-bussey


Processing URLs:  40%|███▉      | 399/1000 [15:48<14:01,  1.40s/it]

Error extracting text from https://uk.reuters.com/article/uk-yemen-security/southern-yemen-leader-sees-independence-referendum-parliament-body-idUKKBN1CJ06T: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  41%|████      | 406/1000 [15:55<09:04,  1.09it/s]

Error extracting text from http://www.comres.co.uk/polls/daily-mail-political-poll-february-2016/: 403 Client Error: Forbidden for url: http://comresglobal.com/polls/daily-mail-political-poll-february-2016/
Error extracting text from http://www.khaama.com/ghani-assigns-stanikzai-as-acting-nds-chief-abdullah-khan-acting-defense-minister-0852: 403 Client Error: Forbidden for url: http://www.khaama.com/ghani-assigns-stanikzai-as-acting-nds-chief-abdullah-khan-acting-defense-minister-0852


Processing URLs:  41%|████      | 408/1000 [16:16<1:03:31,  6.44s/it]

Error extracting text from https://www.thebalance.com/fed-funds-rate-history-highs-lows-3306135: 406 Client Error: Not Acceptable for url: https://www.thebalancemoney.com:443/fed-funds-rate-history-highs-lows-3306135


Processing URLs:  41%|████      | 409/1000 [16:17<48:33,  4.93s/it]  

URL filtered: https://www.youtube.com/watch?v=kQAjScY4nMM


Processing URLs:  41%|████      | 412/1000 [16:20<25:52,  2.64s/it]

Error extracting text from http://www.globalresearch.ca/bashar-al-assad-has-more-popular-support-than-the-western-backed-opposition-poll/5495643: 404 Client Error: Not Found for url: https://www.globalresearch.ca/bashar-al-assad-has-more-popular-support-than-the-western-backed-opposition-poll/5495643


Processing URLs:  42%|████▏     | 415/1000 [16:25<18:50,  1.93s/it]

Error extracting text from http://insideevs.com/toyota-reveals-plans-electric-car-rollout-china/: 404 Client Error: Not Found for url: https://insideevs.com:443/toyota-reveals-plans-electric-car-rollout-china/
URL filtered: https://twitter.com/benbartenstein/status/930241058356826112


Processing URLs:  42%|████▏     | 418/1000 [16:28<13:02,  1.34s/it]

Error extracting text from http://www.economist.com/blogs/bagehot/2016/04/panama-brexit: 403 Client Error: Forbidden for url: https://www.economist.com/blogs/bagehot/2016/04/panama-brexit


Processing URLs:  42%|████▏     | 424/1000 [16:37<12:32,  1.31s/it]

Error extracting text from http://www.biznews.com/leadership/2016/09/22/head-to-head-with-zuma-anc-must-save-sa-from-becoming-mafia-state-urges-mantashe/: 404 Client Error: Not Found for url: https://www.biznews.com/leadership/2016/09/22/head-to-head-with-zuma-anc-must-save-sa-from-becoming-mafia-state-urges-mantashe/
URL filtered: https://www.brookings.edu/blog/fixgov/2016/09/15/what-do-the-models-say-about-who-will-win-in-november/?utm_medium=social&amp;utm_source=twitter&amp;utm_campaign=gs


Processing URLs:  43%|████▎     | 426/1000 [16:38<07:44,  1.24it/s]

Error extracting text from http://blogs.reuters.com/great-debate/2016/02/19/what-makes-just-16-missiles-such-a-deadly-threat-in-the-south-china-sea/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2016/02/19/what-makes-just-16-missiles-such-a-deadly-threat-in-the-south-china-sea/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2feaf3d40>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  43%|████▎     | 433/1000 [17:19<35:45,  3.78s/it]  

Error extracting text from https://www.yahoo.com/news/threatened-mosul-islamic-state-uses-061211726.html?ref=gs: 404 Client Error: Not Found for url: https://www.yahoo.com/news/threatened-mosul-islamic-state-uses-061211726.html?ref=gs


Processing URLs:  44%|████▎     | 435/1000 [17:22<24:24,  2.59s/it]

Error extracting text from http://blogs.spectator.co.uk/2017/03/might-nicola-sturgeons-sinking-approval-ratings-explain-appetite-referendum/: 404 Client Error: Not Found for url: https://www.spectator.co.uk/2017/03/might-nicola-sturgeons-sinking-approval-ratings-explain-appetite-referendum/


Processing URLs:  44%|████▎     | 437/1000 [18:23<3:01:51, 19.38s/it]

Error extracting text from http://www.edmunds.com/toyota/mirai/2016/long-term-road-test/2016-toyota-mirai-easy-to-refuel.html: HTTPConnectionPool(host='www.edmunds.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  44%|████▍     | 439/1000 [18:24<1:32:09,  9.86s/it]

Error extracting text from http://www.hybridcars.com/china-trims-electrified-car-incentives-as-scoundrels-grab-what-they-can/: 406 Client Error: Not Acceptable for url: https://www.hybridcars.com/china-trims-electrified-car-incentives-as-scoundrels-grab-what-they-can/


Processing URLs:  44%|████▍     | 441/1000 [18:26<49:38,  5.33s/it]  

Error extracting text from http://nationalinterest.org/feature/five-things-watch-irans-election-15319?page=show: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/five-things-watch-irans-election-15319?page=show


Processing URLs:  44%|████▍     | 442/1000 [18:29<41:01,  4.41s/it]

Error extracting text from https://fr.reuters.com/article/idUSKBN28K2NU: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=fr


Processing URLs:  44%|████▍     | 445/1000 [18:30<16:49,  1.82s/it]

Error extracting text from https://www.nytimes.com/2018/02/02/world/africa/us-arms-ban-on-south-sudan-.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2018/02/02/world/africa/us-arms-ban-on-south-sudan-.html


Processing URLs:  45%|████▍     | 448/1000 [18:36<16:40,  1.81s/it]

Error extracting text from http://cs.ne/: HTTPConnectionPool(host='cs.ne', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb4770>: Failed to resolve 'cs.ne' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  45%|████▌     | 452/1000 [18:44<19:02,  2.08s/it]

Error extracting text from https://vaultanalytics.com/podcast/predicting-the-unpredictable/: HTTPSConnectionPool(host='vaultanalytics.com', port=443): Max retries exceeded with url: /podcast/predicting-the-unpredictable/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))


Processing URLs:  46%|████▌     | 455/1000 [18:48<13:34,  1.49s/it]

Error extracting text from http://nationalinterest.org/feature/china-prepares-ramp-its-shipbuilding-process-19980?page=2: 403 Client Error: Forbidden for url: https://nationalinterest.org/feature/china-prepares-ramp-its-shipbuilding-process-19980?page=2


Processing URLs:  46%|████▌     | 457/1000 [18:51<12:44,  1.41s/it]

Error extracting text from https://www.predictit.org/Contract/1161/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2015#data: 403 Client Error: Forbidden for url: https://www.predictit.org/Contract/1161/Will-a-federal-criminal-charge-be-filed-against-Hillary-Clinton-in-2015#data


Processing URLs:  46%|████▌     | 459/1000 [18:52<10:11,  1.13s/it]

Error extracting text from https://mobile.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?referer=https://www.google.ca/: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/11/12/us/nsa-shadow-brokers.html?referer=https://www.google.ca/


Processing URLs:  46%|████▌     | 462/1000 [18:54<07:16,  1.23it/s]

Error extracting text from http://www.cell.com/abstract/S0092-8674(13)00267-5: 403 Client Error: Forbidden for url: https://www.cell.com/abstract/S0092-8674(13)00267-5


Processing URLs:  46%|████▋     | 464/1000 [18:56<07:17,  1.23it/s]

Error extracting text from http://news.yahoo.com/brazil-court-authorizes-trial-congress-speaker-212411918.html: 404 Client Error: Not Found for url: http://news.yahoo.com/brazil-court-authorizes-trial-congress-speaker-212411918.html


Processing URLs:  47%|████▋     | 466/1000 [18:59<08:02,  1.11it/s]

Error extracting text from https://www.rbb24.de/kultur/beitrag/2021/05/humboldt-forum-baumaengel-eroeffnung-juli-.html: 404 Client Error: Not Found for url: https://www.rbb24.de/kultur/beitrag/2021/05/humboldt-forum-baumaengel-eroeffnung-juli-.html
Error extracting text from http://www.nytimes.com/2016/07/02/us/politics/donald-trump-republican-convention.html?smid=tw-share&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/07/02/us/politics/donald-trump-republican-convention.html?smid=tw-share&amp;_r=0


Processing URLs:  47%|████▋     | 469/1000 [19:01<07:30,  1.18it/s]

Error extracting text from http://www.reuters.com/article/us-iran-nuclear-kerry-idUSKBN0UL2KO20160107: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-iran-nuclear-kerry-idUSKBN0UL2KO20160107


Processing URLs:  47%|████▋     | 474/1000 [19:14<21:21,  2.44s/it]

Error extracting text from http://www.quinnipiac.edu/images/polling/ia/ia01112016_trends_Ikm63gb.pdf: 404 Client Error: Not Found for url: https://www.qu.edu/images/polling/ia/ia01112016_trends_Ikm63gb.pdf


Processing URLs:  48%|████▊     | 475/1000 [19:15<16:34,  1.89s/it]

Error extracting text from https://bit.ly/3txaA3O: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/only-quarter-of-scots-want-independence-referendum-in-next-year-says-poll-3165345


Processing URLs:  48%|████▊     | 478/1000 [19:16<08:18,  1.05it/s]

Error extracting text from http://www.wsj.com/articles/inflation-expectations-weaken-at-japan-inc-1443751468: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/inflation-expectations-weaken-at-japan-inc-1443751468


Processing URLs:  48%|████▊     | 479/1000 [19:17<06:42,  1.29it/s]

Error extracting text from http://www.wsj.com/articles/fed-plans-to-signal-gradual-cautious-path-on-rate-hikes-1449682591: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/fed-plans-to-signal-gradual-cautious-path-on-rate-hikes-1449682591


Processing URLs:  48%|████▊     | 481/1000 [27:22<20:59:59, 145.66s/it]

Error extracting text from https://www.thespainreport.com/articles/748-160601200037-danger-close-for-sanchez-and-the-psoe: HTTPSConnectionPool(host='www.thespainreport.com', port=443): Max retries exceeded with url: /articles/748-160601200037-danger-close-for-sanchez-and-the-psoe (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x2fedb7290>, 'Connection to www.thespainreport.com timed out. (connect timeout=60)'))


Processing URLs:  48%|████▊     | 482/1000 [27:25<14:45:46, 102.60s/it]

Error extracting text from https://uk.reuters.com/article/us-venezuela-bonds/venezuela-calls-creditors-to-debt-talks-restructuring-plans-hammer-bonds-idUKKBN1D322L: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  48%|████▊     | 483/1000 [27:25<10:19:27, 71.89s/it] 

Error extracting text from http://news.yahoo.com/us-iran-missile-test-breach-un-resolution-204535403.html: 404 Client Error: Not Found for url: http://news.yahoo.com/us-iran-missile-test-breach-un-resolution-204535403.html
URL filtered: http://www.bloomberg.com/news/articles/2016-10-16/saudi-bank-stress-builds-as-kingdom-s-cash-injection-falls-short
Error extracting text from https://www.reuters.com/article/us-venezuela-india-ongc-idUSKBN19S2OT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-venezuela-india-ongc-idUSKBN19S2OT


Processing URLs:  49%|████▉     | 488/1000 [27:29<3:10:14, 22.29s/it] 

Error extracting text from http://www.financialexpress.com/world-news/uks-boris-johnson-says-russia-must-join-push-oust-syrias-bashar-al-assad/321242/: 403 Client Error: Forbidden for url: http://www.financialexpress.com/world-news/uks-boris-johnson-says-russia-must-join-push-oust-syrias-bashar-al-assad/321242/


Processing URLs:  49%|████▉     | 492/1000 [27:46<1:43:21, 12.21s/it]

Error extracting text from http://www.fxnewscall.com/us-treasury-sanctions-against-north-korea-to-hit-china-banks/1940997/: HTTPConnectionPool(host='www.fxnewscall.com', port=80): Max retries exceeded with url: /us-treasury-sanctions-against-north-korea-to-hit-china-banks/1940997/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2fedb7f20>: Failed to resolve 'www.fxnewscall.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  49%|████▉     | 493/1000 [27:46<1:19:18,  9.38s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2015/10/20/i-was-held-in-iran-for-13-months-this-is-why-i-think-jason-rezaian-may-be-freed/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2015/10/20/i-was-held-in-iran-for-13-months-this-is-why-i-think-jason-rezaian-may-be-freed/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3051d9a00>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|████▉     | 496/1000 [27:47<33:09,  3.95s/it]  

Error extracting text from http://www.rigzone.com/news/oil_gas/a/143079/Oil_Loses_Nearly_4_As_Hopes_Over_Saudi_Russia_Deal_Fade: 403 Client Error: Forbidden for url: http://www.rigzone.com/news/oil_gas/a/143079/Oil_Loses_Nearly_4_As_Hopes_Over_Saudi_Russia_Deal_Fade
Error extracting text from http://www.nytimes.com/2016/08/25/world/middleeast/yemen-saudi-arabia-hospital-bombing.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/08/25/world/middleeast/yemen-saudi-arabia-hospital-bombing.html?_r=0


Processing URLs:  50%|█████     | 500/1000 [28:55<2:45:17, 19.84s/it]

Error extracting text from http://www.sacbee.com/opinion/op-ed/soapbox/article181068366.html: HTTPConnectionPool(host='www.sacbee.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  50%|█████     | 502/1000 [29:57<3:53:33, 28.14s/it]

Error extracting text from http://origin.www.uscc.gov/sites/default/files/transcripts/3.4.09HearingTranscript.pdf: HTTPConnectionPool(host='origin.www.uscc.gov', port=80): Max retries exceeded with url: /sites/default/files/transcripts/3.4.09HearingTranscript.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fedb5be0>, 'Connection to origin.www.uscc.gov timed out. (connect timeout=60)'))
Error extracting text from http://mirror.no-ip.org/news/28423.html: HTTPConnectionPool(host='mirror.no-ip.org', port=80): Max retries exceeded with url: /news/28423.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x300936cc0>: Failed to resolve 'mirror.no-ip.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  50%|█████     | 505/1000 [30:01<1:40:00, 12.12s/it]

Error extracting text from http://www.nytimes.com/2016/06/12/world/asia/bangladesh-arrests-over-3000-to-halt-attacks.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/12/world/asia/bangladesh-arrests-over-3000-to-halt-attacks.html?_r=0


Processing URLs:  51%|█████     | 506/1000 [30:02<1:15:05,  9.12s/it]

Error extracting text from http://www.tradingeconomics.com/germany/inflation-cpi/forecast: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/germany/inflation-cpi/forecast


Processing URLs:  51%|█████     | 509/1000 [30:05<34:10,  4.18s/it]  

Error extracting text from https://www.thecipherbrief.com/column/network-take/syria-no-end-sight-1091: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/network-take/syria-no-end-sight-1091


Processing URLs:  52%|█████▏    | 517/1000 [31:34<2:53:13, 21.52s/it]

Error extracting text from http://www.arkleg.state.ar.us/assembly/2017/2017R/Pages/BillInformation.aspx?measureno=SB120: HTTPConnectionPool(host='www.arkleg.state.ar.us', port=80): Max retries exceeded with url: /assembly/2017/2017R/Pages/BillInformation.aspx?measureno=SB120 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x305df2a80>, 'Connection to www.arkleg.state.ar.us timed out. (connect timeout=60)'))


Processing URLs:  52%|█████▏    | 521/1000 [31:37<47:56,  6.00s/it]  

Error extracting text from http://www.baltimoresun.com/news/maryland/bs-md-syed-state-response-20150923-story.html: 404 Client Error: Not Found for url: https://www.baltimoresun.com/news/maryland/bs-md-syed-state-response-20150923-story.html


Processing URLs:  52%|█████▏    | 523/1000 [31:38<24:17,  3.06s/it]

Error extracting text from https://blogs.scientificamerican.com/observations/attacks-on-media-like-roy-moore-rsquo-s-endanger-democracy/: 403 Client Error: Forbidden for url: https://blogs.scientificamerican.com/observations/attacks-on-media-like-roy-moore-rsquo-s-endanger-democracy/
Error extracting text from https://www.reuters.com/article/us-somalia-government/somalia-pm-sacks-three-ministers-as-country-battles-insurgency-idUSKBN1ET0JR: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-somalia-government/somalia-pm-sacks-three-ministers-as-country-battles-insurgency-idUSKBN1ET0JR


Processing URLs:  53%|█████▎    | 526/1000 [32:40<2:33:27, 19.43s/it]

Error extracting text from http://www.arkleg.state.ar.us/assembly/2017/2017R/Bills/SB120.pdf: HTTPConnectionPool(host='www.arkleg.state.ar.us', port=80): Max retries exceeded with url: /assembly/2017/2017R/Bills/SB120.pdf (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2fedb7140>, 'Connection to www.arkleg.state.ar.us timed out. (connect timeout=60)'))


Processing URLs:  53%|█████▎    | 530/1000 [32:46<43:49,  5.59s/it]  

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN0U80HL20151225: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN0U80HL20151225


Processing URLs:  53%|█████▎    | 531/1000 [32:47<32:40,  4.18s/it]

Error extracting text from http://warontherocks.com/2016/03/open-letter-on-donald-trump-from-gop-national-security-leaders/: 403 Client Error: Forbidden for url: http://warontherocks.com/2016/03/open-letter-on-donald-trump-from-gop-national-security-leaders/


Processing URLs:  54%|█████▎    | 535/1000 [32:53<14:53,  1.92s/it]

Error extracting text from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2954173: 403 Client Error: Forbidden for url: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2954173


Processing URLs:  54%|█████▎    | 536/1000 [32:56<15:52,  2.05s/it]

Error extracting text from https://www.reuters.com/article/us-northkorea-missiles-submarine/images-suggest-north-korea-aggressive-work-on-ballistic-missile-submarine-institute-idUSKBN1DH054: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missiles-submarine/images-suggest-north-korea-aggressive-work-on-ballistic-missile-submarine-institute-idUSKBN1DH054


Processing URLs:  54%|█████▍    | 541/1000 [32:59<08:23,  1.10s/it]

Error extracting text from https://www.wsj.com/articles/trial-of-former-officer-derek-chauvin-accused-of-killing-george-floyd-resumes-11615300939: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/trial-of-former-officer-derek-chauvin-accused-of-killing-george-floyd-resumes-11615300939


Processing URLs:  55%|█████▍    | 546/1000 [33:36<1:13:47,  9.75s/it]

Error extracting text from http://www.todayszaman.com/business_turkish-lira-slips-to-new-3-month-low-on-china-worries_409029.html: 522 Server Error:  for url: http://www.todayszaman.com/business_turkish-lira-slips-to-new-3-month-low-on-china-worries_409029.html


Processing URLs:  55%|█████▍    | 548/1000 [33:41<45:52,  6.09s/it]  

Error extracting text from http://www.peruthisweek.com/news-gregorio-santos-prison-debate-109168: HTTPConnectionPool(host='www.peruthisweek.com', port=80): Max retries exceeded with url: /news-gregorio-santos-prison-debate-109168 (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x304ebec00>: Failed to resolve 'www.peruthisweek.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  55%|█████▌    | 550/1000 [33:45<30:32,  4.07s/it]

Error extracting text from https://www.reuters.com/article/us-ethiopia-conflict/fugitive-ex-leader-of-ethiopias-tigray-region-vows-extended-resistance-idUSKBN2A00BW: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-ethiopia-conflict/fugitive-ex-leader-of-ethiopias-tigray-region-vows-extended-resistance-idUSKBN2A00BW


Processing URLs:  55%|█████▌    | 554/1000 [33:48<12:35,  1.69s/it]

Error extracting text from https://www.yahoo.com/news/us-pushes-un-again-consider-arms-embargo-south-174736413.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/us-pushes-un-again-consider-arms-embargo-south-174736413.html


Processing URLs:  56%|█████▌    | 556/1000 [33:58<28:20,  3.83s/it]

Error extracting text from https://www.washingtonpost.com/world/europe/the-latest-germany-oks-some-refugee-kids-bringing-families/2016/02/11/c4af4116-d0ae-11e5-90d3-34c2c42653ac_story.html: 404 Client Error: Not Found for url: https://www.washingtonpost.com/world/europe/the-latest-germany-oks-some-refugee-kids-bringing-families/2016/02/11/c4af4116-d0ae-11e5-90d3-34c2c42653ac_story.html


Processing URLs:  56%|█████▌    | 562/1000 [34:09<19:10,  2.63s/it]

Error extracting text from http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=32&f=G&l=50&co1=AND&d=PG01&s2=Doudna.IN.&OS=IN/Doudna&RS=IN/Doudna: HTTPConnectionPool(host='appft.uspto.gov', port=80): Max retries exceeded with url: /netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=32&f=G&l=50&co1=AND&d=PG01&s2=Doudna.IN.&OS=IN/Doudna&RS=IN/Doudna (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3073cddc0>: Failed to establish a new connection: [Errno 51] Network is unreachable'))


Processing URLs:  56%|█████▋    | 565/1000 [40:17<7:14:20, 59.91s/it]

Error extracting text from http://missionessential.com/: HTTPConnectionPool(host='missionessential.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3073ce720>, 'Connection to missionessential.com timed out. (connect timeout=60)'))


Processing URLs:  57%|█████▋    | 566/1000 [40:17<5:10:34, 42.94s/it]

Error extracting text from https://news.yahoo.com/iraq-deploying-thousands-troops-retake-mosul-114704876.html: 404 Client Error: Not Found for url: https://news.yahoo.com/iraq-deploying-thousands-troops-retake-mosul-114704876.html


Processing URLs:  57%|█████▋    | 570/1000 [40:21<1:20:00, 11.16s/it]

Error extracting text from http://www.ohchr.org/FR/NewsEvents/Pages/DisplayNews.aspx?NewsID=20329&amp;LangID=E: 403 Client Error: Forbidden for url: https://www.ohchr.org/FR/NewsEvents/Pages/DisplayNews.aspx?NewsID=20329&amp;LangID=E
Error extracting text from http://www.reuters.com/article/us-usa-trump-immigration-exclusive-idUSKBN1582XQ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-immigration-exclusive-idUSKBN1582XQ?il=0


Processing URLs:  57%|█████▋    | 571/1000 [40:21<56:20,  7.88s/it]  

Error extracting text from http://www.independent.mk/articles/41512/EU+Membership+Support+for+Macedonia+falls+to++percent: 403 Client Error: Forbidden for url: http://www.independent.mk/articles/41512/EU+Membership+Support+for+Macedonia+falls+to++percent


Processing URLs:  57%|█████▋    | 572/1000 [40:23<44:43,  6.27s/it]

Error extracting text from https://www.reuters.com/world/asia-pacific/cooling-measures-doing-little-slow-new-zealands-housing-boom-2021-08-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/asia-pacific/cooling-measures-doing-little-slow-new-zealands-housing-boom-2021-08-06/


Processing URLs:  57%|█████▋    | 574/1000 [40:24<26:02,  3.67s/it]

Error extracting text from https://en.dailypakistan.com.pk/world/indian-research-centre-in-the-himalayan-foothills-under-scrutiny-after-revelation-that-it-trained-north-korean-scientists/: 503 Server Error: Backend fetch failed for url: https://en.dailypakistan.com.pk/world/indian-research-centre-in-the-himalayan-foothills-under-scrutiny-after-revelation-that-it-trained-north-korean-scientists/


Processing URLs:  57%|█████▊    | 575/1000 [40:27<23:48,  3.36s/it]

Error extracting text from http://longforecast.com/brent/crude-oil-forecast-for-2015-2016-and-2017.html: 403 Client Error: Forbidden for url: https://longforecast.com/brent/crude-oil-forecast-for-2015-2016-and-2017


Processing URLs:  58%|█████▊    | 576/1000 [40:29<22:04,  3.12s/it]

Error extracting text from http://politics.nytimes.com/congress/votes/113/senate/2/280: 404 Client Error: Not Found for url: https://www.nytimes.com/congress/votes/113/senate/2/280


Processing URLs:  58%|█████▊    | 577/1000 [40:30<16:36,  2.36s/it]

Error extracting text from http://www.realclearscience.com/2017/09/30/former_google_engineer_developing_an_ai_god_278332.html?utm_source=rcp-today&amp;utm_medium=email&amp;utm_campaign=mailchimp-newsletter&amp;utm_sou: 403 Client Error: HTTP Forbidden for url: https://www.realclearscience.com/2017/09/30/former_google_engineer_developing_an_ai_god_278332.html?utm_source=rcp-today&amp;utm_medium=email&amp;utm_campaign=mailchimp-newsletter&amp;utm_sou


Processing URLs:  58%|█████▊    | 581/1000 [40:34<10:58,  1.57s/it]

URL filtered: https://www.youtube.com/watch?v=i3BDlx_9Tz4


Processing URLs:  58%|█████▊    | 584/1000 [40:37<07:52,  1.14s/it]

Error extracting text from https://lgbc-scotland.gov.uk/sites/default/files/LGBCS_Register_Interests.pdf: HTTPSConnectionPool(host='lgbc-scotland.gov.uk', port=443): Max retries exceeded with url: /sites/default/files/LGBCS_Register_Interests.pdf (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x3022d4740>: Failed to resolve 'lgbc-scotland.gov.uk' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  59%|█████▉    | 590/1000 [40:50<14:09,  2.07s/it]

Error extracting text from http://vestnikkavkaza.net/news/Iraqi-forces-suffer-heavy-losses-in-Mosul-clashes.html: 404 Client Error: Not Found for url: https://vestikavkaza.ru/news/Iraqi-forces-suffer-heavy-losses-in-Mosul-clashes.html
Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-dam-insight-idUSKCN0WO0DS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-dam-insight-idUSKCN0WO0DS


Processing URLs:  59%|█████▉    | 593/1000 [40:53<09:07,  1.35s/it]

Error extracting text from https://www.nytimes.com/2020/10/25/us/politics/bidens-china.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2020/10/25/us/politics/bidens-china.html


Processing URLs:  60%|█████▉    | 595/1000 [40:56<09:06,  1.35s/it]

Error extracting text from http://uk.reuters.com/article/uk-eu-turkey-idUKKBN1391J3: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  60%|█████▉    | 598/1000 [41:05<17:24,  2.60s/it]

Error extracting text from http://news.trust.org/item/20161103194005-2lkhx: 404 Client Error:  for url: https://news.trust.org:443/item/20161103194005-2lkhx


Processing URLs:  60%|██████    | 600/1000 [42:06<1:32:49, 13.92s/it]

Error extracting text from https://www.usnews.com/news/politics/articles/2017-12-11/not-real-news-alabama-senate-race-spurs-false-reports: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)
Error extracting text from http://www.reuters.com/article/us-usa-congress-debt-idUSKBN18Y39H: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-congress-debt-idUSKBN18Y39H


Processing URLs:  60%|██████    | 601/1000 [42:06<1:05:27,  9.84s/it]

Error extracting text from https://www.nytimes.com/2017/12/10/world/africa/pentagon-somalia-combat-islamic-militants.html?_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/10/world/africa/pentagon-somalia-combat-islamic-militants.html?_r=0


Processing URLs:  60%|██████    | 603/1000 [42:07<33:42,  5.09s/it]  

Error extracting text from http://www.timesofisrael.com/does-trump-know-the-difference-between-hamas-hezbollah/: 403 Client Error: Forbidden for url: http://www.timesofisrael.com/does-trump-know-the-difference-between-hamas-hezbollah/


Processing URLs:  60%|██████    | 605/1000 [42:11<23:56,  3.64s/it]

Error extracting text from https://www.france24.com/en/20180724-nicaragua-ortega-refuses-step-down-despite-bloodshed-protests: 403 Client Error: Forbidden for url: https://www.france24.com/en/20180724-nicaragua-ortega-refuses-step-down-despite-bloodshed-protests


Processing URLs:  61%|██████    | 608/1000 [42:13<11:39,  1.79s/it]

Error extracting text from http://m.panorama.com.ve/movil/noticia.html?nota=/contenidos/2015/11/29/noticia_0012.html: HTTPConnectionPool(host='m.panorama.com.ve', port=80): Max retries exceeded with url: /movil/noticia.html?nota=/contenidos/2015/11/29/noticia_0012.html (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3050d5a00>: Failed to resolve 'm.panorama.com.ve' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  61%|██████    | 610/1000 [42:17<11:48,  1.82s/it]

Error extracting text from https://slatestarcodex.com/2020/06/22/nyt-is-threatening-my-safety-by-revealing-my-real-name-so-i-am-deleting-the-blog/: 403 Client Error: Forbidden for url: https://slatestarcodex.com/2020/06/22/nyt-is-threatening-my-safety-by-revealing-my-real-name-so-i-am-deleting-the-blog/
Error extracting text from http://www.reuters.com/article/us-global-m-a-firstquarter-idUSKBN1713AQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-m-a-firstquarter-idUSKBN1713AQ
URL filtered: https://www.youtube.com/watch?v=_ZKZ_lQ5FWQ


Processing URLs:  61%|██████▏   | 614/1000 [42:20<07:54,  1.23s/it]

Error extracting text from http://www.sciencemag.org/news/2016/09/united-states-will-miss-paris-climate-targets-without-further-action-study-finds: 403 Client Error: Forbidden for url: https://www.science.org/news/2016/09/united-states-will-miss-paris-climate-targets-without-further-action-study-finds


Processing URLs:  62%|██████▏   | 615/1000 [42:20<06:46,  1.06s/it]

Error extracting text from http://blogs.reuters.com/great-debate/2015/05/14/putin-ties-ukraines-government-to-neo-nazis-a-new-law-seems-to-back-him-up/: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /great-debate/2015/05/14/putin-ties-ukraines-government-to-neo-nazis-a-new-law-seems-to-back-him-up/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x3050d5a90>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))
URL filtered: https://www.bloomberg.com/view/articles/2018-01-12/those-who-wrote-off-merkel-were-wrong-again


Processing URLs:  62%|██████▏   | 618/1000 [42:23<06:44,  1.06s/it]

Error extracting text from http://www.gpanet.org/node/567: 503 Server Error: Service Unavailable for url: http://www.gpanet.org/node/567


Processing URLs:  62%|██████▏   | 621/1000 [42:32<10:34,  1.67s/it]

Error extracting text from http://postimg.org/image/evthz3mr3/: HTTPConnectionPool(host='postimg.org', port=80): Max retries exceeded with url: /image/evthz3mr3/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x2ffe68710>: Failed to resolve 'postimg.org' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  62%|██████▏   | 623/1000 [43:04<1:01:59,  9.87s/it]

Error extracting text from http://www.todayszaman.com/diplomacy_assad-visits-moscow-turkey-accepts-transition-period-with-him-in-syria_402177.html: 522 Server Error:  for url: http://www.todayszaman.com/diplomacy_assad-visits-moscow-turkey-accepts-transition-period-with-him-in-syria_402177.html


Processing URLs:  63%|██████▎   | 626/1000 [43:14<33:57,  5.45s/it]  

URL filtered: http://www.bloomberg.com/news/articles/2016-01-07/venezuela-s-maduro-replaces-economic-team-in-cabinet-reshuffle
Error extracting text from http://www.reuters.com/article/us-yemen-security-court-idUSKBN16W0UF: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-yemen-security-court-idUSKBN16W0UF


Processing URLs:  63%|██████▎   | 631/1000 [43:41<37:24,  6.08s/it]

Error extracting text from https://gcaptain.com/gazprom-plans-to-start-nord-stream-2-gas-pipeline-next-month/?subscriber=true&amp;goal=0_f50174ef03-ceab70088e-170102337&amp;mc_cid=ceab70088e&amp;mc_eid=c74873c672: 403 Client Error: Forbidden for url: https://gcaptain.com/gazprom-plans-to-start-nord-stream-2-gas-pipeline-next-month/?subscriber=true&amp;goal=0_f50174ef03-ceab70088e-170102337&amp;mc_cid=ceab70088e&amp;mc_eid=c74873c672


Processing URLs:  63%|██████▎   | 634/1000 [43:45<17:37,  2.89s/it]

Error extracting text from http://reneweconomy.com.au/2016/potentially-game-changing-saudi-arabian-government-restructuring-bolsters-9-5-gw-renewable-energy-target-by-2023: 403 Client Error: Forbidden for url: http://reneweconomy.com.au/2016/potentially-game-changing-saudi-arabian-government-restructuring-bolsters-9-5-gw-renewable-energy-target-by-2023


Processing URLs:  64%|██████▎   | 635/1000 [44:45<2:00:33, 19.82s/it]

Error extracting text from https://www.usnews.com/news/articles/2015/07/28/heres-why-the-14th-amendment-is-a-big-deal: HTTPSConnectionPool(host='www.usnews.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  64%|██████▍   | 638/1000 [44:50<47:44,  7.91s/it]  

Error extracting text from https://apple.news/Auqd4DWvBRCqOIdRuKnfIqA: 404 Client Error: Not Found for url: https://apple.news/Auqd4DWvBRCqOIdRuKnfIqA


Processing URLs:  65%|██████▍   | 646/1000 [45:09<14:24,  2.44s/it]

Error extracting text from http://www.wsj.com/articles/argentine-prosecutor-says-peer-was-murdered-1456445381: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/argentine-prosecutor-says-peer-was-murdered-1456445381


Processing URLs:  65%|██████▍   | 647/1000 [45:25<38:02,  6.47s/it]

Error extracting text from https://www.investopedia.com/articles/investing/033115/it-possible-trade-bitcoin-options.asp: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/investing/033115/it-possible-trade-bitcoin-options.asp


Processing URLs:  65%|██████▌   | 650/1000 [45:30<19:50,  3.40s/it]

Error extracting text from http://www.marketwatch.com/story/time-inc-elevates-the-power-of-its-digital-properties-with-the-appointment-of-jennifer-l-wong-to-president-of-time-inc-digital-2015-12-15: 404 Client Error: Not Found for url: https://www.marketwatch.com/story/time-inc-elevates-the-power-of-its-digital-properties-with-the-appointment-of-jennifer-l-wong-to-president-of-time-inc-digital-2015-12-15


Processing URLs:  65%|██████▌   | 653/1000 [45:35<11:05,  1.92s/it]

Error extracting text from http://www.washingtontimes.com/news/2017/may/11/the-latest-us-sees-iran-working-to-preserve-nuclea/: 403 Client Error: Forbidden for url: http://www.washingtontimes.com/news/2017/may/11/the-latest-us-sees-iran-working-to-preserve-nuclea/


Processing URLs:  66%|██████▌   | 655/1000 [45:37<08:40,  1.51s/it]

Error extracting text from http://www.ictsd.org/bridges-news/bridges-africa/news/us-announces-us300-million-in-payments-for-cotton-producers: 404 Client Error: Not Found for url: https://www.ictsd.org/bridges-news/bridges-africa/news/us-announces-us300-million-in-payments-for-cotton-producers
Error extracting text from http://www.reuters.com/article/us-usa-trump-turkey-idUSKBN15N02J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-trump-turkey-idUSKBN15N02J
URL filtered: https://www.youtube.com/watch?v=wmKHHdqwVes


Processing URLs:  66%|██████▌   | 658/1000 [45:41<07:15,  1.27s/it]

Error extracting text from http://www.pcacases.com/web/sendAttach/1506: 406 Client Error: Not Acceptable for url: http://www.pcacases.com/web/sendAttach/1506


Processing URLs:  66%|██████▋   | 663/1000 [46:02<16:41,  2.97s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN1782S0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-idUSKBN1782S0


Processing URLs:  67%|██████▋   | 668/1000 [46:15<13:46,  2.49s/it]

Error extracting text from http://www.eui.eu/SeminarsAndEvents/Events/2016/November/EUMechanismonDemocracytheRuleofLawandFundamentalRights.aspx: 404 Client Error: Not Found for url: https://www.eui.eu/SeminarsAndEvents/Events/2016/November/EUMechanismonDemocracytheRuleofLawandFundamentalRights.aspx


Processing URLs:  67%|██████▋   | 669/1000 [46:19<14:58,  2.71s/it]

Error extracting text from http://38north.org/2016/03/aabrahamian032116/: 404 Client Error: Not Found for url: https://www.38north.org/403.shtml


Processing URLs:  67%|██████▋   | 672/1000 [47:20<1:44:53, 19.19s/it]

Error extracting text from http://www.post-gazette.com/news/transportation/2016/06/01/Task-force-will-oversee-development-of-driverless-vehicles-in-Pennsylvania/stories/201606010166: HTTPConnectionPool(host='www.post-gazette.com', port=80): Max retries exceeded with url: /news/transportation/2016/06/01/Task-force-will-oversee-development-of-driverless-vehicles-in-Pennsylvania/stories/201606010166 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x2ffe681a0>, 'Connection to www.post-gazette.com timed out. (connect timeout=60)'))


Processing URLs:  67%|██████▋   | 673/1000 [47:20<1:13:48, 13.54s/it]

Error extracting text from https://www.wsj.com/articles/a-state-by-state-guide-to-coronavirus-lockdowns-11584749351: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/a-state-by-state-guide-to-coronavirus-lockdowns-11584749351


Processing URLs:  68%|██████▊   | 675/1000 [47:22<38:09,  7.04s/it]  

Error extracting text from http://warontherocks.com/2015/11/saving-afghanistan-more-than-just-troops/: 403 Client Error: Forbidden for url: http://warontherocks.com/2015/11/saving-afghanistan-more-than-just-troops/


Processing URLs:  68%|██████▊   | 679/1000 [47:31<17:10,  3.21s/it]

Error extracting text from http://colombiapeace.org/2016/02/25/peace-colombia-whats-new-about-it/: 403 Client Error: Forbidden for url: http://colombiapeace.org/2016/02/25/peace-colombia-whats-new-about-it/


Processing URLs:  68%|██████▊   | 681/1000 [47:31<09:29,  1.78s/it]

Error extracting text from http://www.wsj.com/articles/syrian-negotiator-quits-peace-talks-1464597824: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/syrian-negotiator-quits-peace-talks-1464597824


Processing URLs:  69%|██████▊   | 686/1000 [47:38<07:28,  1.43s/it]

Error extracting text from http://insideevs.com/lg-chem-build-battery-plant-europe/: 410 Client Error: Gone for url: https://insideevs.com:443/news/327042/lg-chem-to-build-a-battery-plant-in-europe/


Processing URLs:  69%|██████▉   | 692/1000 [47:58<11:05,  2.16s/it]

Error extracting text from https://balkaneu.com/north-macedonia-pendarovski-signed-the-decree-for-the-census/: 404 Client Error: Not Found for url: https://balkaneu.com/north-macedonia-pendarovski-signed-the-decree-for-the-census/


Processing URLs:  70%|██████▉   | 695/1000 [48:05<09:44,  1.91s/it]

Error extracting text from http://www.realclearpolitics.com/epolls/2016/president/us/2016_republican_presidential_nomination-3823.html: 403 Client Error: HTTP Forbidden for url: https://www.realclearpolitics.com/epolls/2016/president/us/2016_republican_presidential_nomination-3823.html


Processing URLs:  70%|██████▉   | 697/1000 [48:07<07:28,  1.48s/it]

Error extracting text from http://uk.reuters.com/article/uk-mideast-crisis-syria-conference-ahrar-idUKKBN0TS0LI20151209: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=uk


Processing URLs:  70%|██████▉   | 698/1000 [48:07<05:34,  1.11s/it]

Error extracting text from https://www.yahoo.com/news/tactics-battle-iraqs-mosul-113324915.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/tactics-battle-iraqs-mosul-113324915.html


Processing URLs:  70%|███████   | 701/1000 [48:08<03:22,  1.47it/s]

Error extracting text from http://www.consilium.europa.eu/en/press/press-releases/2016/02/02-letter-tusk-proposal-new-settlement-uk/: 403 Client Error: Forbidden for url: https://www.consilium.europa.eu/en/press/press-releases/2016/02/02-letter-tusk-proposal-new-settlement-uk/


Processing URLs:  70%|███████   | 704/1000 [48:16<08:28,  1.72s/it]

Error extracting text from https://www.nato.int/cps/en/natohq/official_texts_17120.htm: 403 Client Error: Forbidden for url: https://www.nato.int/cps/en/natohq/official_texts_17120.htm
URL filtered: https://www.facebook.com/nntaleb/posts/10153933520338375


Processing URLs:  71%|███████   | 707/1000 [48:21<08:10,  1.67s/it]

Error extracting text from https://www.faa.gov/uas/request_waiver/waivers_granted/media/107W-2016-00003_BNSF_CoW.pdf: 404 Client Error: Not Found for url: https://www.faa.gov/uas/request_waiver/waivers_granted/media/107W-2016-00003_BNSF_CoW.pdf


Processing URLs:  71%|███████   | 711/1000 [48:26<05:24,  1.12s/it]

Error extracting text from http://www.startup-buzz.com/nextev-chinese-company-set-take-tesla-near-future/: 403 Client Error: Forbidden for url: http://www.startup-buzz.com/nextev-chinese-company-set-take-tesla-near-future/
Error extracting text from http://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16Z2UG?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-nato-montenegro-idUSKBN16Z2UG?il=0


Processing URLs:  71%|███████▏  | 714/1000 [48:29<05:53,  1.23s/it]

Error extracting text from http://en.abna24.com/service/middle-east-west-asia/archive/2016/04/17/747952/story.html: 404 Client Error: Not Found for url: https://en.abna24.com/service/middle-east-west-asia/archive/2016/04/17/747952/story.html


Processing URLs:  72%|███████▏  | 715/1000 [48:30<05:03,  1.07s/it]

URL filtered: https://www.bloomberg.com/news/articles/2021-01-19/yellen-says-u-s-prepared-to-take-on-china-s-abusive-practices


Processing URLs:  72%|███████▏  | 721/1000 [48:38<06:55,  1.49s/it]

Error extracting text from http://economictimes.indiatimes.com/news/international/world-news/trump-set-to-order-ban-on-immigration-from-muslim-countries: 404 Client Error: Not Found for url: https://economictimes.indiatimes.com/news/international/world-news/trump-set-to-order-ban-on-immigration-from-muslim-countries


Processing URLs:  72%|███████▏  | 722/1000 [48:39<06:12,  1.34s/it]

Error extracting text from https://bit.ly/3tkWsvi: 403 Client Error: Forbidden for url: https://europeelects.eu/italy/


Processing URLs:  72%|███████▎  | 725/1000 [48:45<08:50,  1.93s/it]

URL filtered: https://twitter.com/bgreene/status/913105393009725440


Processing URLs:  73%|███████▎  | 729/1000 [48:48<05:24,  1.20s/it]

Error extracting text from https://www.nytimes.com/2021/06/18/world/asia/un-myanmar-coup-condemned.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/18/world/asia/un-myanmar-coup-condemned.html


Processing URLs:  73%|███████▎  | 730/1000 [48:49<04:42,  1.05s/it]

Error extracting text from http://thehill.com/blogs/pundits-blog/the-judiciary/321567-judge-neil-gorsuch-likable-but-dangerous: 403 Client Error: Forbidden for url: https://thehill.com/blogs/pundits-blog/the-judiciary/321567-judge-neil-gorsuch-likable-but-dangerous/


Processing URLs:  73%|███████▎  | 731/1000 [48:49<03:56,  1.14it/s]

Error extracting text from https://www.whitehouse.gov/sites/default/files/microsites/ostp/final_nationalspaceweatheractionplan_20151028.pdf: 404 Client Error: Not Found for url: https://www.whitehouse.gov/sites/default/files/microsites/ostp/final_nationalspaceweatheractionplan_20151028.pdf


Processing URLs:  73%|███████▎  | 733/1000 [48:59<11:04,  2.49s/it]

Error extracting text from http://www.reuters.com/article/us-un-syria-securitycouncil-idUSKBN1631XP: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-un-syria-securitycouncil-idUSKBN1631XP


Processing URLs:  74%|███████▎  | 735/1000 [49:03<08:38,  1.96s/it]

Error extracting text from https://www.nytimes.com/2021/06/02/world/middleeast/netanyahu-bennett-israel-coalition.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/06/02/world/middleeast/netanyahu-bennett-israel-coalition.html


Processing URLs:  74%|███████▎  | 736/1000 [50:03<1:24:22, 19.17s/it]

Error extracting text from http://www.usnews.com/news/the-report/articles/2016-03-25/the-first-rule-of-the-republican-national-convention-there-arent-any-rules-yet: HTTPConnectionPool(host='www.usnews.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  74%|███████▍  | 743/1000 [50:17<12:59,  3.03s/it]  

Error extracting text from https://www.wsj.com/articles/chinas-economic-recovery-is-looking-gloomier-11631683862?mod=article_inline: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/chinas-economic-recovery-is-looking-gloomier-11631683862?mod=article_inline


Processing URLs:  74%|███████▍  | 745/1000 [50:20<09:22,  2.21s/it]

URL filtered: https://codastory.com/disinformation-crisis/information-war/welcome-to-north-korea-the-world-s-safest-fashion-hotspot?utm_content=buffer08a92&amp;utm_medium=social&amp;utm_source=twitter.com&amp;utm_campaign=buffer


Processing URLs:  75%|███████▍  | 749/1000 [50:22<04:01,  1.04it/s]

Error extracting text from http://www.oddschecker.com/cricket/t20-world-cup/twenty20-world-cup/winner: 403 Client Error: Forbidden for url: http://www.oddschecker.com/cricket/t20-world-cup/twenty20-world-cup/winner
URL filtered: https://www.youtube.com/watch?v=hhvBTy28VJM
URL filtered: http://www.bloomberg.com/news/articles/2015-12-06/oil-extends-losses-below-40-as-opec-abandons-production-target


Processing URLs:  75%|███████▌  | 753/1000 [50:24<02:41,  1.53it/s]

Error extracting text from http://www.reuters.com/article/us-saudi-oil-aramco-idUSKCN0YI0YT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-saudi-oil-aramco-idUSKCN0YI0YT


Processing URLs:  76%|███████▌  | 756/1000 [50:28<03:55,  1.03it/s]

Error extracting text from http://www.rand.org/paf/projects/us-china-scorecard.html: 403 Client Error: Forbidden for url: https://www.rand.org/paf/projects/us-china-scorecard.html


Processing URLs:  76%|███████▌  | 760/1000 [50:33<04:11,  1.05s/it]

Error extracting text from https://www.rigzone.com/news/usa_eia_reveals_new_oil_price_forecast-12-aug-2021-166164-article/?amp: 403 Client Error: Forbidden for url: https://www.rigzone.com/news/usa_eia_reveals_new_oil_price_forecast-12-aug-2021-166164-article/?amp
Error extracting text from http://www.nytimes.com/2016/03/05/world/europe/for-greece-migrant-crisis-alters-eu-alliances.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/03/05/world/europe/for-greece-migrant-crisis-alters-eu-alliances.html?_r=0


Processing URLs:  76%|███████▌  | 762/1000 [50:36<04:35,  1.16s/it]

Error extracting text from http://tinyurl.com/njyy2cf: HTTPConnectionPool(host='blogs.reuters.com', port=80): Max retries exceeded with url: /talesfromthetrail/2015/09/28/donald-trump-is-the-only-man-to-save-the-world-says-carl-icahn/ (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x307b2c2f0>: Failed to resolve 'blogs.reuters.com' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  76%|███████▋  | 765/1000 [50:37<02:41,  1.46it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0ZL1FS: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKCN0ZL1FS


Processing URLs:  77%|███████▋  | 767/1000 [50:40<04:02,  1.04s/it]

URL filtered: https://www.bloomberg.com/gadfly/articles/2017-07-21/venezuela-oil-storm-may-be-about-to-hit-the-market


Processing URLs:  77%|███████▋  | 769/1000 [50:40<02:25,  1.59it/s]

Error extracting text from https://www.wsj.com/articles/ex-cia-director-mike-flynn-and-turkish-officials-discussed-removal-of-erdogan-foe-from-u-s-1490380426: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/ex-cia-director-mike-flynn-and-turkish-officials-discussed-removal-of-erdogan-foe-from-u-s-1490380426


Processing URLs:  77%|███████▋  | 771/1000 [50:42<02:36,  1.47it/s]

Error extracting text from https://www.congress.gov/bill/107th-congress/house-bill/2458/text: 403 Client Error: Forbidden for url: https://www.congress.gov/bill/107th-congress/house-bill/2458/text


Processing URLs:  78%|███████▊  | 777/1000 [50:54<06:13,  1.68s/it]

URL filtered: https://twitter.com/wikileaks/status/765342384821534722


Processing URLs:  78%|███████▊  | 780/1000 [50:55<03:03,  1.20it/s]

Error extracting text from http://www.nti.org/learn/countries/south-korea/: 403 Client Error: Forbidden for url: https://www.nti.org/learn/countries/south-korea/
Error extracting text from http://www.hindustantimes.com/india-news/beheading-of-soldiers-army-will-hit-back-says-chief-rawat/story-3jgNa9vXUfDneaPCGcqTBL.html: 401 Client Error: Unauthorized for url: http://www.hindustantimes.com/india-news/beheading-of-soldiers-army-will-hit-back-says-chief-rawat/story-3jgNa9vXUfDneaPCGcqTBL.html


Processing URLs:  78%|███████▊  | 783/1000 [51:00<04:15,  1.18s/it]

Error extracting text from https://www.nytimes.com/2017/12/17/us/politics/john-mccain-cancer-tax-bill.html?rref=collection%2Fsectioncollection%2Fpolitics: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/12/17/us/politics/john-mccain-cancer-tax-bill.html?rref=collection%2Fsectioncollection%2Fpolitics


Processing URLs:  79%|███████▊  | 786/1000 [51:04<05:12,  1.46s/it]

Error extracting text from http://www.iiss.org/en/events/events/archive/2015-f463/december-52fa/view-from-paris-097c: 404 Client Error: Not Found for url: https://www.iiss.org/en/events/events/archive/2015-f463/december-52fa/view-from-paris-097c


Processing URLs:  79%|███████▉  | 789/1000 [51:07<03:36,  1.03s/it]

Error extracting text from http://www.moneyweb.co.za/news/economy/the-big-business-opportunity-of-2016-iran/: 403 Client Error: Forbidden for url: http://www.moneyweb.co.za/news/economy/the-big-business-opportunity-of-2016-iran/


Processing URLs:  79%|███████▉  | 790/1000 [51:07<02:45,  1.27it/s]

Error extracting text from https://www.nytimes.com/2021/08/07/world/boris-johnson-uk-covid-self-isolation.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/08/07/world/boris-johnson-uk-covid-self-isolation.html


Processing URLs:  79%|███████▉  | 791/1000 [51:08<02:31,  1.38it/s]

Error extracting text from http://blog.lesoir.be/colette-braeckman/2016/04/19/fausse-accalmie-au-burundi/: HTTPSConnectionPool(host='colette-braeckman.lesoir.be', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x307c83890>: Failed to resolve 'colette-braeckman.lesoir.be' ([Errno 8] nodename nor servname provided, or not known)"))


Processing URLs:  79%|███████▉  | 792/1000 [51:09<02:34,  1.35it/s]

Error extracting text from http://www.politicususa.com/2016/03/20/fox-news-stabs-mitch-mcconnell-gop-destroys-scotus-nominee.html: 403 Client Error: Forbidden for url: http://www.politicususa.com/2016/03/20/fox-news-stabs-mitch-mcconnell-gop-destroys-scotus-nominee.html


Processing URLs:  79%|███████▉  | 794/1000 [51:10<02:42,  1.26it/s]

Error extracting text from http://allafrica.com/stories/201710240782.html: HTTPConnectionPool(host='allafrica.com', port=80): Max retries exceeded with url: /stories/201710240782.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2fe74e690>: Failed to establish a new connection: [Errno 61] Connection refused'))


Processing URLs:  80%|████████  | 801/1000 [51:20<03:27,  1.04s/it]

URL filtered: https://twitter.com/russian_market/status/916313167688552449
Error extracting text from http://www.reuters.com/article/us-britain-eu-germany-idUSKBN1820CQ: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-germany-idUSKBN1820CQ


Processing URLs:  80%|████████  | 802/1000 [51:21<03:29,  1.06s/it]

URL filtered: https://www.youtube.com/watch?v=vNZ63iwnn5M


Processing URLs:  80%|████████  | 804/1000 [51:23<02:53,  1.13it/s]

Error extracting text from http://in.reuters.com/article/britain-eu-cameron-idINKCN0UO0IL20160110: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com?edition-redirect=in
URL filtered: https://twitter.com/DiemDevelopers


Processing URLs:  81%|████████  | 807/1000 [51:24<02:03,  1.56it/s]

Error extracting text from http://www.greencarcongress.com/2016/04/20160415-meti.html: 403 Client Error: Forbidden for url: https://www.greencarcongress.com/2016/04/20160415-meti.html


Processing URLs:  81%|████████  | 808/1000 [51:24<02:00,  1.60it/s]

Error extracting text from https://thehill.com/policy/international/552160-us-says-swift-return-to-iran-deal-possible-ahead-of-vienna-talks: 403 Client Error: Forbidden for url: https://thehill.com/policy/international/552160-us-says-swift-return-to-iran-deal-possible-ahead-of-vienna-talks/


Processing URLs:  81%|████████  | 810/1000 [51:27<02:41,  1.18it/s]

Error extracting text from https://superforecasting.squarespace.com/blog/will-the-us-federal-funds-rate-be-increased-before-the-end-of-the-year-2015: 404 Client Error: Not Found for url: https://superforecasting.squarespace.com/blog/will-the-us-federal-funds-rate-be-increased-before-the-end-of-the-year-2015


Processing URLs:  81%|████████  | 812/1000 [51:28<02:11,  1.43it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14A197: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14A197


Processing URLs:  81%|████████▏ | 814/1000 [51:29<01:54,  1.63it/s]

Error extracting text from http://www.bnonews.com/files/images/IranNuclearDeal.PDF: 500 Server Error: Internal Server Error for url: https://www.bnonews.com/files/images/IranNuclearDeal.PDF
Error extracting text from http://www.nytimes.com/2016/10/08/us/politics/isis-mosul-iraq-us.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/10/08/us/politics/isis-mosul-iraq-us.html?_r=0


Processing URLs:  82%|████████▏ | 815/1000 [51:29<01:27,  2.12it/s]

Error extracting text from https://www.predictit.org/Market/1327/Who-will-win-the-2016-Iowa-Republican-caucus: 403 Client Error: Forbidden for url: https://www.predictit.org/Market/1327/Who-will-win-the-2016-Iowa-Republican-caucus


Processing URLs:  82%|████████▏ | 817/1000 [51:35<05:48,  1.91s/it]error getting summary: 
Traceback (most recent call last):
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 213, in summary
    self._html(True)
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 148, in _html
    self.html = self._parse(self.input)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/readability.py", line 157, in _parse
    doc, self.encoding = build_doc(input)
                         ^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/readability/htmls.py", line 21, in build_doc
    doc = lxml.html.document_fromstring(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arshath/miniforge3/envs/valory/lib/python3.12/site-packages/lxml/html/__init__.py", line 738, in doc

Error extracting text from http://www.onlinenews.com.pk/index.php?page=newsdetail&amp;news_id=4982: Document is empty


Processing URLs:  83%|████████▎ | 826/1000 [51:49<02:39,  1.09it/s]

Error extracting text from https://www.reuters.com/world/europe/final-piece-nord-stream-2-place-operator-says-2021-09-06/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/world/europe/final-piece-nord-stream-2-place-operator-says-2021-09-06/
Error extracting text from http://www.nytimes.com/2016/12/13/us/politics/russia-hack-election-dnc.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/12/13/us/politics/russia-hack-election-dnc.html


Processing URLs:  83%|████████▎ | 827/1000 [51:51<03:19,  1.16s/it]

Error extracting text from https://www.dlapiper.com/en/hongkong/insights/publications/2017/03/appetite-for-masala-bonds-grows/: 403 Client Error: Forbidden for url: https://www.dlapiper.com/en-hk/insights/publications/2017/03/appetite-for-masala-bonds-grows


Processing URLs:  83%|████████▎ | 833/1000 [52:05<05:21,  1.93s/it]

Error extracting text from http://www.latimes.com/world/middleeast/la-fg-isis-iraq-20151228-story.html: 403 Client Error: Forbidden for url: https://www.latimes.com/world/middleeast/la-fg-isis-iraq-20151228-story.html
Error extracting text from http://www.reuters.com/article/us-usa-northkorea-missiles-engine-idUSKBN19D2RC?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-usa-northkorea-missiles-engine-idUSKBN19D2RC?il=0


Processing URLs:  84%|████████▎ | 835/1000 [52:09<05:41,  2.07s/it]

Error extracting text from http://www.reuters.com/article/us-southchinasea-china-idUSKCN0YS09J: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-southchinasea-china-idUSKCN0YS09J


Processing URLs:  84%|████████▍ | 841/1000 [52:16<02:51,  1.08s/it]

Error extracting text from http://movieweb.com/movies/2016/april/: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Error extracting text from http://www.reuters.com/article/mideast-crisis-syria-idUSKCN0VN03B: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/mideast-crisis-syria-idUSKCN0VN03B


Processing URLs:  84%|████████▍ | 843/1000 [53:18<47:59, 18.34s/it]

Error extracting text from http://tech.firstpost.com/news-analysis/iphone-6s-6s-plus-to-rollout-in-india-on-16-october-will-be-available-in-over-2500-retail-stores-282857.html: HTTPConnectionPool(host='tech.firstpost.com', port=80): Max retries exceeded with url: /news-analysis/iphone-6s-6s-plus-to-rollout-in-india-on-16-october-will-be-available-in-over-2500-retail-stores-282857.html (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x3073ccb00>, 'Connection to tech.firstpost.com timed out. (connect timeout=60)'))


Processing URLs:  84%|████████▍ | 845/1000 [53:21<25:50, 10.00s/it]

Error extracting text from http://www.business-standard.com/article/news-ians/venezuelan-opposition-calls-for-more-protests-117052900089_1.html: 403 Client Error: Forbidden for url: http://www.business-standard.com/article/news-ians/venezuelan-opposition-calls-for-more-protests-117052900089_1.html


Processing URLs:  85%|████████▍ | 847/1000 [53:23<13:33,  5.32s/it]

Error extracting text from http://www.reuters.com/article/us-northkorea-missile-idUSKCN0XC010: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-northkorea-missile-idUSKCN0XC010


Processing URLs:  85%|████████▌ | 850/1000 [53:38<12:05,  4.83s/it]

Error extracting text from http://www.jonesday.com/force_majeure/: 403 Client Error: Forbidden for url: http://www.jonesday.com/force_majeure/


Processing URLs:  86%|████████▌ | 860/1000 [53:55<04:39,  1.99s/it]

Error extracting text from http://www.innocenceproject.org/free-innocent/improve-the-law/fact-sheets/dna-exonerations-nationwide: 403 Client Error: Forbidden for url: https://www.innocenceproject.org/free-innocent/improve-the-law/fact-sheets/dna-exonerations-nationwide


Processing URLs:  86%|████████▋ | 863/1000 [53:58<03:29,  1.53s/it]

Error extracting text from http://m.nzherald.co.nz/nz/news/article.cfm?c_id=1&amp;objectid=11707122: 404 Client Error: Not Found for url: https://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&amp;objectid=11707122


Processing URLs:  87%|████████▋ | 867/1000 [54:01<01:48,  1.23it/s]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14F0OJ?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-iraq-mosul-idUSKBN14F0OJ?il=0


Processing URLs:  87%|████████▋ | 870/1000 [55:13<40:35, 18.73s/it]

Error extracting text from http://www.itv.com/news/2017-10-18/government-and-eu-in-touching-distance-of-deal-on-eu-citizens-rights/: HTTPConnectionPool(host='www.itv.com', port=80): Read timed out. (read timeout=60)


Processing URLs:  88%|████████▊ | 875/1000 [55:22<09:50,  4.72s/it]

Error extracting text from http://www.newsletter.co.uk/news/northern-ireland-news/stormont-deal-held-up-by-stand-off-on-national-security-1-7065739: 403 Client Error: Forbidden for url: https://www.newsletter.co.uk/news/northern-ireland-news/stormont-deal-held-up-by-stand-off-on-national-security-1-7065739


Processing URLs:  88%|████████▊ | 880/1000 [55:28<04:09,  2.08s/it]

URL filtered: http://www.businessinsider.com/r-putin-plans-air-strikes-in-syria-if-no-us-deal-reached-bloomberg-2015-9


Processing URLs:  88%|████████▊ | 883/1000 [55:31<02:52,  1.47s/it]

Error extracting text from http://www.tradingeconomics.com/venezuela/gold-reserves: 405 Client Error: Not Allowed for url: http://www.tradingeconomics.com/venezuela/gold-reserves


Processing URLs:  89%|████████▊ | 886/1000 [55:33<01:42,  1.11it/s]

Error extracting text from https://www.nytimes.com/2017/02/05/us/politics/donald-trump-health-care-law-repeal-replace-plan.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/02/05/us/politics/donald-trump-health-care-law-repeal-replace-plan.html


Processing URLs:  89%|████████▊ | 887/1000 [55:34<01:47,  1.05it/s]

Error extracting text from https://www.whitehouse.gov/sites/default/files/microsites/ostp/spaceweather_2013_report.pdf: 404 Client Error: Not Found for url: https://www.whitehouse.gov/sites/default/files/microsites/ostp/spaceweather_2013_report.pdf


Processing URLs:  89%|████████▉ | 890/1000 [55:39<02:31,  1.37s/it]

Error extracting text from http://www.ibtimes.co.uk/civilians-mosul-face-hunger-basic-service-shortages-even-liberated-areas-1597273: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/civilians-mosul-face-hunger-basic-service-shortages-even-liberated-areas-1597273


Processing URLs:  89%|████████▉ | 892/1000 [55:42<02:27,  1.37s/it]

Error extracting text from http://www.nytimes.com/2015/06/21/opinion/sunday/whats-the-matter-with-polling.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/06/21/opinion/sunday/whats-the-matter-with-polling.html
URL filtered: https://twitter.com/nypost/status/1387830352719253507?s=20


Processing URLs:  90%|████████▉ | 895/1000 [55:48<02:42,  1.55s/it]

Error extracting text from http://www.fxstreet.com/news/forex-news/article.aspx?storyid=8e98ea1c-3c39-4040-a06b-49cc4cc469a6: 410 Client Error: Gone for url: http://www.fxstreet.com/news/8e98ea1c-3c39-4040-a06b-49cc4cc469a6


Processing URLs:  90%|████████▉ | 896/1000 [55:48<02:05,  1.20s/it]

Error extracting text from https://www.yahoo.com/news/casualties-mount-iraqis-press-deeper-124152870.html: 404 Client Error: Not Found for url: https://www.yahoo.com/news/casualties-mount-iraqis-press-deeper-124152870.html


Processing URLs:  90%|████████▉ | 897/1000 [55:51<02:58,  1.74s/it]

Error extracting text from http://www.diplomatie.gouv.fr/en/country-files/djibouti/france-and-djibouti/: 404 Client Error: Not Found for url: https://www.diplomatie.gouv.fr/en/country-files/djibouti/france-and-djibouti/


Processing URLs:  90%|█████████ | 900/1000 [55:53<01:42,  1.03s/it]

Error extracting text from https://joebiden.com/healthcare/: 404 Client Error: Not Found for url: https://joebiden.com/healthcare/


Processing URLs:  90%|█████████ | 903/1000 [55:57<02:10,  1.34s/it]

URL filtered: http://www.bbc.co.uk/news/world-middle-east-35333656?ns_mchannel=social&amp;ns_campaign=bbc_breaking&amp;ns_source=twitter&amp;ns_linkname=news_central


Processing URLs:  91%|█████████ | 908/1000 [56:06<02:08,  1.39s/it]

Error extracting text from https://www.eureporter.co/world/russia/2021/07/28/kremlin-critic-alexei-navalnys-website-blocked-by-regulator-before-election/: 403 Client Error: Forbidden for url: https://www.eureporter.co/world/russia/2021/07/28/kremlin-critic-alexei-navalnys-website-blocked-by-regulator-before-election/


Processing URLs:  91%|█████████ | 909/1000 [56:07<01:55,  1.27s/it]

URL filtered: https://twitter.com/TerrorEvents/status/787382154229612544


Processing URLs:  91%|█████████▏| 913/1000 [56:19<03:07,  2.16s/it]

URL filtered: https://twitter.com/hashtag/caucusforbernie


Processing URLs:  92%|█████████▏| 916/1000 [56:23<02:20,  1.68s/it]

Error extracting text from https://www.wsj.com/articles/pence-on-latin-america-visit-plays-down-trump-talk-of-military-approach-to-venezuela-1502682207: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/pence-on-latin-america-visit-plays-down-trump-talk-of-military-approach-to-venezuela-1502682207


Processing URLs:  92%|█████████▏| 919/1000 [56:26<01:38,  1.21s/it]

Error extracting text from https://www.reuters.com/article/us-mideast-crisis-syria-assad/syrian-government-denies-rumors-assad-in-poor-health-idUSKBN15E1JK: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-assad/syrian-government-denies-rumors-assad-in-poor-health-idUSKBN15E1JK


Processing URLs:  92%|█████████▏| 922/1000 [56:30<01:26,  1.11s/it]

Error extracting text from http://www.nytimes.com/2016/06/08/world/americas/brazil-dilma-rousseff-impeachment.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/06/08/world/americas/brazil-dilma-rousseff-impeachment.html?_r=0


Processing URLs:  92%|█████████▏| 923/1000 [56:31<01:27,  1.14s/it]

Error extracting text from http://abcnews.go.com/International/wireStory/putin-visit-egypt-week-51646133: 404 Client Error: Not Found for url: https://abcnews.go.com/International/wireStory/putin-visit-egypt-week-51646133


Processing URLs:  93%|█████████▎| 926/1000 [56:35<01:28,  1.19s/it]

Error extracting text from https://balkaninsight.com/2020/10/09/north-macedonia-makes-fresh-push-for-long-overdue-census/: 403 Client Error: Forbidden for url: https://balkaninsight.com/2020/10/09/north-macedonia-makes-fresh-push-for-long-overdue-census/


Processing URLs:  93%|█████████▎| 927/1000 [57:35<22:04, 18.14s/it]

Error extracting text from https://www.cmegroup.com/trading/energy/crude-oil/brent-crude-oil.html#: HTTPSConnectionPool(host='www.cmegroup.com', port=443): Read timed out. (read timeout=60)


Processing URLs:  93%|█████████▎| 928/1000 [57:36<15:38, 13.03s/it]

Error extracting text from http://capitolweekly.net/brokered-convention-republican-presidential-nomination-cleveland/: 403 Client Error: Forbidden for url: http://capitolweekly.net/brokered-convention-republican-presidential-nomination-cleveland/


Processing URLs:  93%|█████████▎| 929/1000 [57:37<11:09,  9.43s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/introducing-the-iskander-the-russian-missile-nato-fears-15653: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/introducing-the-iskander-the-russian-missile-nato-fears-15653


Processing URLs:  93%|█████████▎| 933/1000 [57:39<03:29,  3.13s/it]

Error extracting text from https://www.nytimes.com/2021/07/24/world/europe/uk-conservatives-blue-wall.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2021/07/24/world/europe/uk-conservatives-blue-wall.html


Processing URLs:  94%|█████████▍| 940/1000 [57:53<01:44,  1.74s/it]

Error extracting text from http://inserbia.info/today/2016/01/montenegro-government-wins-confidence-vote/: 404 Client Error: Not Found for url: https://inserbia.info/today/2016/01/montenegro-government-wins-confidence-vote/
Error extracting text from http://www.nytimes.com/2016/01/06/world/europe/coordinated-attacks-on-women-in-cologne-were-unprecedented-germany-says.html?hp&amp;target=comments&amp;_r=0#commentsContainer: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/01/06/world/europe/coordinated-attacks-on-women-in-cologne-were-unprecedented-germany-says.html?hp&amp;target=comments&amp;_r=0#commentsContainer


Processing URLs:  94%|█████████▍| 941/1000 [57:55<01:40,  1.71s/it]

Error extracting text from https://bit.ly/3e3EGHl: 403 Client Error: Forbidden for url: https://capx.co/scotland-has-a-deficit-of-leadership-on-its-economy/


Processing URLs:  94%|█████████▍| 944/1000 [57:59<01:15,  1.35s/it]

Error extracting text from http://www.saarc-sec.org/Cooperation-with-Inter-Governmental-Organisations/16/: 404 Client Error: Not Found for url: https://www.saarc-sec.org/Cooperation-with-Inter-Governmental-Organisations/16/


Processing URLs:  95%|█████████▍| 946/1000 [58:03<01:28,  1.64s/it]

Error extracting text from http://celebcafe.org/venezuela-says-china-to-give-5-billion-oil-loan-2552/: 500 Server Error: Internal Server Error for url: https://celebcafe.org/venezuela-says-china-to-give-5-billion-oil-loan-2552/


Processing URLs:  95%|█████████▍| 949/1000 [58:07<01:06,  1.29s/it]

Error extracting text from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7457603/: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7457603/
Error extracting text from https://www.reuters.com/business/energy/german-gas-power-prices-households-new-highs-2022-03-16/: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/business/energy/german-gas-power-prices-households-new-highs-2022-03-16/


Processing URLs:  96%|█████████▌| 955/1000 [58:17<01:03,  1.40s/it]

Error extracting text from http://www.nytimes.com/2016/11/16/world/middleeast/iran-sanctions-extended.html?_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/11/16/world/middleeast/iran-sanctions-extended.html?_r=0
URL filtered: http://www.theguardian.com/news/2016/apr/03/the-panama-papers-how-the-worlds-rich-and-famous-hide-their-money-offshore?CMP=Share_AndroidApp_Facebook


Processing URLs:  96%|█████████▌| 957/1000 [58:19<00:59,  1.38s/it]

Error extracting text from http://ffm-online.org/2016/04/10/libya-faces-influx-of-migrants-seeking-new-routes-to-europe/: 404 Client Error: Not Found for url: https://ffm-online.org/2016/04/10/libya-faces-influx-of-migrants-seeking-new-routes-to-europe/


Processing URLs:  97%|█████████▋| 968/1000 [58:34<00:44,  1.39s/it]

Error extracting text from http://www.newsweek.com/robert-reich-gorsuch-must-wait-until-trump-legit-551370: 403 Client Error: Forbidden for url: https://www.newsweek.com/robert-reich-gorsuch-must-wait-until-trump-legit-551370


Processing URLs:  97%|█████████▋| 970/1000 [58:36<00:34,  1.14s/it]

Error extracting text from https://www.nytimes.com/2017/10/08/technology/russian-election-hacking-silicon-valley.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/10/08/technology/russian-election-hacking-silicon-valley.html


Processing URLs:  97%|█████████▋| 972/1000 [58:38<00:29,  1.05s/it]

Error extracting text from http://www.grandforksherald.com/opinion/op-ed-columns/3842340-heidi-heitkamp-senators-gain-ground-finding-votes-repeal: 404 Client Error: Not Found for url: https://www.grandforksherald.com/opinion/op-ed-columns/3842340-heidi-heitkamp-senators-gain-ground-finding-votes-repeal


Processing URLs:  97%|█████████▋| 974/1000 [58:40<00:25,  1.03it/s]

Error extracting text from https://www.amnesty.org/en/latest/news/2016/08/ethiopia-dozens-killed-as-police-use-excessive-force-against-peaceful-protesters/: 403 Client Error: Forbidden for url: https://www.amnesty.org/en/latest/news/2016/08/ethiopia-dozens-killed-as-police-use-excessive-force-against-peaceful-protesters/


Processing URLs:  98%|█████████▊| 977/1000 [58:42<00:16,  1.44it/s]

Error extracting text from https://www.ccn.com/25000-in-2018-bitcoin-bull-tom-lee-sticks-to-strong-forecast-despite-failed-prediction/: 403 Client Error: Forbidden for url: https://www.ccn.com/25000-in-2018-bitcoin-bull-tom-lee-sticks-to-strong-forecast-despite-failed-prediction/


Processing URLs:  98%|█████████▊| 981/1000 [58:47<00:24,  1.31s/it]

URL filtered: https://ktla.com/news/nationworld/in-separate-cases-u-s-government-and-48-states-file-antitrust-lawsuits-against-facebook/


Processing URLs:  98%|█████████▊| 984/1000 [58:50<00:14,  1.09it/s]

Error extracting text from http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/03/world/americas/brazil-president-faces-prospect-of-impeachment.html


Processing URLs:  98%|█████████▊| 985/1000 [58:51<00:15,  1.01s/it]

Error extracting text from http://www.navytimes.com/story/military/2016/04/06/4-star-admiral-wants-confront-china-white-house-says-not-so-fast/82472290/: 404 Client Error: Not Found for url: https://www.navytimes.com/story/military/2016/04/06/4-star-admiral-wants-confront-china-white-house-says-not-so-fast/82472290/


Processing URLs:  99%|█████████▊| 986/1000 [58:53<00:18,  1.32s/it]

URL filtered: http://www.bloomberg.com/news/articles/2016-05-06/no-sign-of-brexit-revolution-as-u-k-voters-opt-for-no-change


Processing URLs:  99%|█████████▉| 992/1000 [59:01<00:09,  1.16s/it]

Error extracting text from http://www.scotsman.com/news/politics/sturgeon-refuses-to-rule-out-working-with-tories-on-councils-1-4425502: 403 Client Error: Forbidden for url: https://www.scotsman.com/news/politics/sturgeon-refuses-to-rule-out-working-with-tories-on-councils-1-4425502


Processing URLs: 100%|█████████▉| 995/1000 [59:14<00:12,  2.47s/it]

Error extracting text from http://www.rand.org/pubs/research_reports/RR770.html: 403 Client Error: Forbidden for url: https://www.rand.org/pubs/research_reports/RR770.html


Processing URLs: 100%|█████████▉| 996/1000 [1:00:15<01:18, 19.62s/it]

Error extracting text from http://cdmrp.army.mil/pubs/press/2016/16prmrppreann.shtml: HTTPConnectionPool(host='cdmrp.army.mil', port=80): Max retries exceeded with url: /pubs/press/2016/16prmrppreann.shtml (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x30650d3d0>, 'Connection to cdmrp.army.mil timed out. (connect timeout=60)'))


Processing URLs: 100%|█████████▉| 998/1000 [1:00:17<00:20, 10.22s/it]

Error extracting text from http://www.reuters.com/article/us-europe-markets-m-a-idUSKBN161118: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-europe-markets-m-a-idUSKBN161118


Processing URLs: 100%|██████████| 1000/1000 [1:00:18<00:00,  3.62s/it]


Error extracting text from http://www.wsj.com/articles/u-s-warship-sails-near-island-claimed-by-china-1454131157: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/u-s-warship-sails-near-island-claimed-by-china-1454131157


Processing URLs:   2%|▏         | 3/130 [00:05<03:06,  1.47s/it]

Error extracting text from https://newsinfo.inquirer.net/1392129/explainer-the-3-variants-of-sars2: 403 Client Error: Forbidden for url: https://newsinfo.inquirer.net/1392129/explainer-the-3-variants-of-sars2


Processing URLs:   5%|▌         | 7/130 [00:11<02:39,  1.30s/it]

Error extracting text from http://www.reuters.com/article/us-mideast-crisis-syria-russia-astana-idUSKBN14C13I?il=0: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-mideast-crisis-syria-russia-astana-idUSKBN14C13I?il=0


Processing URLs:   6%|▌         | 8/130 [05:17<3:19:32, 98.13s/it]

Error extracting text from http://www.japantoday.com/smartphone/view/politics/abe-putin-agree-to-advance-japan-russia-territorial-talks: 404 Client Error: Not Found for url: https://japantoday.com/category/politics/abe-putin-agree-to-advance-japan-russia-territorial-talks


Processing URLs:   7%|▋         | 9/130 [05:17<2:16:29, 67.68s/it]

Error extracting text from https://www.rfi.fr/en/international/20210725-uk-hosts-51-countries-for-climate-talks-ahead-of-cop26: 403 Client Error: Forbidden for url: https://www.rfi.fr/en/international/20210725-uk-hosts-51-countries-for-climate-talks-ahead-of-cop26


Processing URLs:   9%|▉         | 12/130 [05:22<54:23, 27.66s/it]  

Error extracting text from http://www.southcom.mil/newsroom/Pages/Southern-Seas-UNITAS-2015.aspx: 404 Client Error: Not Found for url: https://www.southcom.mil/newsroom/Pages/Southern-Seas-UNITAS-2015.aspx


Processing URLs:  10%|█         | 13/130 [05:23<40:16, 20.66s/it]

Error extracting text from https://www.fusion-festival.de/de/x/home: 404 Client Error: Not Found for url: https://www.fusion-festival.de/de/x/home


Processing URLs:  11%|█         | 14/130 [05:26<30:26, 15.74s/it]

Error extracting text from http://journaltimes.com/news/national/latin-america/peruvians-take-to-streets-to-protest-fujimori-candidacy/article_38e49a0c-e176-5cea-8332-ad46d9a08102.html: 404 Client Error: Not Found for url: https://journaltimes.com/news/national/latin-america/peruvians-take-to-streets-to-protest-fujimori-candidacy/article_38e49a0c-e176-5cea-8332-ad46d9a08102.html


Processing URLs:  14%|█▍        | 18/130 [05:31<09:26,  5.06s/it]

Error extracting text from http://mobile.nytimes.com/2016/06/12/world/europe/while-young-britons-favor-staying-in-eu-they-arent-big-on-voting.html: 403 Client Error: Forbidden for url: https://www.nytimes.com/2016/06/12/world/europe/while-young-britons-favor-staying-in-eu-they-arent-big-on-voting.html


Processing URLs:  15%|█▍        | 19/130 [05:32<06:56,  3.75s/it]

Error extracting text from http://warisboring.com/articles/the-iraqi-army-can-win-ground-battles-after-all/?mc_cid=3a17bb2bbf&amp;mc_eid=0467f21653: 403 Client Error: Forbidden for url: http://warisboring.com/articles/the-iraqi-army-can-win-ground-battles-after-all/?mc_cid=3a17bb2bbf&amp;mc_eid=0467f21653


Processing URLs:  16%|█▌        | 21/130 [05:38<06:11,  3.41s/it]

Error extracting text from http://ec.europa.eu/justice/effective-justice/rule-of-law/index_en.htm: 404 Client Error: Not Found for url: https://commission.europa.eu/strategy/justice-and-fundamental-rights/effective-justice_en


Processing URLs:  18%|█▊        | 23/130 [05:42<04:42,  2.64s/it]

Error extracting text from http://europe.newsweek.com/dutch-election-2017-geert-wilders-mark-rutte-coalition-deal-pvv-543284: 403 Client Error: Forbidden for url: https://www.newsweek.com/dutch-election-2017-geert-wilders-mark-rutte-coalition-deal-pvv-543284


Processing URLs:  18%|█▊        | 24/130 [05:43<03:59,  2.26s/it]

Error extracting text from http://www.vcstar.com/news/national/supporters-of-embattled-exim-bank-move-to-bypass-gop-foes_29144955: 404 Client Error: OK for url: https://www.vcstar.com/news/national/supporters-of-embattled-exim-bank-move-to-bypass-gop-foes_29144955/


Processing URLs:  25%|██▍       | 32/130 [06:00<01:58,  1.21s/it]

Error extracting text from http://www.japantimes.co.jp/news/2016/07/24/asia-pacific/south-china-sea-ruling-far-fueled-tensions-analysts/#.V5Vmeldh36k: 404 Client Error: Not Found for url: https://www.japantimes.co.jp/news/2016/07/24/asia-pacific/south-china-sea-ruling-far-fueled-tensions-analysts/#.V5Vmeldh36k
Error extracting text from http://www.nytimes.com/2016/02/20/world/europe/eu-deal-clears-path-for-british-referendum-on-membership.html?emc=edit_ee_20160220&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/02/20/world/europe/eu-deal-clears-path-for-british-referendum-on-membership.html?emc=edit_ee_20160220&amp;nl=todaysheadlines-europe&amp;nlid=70183565&amp;_r=0


Processing URLs:  26%|██▌       | 34/130 [06:19<07:30,  4.69s/it]

Error extracting text from http://nationalinterest.org/blog/the-buzz/how-china-plans-dominate-the-south-china-sea-copy-great-20080: 403 Client Error: Forbidden for url: https://nationalinterest.org/blog/the-buzz/how-china-plans-dominate-the-south-china-sea-copy-great-20080


Processing URLs:  28%|██▊       | 37/130 [06:21<02:52,  1.85s/it]

Error extracting text from https://constitutioncenter.org/interactive-constitution/amendment/amendment-xxv: 403 Client Error: Forbidden for url: https://constitutioncenter.org/interactive-constitution/amendment/amendment-xxv
Error extracting text from http://www.reuters.com/article/us-britain-eu-polls-idUSKBN17O0PT: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-britain-eu-polls-idUSKBN17O0PT


Processing URLs:  37%|███▋      | 48/130 [06:45<03:37,  2.65s/it]

Error extracting text from http://www.buenosairesherald.com/article/205475/brazil-lawmaker%E2%80%99s-report-could-save-rousseff: 404 Client Error: Not Found for url: https://buenosairesherald.com/article/205475/brazil-lawmaker%E2%80%99s-report-could-save-rousseff


Processing URLs:  39%|███▉      | 51/130 [07:03<06:12,  4.71s/it]

Error extracting text from http://www.wsj.com/articles/baidu-joins-race-to-build-autonomous-cars-1449714601: 403 Client Error: Forbidden for url: https://www.wsj.com/articles/baidu-joins-race-to-build-autonomous-cars-1449714601
URL filtered: https://www.bloomberg.com/news/articles/2016-07-04/brexit-accelerates-the-british-pound-s-100-years-of-debasement
URL filtered: http://www.wsj.com/articles/zuckerberg-defends-facebook-against-charges-it-harmed-political-discourse-1478833876


Processing URLs:  45%|████▍     | 58/130 [07:12<02:33,  2.13s/it]

Error extracting text from https://www.legislationline.org/documents/action/popup/id/8917: 404 Client Error: Not Found for url: https://legislationline.org/documents/action/popup/id/8917


Processing URLs:  48%|████▊     | 62/130 [07:15<01:10,  1.03s/it]

Error extracting text from http://www.reuters.com/article/us-pdvsa-output-idUSKCN10Q0D6: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-pdvsa-output-idUSKCN10Q0D6


Processing URLs:  53%|█████▎    | 69/130 [07:26<01:39,  1.64s/it]

Error extracting text from http://www.defense.gov/Video?videoid=444590&amp;videotag=Latest%20Videos&amp;videopage=1&amp;ccenabled=false&amp;videopage=1&amp;ccenabled=false: 404 Client Error: Not Found for url: https://www.defense.gov/Video?videoid=444590&amp;videotag=Latest%20Videos&amp;videopage=1&amp;ccenabled=false&amp;videopage=1&amp;ccenabled=false


Processing URLs:  54%|█████▍    | 70/130 [07:26<01:13,  1.23s/it]

Error extracting text from https://www.nytimes.com/2017/07/12/health/fda-novartis-leukemia-gene-medicine.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/07/12/health/fda-novartis-leukemia-gene-medicine.html?hp&amp;action=click&amp;pgtype=Homepage&amp;clickSource=story-heading&amp;module=first-column-region&amp;region=top-news&amp;WT.nav=top-news


Processing URLs:  61%|██████    | 79/130 [07:49<01:45,  2.07s/it]

Error extracting text from https://www.weforum.org/press/2016/12/a-convenient-truth-fighting-climate-change-turned-into-a-profitable-business/: 403 Client Error: Forbidden for url: https://www.weforum.org/press/2016/12/a-convenient-truth-fighting-climate-change-turned-into-a-profitable-business/


Processing URLs:  63%|██████▎   | 82/130 [07:56<02:09,  2.69s/it]

Error extracting text from http://www.rollcall.com/news/politics/grassley-outlines-timeline-getting-trumps-scotus-nominee-confirmed: 404 Client Error: Not Found for url: https://rollcall.com/news/politics/grassley-outlines-timeline-getting-trumps-scotus-nominee-confirmed


Processing URLs:  65%|██████▍   | 84/130 [07:58<01:19,  1.73s/it]

Error extracting text from http://news.yahoo.com/freed-british-journalist-decries-burundi-media-intimidation-104835925.html: 404 Client Error: Not Found for url: http://news.yahoo.com/freed-british-journalist-decries-burundi-media-intimidation-104835925.html


Processing URLs:  71%|███████   | 92/130 [09:09<12:11, 19.25s/it]

Error extracting text from https://sports.ladbrokes.com/en-gb/betting/politics/german-federal-election/2017-german-federal-election/223001346/: HTTPSConnectionPool(host='sports.ladbrokes.com', port=443): Max retries exceeded with url: /en-gb/betting/politics/german-federal-election/2017-german-federal-election/223001346/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x3049202c0>, 'Connection to sports.ladbrokes.com timed out. (connect timeout=60)'))


Processing URLs:  73%|███████▎  | 95/130 [09:17<04:32,  7.79s/it]

Error extracting text from http://www.reuters.com/article/2015/09/18/us-usa-election-biden-exclusive-idUSKCN0RI25F20150918: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/2015/09/18/us-usa-election-biden-exclusive-idUSKCN0RI25F20150918


Processing URLs:  76%|███████▌  | 99/130 [09:22<01:24,  2.74s/it]

Error extracting text from http://thehill.com/blogs/ballot-box/polls/266640-poll-sanders-overtakes-clinton-in-iowa: 403 Client Error: Forbidden for url: https://thehill.com/blogs/ballot-box/polls/266640-poll-sanders-overtakes-clinton-in-iowa/


Processing URLs:  78%|███████▊  | 101/130 [09:24<00:50,  1.73s/it]

Error extracting text from https://www.nytimes.com/2017/03/28/world/europe/scotland-britain-brexit-european-union.html?emc=edit_th_20170329&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0: 403 Client Error: Forbidden for url: https://www.nytimes.com/2017/03/28/world/europe/scotland-britain-brexit-european-union.html?emc=edit_th_20170329&amp;nl=todaysheadlines&amp;nlid=77825025&amp;_r=0


Processing URLs:  82%|████████▏ | 106/130 [09:49<02:28,  6.20s/it]

Error extracting text from https://www.investopedia.com/articles/03/122203.asp.: 406 Client Error: Not Acceptable for url: https://www.investopedia.com/articles/03/122203.asp.


Processing URLs:  82%|████████▏ | 107/130 [09:52<01:55,  5.02s/it]

Error extracting text from https://www.eurogroupforanimals.org/sites/eurogroup/files/2020-12/2020_12_joint_position_paper_fur_farms_FINAL.pdf: 404 Client Error: Not Found for url: https://www.eurogroupforanimals.org/sites/eurogroup/files/2020-12/2020_12_joint_position_paper_fur_farms_FINAL.pdf


Processing URLs:  84%|████████▍ | 109/130 [09:53<01:02,  2.95s/it]

Error extracting text from https://www.theneweuropean.co.uk/brexit-news/westminster-news/boris-johnson-laughs-at-robert-peston-question-6863812: 404 Client Error: Not Found for url: https://www.theneweuropean.co.uk/brexit-news/westminster-news/boris-johnson-laughs-at-robert-peston-question-6863812


Processing URLs:  85%|████████▌ | 111/130 [10:54<04:27, 14.10s/it]

Error extracting text from http://blogs.rollcall.com/218/pelosi-touts-export-import-rescue-chastises-abortion-reporter-video/: HTTPConnectionPool(host='blogs.rollcall.com', port=80): Max retries exceeded with url: /218/pelosi-touts-export-import-rescue-chastises-abortion-reporter-video/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x307570f20>, 'Connection to blogs.rollcall.com timed out. (connect timeout=60)'))
Error extracting text from http://www.reuters.com/article/us-global-oil-idUSKBN1AO029: 401 Client Error: HTTP Forbidden for url: https://www.reuters.com/article/us-global-oil-idUSKBN1AO029


Processing URLs:  88%|████████▊ | 115/130 [10:57<01:01,  4.13s/it]

Error extracting text from http://insideevs.com/nearly-three-quarters-of-q2-2016-tesla-sales-were-domestic/: 404 Client Error: Not Found for url: https://insideevs.com:443/nearly-three-quarters-of-q2-2016-tesla-sales-were-domestic/


Processing URLs:  91%|█████████ | 118/130 [11:01<00:24,  2.07s/it]

Error extracting text from http://www.nytimes.com/2016/05/21/health/pregnant-women-zika-virus-cdc.html: 403 Client Error: Forbidden for url: http://www.nytimes.com/2016/05/21/health/pregnant-women-zika-virus-cdc.html


Processing URLs:  92%|█████████▏| 120/130 [11:03<00:17,  1.71s/it]

Error extracting text from https://www.thecipherbrief.com/column/expert-view/north-koreas-threat-us-homeland-correcting-misperception-1090?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=4802c06770-EM: 404 Client Error: Not Found for url: https://www.thecipherbrief.com/column/expert-view/north-koreas-threat-us-homeland-correcting-misperception-1090?utm_source=Join+the+Community+Subscribers&amp;utm_campaign=4802c06770-EM


Processing URLs:  93%|█████████▎| 121/130 [11:05<00:16,  1.83s/it]

Error extracting text from http://thesmartset.com/dead-philosopher-walking/: 503 Server Error: Service Unavailable for url: https://thesmartset.com/dead-philosopher-walking/


Processing URLs:  95%|█████████▌| 124/130 [11:10<00:08,  1.45s/it]

Error extracting text from https://lagunita.stanford.edu/courses/course-v1:Engineering+NuclearBrink+Fall2016/about: 403 Client Error: Forbidden for url: https://online.stanford.edu/lagunita-learning-platform
Error extracting text from http://www.nytimes.com/2015/12/30/world/middleeast/haider-al-abadi-iraq-ramadi-isis.html?emc=edit_th_20151230&amp;nl=todaysheadlines&amp;nlid=28699183: 403 Client Error: Forbidden for url: http://www.nytimes.com/2015/12/30/world/middleeast/haider-al-abadi-iraq-ramadi-isis.html?emc=edit_th_20151230&amp;nl=todaysheadlines&amp;nlid=28699183


Processing URLs:  97%|█████████▋| 126/130 [11:14<00:06,  1.69s/it]

Error extracting text from http://inhomelandsecurity.com/zika-virus-map-shows-america-next/?utm_source=IHS: 403 Client Error: Forbidden for url: https://amuedge.com/zika-virus-map-shows-america-next/?utm_source=IHS


Processing URLs:  98%|█████████▊| 128/130 [11:28<00:07,  3.92s/it]

Error extracting text from http://globalriskinsights.com/2016/04/south-korea-parliamentary-elections/: 403 Client Error: Forbidden for url: http://globalriskinsights.com/2016/04/south-korea-parliamentary-elections/


Processing URLs: 100%|██████████| 130/130 [11:32<00:00,  5.32s/it]


In [42]:
# load all processed data
processed_data = []
for i in range(0, len(all_questions), 1000):
    with open(f'processed_urls_{i // 1000}.pickle', 'rb') as file:
        processed_data.extend(pickle.load(file))

In [43]:
# save it as a pickle file
with open("retrieved_docs_v3.pickle", "wb") as file:
    pickle.dump(processed_data, file)

In [44]:
# remove non dictionary items
processed_data = [d for d in processed_data if isinstance(d, dict)]

In [45]:
docs_df = pd.DataFrame(processed_data)
print(docs_df.shape)

# remove duplicates
docs_df = docs_df.drop_duplicates(subset=["url"]).reset_index(drop=True)
print(docs_df.shape)

(46083, 4)
(46073, 4)


In [46]:
docs_df_error = docs_df[docs_df["error"] == True].reset_index(drop=True)
docs_df_error.shape

(18255, 4)

In [47]:
docs_df_no_error = docs_df[docs_df["error"] == False].reset_index(drop=True)
docs_df_no_error.shape

(27818, 4)

In [48]:
def count_no_error_links(row):
    count = 0
    for link in row["source_links"]:
        if link in docs_df_no_error["url"].values:
            count += 1

    return count

df["num_no_error_links"] = df.apply(count_no_error_links, axis=1)

In [49]:
# number of questions with no error links minimum 5 no error links
min_no_error_links = 5
df["num_no_error_links"].value_counts().sort_index()[min_no_error_links:].sum()

602

In [50]:
docs_df_no_error['num_words'] = docs_df_no_error['text'].apply(lambda x: len(x.split()))

In [51]:
docs_df_no_error['num_words'].quantile([0.25, 0.50, .75, .9, .95, .99])

0.25      140.00
0.50      511.00
0.75     1013.00
0.90     1915.00
0.95     3860.00
0.99    13079.94
Name: num_words, dtype: float64

In [52]:
# count number of source links that are more than n words
n_words = 1000
def count_num_words(row, n):
    count = 0
    for link in row["source_links"]:
        if link in docs_df_no_error["url"].values:
            num_words = docs_df_no_error[docs_df_no_error["url"] == link]["num_words"].values[0]
            if num_words > n:
                count += 1

    return count

df["num_words_1000"] = df.apply(lambda x: count_num_words(x, n_words), axis=1)

In [53]:
df["num_words_1000"].value_counts().sort_index()[min_no_error_links:].sum()

340

### Making final dataset

we filter out the following
- non 200 codes
- urls with certain keywords that were identified during manual checking
- use of html2text
- use pdf extractor for pdfs
- use wikipedia api for wikis
- and finally, filter links that had less than 1000 words
- the final dataset will have a minimum of 5 and maximum of 20 source_links 

In [54]:
final_df = df.copy()

In [55]:
# keep only questions with at least 5 links and more than 1000 words
final_df = final_df[final_df["num_words_1000"] >= min_no_error_links].reset_index(drop=True)

In [56]:
# source_links remove links that are not in docs_df_no_error and keep only links that have more than 1000 words
def filter_source_links(row):
    links = []
    for link in row["source_links"]:
        if link in docs_df_no_error["url"].values:
            num_words = docs_df_no_error[docs_df_no_error["url"] == link]["num_words"].values[0]
            if num_words > n_words:
                links.append(link)

    return links

final_df["source_links"] = final_df.apply(filter_source_links, axis=1)

In [57]:
# verify that all links are in docs_df_no_error and have more than 1000 words
for links in final_df["source_links"]:
    for link in links:
        assert link in docs_df_no_error["url"].values
        assert docs_df_no_error[docs_df_no_error["url"] == link]["num_words"].values[0] > n_words

In [58]:
# from final_df remove added columns
final_df = final_df.drop(columns=["num_links", "num_no_error_links", "num_words_1000"])

# convert to json
final_df.to_json("autocast_questions_filtered_v3.json", orient="records")

In [59]:
# from docs_df_no_error keep only urls that are in final_df
docs_df_no_error = docs_df_no_error[docs_df_no_error["url"].isin(final_df["source_links"].explode().unique())].reset_index(drop=True)

# drop error, error_message, num_words
docs_df_no_error = docs_df_no_error.drop(columns=["error", "error_message", "num_words"])

In [60]:
# save as pickle
with open("autocast_questions_filtered_v3.pkl", "wb") as f:
    pickle.dump(docs_df_no_error, f)

### Verify saved files

In [61]:
import json

with open("autocast_questions_filtered_v3.json", "r") as f:
    data = json.load(f)

In [66]:
len(data), data[0]

(340,
 {'question': 'Will the Export-Import Bank of the United States be re-authorized before 1 January 2016?',
  'id': 'G5',
  'background': "The Export-Import Bank's authorization expired on 1 July, but proponents of the bank are working to get it re-authorized (http://www.nytimes.com/2015/07/01/business/international/though-charter-is-expiring-export-import-bank-will-keep-its-doors-open.html , http://www.nytimes.com/2015/07/06/us/politics/us-export-import-bank-teetering-on-edge.html , http://thehill.com/policy/finance/247953-house-gop-draws-first-in-ex-im-showdown ). Legislation re-authorizing the bank must be signed into law by the President before taking effect.",
  'publish_time': 1441116141242,
  'close_time': '2015-12-04 14:00:25+00:00',
  'tags': ['Economic Policy', 'US Politics', 'US Policy'],
  'source_links': ['http://www.hartfordbusiness.com/article/20151005/NEWS01/310029963',
   'http://breakingdefense.com/2015/09/theyre-back-congress-likely-to-pass-short-term-budget-deal

In [63]:
with open("autocast_questions_filtered_v3.pkl", "rb") as f:
    docs = pickle.load(f)

In [64]:
docs

Unnamed: 0,url,text
0,https://www.chinafile.com/viewpoint/future-of-...,"In early August, Beijing held show trials of f..."
1,http://www.businessinsider.com/donald-trump-jr...,"This March 16, 2016 file photo shows Trump Tow..."
2,http://www.pewresearch.org/fact-tank/2016/05/0...,For decades the gold standard for public opini...
3,https://en.wikipedia.org/wiki/U.S._sanctions_a...,The United States has since 1979 applied vario...
4,http://www.cbc.ca/news/politics/premiers-court...,A potential showdown is looming over the feder...
...,...,...
6100,https://www.theguardian.com/commentisfree/2017...,Home from chasing lost sheep on a beautiful su...
6101,https://en.wikipedia.org/wiki/Geryon#The_Tenth...,"In Greek mythology, Geryon ( or ; also Geryone..."
6102,http://www.bbc.com/news/world-europe-35118927,This video can not be played\n\n## To play thi...
6103,https://www.bbc.com/news/world-middle-east-560...,"Image source, Reuters\n\nImage caption,\n\nThe..."


In [83]:
display(Markdown(docs.loc[876, "text"]))

Why is Britain eurosceptic? 
By Charles Grant
The British have never been terribly popular members of the European Union. Long before they joined,
many continentals thought them too different to be constructive members of what was then the European
Economic Community (EEC). In January 1963 General de Gaulle held a press conference to set out hisreasons for vetoing Harold Macmillan’s application for membership. Some, though not all of his arguments,
still resonate today.
Britain is insular, maritime, bound up by its trade, its markets, its food supplies, with the most varied and
often the most distant countries. Her activity is essentially industrial and commercial, not agricultural.She has, in all her work, very special, very original, habits and traditions. In short, the nature, structure,
circumstances peculiar to England are very different from those of other continentals. How can Britain,in the way that she lives, pr oduces, trades, be incorporated into the Common Market as it has been
conceived and functions?… It is predictable that the cohesion of all its members, which would soon bevery large, very diverse, would not last for very long and that, in fact, it would seem like a colossalAtlantic community under American dependence and dir ection, and that is not at all what France wanted
to do and is doing, which is a strictly Eur opean constr uction.
Exactly ten years later Britain joined the EEC. But the British have never been at ease in what has become
the EU. They ar e mor e hostile to the EU than any other Eur opean people. British gover nments, too, have
often used their influence to slow down Eur opean integration. Thus Britain has opted out of the euro and
the Schengen agr eement, and pr evented the extension of qualified majority voting into ar eas such as tax,
foreign policy and defence. Ther e is no r eason to think that this attitude will change. Gordon Brown’s
government is less enthusiastic about the EU than that of his predecessor, Tony Blair. And if the
Conservative Party wins the next general election, as seems plausible at the time of writing, a governmentled by David Cameron will be markedly more eurosceptic than that led by Brown.
Some of the British people’s disdain towards the EU and things European is reciprocated. Many Britons
would be surprised to know just how fed up many other Europeans are with their attitude to the EU. Yearsof British leaders preaching – sometimes arrogantly – about the success of their economic model, a foreignpolicy that often appears subservient to that of the US, a penny-pinching approach to the EU budget and aconsistently negative attitude to treaty change have left their mark. The kinds of argument that de Gaullemade in the 1960s can still be heard.
People on the continent tend to overlook the positive impact of Britain on the EU. I would argue that Britain
is far from being the most eurosceptic member-state, defined as the one that causes the most damage to theEU. The British have a good r ecord of implementing EU directives and of respecting the decisions of the
European Court of Justice, while a supposedly pro-EU country such as France has a poor record on those
counts. At the level of EU policy-making, British influence has been considerable and often positive. The‘1992 pr ogramme’ that led to the single market was drawn up by a British Conservative commissioner, Lord
Cockfield. Tony Blair, together with the then French president Jacques Chirac, wrote the Saint Malo
declaration of 1998, which led to the EU developing military capabilities. The ‘Lisbon agenda’ of economicreform, established in 2000, had considerable British input. Britain has championed the enlargement of the
Union and the concept of economic openness (though not everyone shares my view that those objectives aredesirable). It has made a big contribution to the EU’s regulatory agenda, for example through the idea of‘unbundling’ (the separation of retail networks from the supply of a public service such as energy). 
essaysessays
Centre for European Reform T: 00 44 20 7233 1199
14 Great College Street F: 00 44 20 7233 1117London SW1P 3RX UK info@cer .org.uk / www .cer.org.uk2
Britain takes the ‘four freedoms’ (the free movement of capital, goods, labour and services) more seriously
than many countries that r egard themselves as fully committed to the EU. Thus Britain has encouraged
French and German companies to buy up most of its utilities, though the favour has not been returned; it
is the only large EU country that has allowed other European firms to purchase big chunks of its defenceindustr y; and when eight Central and East European countries joined the EU in 2004, initially only Britain,
Ireland and Sweden opened their labour markets to workers from the new members. 
On balance I would argue that British influence on the EU has been more beneficial than harmful.
Nevertheless I have no doubt that the euroscepticism of the British is a serious problem, not only for any
UK government that tries to engage with the EU, but also for other Europeangovernments. British ministers often oppose measures coming out of Brussels orother capitals because they fear the reaction of the British media or public. 
This essay analyses the reasons for British people disliking the EU, looking at
geography, history, economics and, especially, the media. It asks why Britain’s
ruling classes have been unwilling to try and shift opinion in a more EU-friendlydirection. And it concludes with the prediction that, in the very long run, Britain
will take a more positive line on the EU.
1
Geography, history and economics
The regular Eurobarometer surveys of public opinion, carried out by the European Commission, suggest
that the British are the most eurosceptic people in Europe. A Eurobarometer survey published by theCommission in June 2008 asked whether membership of the EU was a good thing. The average response
of citizens across the EU was that 52 per cent thought it a good thing.
2But in the UK the answer was 30
per cent, with only Latvia on 29 per cent scoring lower . Then Eurobarometer
asked respondents if their country benefited from membership. The average
positive response was 54 per cent, but only 36 per cent of Britons (and Austrians and Hungarians) thought
their countr y had benefited. 
Eurobarometer also asked whether people trusted EU institutions. For the Commission, the average answer
across the EU was 47 per cent. In the UK it was 24 per cent, a much lower per centage than in the next most
Commission-phobic countr y, Latvia, wher e it was 37 per cent. As for the Eur opean Parliament, 52 per cent
of EU citizens tr ust it, but only 27 per cent of Britons. The British results in this Eurobarometer survey were
a little mor e positive than in the pr evious poll, published in December 2007, per haps because in the second
half of 2007 ar guments about tr eaty change and r eferendums fuelled eurosceptic sentiment in the UK.
When I travel around Europe, and people ask me why the British are eurosceptic, I offer four explanations
– three of which are easily understood. The first of these is geography and the effect it has had on Britishhistory. The British people live on an island on the edge of the continent and have always been inspired bythe oceans. The British talk of Europe as another place (as the Finns, Irish and Portuguese sometimes do).Britain’s history has been very different to that of most continental powers. Its colonies, trade, investmentsand patterns of emigration and immigration have been focused on North and South America, Africa andAsia as much as on Europe. To some extent France, Holland, Spain and Portugal shared this maritimeexperience, while the other European states sought to build empires or wield influence mainly in theirneighbourhoods. Although Britain has been involved in countless European wars, its history has been moreorientated to other continents than that of any continental power. Even France, which had colonies all overthe world, has focused its ambitions on Europe for much of its history. 
Today, London is by far the most cosmopolitan city in Europe; more than 30 per cent of its population was
born outside the UK. Although a little more than half of Britain’s merchandise trade is now with the rest of
the EU, many Britons believe that their country would flourish as a global hub for trade and investment,outside the EU, unencumber ed by the rules and regulations of Brussels. Churchill famously told de Gaulle
that, faced with a choice between the continent and 
le grand large , the British would always choose the wide
open seas.
Britain’s relatively glorious role in the Second World War plays a potent role in nourishing euroscepticism.
Virtually every other major European nation has something to be ashamed of in that war. A lot of countrieswere on the wrong side. Others were conquered. And others stayed neutral. British popular culture is stillheavily focused on the Second World War as the country’s ‘finest hour’. Other countries have moved on,and suppor ted the EU as means of ensuring that the hor rors of the Nazi period can never be r epeated. But
the British do not want to for get the histor y of which they feel pr oud. Memories of the war give them a2 Eurobarometer 69, June 2008.1 An earlier version of this essay
appeared in French in ‘Notre
Europe’, a book edited by Michel
Rocard and Nicole Gnesotto and
published by Robert Laffont in
2008. This revised version has 
benefited fr om the comments of
Katinka Barysch, Hugo Brady,Clara Marina O’Donnell, 
Simon Tilford and Philip Whyte. smug sense of moral superiority vis-à-vis most of the other peoples of Europe. Margaret Thatcher often said
that the continent of Eur ope has been the source of most of Britain’s ills, and that the Anglo-Saxon nations,
and in particular the Americans, have repeatedly rescued Britain from those ills. Many Britons, especially
the older generation, would agree with her.
In addition to geography and history, economics helps to explain British euroscepticism. Since the mid-
1990s, the UK economy has out-performed the leading economies of Western Europe – France,Germany and Italy – by most measures. Britain has had relatively high growth and low unemployment.Its economy has some evident weaknesses, such as quite poor productivity. But Britain has appeared to
benefit from the structural reforms of the Thatcher period, such as the liberalisation of labour markets,the openness to foreign investment and – though this is now open to some challenge – the fostering ofvibrant financial markets. 
In the 1970s, Britain was regularly described as the sick man of Europe; nowadays that epithet is most often
applied to Italy. In the late 1990s and early 2000s the unemployment rate in Spain, Italy, France and
Germany was around twice that of the UK. The contrast in economic fortunes between Britain and theeurozone is the biggest reason why Tony Blair’s government never found the courage to fight a referendum
on joining the euro. So long as Euroland seemed beset with economic problems, and Britain was booming,it was extremely hard to make a convincing case that joining the euro would benefit the UK. In the 1980s,
many British politicians had argued that Britain had a lot to learn from the way that France and Germany
ran their economies. By the late 1990s the predominant view in the UK was that the rest of Europe had alot to learn from the British.
Of course, the financial crash of 2008, and the fact that Britain is entering a particularly severe recession,
is likely to diminish the hubris of Britain’s political class about their economic model. It is no longer self-evident that an economy benefits fr om being as orientated to services and financial markets as is Britain. 
However, even if the British economy performs less well than its continental peers for a few years – which
is likely – I doubt that many Britons will year n to adopt the eur o or demand that their economy be r un like
those on the continent. And despite the r ecent financial and economic tur moil, political leaders in countries
such as France, Ger many and Italy know that their economies suffer from serious structural problems and
that they need to copy many of the r eforms that the British (and the Nor dic countries) have implemented
in recent decades. 
However , if the British economy underper formed, compar ed with its peers, for a pr olonged period, one
cause of British eur oscepticism would be r emoved.
Britain’s unique media
The fourth explanation for Britain’s hostility to the EU, which is not easily understood outside the UK, is
that Britain has a uniquely powerful and eurosceptic popular press. Ironically, some of the best mediaorganisations that cover the EU, such as the 
Financial Times , The Economist and Reuters, are UK-based.
But of the roughly 30 million people who read a daily newspaper in Britain, three-quarters read papers thatare determined to make people dislike the EU. The remaining quarter read papers which, though broadlypro-European, still print much that criticises the EU. In the eurosceptic newspaper groups, journalists areexpected to write stories that knock the Union. Articles which attempt to present a balanced account of anEU issue are unlikely to be published. The 
Times and the Daily Telegraph , two serious newspapers, almost
never print an opinion piece that is supportive of the EU or what it is trying to do.
The national written pr ess is particularly influential in Britain, compared with other EU states, and the
internet has not yet changed this. The total circulation of national titles is much larger than in France: 11
million against 2.5 million (in France many more people buy regional papers and news magazines). Theownership of the UK written pr ess is also very concentrated. Four newspaper groups – the 
Daily Mail and
those controlled by Rupert Murdoch (the Sunand the Times ), Richard Desmond (the Express and the Star)
and the Barclay brothers (the Daily Telegraph ) – account for about 75 per cent of daily newspapers sold,
and generally impose a rigidly eur osceptic line on their journalists. The competition among national titles
is also very strong. This encourages bold, striking and often inaccurate front pages.
To be fair to the British tabloids, the EU sometimes does its best to help them portray the Union in a sinister
light. The annual refusal of the Court of Auditors to give an unqualified approval of the EU accountspresented the anti-EU pr ess with a field day (though in November 2008 the Cour t gave the accounts an
unqualified appr oval). So does the fact that the Common Agricultural Policy still accounts for nearly half3the EU budget, still causes great damage to farming in developing countries, and is still spent mainly on rich
farmers rather than poor ones. The inability of the EU institutions to explain simply and clearly why they
do what they do, and how EU policies and programmes help ordinary citizens, is legendary.
Never theless, in no other European country is it acceptable for leading journalists to report tendentiously
on, or even lie about, the EU. I have twice, in 2004 (at the time of the agreement on the constitutional treaty)
and in 2007 (at the time of the agreement on the Lisbon treaty), spent a couple of weeks analysingnewspaper coverage of the EU. I shall start with some examples from 2004. Edward Heathcoat-Amorywrote in the 
Daily Mail that the constitutional treaty meant the British would “have to give up our vital
seat on the UN Security Council if the EU Foreign Minister asked for it”. In the same paper Melanie Phillipssaid of the European Court of Justice that its “overt purpose is to bring about a super-state”. And in the
Times , Irwin Stelzer claimed the constitution would force Britain to give up the pound, even if there was no
UK referendum on joining the euro. 
Now for some examples from 2007. A Daily Telegraph leader said it was “an atrocity” that the royal
family’s coat of arms would be banned from the cover of British passports. A Sunday Express piece on the
European gendarme force, which involves five EU member-states (but not Britain), said that “Brussels has
set up a new EU police force that could patrol the streets of Britain”. A Sunarticle on the treaty’s passerelle
clause said that “further vetoes could be given up by EU leaders without the permission of our Parliament.”
Of course, all these claims are entirely false. In virtually any week of the year, there are similarly bogus stories
about the EU. In the words of David Rennie, the current Economist correspondent in Brussels: “The EU has
become the equivalent of the fat boy with glasses who is bullied each break time: it is just what happens, itis cost free.” That is true. So is this comment from Brian Cathcart, professor of journalism at Kingston
University. “HL Menken once said that no one ever went broke underestimating the taste of the Americanpublic; by the same token, no newspaper publisher will go br oke overestimating the euroscepticism of the
British public.”
Journalists get away with writing factual inaccuracies because they ar e accountable to no one but their
bosses and they face no sanction. The British system of pr ess regulation – r un by the Pr ess Complaints
Commission – is a voluntar y body that has no teeth and does virtually nothing to encourage truth-telling
or balance. Politicians ar e too scar ed to ask for a mor e rigor ous system of pr ess regulation, for they know
that anyone who did so would become a 
bête noir efor the tabloids. Jour nalists also get away with writing
lies about the EU because it does not, as a policy , sue in the courts.
Unfor tunately , some of the ludicr ous stories that appear in the British press really influence what politicians
say and do. For example, just before the June 2007 European Council, which approved what became the
Lisbon treaty, the Sunwas particularly vocal about the treaty’s new foreign policy provisions. A piece
headlined “Britain surrendering its seat at the UN” said that the new treaty would mean “a newinternational affairs minister dictating UK policy at the UN.” Gordon Brown, then in his last days asChancellor of the Exchequer, intervened in the Whitehall machinery to harden the UK line on the newforeign policy institutions. As a result, Foreign Secretary Margaret Beckett went to Brussels to announce atougher UK line, including a veto on the creation of the ‘external action service’, the new body designed tobring together diplomats from the member-states with those in the Council and the Commission who workon foreign policy. Then at the summit itself Prime Minister Tony Blair changed British policy back to whereit had been and accepted the external action service.
Such examples of the popular press shifting the EU policy of the Labour government are, thankfully, rare.
But the press does have a big influence on the way ministers present policy. They regularly brief the tabloidsthat they ar e fighting nefarious schemes dreamed up by the Commission or other countries. They do so in
the hope of seeing articles that portray them as fighting bloody but unbowed for the sake of British interests.
Often, however, such stories bear very little relationship to what the minister concerned has in fact said inthe Council of Ministers. The British media – and not just the tabloids, but also the BBC – like to por tray
Brussels as a story about epic battles, victories and defeats. The truth is that almost everything the EU does
helps to bring about compromises that benefit all or nearly all member-states. But that – often dull – truthdoes not make for thrilling jour nalism.
Strangely, the most eurosceptic newspapers – the ones which claim that Brussels bureaucrats exercise
increasing power over Britain – do not bother to have full-time correspondents in Brussels. They prefer towrite EU stories out of London, where of course they are less likely to get their facts right. Currently, of thenational dailies, only the 
Financial T imes, Guar dian and Times (whose editorial line is anti-EU, but whose
reporting is sometimes good) have staf f correspondents in Br ussels.4The broadcast media, and notably the BBC, tend to report more fairly than the tabloids. But BBC journalists
are prone to follow an agenda set by tabloid stories.  The BBC is often accused of elitism, especially by
eurosceptic lobbies. So it tends to bend over backwards to avoid the charge by making extra efforts to
accommodate populist and eurosceptic viewpoints. 
Of course, not everyone believes everything they read in the press. But the steady drip, drip, of anti-EU
propaganda over many years, having permeated deep into Britain’s political culture, has made a majorcontribution to the shift in British public opinion since the late 1980s: the country has become more eurosceptic. 
Politicians in the rest of Europe quite rightly ask why Britain’s leaders have to accept the tyranny of the
popular press. Why can they not take on the 
Daily Mail and the Sun, make speeches about error-prone
tabloid reporting, and explain all the good things the EU does? Part of the answer is that very few politicians
would see it as being in their interest to do so. I know pro-EU cabinet ministers in the Blair and Brown
governments who believe that if they spoke openly about their support for the EU, their careers would beseriously damaged. They are rather like homosexual ministers who, until the 1990s, had to keep quiet about
their sexual orientation for fear of the media reaction.
But part of the answer is more complicated. In most European countries, those who dislike the EU tend
to be the poor and the less educated, who fear for their future and travel little. The politicians who speak
for such people tend to come from the far left or far right. Those who are well-educated, travel a lot and
lead comfortable lives usually support the EU. The mainstream political parties in most member-statesare broadly ‘pro-European’. Britain, however, is different. A significant section of its ruling class is anti-
French, anti-German and, especially, anti-Brussels. Even amongst politically moderate and highlyintelligent people, one sometimes hears disparaging comments about theFrench and the Germans that, if said about other ethnic groups, would be
socially unacceptable.
3One of the defining characteristics of the moder n
Conservative Party is its hostility to the EU and its institutions.
Britain’s parochial ruling classes
Thus a fifth r eason for the euroscepticism of the British is cultural and social. The ruling classes – because
of the four r easons alr eady mentioned – hold attitudes to the EU that ar e not common in other member -
states. The r esult is that few political, media or business leaders have sought to lead and educate the British
people on how they benefit fr om the European Union. 
Consider the cabinet of the cur rent Labour gover nment. Only four or five of its 23 members could be
described as pro-European, with an interest in and some knowledge of the EU (though some others know
about particular areas of EU policy). The Conservative shadow cabinet is worse. I do not believe that anyof its members has devoted much time or attention to learning about the EU. The Liberal Democrat partyis something of an exception. Both the current leader, Nick Clegg, and the rival he defeated to get the job,Chris Huhne, are former MEPs with a profound knowledge of the EU. The previous leader, MenziesCampbell, is also a pro-European. 
Consider the political editors of the national daily newspapers. Only one of them could be described as
knowledgeable about Europe – and his previous job was Brussels correspondent. Of the two-dozen mostinfluential political commentators who write in the British press, perhaps only three know much about the EU.
If you want to succeed in politics or the media in Britain, make sure you do not know too much about
Europe. If you know too much you risk being branded as a nerd who is out of touch with what mostBritish people think. I once asked a Labour MP who is now a senior cabinet minister if he wanted tobe appointed a UK representative to the Convention on the Future of Europe (which helped to drawup the constitutional treaty). He told me no: although a pro-European, he thought that taking part inthe Convention would be the death of his political car eer. Ignorance of or hostility to Europe is
certainly no handicap in the world of British journalism or politics. I have heard BBC current affairs
journalists confess that they cover up their true pro-European feelings for the sake of their careers: oneof the best ways to advance is to make eur osceptic comments that get you noticed; then senior editors
will praise you for being ‘in touch’.
But what about the worlds of business and finance? Surely top businessmen and bankers understand how
the UK gains from being part of the world’s largest single market? It is true that many business leaders arebroadly pr o-Eur opean. But few of them ar e prepared to speak out on the EU. Some of them campaigned
for the eur o, but became bitter when the Labour gover nment was too cowar dly to ever make a for ceful case5
3 To be fair to the British, plenty of
French and German people speakabout ‘Anglo-Saxons’ in an equally
derogatory manner. for the single currency. Now many of these business leaders say that it is up to politicians to give a lead on
Europe. In any case, in business circles, it is increasingly fashionable to argue that the EU is an out-of-date,
failing project that will not survive the era of globalisation. Small businesses have always tended to be
eurosceptic. Many of them see the EU not as an instrument for making it easier to trade and invest abroad,or to impor t cheaper workers, but rather as a source of red tape. 
The City of London, which has thrived outside the euro – though it makes lots of money from trading paper
denominated in euros – has become markedly more anti-EU in the last ten years. Many financiers have cometo regard the EU principally as the source of every regulation they dislike. In fact the British government has
usually fought off European challenges to its own ‘light touch’ approach to financial regulation.
The EU is the source of some bothersome financial and business regulations, but the British government has
often voted in favour and then implemented them in an over-detailed manner. Bankers and business people
sometimes forget that if the EU did not set common standards on, say, the use of industrial chemicals orbank capital requirements, they could not benefit from the EU’s single market.
Business people are particularly hostile to attempts to set social or labour market norms in Brussels. That
is why the Confederation of British Industry (CBI) campaigned so vehemently against the charter offundamental rights, a non-binding set of principles that was included in both the constitutional treaty and
the Lisbon treaty. That charter contains aspirations such as the right to strike and the right to join a trade
union. Because of the CBI, the Labour government insisted on amendments to the Lisbon treaty to clarifythat the charter will have no effect in Britain. But that in turn upset Britain’s trade unions. They had become
broadly pro-European in the 1990s, thanks to Margaret Thatcher’s hostility to the EU and Jacques Delors’sadvocacy of social Europe. Now many British trade unions are fairly eurosceptic, viewing the EU as an
organisation designed to promote the interests of business.
Reasons for hope
So is Britain doomed to become ever mor e eur osceptic, and per haps one day to leave the EU? Despite
everything I have written in this essay , I do not believe this to be the case. But I am not optimistic in the
short term. The Conservatives appear to be a profoundly eurosceptic party, divided between those who
dislike the EU but accept that Britain is better of f in than out, and those who want to leave. Most of the
party’s leaders ar e ignorant about the EU and many of them have r emarkably few contacts with other
European politicians. In power, the Conservatives are likely to provoke crises in the UK-EU relationship, for
example by tr ying to pull out of the social chapter of the tr eaties or EU defence policy . However , I would
predict that after a few years the har d facts of Eur opean and global power politics – namely that Britain on
its own cannot achieve many of its objectives – will make the Conservatives more responsible. 
In the long run, I am more optimistic. Indeed, I would argue that the UK is in many ways a pro-European
country, even if its people do not understand why. To use a Marxist concept, the ‘base’ of Britain’s economyand society is profoundly European, and sooner or later the ‘superstructure’ of the political and media elitewill have to reflect that reality.
On the continent, and also in Britain, many people believe that the British economy is closer to that of the
US, with its low taxation and minimal welfare state, than those of mainland Europe. Wrong. The British,like many other Europeans, like to live in a country with quite high levels of taxation and welfare spending.For example, spending by the state as a percentage of GDP was 44.6 per cent in the UK in 2007, comparedwith a eurozone average of 46.4 per cent and the US figure of 37.4 per cent. In 2007 the German state spentless than the British state: 44.3 per cent of GDP.
Britain has a higher proportion of its workers in a trade union than most EU member-states, including
France. Over the past ten years the Labour government has roughly doubled spending on health andeducation,  bringing the level in Britain to EU nor ms. Measures of inequality remain quite high in Britain,
mainly due to the consequences of a lot of children leaving school at 16, without useful skills. However,
Italy, Greece and Portugal have higher levels of poverty and inequality – and much more labour marketregulation – than Britain. What does differentiate the UK economy from many others in the EU is not
only the size of its financial sector but also its openness to foreign investment, including takeovers byforeign firms. 
The values held by British people tend to line up with those of other Europeans, rather than Americans,
though of course ther e is much variation among the dif ferent Eur opean peoples. Britain is a secular society
in which less than 20 per cent of the population worship r egularly , while in the US about half the population6attends church once a week. Ronald Inglehart has researched values across the world’s major countries,
interviewing 120,000 people in 1999-2001. Based on those interviews, he has drawn up an index of
secular/rational values against traditional ones: on Inglehart’s measure, a society that scores 2.0 is very
secular and rational, while –2.0 indicates a very traditional society. The United States scores –0.5 on thisindex. The EU states ar e spread out between 0.35 and 1.5, Sweden and Germany having the highest figures,
with the UK, Italy, Belgium, Austria and Spain scoring about 0.4.
On international relations, too, the British are broadly European in their world view. They may not love the
EU but they do believe in multilateral systems of governance, international law and a strong UN. Many
Americans differ on those points. It is true that the British, like the French and the Americans, are morelikely to countenance the use of force to solve an international problem than are many other Europeans. Butthe British people have been almost as hostile to the US-led invasion of Iraq as most continental Europeans.
Therefore there is nothing in the structure of the British economy, or the nature of British values, that should
prevent the UK from making a positive contribution to the EU. Indeed, the way the EU is evolving should
make it easier for the British to engage. The enlargements of the EU in 2004 and 2007 have changed itsnature profoundly. With the arrival of ten Central and East European countries, there never will be a
majority of member-states in favour of the kind of highly-integrated system of European government thatseemed plausible 20 years ago, nor for a Union that seeks to act as a counterweight to the US. Most of the
new members are more or less Atlanticist, opposed to a lot of centrally-set economic and social rules, and
in favour of free trade. 
One reason why the British dislike the EU is that they have seen it as dominated by France and Germany,
promoting Franco-German interests. There was sometimes some truth in that caricature, but no longer.
France and Germany cannot dominate an EU of 27 countries. Furthermore, some of the countries that usedto be most suppor tive of European integration have recently become more sceptical of it – for example,
France, Germany, Ireland and the Netherlands. So even if the British people tend to be the most
eurosceptical, as the Eurobarometer surveys show, their views on the EU are less divergent from the rest of
Europe than they used to be.
In this wider , looser, Europe, there are bound to be groups of countries that form 
avant-gar desto hasten
integration in par ticular ar eas, as many countries have done with the eur o. But the idea that the EU as a
whole could become some sor t of super -state – an idea that nourishes much of the passion in British
eurosceptism – is now laughable. Only in Belgium and Luxembourg, and to some degree in Italy and
Germany , can one find influential politicians who ar e ‘federalist’ in the sense that Helmut Kohl and Jacques
Delors wer e in the early 1990s.
The British are the last people in Europe to understand how the EU is changing. The French understand
very well, which is why more than half of them voted against the constitutional treaty in 2005. To many ofthe French, that treaty symbolised the increasingly Ango-Saxon and economically liberal nature of the EU.Ironically, British eurosceptics opposed the treaty because they saw it as a vehicle for continental socialismand federalism.
The main reason why I am optimistic about Britain’s role in Europe is that British euroscepticism has always
been largely about institutions. It thrives when the papers are full of stories about new treaties, the loss ofnational vetoes, institutional change and the erosion of sovereignty. And ever since the Amsterdam treaty,agreed in 1997, there has been almost constant discussion of these dull and complicated but – to some – scarysubjects. The British perception of the EU is that it is all about process rather than about doing useful things.
Now, however, there is a real prospect that treaty change will be off the agenda for a long period. Either the
Irish will ratify the Lisbon treaty, allowing it to enter into force across the EU, or they will not. If the Lisbon
treaty is adopted, there will be no demand for another new treaty any time soon. The difficulties of gettingany tr eaty ratified in so many member-states are too great. If the Lisbon treaty is not adopted, some
governments will huff and puff and talk about treaty change and leadership groups, but in practice the EU
will have to learn to live with the existing treaties and all their imperfections. Either way, the EU is unlikelyto spend the next decade talking about tr eaty change.
That is good news for the British debate on Europe. The issues that will shape the way the EU develops in
the coming years – the need to tackle climate change, enhance energy security, co-operate in the fight againstcrime and terrorism, manage migration, respond to the rise of China’s economic power and stand up toRussia – ar e of huge inter est to British people. Only the most crazed of eur osceptics could ar gue that Britain
can deal ef fectively with these challenges on its own. If, as I expect, the EU focuses incr easingly on substance7rather than process, the eurosceptics will be deprived of their most powerful arguments. An EU that delivers
real benefits to the British people will become more popular, despite the best efforts of some newspapers to
tarnish its image.
#
Charles Grant is director of the Centre for European Reform. 
December 20088
For fur ther infor mation, visit our website
www .cer.org.uk

In [85]:
# verify min number of words
docs["text"].apply(lambda x: len(x.split())).min()

1001

In [86]:
# verify min 5 links
final_df["source_links"].apply(len).min()

5

In [87]:
# verify that all links are in docs
final_df["source_links"].explode().isin(docs["url"]).all()

True