# DefaultTokenizer
<div style="position: absolute; right:0;top:0"><a href="./tokenizer.ipynb" style="text-decoration: none"> <font size="5">←</font></a>
<a href="../evaluation.py.ipynb" style="text-decoration: none"> <font size="5">↑</font></a></div>

This module provides the `DefaultTokenizer` class that transforms the `text` of a document into `tokens`.
It executes various functions that are controlled by the `token_info` settings.
You can chose dataset, token version and a document to see the effect of various settings.
The array `fixed_tokens` is used for all tokens that are no further processed.

---
## Setup and Settings
---

In [1]:
from __init__ import init_vars
init_vars(vars(), ('info', {}), ('runvars', {}))

import re
import emoji
    
import data
import config
from base import nbprint
from util import ProgressIterator, add_method

import tokenizer.common
import tokenizer.emoticons
from tokenizer.token_util import iterate_tokens, TokenizerBase
from tokenizer.widgets import token_picker, run_and_compare, show_comparison

if RUN_SCRIPT: token_picker(info, runvars)

VBox(children=(Dropdown(description='Dataset', layout=Layout(width='400px'), options=(('ACM', 'acm'), ('ATD', …

0,1
Data Name,Reuters (exists)
Token,exists
Token Version,B0
Class,default_tokenizer.DefaultTokenizer
Settings,"numbers: skip, numbers_split: True, urls: domain, ascii_emotes: skip, unicode_emotes: skip, lowercase: True, alnum_only: weak, ascii_only: False"
Document,
Id,0
,"ASIAN EXPORTERS FEAR DAMAGE FROM U. S.- JAPAN RIFT Mounting trade friction between the U. S. And Japan has raised fears among many of Asia's exporting nations that the row could inflict far - reaching economic damage, businessmen and officials said. They told Reuter correspondents in Asian capitals a U. S. Move against Japan might boost protectionist sentiment in the U. S. And lead to curbs on American imports of their products. But some exporters said that while the conflict would hurt them in the long - run, in the short - term Tokyo's loss might be their gain. The U. S. Has said it will impose 300 mln dlrs of tariffs on imports of Japanese electronics goods on April 17, in retaliation for Japan's alleged failure to stick to a pact not to sell semiconductors on world markets at below cost. Unofficial Japanese estimates put the impact of the tariffs at 10 billion dlrs and spokesmen for major electronics firms said they would virtually halt exports of products hit by the new taxes. "" We wouldn't be able to do business,"" said a spokesman for leading Japanese electronics firm Matsushita Electric Industrial Co Ltd & lt ; MC. T >. "" If the tariffs remain in place for any length of time beyond a few months it will mean the complete erosion of exports (of goods subject to tariffs) to the U. S.,"" said Tom Murtha, a stock analyst at the Tokyo office of broker & lt ; James Capel and Co >. In Taiwan, businessmen and officials are also worried. "" We are aware of the seriousness of the U. S. Threat against Japan because it serves as a warning to us,"" said a senior Taiwanese trade official who asked not to be named. Taiwan had a trade trade surplus of 15.6 billion dlrs last year, 95 pct of it with the U. S. The surplus helped swell Taiwan's foreign exchange reserves to 53 billion dlrs, among the world's largest. "" We must quickly open our markets, remove trade barriers and cut import tariffs to allow imports of U. S. Products, if we want to defuse problems from possible U. S. Retaliation,"" said Paul Sheen, chairman of textile exporters & lt ; Taiwan Safe Group >. A senior official of South Korea's trade promotion association said the trade dispute between the U. S. And Japan might also lead to pressure on South Korea, whose chief exports are similar to those of Japan. Last year South Korea had a trade surplus of 7.1 billion dlrs with the U. S., Up from 4.9 billion dlrs in 1985. In Malaysia, trade officers and businessmen said tough curbs against Japan might allow hard - hit producers of semiconductors in third countries to expand their sales to the U. S. In Hong Kong, where newspapers have alleged Japan has been selling below - cost semiconductors, some electronics manufacturers share that view. But other businessmen said such a short - term commercial advantage would be outweighed by further U. S. Pressure to block imports. "" That is a very short - term view,"" said Lawrence Mills, director - general of the Federation of Hong Kong Industry. "" If the whole purpose is to prevent imports, one day it will be extended to other sources. Much more serious for Hong Kong is the disadvantage of action restraining trade,"" he said. The U. S. Last year was Hong Kong's biggest export market, accounting for over 30 pct of domestically produced exports. The Australian government is awaiting the outcome of trade talks between the U. S. And Japan with interest and concern, Industry Minister John Button said in Canberra last Friday. "" This kind of deterioration in trade relations between two countries which are major trading partners of ours is a very serious matter,"" Button said. He said Australia's concerns centred on coal and beef, Australia's two largest exports to Japan and also significant U. S. Exports to that country. Meanwhile U. S.- Japanese diplomatic manoeuvres to solve the trade stand - off continue. Japan's ruling Liberal Democratic Party yesterday outlined a package of economic measures to boost the Japanese economy. The measures proposed include a large supplementary budget and record public works spending in the first half of the financial year. They also call for stepped - up spending as an emergency measure to stimulate the economy despite Prime Minister Yasuhiro Nakasone's avowed fiscal reform program. Deputy U. S. Trade Representative Michael Smith and Makoto Kuroda, Japan's deputy minister of International Trade and Industry (MITI), are due to meet in Washington this week in an effort to end the dispute."




FloatProgress(value=0.0, bar_style='info', layout=Layout(visibility='hidden'), max=1.0, style=ProgressStyle(de…

---
## Tokenize Document
---
The following functions consitute the `DefaultTokenizer` class that transforms the raw text of a document into tokens.

In [2]:
URLS_OPTIONS = ['skip', 'keep', 'domain', 'replace', 'drop']
EMOTES_OPTIONS = ['skip', 'keep', 'replace', 'drop']
ALNUM_OPTIONS = ['skip', 'weak', 'apostrophe', 'strict']
NUMBERS_OPTIONS = ['skip', 'decimal-on-one', 'all-on-one', 'drop']

class DefaultTokenizer(TokenizerBase):
    
    def __init__(self, info):
        super().__init__(info)
        token_info = info.get('token_info', {})
        
        self.urls = token_info.get('urls', 'skip')
        try:
            self.urls_idx = URLS_OPTIONS.index(self.urls)
        except ValueError:
            raise config.ConfigException(('Invalid DefaultTokenizer configuration option urls "{}". '
                                          'Valid options are "{}".').format(self.urls, '", "'.join(URLS_OPTIONS)))
        
        self.ascii_emotes   = token_info.get('ascii_emotes', 'skip')
        try:
            self.ascii_emotes_idx = EMOTES_OPTIONS.index(self.ascii_emotes)
        except ValueError:
            raise config.ConfigException(('Invalid DefaultTokenizer configuration option ascii_emotes "{}". '
                                          'Valid options are "{}".').format(self.ascii_emotes, '", "'.join(EMOTES_OPTIONS)))
        
        self.unicode_emotes = token_info.get('unicode_emotes', 'skip')
        try:
            self.unicode_emotes_idx = EMOTES_OPTIONS.index(self.unicode_emotes)
        except ValueError:
            raise config.ConfigException(('Invalid DefaultTokenizer configuration option unicode_emotes "{}". '
                                          'Valid options are "{}".').format(self.unicode_emotes, '", "'.join(EMOTES_OPTIONS)))
        
        self.alnum_only     = token_info.get('alnum_only', True)
        try:
            self.alnum_only_idx = ALNUM_OPTIONS.index(self.alnum_only)
        except ValueError:
            raise config.ConfigException(('Invalid DefaultTokenizer configuration option alnum_only "{}". '
                                          'Valid options are "{}".').format(self.alnum_only, '", "'.join(ALNUM_OPTIONS)))
        
        self.numbers        = token_info.get('numbers', 'skip')
        try:
            self.numbers_idx = NUMBERS_OPTIONS.index(self.numbers)
        except ValueError:
            raise config.ConfigException(('Invalid DefaultTokenizer configuration option numbers "{}". '
                                          'Valid options are "{}".').format(self.numbers, '", "'.join(NUMBERS_OPTIONS)))
        
        
        self.lowercase      = token_info.get('lowercase', True)
        self.numbers_split  = token_info.get('numbers_split', False)
        self.ascii_only     = token_info.get('ascii_only', True)
        
if RUN_SCRIPT:
    default_tokenizer = DefaultTokenizer(info)

### Prepare

Splits the text at whitespace and initializes an empty list of fixed tokens. The `separator_token` is replaced with the `separator_token_replacement`, so that it can be used for saving the tokens as a string, i.e.
```
This:is:a:token:list
```

In [3]:
@add_method(DefaultTokenizer)
def init_tokenization(self, text):
    self.fixed_tokens = []
    self.text = text.replace(tokenizer.common.separator_token,tokenizer.common.separator_token_replacement)
    self.tokens = text.split()

if RUN_SCRIPT:
    default_tokenizer.init_tokenization(runvars['document']['text'])
    show_comparison(default_tokenizer.text, default_tokenizer.tokens, 'Text', 'Tokens')

0,1
Text,Tokens
"ASIAN EXPORTERS FEAR DAMAGE FROM U. S.- JAPAN RIFT Mounting trade friction between the U. S. And Japan has raised fears among many of Asia's exporting nations that the row could inflict far - reaching economic damage, businessmen and officials said. They told Reuter correspondents in Asian capitals a U. S. Move against Japan might boost protectionist sentiment in the U. S. And lead to curbs on American imports of their products. But some exporters said that while the conflict would hurt them in the long - run, in the short - term Tokyo's loss might be their gain. The U. S. Has said it will impose 300 mln dlrs of tariffs on imports of Japanese electronics goods on April 17, in retaliation for Japan's alleged failure to stick to a pact not to sell semiconductors on world markets at below cost. Unofficial Japanese estimates put the impact of the tariffs at 10 billion dlrs and spokesmen for major electronics firms said they would virtually halt exports of products hit by the new taxes. "" We wouldn't be able to do business,"" said a spokesman for leading Japanese electronics firm Matsushita Electric Industrial Co Ltd & lt , MC. T >. "" If the tariffs remain in place for any length of time beyond a few months it will mean the complete erosion of exports (of goods subject to tariffs) to the U. S.,"" said Tom Murtha, a stock analyst at the Tokyo office of broker & lt , James Capel and Co >. In Taiwan, businessmen and officials are also worried. "" We are aware of the seriousness of the U. S. Threat against Japan because it serves as a warning to us,"" said a senior Taiwanese trade official who asked not to be named. Taiwan had a trade trade surplus of 15.6 billion dlrs last year, 95 pct of it with the U. S. The surplus helped swell Taiwan's foreign exchange reserves to 53 billion dlrs, among the world's largest. "" We must quickly open our markets, remove trade barriers and cut import tariffs to allow imports of U. S. Products, if we want to defuse problems from possible U. S. Retaliation,"" said Paul Sheen, chairman of textile exporters & lt , Taiwan Safe Group >. A senior official of South Korea's trade promotion association said the trade dispute between the U. S. And Japan might also lead to pressure on South Korea, whose chief exports are similar to those of Japan. Last year South Korea had a trade surplus of 7.1 billion dlrs with the U. S., Up from 4.9 billion dlrs in 1985. In Malaysia, trade officers and businessmen said tough curbs against Japan might allow hard - hit producers of semiconductors in third countries to expand their sales to the U. S. In Hong Kong, where newspapers have alleged Japan has been selling below - cost semiconductors, some electronics manufacturers share that view. But other businessmen said such a short - term commercial advantage would be outweighed by further U. S. Pressure to block imports. "" That is a very short - term view,"" said Lawrence Mills, director - general of the Federation of Hong Kong Industry. "" If the whole purpose is to prevent imports, one day it will be extended to other sources. Much more serious for Hong Kong is the disadvantage of action restraining trade,"" he said. The U. S. Last year was Hong Kong's biggest export market, accounting for over 30 pct of domestically produced exports. The Australian government is awaiting the outcome of trade talks between the U. S. And Japan with interest and concern, Industry Minister John Button said in Canberra last Friday. "" This kind of deterioration in trade relations between two countries which are major trading partners of ours is a very serious matter,"" Button said. He said Australia's concerns centred on coal and beef, Australia's two largest exports to Japan and also significant U. S. Exports to that country. Meanwhile U. S.- Japanese diplomatic manoeuvres to solve the trade stand - off continue. Japan's ruling Liberal Democratic Party yesterday outlined a package of economic measures to boost the Japanese economy. The measures proposed include a large supplementary budget and record public works spending in the first half of the financial year. They also call for stepped - up spending as an emergency measure to stimulate the economy despite Prime Minister Yasuhiro Nakasone's avowed fiscal reform program. Deputy U. S. Trade Representative Michael Smith and Makoto Kuroda, Japan's deputy minister of International Trade and Industry (MITI), are due to meet in Washington this week in an effort to end the dispute.","ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985.; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute."


### URLs
Supports the following options for `urls`:
- `skip`: this step will be skipped
- `keep`: keeps every URL as it is, no further processing
- `domain`: replaces every URL by its top and second level domain
- `drop`: completely removes every URL from the text
- `replace`: replaces every URL with the URL Token

In [4]:
def _process_urls_keep_fct(url_str):
    return url_str
def _process_urls_domain_fct(url_str):
    for prefix in ['http://', 'https://']:
        url_str = url_str.replace(prefix,'')
    slash_index = url_str.find('/')
    if slash_index > 0:
        url_str = url_str[:slash_index]
    if url_str.count('.') > 1:
        url_str = url_str[url_str.rfind('.',0,url_str.rfind('.'))+1:]
    return url_str
def _process_urls_replace_fct(url_str):
    return tokenizer.common.url_token
def _process_urls_drop_fct(url_str):
    pass
_process_urls_fct_selector = [None,
                              _process_urls_keep_fct, 
                              _process_urls_domain_fct, 
                              _process_urls_replace_fct, 
                              _process_urls_drop_fct]
    
@add_method(DefaultTokenizer)
def process_urls_token(self, token):
    if (token.startswith("http://") or
        token.startswith("https://") or
        token.startswith("www.")):
        url_str = self._process_urls_fct(token)
        if url_str is not None:
            self.fixed_tokens.append(url_str)
        return None
    return token

@add_method(DefaultTokenizer)
def process_urls(self):
    self._process_urls_fct = _process_urls_fct_selector[self.urls_idx]
    if self._process_urls_fct is not None:
        iterate_tokens(self.tokens, self.process_urls_token)         
    
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.process_urls, 'tokens', 'fixed_tokens', 'Input', 'Fixed Tokens')

0,1
Input,Fixed Tokens
"ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985.; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute.",


### ASCII Emoticons

Supports the following options for `ascii_emotes`:
- `skip`: this step will be skipped
- `keep`: keeps every ASCII emoticon as it is, no further processing
- `drop`: ompletely removes every ASCII emoticon from the text
- `replace`: replaces with the emote token

In [5]:
def _process_emotes_keep_fct(emoticon):
    return emoticon
def _process_emotes_replace_fct(emoticon):
    return tokenizer.common.emote_token
def _process_emotes_drop_fct(emoticon):
    pass
_process_emotes_fct_selector = [None,
                                _process_emotes_keep_fct,
                                _process_emotes_replace_fct,
                                _process_emotes_drop_fct]

@add_method(DefaultTokenizer)
def process_ascii_emotes_token(self, token):
    for e, remainder in tokenizer.emoticons.western_dict.items():
        parts = token.split(e, 1)
        if len(parts) == 1:
            continue
        else:
            # Test if it is preceded by alphanumeric characters
            pre = parts[0][-1:]
            if pre.isalnum():
                continue
                
            for r in remainder:
                if not parts[1].startswith(r):
                    continue
                post = parts[1][len(r):]
                
                # Test if it is followed by alphanumeric characters
                if post[:1].isalnum():
                    continue
                
                emoticon = self._process_ascii_emotes_fct(e + r)
                if emoticon is not None:
                    self.fixed_tokens.append(emoticon)
                remaining = []
                if len(pre) >= 2:
                    remaining += self.process_ascii_emotes_token(pre)
                else:
                    remaining += pre
                if len(post) >= 2:
                    remaining += self.process_ascii_emotes_token(post)
                else:
                    remaining += post
                return [s for s in parts if len(s) > 0]
    return token

@add_method(DefaultTokenizer)
def process_ascii_emotes(self):
    self._process_ascii_emotes_fct = _process_emotes_fct_selector[self.ascii_emotes_idx]
    if self._process_ascii_emotes_fct is not None:
        iterate_tokens(self.tokens, self.process_ascii_emotes_token)    
            
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.process_ascii_emotes, 'tokens', 'fixed_tokens', 'Input', 'Fixed Tokens')

0,1
Input,Fixed Tokens
"ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985.; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute.",


### Unicode Emoticons

Supports the following options for `unicode_emotes`:
- `skip`: this step will be skipped
- `keep`: keeps every ASCII emoticon as it is, no further processing
- `drop`: ompletely removes every ASCII emoticon from the text
- `replace`: replaces with the emote token

In [6]:
@add_method(DefaultTokenizer)
def process_unicode_emotes_token(self, token):
    tokens = []
    new_token = ''
    for c in token:
        if c in emoji.UNICODE_EMOJI:
            emoticon = self._process_ascii_emotes_fct(c)
            if emoticon is not None:
                self.fixed_tokens.append(emoticon)
            if len(new_token) > 0:
                tokens.append(new_token)
                new_token = ''
        else:
            new_token += c
    if len(tokens) == 0:
        return new_token
    return tokens + [new_token]
    
@add_method(DefaultTokenizer)
def process_unicode_emoticons(self):
    self._process_ascii_emotes_fct = _process_emotes_fct_selector[self.ascii_emotes_idx]
    if self._process_ascii_emotes_fct is not None:
        iterate_tokens(self.tokens, self.process_unicode_emotes_token) 
        
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.process_unicode_emoticons, 'tokens', 'fixed_tokens', 'Input', 'Fixed Tokens')

0,1
Input,Fixed Tokens
"ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985.; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute.",


### Split Numbers

This step splits words that consist of letters, special characters and numbers into distinct words.
`,` and `.` are allowed to occur within numbers and do not lead to splitting up the string.

In [7]:
dezimal_re = re.compile('([0-9]+(?:[,.]+[0-9,.]+)*)')

@add_method(DefaultTokenizer)
def split_numbers_token(self, token):
    return [s for s in dezimal_re.split(token) if len(s) > 0]

@add_method(DefaultTokenizer)
def split_numbers(self):
    if self.numbers_split:
        iterate_tokens(self.tokens, self.split_numbers_token) 
        
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.split_numbers, 'tokens')

0,1
Before,After
"ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985.; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute.","ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17; ,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985; .; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute."


### Lowercase

All letters are lowercased.

In [8]:
@add_method(DefaultTokenizer)
def process_lowercase_token(self, token):
    return token.lower()

@add_method(DefaultTokenizer)
def process_lowercase(self):
    if self.lowercase:
        iterate_tokens(self.tokens, self.process_lowercase_token)
        
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.process_lowercase, 'tokens')

0,1
Before,After
"ASIAN; EXPORTERS; FEAR; DAMAGE; FROM; U.; S.-; JAPAN; RIFT; Mounting; trade; friction; between; the; U.; S.; And; Japan; has; raised; fears; among; many; of; Asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; They; told; Reuter; correspondents; in; Asian; capitals; a; U.; S.; Move; against; Japan; might; boost; protectionist; sentiment; in; the; U.; S.; And; lead; to; curbs; on; American; imports; of; their; products.; But; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; Tokyo's; loss; might; be; their; gain.; The; U.; S.; Has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; Japanese; electronics; goods; on; April; 17; ,; in; retaliation; for; Japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; Unofficial; Japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; We; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; Japanese; electronics; firm; Matsushita; Electric; Industrial; Co; Ltd; &; lt; ;; MC.; T; >.; ""; If; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; U.; S.,""; said; Tom; Murtha,; a; stock; analyst; at; the; Tokyo; office; of; broker; &; lt; ;; James; Capel; and; Co; >.; In; Taiwan,; businessmen; and; officials; are; also; worried.; ""; We; are; aware; of; the; seriousness; of; the; U.; S.; Threat; against; Japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; Taiwanese; trade; official; who; asked; not; to; be; named.; Taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; U.; S.; The; surplus; helped; swell; Taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; We; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; U.; S.; Products,; if; we; want; to; defuse; problems; from; possible; U.; S.; Retaliation,""; said; Paul; Sheen,; chairman; of; textile; exporters; &; lt; ;; Taiwan; Safe; Group; >.; A; senior; official; of; South; Korea's; trade; promotion; association; said; the; trade; dispute; between; the; U.; S.; And; Japan; might; also; lead; to; pressure; on; South; Korea,; whose; chief; exports; are; similar; to; those; of; Japan.; Last; year; South; Korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; U.; S.,; Up; from; 4.9; billion; dlrs; in; 1985; .; In; Malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; Japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; U.; S.; In; Hong; Kong,; where; newspapers; have; alleged; Japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; But; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; U.; S.; Pressure; to; block; imports.; ""; That; is; a; very; short; -; term; view,""; said; Lawrence; Mills,; director; -; general; of; the; Federation; of; Hong; Kong; Industry.; ""; If; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; Much; more; serious; for; Hong; Kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; The; U.; S.; Last; year; was; Hong; Kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; The; Australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; U.; S.; And; Japan; with; interest; and; concern,; Industry; Minister; John; Button; said; in; Canberra; last; Friday.; ""; This; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; Button; said.; He; said; Australia's; concerns; centred; on; coal; and; beef,; Australia's; two; largest; exports; to; Japan; and; also; significant; U.; S.; Exports; to; that; country.; Meanwhile; U.; S.-; Japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; Japan's; ruling; Liberal; Democratic; Party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; Japanese; economy.; The; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; They; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; Prime; Minister; Yasuhiro; Nakasone's; avowed; fiscal; reform; program.; Deputy; U.; S.; Trade; Representative; Michael; Smith; and; Makoto; Kuroda,; Japan's; deputy; minister; of; International; Trade; and; Industry; (MITI),; are; due; to; meet; in; Washington; this; week; in; an; effort; to; end; the; dispute.","asian; exporters; fear; damage; from; u.; s.-; japan; rift; mounting; trade; friction; between; the; u.; s.; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; they; told; reuter; correspondents; in; asian; capitals; a; u.; s.; move; against; japan; might; boost; protectionist; sentiment; in; the; u.; s.; and; lead; to; curbs; on; american; imports; of; their; products.; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; tokyo's; loss; might; be; their; gain.; the; u.; s.; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; ,; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; we; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; &; lt; ;; mc.; t; >.; ""; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; u.; s.,""; said; tom; murtha,; a; stock; analyst; at; the; tokyo; office; of; broker; &; lt; ;; james; capel; and; co; >.; in; taiwan,; businessmen; and; officials; are; also; worried.; ""; we; are; aware; of; the; seriousness; of; the; u.; s.; threat; against; japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named.; taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; u.; s.; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; we; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u.; s.; products,; if; we; want; to; defuse; problems; from; possible; u.; s.; retaliation,""; said; paul; sheen,; chairman; of; textile; exporters; &; lt; ;; taiwan; safe; group; >.; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u.; s.; and; japan; might; also; lead; to; pressure; on; south; korea,; whose; chief; exports; are; similar; to; those; of; japan.; last; year; south; korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; u.; s.,; up; from; 4.9; billion; dlrs; in; 1985; .; in; malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u.; s.; in; hong; kong,; where; newspapers; have; alleged; japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; but; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; u.; s.; pressure; to; block; imports.; ""; that; is; a; very; short; -; term; view,""; said; lawrence; mills,; director; -; general; of; the; federation; of; hong; kong; industry.; ""; if; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; the; u.; s.; last; year; was; hong; kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u.; s.; and; japan; with; interest; and; concern,; industry; minister; john; button; said; in; canberra; last; friday.; ""; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; button; said.; he; said; australia's; concerns; centred; on; coal; and; beef,; australia's; two; largest; exports; to; japan; and; also; significant; u.; s.; exports; to; that; country.; meanwhile; u.; s.-; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy.; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; they; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program.; deputy; u.; s.; trade; representative; michael; smith; and; makoto; kuroda,; japan's; deputy; minister; of; international; trade; and; industry; (miti),; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute."


### Remove non-ascii characters

In [9]:
@add_method(DefaultTokenizer)
def remove_nonascii_token(self, token):
    return token.encode('ascii',errors='ignore').decode()

@add_method(DefaultTokenizer)
def remove_nonascii(self):
    if self.ascii_only:
        iterate_tokens(self.tokens, self.remove_nonascii_token)
        
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.remove_nonascii, 'tokens')

0,1
Before,After
"asian; exporters; fear; damage; from; u.; s.-; japan; rift; mounting; trade; friction; between; the; u.; s.; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; they; told; reuter; correspondents; in; asian; capitals; a; u.; s.; move; against; japan; might; boost; protectionist; sentiment; in; the; u.; s.; and; lead; to; curbs; on; american; imports; of; their; products.; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; tokyo's; loss; might; be; their; gain.; the; u.; s.; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; ,; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; we; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; &; lt; ;; mc.; t; >.; ""; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; u.; s.,""; said; tom; murtha,; a; stock; analyst; at; the; tokyo; office; of; broker; &; lt; ;; james; capel; and; co; >.; in; taiwan,; businessmen; and; officials; are; also; worried.; ""; we; are; aware; of; the; seriousness; of; the; u.; s.; threat; against; japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named.; taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; u.; s.; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; we; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u.; s.; products,; if; we; want; to; defuse; problems; from; possible; u.; s.; retaliation,""; said; paul; sheen,; chairman; of; textile; exporters; &; lt; ;; taiwan; safe; group; >.; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u.; s.; and; japan; might; also; lead; to; pressure; on; south; korea,; whose; chief; exports; are; similar; to; those; of; japan.; last; year; south; korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; u.; s.,; up; from; 4.9; billion; dlrs; in; 1985; .; in; malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u.; s.; in; hong; kong,; where; newspapers; have; alleged; japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; but; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; u.; s.; pressure; to; block; imports.; ""; that; is; a; very; short; -; term; view,""; said; lawrence; mills,; director; -; general; of; the; federation; of; hong; kong; industry.; ""; if; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; the; u.; s.; last; year; was; hong; kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u.; s.; and; japan; with; interest; and; concern,; industry; minister; john; button; said; in; canberra; last; friday.; ""; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; button; said.; he; said; australia's; concerns; centred; on; coal; and; beef,; australia's; two; largest; exports; to; japan; and; also; significant; u.; s.; exports; to; that; country.; meanwhile; u.; s.-; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy.; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; they; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program.; deputy; u.; s.; trade; representative; michael; smith; and; makoto; kuroda,; japan's; deputy; minister; of; international; trade; and; industry; (miti),; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute.","asian; exporters; fear; damage; from; u.; s.-; japan; rift; mounting; trade; friction; between; the; u.; s.; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; they; told; reuter; correspondents; in; asian; capitals; a; u.; s.; move; against; japan; might; boost; protectionist; sentiment; in; the; u.; s.; and; lead; to; curbs; on; american; imports; of; their; products.; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; tokyo's; loss; might; be; their; gain.; the; u.; s.; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; ,; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; we; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; &; lt; ;; mc.; t; >.; ""; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; u.; s.,""; said; tom; murtha,; a; stock; analyst; at; the; tokyo; office; of; broker; &; lt; ;; james; capel; and; co; >.; in; taiwan,; businessmen; and; officials; are; also; worried.; ""; we; are; aware; of; the; seriousness; of; the; u.; s.; threat; against; japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named.; taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; u.; s.; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; we; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u.; s.; products,; if; we; want; to; defuse; problems; from; possible; u.; s.; retaliation,""; said; paul; sheen,; chairman; of; textile; exporters; &; lt; ;; taiwan; safe; group; >.; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u.; s.; and; japan; might; also; lead; to; pressure; on; south; korea,; whose; chief; exports; are; similar; to; those; of; japan.; last; year; south; korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; u.; s.,; up; from; 4.9; billion; dlrs; in; 1985; .; in; malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u.; s.; in; hong; kong,; where; newspapers; have; alleged; japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; but; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; u.; s.; pressure; to; block; imports.; ""; that; is; a; very; short; -; term; view,""; said; lawrence; mills,; director; -; general; of; the; federation; of; hong; kong; industry.; ""; if; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; the; u.; s.; last; year; was; hong; kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u.; s.; and; japan; with; interest; and; concern,; industry; minister; john; button; said; in; canberra; last; friday.; ""; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; button; said.; he; said; australia's; concerns; centred; on; coal; and; beef,; australia's; two; largest; exports; to; japan; and; also; significant; u.; s.; exports; to; that; country.; meanwhile; u.; s.-; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy.; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; they; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program.; deputy; u.; s.; trade; representative; michael; smith; and; makoto; kuroda,; japan's; deputy; minister; of; international; trade; and; industry; (miti),; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute."


### Remove Non-alphanumeric characters

Removes non alphanumeric characters. Possible options for `alnum_only` are:
- `skip`: Retains all characters
- `weak`: Retains `'`, `@`, `#`, and `_`
- `apostrophe`: Retains `'`
- `strict`: Removed all characters except `a-z`, `A-Z` and `0-9`

In [10]:
_remove_nonalnum_re_selector = [
    None,
    re.compile('[^a-zA-Z0-9\'@#_]'),
    re.compile('[^a-zA-Z0-9\']'),
    re.compile('[^a-zA-Z0-9]')]

@add_method(DefaultTokenizer)
def remove_nonalnum_token(self, token):
    return [s 
            for s in _remove_nonalnum_re_selector[self.alnum_only_idx].split(token) 
            if len(s) > 0]

@add_method(DefaultTokenizer)
def remove_nonalnum(self):
    if self.alnum_only:
        iterate_tokens(self.tokens, self.remove_nonalnum_token)
    
if RUN_SCRIPT:
    run_and_compare(default_tokenizer, default_tokenizer.remove_nonalnum, 'tokens')

0,1
Before,After
"asian; exporters; fear; damage; from; u.; s.-; japan; rift; mounting; trade; friction; between; the; u.; s.; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; -; reaching; economic; damage,; businessmen; and; officials; said.; they; told; reuter; correspondents; in; asian; capitals; a; u.; s.; move; against; japan; might; boost; protectionist; sentiment; in; the; u.; s.; and; lead; to; curbs; on; american; imports; of; their; products.; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; -; run,; in; the; short; -; term; tokyo's; loss; might; be; their; gain.; the; u.; s.; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; ,; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost.; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes.; ""; we; wouldn't; be; able; to; do; business,""; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; &; lt; ;; mc.; t; >.; ""; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; (of; goods; subject; to; tariffs); to; the; u.; s.,""; said; tom; murtha,; a; stock; analyst; at; the; tokyo; office; of; broker; &; lt; ;; james; capel; and; co; >.; in; taiwan,; businessmen; and; officials; are; also; worried.; ""; we; are; aware; of; the; seriousness; of; the; u.; s.; threat; against; japan; because; it; serves; as; a; warning; to; us,""; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named.; taiwan; had; a; trade; trade; surplus; of; 15.6; billion; dlrs; last; year,; 95; pct; of; it; with; the; u.; s.; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs,; among; the; world's; largest.; ""; we; must; quickly; open; our; markets,; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u.; s.; products,; if; we; want; to; defuse; problems; from; possible; u.; s.; retaliation,""; said; paul; sheen,; chairman; of; textile; exporters; &; lt; ;; taiwan; safe; group; >.; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u.; s.; and; japan; might; also; lead; to; pressure; on; south; korea,; whose; chief; exports; are; similar; to; those; of; japan.; last; year; south; korea; had; a; trade; surplus; of; 7.1; billion; dlrs; with; the; u.; s.,; up; from; 4.9; billion; dlrs; in; 1985; .; in; malaysia,; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; -; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u.; s.; in; hong; kong,; where; newspapers; have; alleged; japan; has; been; selling; below; -; cost; semiconductors,; some; electronics; manufacturers; share; that; view.; but; other; businessmen; said; such; a; short; -; term; commercial; advantage; would; be; outweighed; by; further; u.; s.; pressure; to; block; imports.; ""; that; is; a; very; short; -; term; view,""; said; lawrence; mills,; director; -; general; of; the; federation; of; hong; kong; industry.; ""; if; the; whole; purpose; is; to; prevent; imports,; one; day; it; will; be; extended; to; other; sources.; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade,""; he; said.; the; u.; s.; last; year; was; hong; kong's; biggest; export; market,; accounting; for; over; 30; pct; of; domestically; produced; exports.; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u.; s.; and; japan; with; interest; and; concern,; industry; minister; john; button; said; in; canberra; last; friday.; ""; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter,""; button; said.; he; said; australia's; concerns; centred; on; coal; and; beef,; australia's; two; largest; exports; to; japan; and; also; significant; u.; s.; exports; to; that; country.; meanwhile; u.; s.-; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; -; off; continue.; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy.; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year.; they; also; call; for; stepped; -; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program.; deputy; u.; s.; trade; representative; michael; smith; and; makoto; kuroda,; japan's; deputy; minister; of; international; trade; and; industry; (miti),; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute.",asian; exporters; fear; damage; from; u; s; japan; rift; mounting; trade; friction; between; the; u; s; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; reaching; economic; damage; businessmen; and; officials; said; they; told; reuter; correspondents; in; asian; capitals; a; u; s; move; against; japan; might; boost; protectionist; sentiment; in; the; u; s; and; lead; to; curbs; on; american; imports; of; their; products; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; run; in; the; short; term; tokyo's; loss; might; be; their; gain; the; u; s; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes; we; wouldn't; be; able; to; do; business; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; lt; mc; t; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; of; goods; subject; to; tariffs; to; the; u; s; said; tom; murtha; a; stock; analyst; at; the; tokyo; office; of; broker; lt; james; capel; and; co; in; taiwan; businessmen; and; officials; are; also; worried; we; are; aware; of; the; seriousness; of; the; u; s; threat; against; japan; because; it; serves; as; a; warning; to; us; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named; taiwan; had; a; trade; trade; surplus; of; 15; 6; billion; dlrs; last; year; 95; pct; of; it; with; the; u; s; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs; among; the; world's; largest; we; must; quickly; open; our; markets; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u; s; products; if; we; want; to; defuse; problems; from; possible; u; s; retaliation; said; paul; sheen; chairman; of; textile; exporters; lt; taiwan; safe; group; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u; s; and; japan; might; also; lead; to; pressure; on; south; korea; whose; chief; exports; are; similar; to; those; of; japan; last; year; south; korea; had; a; trade; surplus; of; 7; 1; billion; dlrs; with; the; u; s; up; from; 4; 9; billion; dlrs; in; 1985; in; malaysia; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u; s; in; hong; kong; where; newspapers; have; alleged; japan; has; been; selling; below; cost; semiconductors; some; electronics; manufacturers; share; that; view; but; other; businessmen; said; such; a; short; term; commercial; advantage; would; be; outweighed; by; further; u; s; pressure; to; block; imports; that; is; a; very; short; term; view; said; lawrence; mills; director; general; of; the; federation; of; hong; kong; industry; if; the; whole; purpose; is; to; prevent; imports; one; day; it; will; be; extended; to; other; sources; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade; he; said; the; u; s; last; year; was; hong; kong's; biggest; export; market; accounting; for; over; 30; pct; of; domestically; produced; exports; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u; s; and; japan; with; interest; and; concern; industry; minister; john; button; said; in; canberra; last; friday; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter; button; said; he; said; australia's; concerns; centred; on; coal; and; beef; australia's; two; largest; exports; to; japan; and; also; significant; u; s; exports; to; that; country; meanwhile; u; s; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; off; continue; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year; they; also; call; for; stepped; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program; deputy; u; s; trade; representative; michael; smith; and; makoto; kuroda; japan's; deputy; minister; of; international; trade; and; industry; miti; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute


### Replace Numbers

Replace numbers by number tokens. Supports the following options for `numbers`:
- `skip`: 
Either replace each single digit by a token or replace the whole number (possibly including `.` and `,`) by a single token.

In [11]:
_re_numbers_decimal  = re.compile("[0-9]")
_re_numbers_complete = re.compile("([0-9][0-9\.,]*)|([0-9\.,]*[0-9])")
_replace_numbers_re_selector = [None,
    _re_numbers_decimal,
    _re_numbers_complete,
    _re_numbers_complete]
_replace_numbers_sub_selector = [None,
    tokenizer.common.number_token,
    tokenizer.common.number_token,
    '']

@add_method(DefaultTokenizer)
def replace_numbers_token(self, token):
    token, count = self._replace_numbers_re.subn(self._replace_numbers_sub, token)
    if count > 0:
        self.fixed_tokens.append(token)
        return None
    return token

@add_method(DefaultTokenizer)
def replace_numbers(self):
    if self.numbers_idx > 0:
        self._replace_numbers_re= _replace_numbers_re_selector[self.numbers_idx]
        self._replace_numbers_sub = _replace_numbers_sub_selector[self.numbers_idx]
        iterate_tokens(self.tokens, self.replace_numbers_token)
        
if RUN_SCRIPT: 
    run_and_compare(default_tokenizer, default_tokenizer.replace_numbers, 'tokens', 'fixed_tokens')

0,1
Before,After
asian; exporters; fear; damage; from; u; s; japan; rift; mounting; trade; friction; between; the; u; s; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; reaching; economic; damage; businessmen; and; officials; said; they; told; reuter; correspondents; in; asian; capitals; a; u; s; move; against; japan; might; boost; protectionist; sentiment; in; the; u; s; and; lead; to; curbs; on; american; imports; of; their; products; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; run; in; the; short; term; tokyo's; loss; might; be; their; gain; the; u; s; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes; we; wouldn't; be; able; to; do; business; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; lt; mc; t; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; of; goods; subject; to; tariffs; to; the; u; s; said; tom; murtha; a; stock; analyst; at; the; tokyo; office; of; broker; lt; james; capel; and; co; in; taiwan; businessmen; and; officials; are; also; worried; we; are; aware; of; the; seriousness; of; the; u; s; threat; against; japan; because; it; serves; as; a; warning; to; us; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named; taiwan; had; a; trade; trade; surplus; of; 15; 6; billion; dlrs; last; year; 95; pct; of; it; with; the; u; s; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs; among; the; world's; largest; we; must; quickly; open; our; markets; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u; s; products; if; we; want; to; defuse; problems; from; possible; u; s; retaliation; said; paul; sheen; chairman; of; textile; exporters; lt; taiwan; safe; group; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u; s; and; japan; might; also; lead; to; pressure; on; south; korea; whose; chief; exports; are; similar; to; those; of; japan; last; year; south; korea; had; a; trade; surplus; of; 7; 1; billion; dlrs; with; the; u; s; up; from; 4; 9; billion; dlrs; in; 1985; in; malaysia; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u; s; in; hong; kong; where; newspapers; have; alleged; japan; has; been; selling; below; cost; semiconductors; some; electronics; manufacturers; share; that; view; but; other; businessmen; said; such; a; short; term; commercial; advantage; would; be; outweighed; by; further; u; s; pressure; to; block; imports; that; is; a; very; short; term; view; said; lawrence; mills; director; general; of; the; federation; of; hong; kong; industry; if; the; whole; purpose; is; to; prevent; imports; one; day; it; will; be; extended; to; other; sources; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade; he; said; the; u; s; last; year; was; hong; kong's; biggest; export; market; accounting; for; over; 30; pct; of; domestically; produced; exports; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u; s; and; japan; with; interest; and; concern; industry; minister; john; button; said; in; canberra; last; friday; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter; button; said; he; said; australia's; concerns; centred; on; coal; and; beef; australia's; two; largest; exports; to; japan; and; also; significant; u; s; exports; to; that; country; meanwhile; u; s; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; off; continue; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year; they; also; call; for; stepped; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program; deputy; u; s; trade; representative; michael; smith; and; makoto; kuroda; japan's; deputy; minister; of; international; trade; and; industry; miti; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute,


---
## Complete function
---

In [12]:
@add_method(DefaultTokenizer)
def tokenize(self, text, *args):
    self.init_tokenization(text)
    self.process_urls()
    self.process_ascii_emotes()
    self.process_unicode_emoticons()
    self.split_numbers()
    self.process_lowercase()
    self.remove_nonascii()
    self.remove_nonalnum()
    self.replace_numbers()
    self.tokens = self.tokens + self.fixed_tokens
    return self.tokens

## Test tokenizer

In [13]:
if RUN_SCRIPT:
    default_tokenizer = DefaultTokenizer(info)
    default_tokenizer.tokenize(runvars['document']['text'])
    show_comparison(default_tokenizer.text, default_tokenizer.tokens, 'Text', 'Tokens')

0,1
Text,Tokens
"ASIAN EXPORTERS FEAR DAMAGE FROM U. S.- JAPAN RIFT Mounting trade friction between the U. S. And Japan has raised fears among many of Asia's exporting nations that the row could inflict far - reaching economic damage, businessmen and officials said. They told Reuter correspondents in Asian capitals a U. S. Move against Japan might boost protectionist sentiment in the U. S. And lead to curbs on American imports of their products. But some exporters said that while the conflict would hurt them in the long - run, in the short - term Tokyo's loss might be their gain. The U. S. Has said it will impose 300 mln dlrs of tariffs on imports of Japanese electronics goods on April 17, in retaliation for Japan's alleged failure to stick to a pact not to sell semiconductors on world markets at below cost. Unofficial Japanese estimates put the impact of the tariffs at 10 billion dlrs and spokesmen for major electronics firms said they would virtually halt exports of products hit by the new taxes. "" We wouldn't be able to do business,"" said a spokesman for leading Japanese electronics firm Matsushita Electric Industrial Co Ltd & lt , MC. T >. "" If the tariffs remain in place for any length of time beyond a few months it will mean the complete erosion of exports (of goods subject to tariffs) to the U. S.,"" said Tom Murtha, a stock analyst at the Tokyo office of broker & lt , James Capel and Co >. In Taiwan, businessmen and officials are also worried. "" We are aware of the seriousness of the U. S. Threat against Japan because it serves as a warning to us,"" said a senior Taiwanese trade official who asked not to be named. Taiwan had a trade trade surplus of 15.6 billion dlrs last year, 95 pct of it with the U. S. The surplus helped swell Taiwan's foreign exchange reserves to 53 billion dlrs, among the world's largest. "" We must quickly open our markets, remove trade barriers and cut import tariffs to allow imports of U. S. Products, if we want to defuse problems from possible U. S. Retaliation,"" said Paul Sheen, chairman of textile exporters & lt , Taiwan Safe Group >. A senior official of South Korea's trade promotion association said the trade dispute between the U. S. And Japan might also lead to pressure on South Korea, whose chief exports are similar to those of Japan. Last year South Korea had a trade surplus of 7.1 billion dlrs with the U. S., Up from 4.9 billion dlrs in 1985. In Malaysia, trade officers and businessmen said tough curbs against Japan might allow hard - hit producers of semiconductors in third countries to expand their sales to the U. S. In Hong Kong, where newspapers have alleged Japan has been selling below - cost semiconductors, some electronics manufacturers share that view. But other businessmen said such a short - term commercial advantage would be outweighed by further U. S. Pressure to block imports. "" That is a very short - term view,"" said Lawrence Mills, director - general of the Federation of Hong Kong Industry. "" If the whole purpose is to prevent imports, one day it will be extended to other sources. Much more serious for Hong Kong is the disadvantage of action restraining trade,"" he said. The U. S. Last year was Hong Kong's biggest export market, accounting for over 30 pct of domestically produced exports. The Australian government is awaiting the outcome of trade talks between the U. S. And Japan with interest and concern, Industry Minister John Button said in Canberra last Friday. "" This kind of deterioration in trade relations between two countries which are major trading partners of ours is a very serious matter,"" Button said. He said Australia's concerns centred on coal and beef, Australia's two largest exports to Japan and also significant U. S. Exports to that country. Meanwhile U. S.- Japanese diplomatic manoeuvres to solve the trade stand - off continue. Japan's ruling Liberal Democratic Party yesterday outlined a package of economic measures to boost the Japanese economy. The measures proposed include a large supplementary budget and record public works spending in the first half of the financial year. They also call for stepped - up spending as an emergency measure to stimulate the economy despite Prime Minister Yasuhiro Nakasone's avowed fiscal reform program. Deputy U. S. Trade Representative Michael Smith and Makoto Kuroda, Japan's deputy minister of International Trade and Industry (MITI), are due to meet in Washington this week in an effort to end the dispute.",asian; exporters; fear; damage; from; u; s; japan; rift; mounting; trade; friction; between; the; u; s; and; japan; has; raised; fears; among; many; of; asia's; exporting; nations; that; the; row; could; inflict; far; reaching; economic; damage; businessmen; and; officials; said; they; told; reuter; correspondents; in; asian; capitals; a; u; s; move; against; japan; might; boost; protectionist; sentiment; in; the; u; s; and; lead; to; curbs; on; american; imports; of; their; products; but; some; exporters; said; that; while; the; conflict; would; hurt; them; in; the; long; run; in; the; short; term; tokyo's; loss; might; be; their; gain; the; u; s; has; said; it; will; impose; 300; mln; dlrs; of; tariffs; on; imports; of; japanese; electronics; goods; on; april; 17; in; retaliation; for; japan's; alleged; failure; to; stick; to; a; pact; not; to; sell; semiconductors; on; world; markets; at; below; cost; unofficial; japanese; estimates; put; the; impact; of; the; tariffs; at; 10; billion; dlrs; and; spokesmen; for; major; electronics; firms; said; they; would; virtually; halt; exports; of; products; hit; by; the; new; taxes; we; wouldn't; be; able; to; do; business; said; a; spokesman; for; leading; japanese; electronics; firm; matsushita; electric; industrial; co; ltd; lt; mc; t; if; the; tariffs; remain; in; place; for; any; length; of; time; beyond; a; few; months; it; will; mean; the; complete; erosion; of; exports; of; goods; subject; to; tariffs; to; the; u; s; said; tom; murtha; a; stock; analyst; at; the; tokyo; office; of; broker; lt; james; capel; and; co; in; taiwan; businessmen; and; officials; are; also; worried; we; are; aware; of; the; seriousness; of; the; u; s; threat; against; japan; because; it; serves; as; a; warning; to; us; said; a; senior; taiwanese; trade; official; who; asked; not; to; be; named; taiwan; had; a; trade; trade; surplus; of; 15; 6; billion; dlrs; last; year; 95; pct; of; it; with; the; u; s; the; surplus; helped; swell; taiwan's; foreign; exchange; reserves; to; 53; billion; dlrs; among; the; world's; largest; we; must; quickly; open; our; markets; remove; trade; barriers; and; cut; import; tariffs; to; allow; imports; of; u; s; products; if; we; want; to; defuse; problems; from; possible; u; s; retaliation; said; paul; sheen; chairman; of; textile; exporters; lt; taiwan; safe; group; a; senior; official; of; south; korea's; trade; promotion; association; said; the; trade; dispute; between; the; u; s; and; japan; might; also; lead; to; pressure; on; south; korea; whose; chief; exports; are; similar; to; those; of; japan; last; year; south; korea; had; a; trade; surplus; of; 7; 1; billion; dlrs; with; the; u; s; up; from; 4; 9; billion; dlrs; in; 1985; in; malaysia; trade; officers; and; businessmen; said; tough; curbs; against; japan; might; allow; hard; hit; producers; of; semiconductors; in; third; countries; to; expand; their; sales; to; the; u; s; in; hong; kong; where; newspapers; have; alleged; japan; has; been; selling; below; cost; semiconductors; some; electronics; manufacturers; share; that; view; but; other; businessmen; said; such; a; short; term; commercial; advantage; would; be; outweighed; by; further; u; s; pressure; to; block; imports; that; is; a; very; short; term; view; said; lawrence; mills; director; general; of; the; federation; of; hong; kong; industry; if; the; whole; purpose; is; to; prevent; imports; one; day; it; will; be; extended; to; other; sources; much; more; serious; for; hong; kong; is; the; disadvantage; of; action; restraining; trade; he; said; the; u; s; last; year; was; hong; kong's; biggest; export; market; accounting; for; over; 30; pct; of; domestically; produced; exports; the; australian; government; is; awaiting; the; outcome; of; trade; talks; between; the; u; s; and; japan; with; interest; and; concern; industry; minister; john; button; said; in; canberra; last; friday; this; kind; of; deterioration; in; trade; relations; between; two; countries; which; are; major; trading; partners; of; ours; is; a; very; serious; matter; button; said; he; said; australia's; concerns; centred; on; coal; and; beef; australia's; two; largest; exports; to; japan; and; also; significant; u; s; exports; to; that; country; meanwhile; u; s; japanese; diplomatic; manoeuvres; to; solve; the; trade; stand; off; continue; japan's; ruling; liberal; democratic; party; yesterday; outlined; a; package; of; economic; measures; to; boost; the; japanese; economy; the; measures; proposed; include; a; large; supplementary; budget; and; record; public; works; spending; in; the; first; half; of; the; financial; year; they; also; call; for; stepped; up; spending; as; an; emergency; measure; to; stimulate; the; economy; despite; prime; minister; yasuhiro; nakasone's; avowed; fiscal; reform; program; deputy; u; s; trade; representative; michael; smith; and; makoto; kuroda; japan's; deputy; minister; of; international; trade; and; industry; miti; are; due; to; meet; in; washington; this; week; in; an; effort; to; end; the; dispute
