# Excel Extractor

ETK's Excel Extractor is a cell-based extractor for extracting data from compatible spreadsheets.

## Souce spreadsheet

The example spreadsheet file named `alabama.xml` and it has a sheet named `16tbl08al`, in which row 1 to row 5 and row 60 to row 62 are metadata, 6A to M59 is a table (which has row and colume headers). For this example, I'm going to extract data from C7 to M33 (see the picture attached below).

![screenshot.png](screenshot.png)

## Define where and how to extract data

Excel Extractor will scan cell-by-cell within a region that you specified and fill variables that you defined. In this particular example, I want to extract value of all cells in region (C7, M33) and I defined a variable called `value`. Its value will be extracted from a cell located at `$col, $row` where `$col` and `$row` means current row id and column id that the scanner is at. The return here is a list object which has one variable defined named `value`.

In [1]:
import pprint
from etk.extractors.excel_extractor import ExcelExtractor
ee = ExcelExtractor()
variables = {
    'value': '$col,$row'
}
raw_extractions = ee.extract('alabama.xls', '16tbl08al', ['C,7', 'M,33'], variables)
pprint.pprint(raw_extractions)

[{'value': 73},
 {'value': 1},
 {'value': 12},
 {'value': ''},
 {'value': 8},
 {'value': 52},
 {'value': 429},
 {'value': 146},
 {'value': 233},
 {'value': 50},
 {'value': 127},
 {'value': 1},
 {'value': 5},
 {'value': ''},
 {'value': 23},
 {'value': 98},
 {'value': 613},
 {'value': 229},
 {'value': 342},
 {'value': 42},
 {'value': 0},
 {'value': 0},
 {'value': 0},
 {'value': ''},
 {'value': 0},
 {'value': 0},
 {'value': 37},
 {'value': 20},
 {'value': 14},
 {'value': 3},
 {'value': 394},
 {'value': 1},
 {'value': 17},
 {'value': ''},
 {'value': 9},
 {'value': 367},
 {'value': 867},
 {'value': 261},
 {'value': 501},
 {'value': 105},
 {'value': 23},
 {'value': 0},
 {'value': 7},
 {'value': ''},
 {'value': 5},
 {'value': 11},
 {'value': 319},
 {'value': 137},
 {'value': 181},
 {'value': 1},
 {'value': 151},
 {'value': 0},
 {'value': 10},
 {'value': ''},
 {'value': 3},
 {'value': 138},
 {'value': 592},
 {'value': 247},
 {'value': 295},
 {'value': 50},
 {'value': 88},
 {'value': 4},
 {'val

Excel Extractor allows you to define multiple variables. This is useful if you want to extract the data from other cells which are associated with current cell. In this example, I also need colume header (category) and country name of every cell in the region. It supports fixed coordinate like `($B,$1)` (which means the cell at colume B row 1) or using `+` and `-` to caculate relative coordinate like `($B-1,$row+1)` (which means the cell colume A and row id is current row+1).

In [2]:
variables = {
    'value': '$col,$row',
    'country': '$B,$row',
    'category': '$col,$6'
}
raw_extractions = ee.extract('alabama.xls', '16tbl08al', ['C,7', 'M,33'], variables)
pprint.pprint(raw_extractions)

[{'category': 'Violent\ncrime', 'country': 'Autauga', 'value': 73},
 {'category': 'Violent\ncrime', 'country': 'Autauga', 'value': 73},
 {'category': 'Violent\ncrime', 'country': 'Autauga', 'value': 73},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'value': 1},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'value': 1},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'value': 1},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Autauga',
  'value': 12},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Autauga',
  'value': 12},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Autauga',
  'value': 12},
 {'category': 'Rape\n(legacy\ndefinition)2', 'country': 'Autauga', 'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2', 'country': 'Autauga', 'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2', 'country': 'Autauga', 'value': ''},
 {'categor

  'country': 'Montgomery',
  'value': 4},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'value': ''},
 {'category': 'Robbery', 'country': 'Montgomery', 'value': 8},
 {'category': 'Robbery', 'country': 'Montgomery', 'value': 8},
 {'category': 'Robbery', 'country': 'Montgomery', 'value': 8},
 {'category': 'Aggravated\nassault', 'country': 'Montgomery', 'value': 104},
 {'category': 'Aggravated\nassault', 'country': 'Montgomery', 'value': 104},
 {'category': 'Aggravated\nassault', 'country': 'Montgomery', 'value': 104},
 {'category': 'Property\ncrime', 'country': 'Montgomery', 'value': 360},
 {'category': 'Property\ncrime', 'country': 'Montgomery', 'value': 360},
 {'category': 'Property\ncrime', 'country': 'Montgomery', 'value': 360},
 {'category': 'Burglary', 'country': 'Montgomery', 'val

Besides the coordinate, the value of variables can also be a builtin variable (it only supports `$row` and `$col` right now). This can be used for getting provenance of extractions. Both row and colume id here are presented in numeric form.

In [3]:
variables = {
    'value': '$col,$row',
    'country': '$B,$row',
    'category': '$col,$6',
    'from_row': '$row',
    'from_col': '$col'
}
raw_extractions = ee.extract('alabama.xls', '16tbl08al', ['C,7', 'M,33'], variables)
pprint.pprint(raw_extractions)

[{'category': 'Violent\ncrime',
  'country': 'Autauga',
  'from_col': 2,
  'from_row': 6,
  'value': 73},
 {'category': 'Violent\ncrime',
  'country': 'Autauga',
  'from_col': 2,
  'from_row': 6,
  'value': 73},
 {'category': 'Violent\ncrime',
  'country': 'Autauga',
  'from_col': 2,
  'from_row': 6,
  'value': 73},
 {'category': 'Violent\ncrime',
  'country': 'Autauga',
  'from_col': 2,
  'from_row': 6,
  'value': 73},
 {'category': 'Violent\ncrime',
  'country': 'Autauga',
  'from_col': 2,
  'from_row': 6,
  'value': 73},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'from_col': 3,
  'from_row': 6,
  'value': 1},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'from_col': 3,
  'from_row': 6,
  'value': 1},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  'from_col': 3,
  'from_row': 6,
  'value': 1},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Autauga',
  '

  'value': 0},
 {'category': 'Aggravated\nassault',
  'country': 'Bibb',
  'from_col': 7,
  'from_row': 8,
  'value': 0},
 {'category': 'Property\ncrime',
  'country': 'Bibb',
  'from_col': 8,
  'from_row': 8,
  'value': 37},
 {'category': 'Property\ncrime',
  'country': 'Bibb',
  'from_col': 8,
  'from_row': 8,
  'value': 37},
 {'category': 'Property\ncrime',
  'country': 'Bibb',
  'from_col': 8,
  'from_row': 8,
  'value': 37},
 {'category': 'Property\ncrime',
  'country': 'Bibb',
  'from_col': 8,
  'from_row': 8,
  'value': 37},
 {'category': 'Property\ncrime',
  'country': 'Bibb',
  'from_col': 8,
  'from_row': 8,
  'value': 37},
 {'category': 'Burglary',
  'country': 'Bibb',
  'from_col': 9,
  'from_row': 8,
  'value': 20},
 {'category': 'Burglary',
  'country': 'Bibb',
  'from_col': 9,
  'from_row': 8,
  'value': 20},
 {'category': 'Burglary',
  'country': 'Bibb',
  'from_col': 9,
  'from_row': 8,
  'value': 20},
 {'category': 'Burglary',
  'country': 'Bibb',
  'from_col': 9,
  '

  'value': 0},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Calhoun',
  'from_col': 4,
  'from_row': 10,
  'value': 7},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Calhoun',
  'from_col': 4,
  'from_row': 10,
  'value': 7},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Calhoun',
  'from_col': 4,
  'from_row': 10,
  'value': 7},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Calhoun',
  'from_col': 4,
  'from_row': 10,
  'value': 7},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Calhoun',
  'from_col': 4,
  'from_row': 10,
  'value': 7},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Calhoun',
  'from_col': 5,
  'from_row': 10,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Calhoun',
  'from_col': 5,
  'from_row': 10,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Calhoun',
  'from_col': 5,
  'from_row': 10,
  'value': ''},
 {'category': 'Rape\n(leg

  'from_col': 8,
  'from_row': 11,
  'value': 592},
 {'category': 'Property\ncrime',
  'country': 'Chilton',
  'from_col': 8,
  'from_row': 11,
  'value': 592},
 {'category': 'Burglary',
  'country': 'Chilton',
  'from_col': 9,
  'from_row': 11,
  'value': 247},
 {'category': 'Burglary',
  'country': 'Chilton',
  'from_col': 9,
  'from_row': 11,
  'value': 247},
 {'category': 'Burglary',
  'country': 'Chilton',
  'from_col': 9,
  'from_row': 11,
  'value': 247},
 {'category': 'Burglary',
  'country': 'Chilton',
  'from_col': 9,
  'from_row': 11,
  'value': 247},
 {'category': 'Burglary',
  'country': 'Chilton',
  'from_col': 9,
  'from_row': 11,
  'value': 247},
 {'category': 'Larceny-\ntheft',
  'country': 'Chilton',
  'from_col': 10,
  'from_row': 11,
  'value': 295},
 {'category': 'Larceny-\ntheft',
  'country': 'Chilton',
  'from_col': 10,
  'from_row': 11,
  'value': 295},
 {'category': 'Larceny-\ntheft',
  'country': 'Chilton',
  'from_col': 10,
  'from_row': 11,
  'value': 295},

  'country': 'Etowah',
  'from_col': 6,
  'from_row': 13,
  'value': 7},
 {'category': 'Aggravated\nassault',
  'country': 'Etowah',
  'from_col': 7,
  'from_row': 13,
  'value': 107},
 {'category': 'Aggravated\nassault',
  'country': 'Etowah',
  'from_col': 7,
  'from_row': 13,
  'value': 107},
 {'category': 'Aggravated\nassault',
  'country': 'Etowah',
  'from_col': 7,
  'from_row': 13,
  'value': 107},
 {'category': 'Aggravated\nassault',
  'country': 'Etowah',
  'from_col': 7,
  'from_row': 13,
  'value': 107},
 {'category': 'Aggravated\nassault',
  'country': 'Etowah',
  'from_col': 7,
  'from_row': 13,
  'value': 107},
 {'category': 'Property\ncrime',
  'country': 'Etowah',
  'from_col': 8,
  'from_row': 13,
  'value': 480},
 {'category': 'Property\ncrime',
  'country': 'Etowah',
  'from_col': 8,
  'from_row': 13,
  'value': 480},
 {'category': 'Property\ncrime',
  'country': 'Etowah',
  'from_col': 8,
  'from_row': 13,
  'value': 480},
 {'category': 'Property\ncrime',
  'country

 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Hale',
  'from_col': 5,
  'from_row': 15,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Hale',
  'from_col': 5,
  'from_row': 15,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Hale',
  'from_col': 5,
  'from_row': 15,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Hale',
  'from_col': 5,
  'from_row': 15,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Hale',
  'from_col': 5,
  'from_row': 15,
  'value': ''},
 {'category': 'Robbery',
  'country': 'Hale',
  'from_col': 6,
  'from_row': 15,
  'value': 2},
 {'category': 'Robbery',
  'country': 'Hale',
  'from_col': 6,
  'from_row': 15,
  'value': 2},
 {'category': 'Robbery',
  'country': 'Hale',
  'from_col': 6,
  'from_row': 15,
  'value': 2},
 {'category': 'Robbery',
  'country': 'Hale',
  'from_col': 6,
  'from_row': 15,
  'value': 2},
 {'category': 'Robbery',
  'cou

 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Houston',
  'from_col': 3,
  'from_row': 17,
  'value': 4},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Houston',
  'from_col': 3,
  'from_row': 17,
  'value': 4},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Houston',
  'from_col': 3,
  'from_row': 17,
  'value': 4},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Houston',
  'from_col': 3,
  'from_row': 17,
  'value': 4},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Houston',
  'from_col': 4,
  'from_row': 17,
  'value': 18},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Houston',
  'from_col': 4,
  'from_row': 17,
  'value': 18},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Houston',
  'from_col': 4,
  'from_row': 17,
  'value': 18},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Houston',
  'from_col': 4,
  'from_row': 17,
  'value': 1

  'value': 151},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Jefferson',
  'from_col': 11,
  'from_row': 18,
  'value': 151},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Jefferson',
  'from_col': 11,
  'from_row': 18,
  'value': 151},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Jefferson',
  'from_col': 11,
  'from_row': 18,
  'value': 151},
 {'category': 'Violent\ncrime',
  'country': 'Lauderdale',
  'from_col': 2,
  'from_row': 19,
  'value': 30},
 {'category': 'Violent\ncrime',
  'country': 'Lauderdale',
  'from_col': 2,
  'from_row': 19,
  'value': 30},
 {'category': 'Violent\ncrime',
  'country': 'Lauderdale',
  'from_col': 2,
  'from_row': 19,
  'value': 30},
 {'category': 'Violent\ncrime',
  'country': 'Lauderdale',
  'from_col': 2,
  'from_row': 19,
  'value': 30},
 {'category': 'Violent\ncrime',
  'country': 'Lauderdale',
  'from_col': 2,
  'from_row': 19,
  'value': 30},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Lauderdal

  'from_row': 20,
  'value': 105},
 {'category': 'Burglary',
  'country': 'Lawrence',
  'from_col': 9,
  'from_row': 20,
  'value': 105},
 {'category': 'Burglary',
  'country': 'Lawrence',
  'from_col': 9,
  'from_row': 20,
  'value': 105},
 {'category': 'Larceny-\ntheft',
  'country': 'Lawrence',
  'from_col': 10,
  'from_row': 20,
  'value': 189},
 {'category': 'Larceny-\ntheft',
  'country': 'Lawrence',
  'from_col': 10,
  'from_row': 20,
  'value': 189},
 {'category': 'Larceny-\ntheft',
  'country': 'Lawrence',
  'from_col': 10,
  'from_row': 20,
  'value': 189},
 {'category': 'Larceny-\ntheft',
  'country': 'Lawrence',
  'from_col': 10,
  'from_row': 20,
  'value': 189},
 {'category': 'Larceny-\ntheft',
  'country': 'Lawrence',
  'from_col': 10,
  'from_row': 20,
  'value': 189},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Lawrence',
  'from_col': 11,
  'from_row': 20,
  'value': 32},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Lawrence',
  'from_col': 11,
  'fro

  'from_row': 22,
  'value': 82},
 {'category': 'Aggravated\nassault',
  'country': 'Limestone',
  'from_col': 7,
  'from_row': 22,
  'value': 82},
 {'category': 'Property\ncrime',
  'country': 'Limestone',
  'from_col': 8,
  'from_row': 22,
  'value': 829},
 {'category': 'Property\ncrime',
  'country': 'Limestone',
  'from_col': 8,
  'from_row': 22,
  'value': 829},
 {'category': 'Property\ncrime',
  'country': 'Limestone',
  'from_col': 8,
  'from_row': 22,
  'value': 829},
 {'category': 'Property\ncrime',
  'country': 'Limestone',
  'from_col': 8,
  'from_row': 22,
  'value': 829},
 {'category': 'Property\ncrime',
  'country': 'Limestone',
  'from_col': 8,
  'from_row': 22,
  'value': 829},
 {'category': 'Burglary',
  'country': 'Limestone',
  'from_col': 9,
  'from_row': 22,
  'value': 202},
 {'category': 'Burglary',
  'country': 'Limestone',
  'from_col': 9,
  'from_row': 22,
  'value': 202},
 {'category': 'Burglary',
  'country': 'Limestone',
  'from_col': 9,
  'from_row': 22,
  

  'from_col': 5,
  'from_row': 24,
  'value': ''},
 {'category': 'Robbery',
  'country': 'Madison',
  'from_col': 6,
  'from_row': 24,
  'value': 39},
 {'category': 'Robbery',
  'country': 'Madison',
  'from_col': 6,
  'from_row': 24,
  'value': 39},
 {'category': 'Robbery',
  'country': 'Madison',
  'from_col': 6,
  'from_row': 24,
  'value': 39},
 {'category': 'Robbery',
  'country': 'Madison',
  'from_col': 6,
  'from_row': 24,
  'value': 39},
 {'category': 'Robbery',
  'country': 'Madison',
  'from_col': 6,
  'from_row': 24,
  'value': 39},
 {'category': 'Aggravated\nassault',
  'country': 'Madison',
  'from_col': 7,
  'from_row': 24,
  'value': 256},
 {'category': 'Aggravated\nassault',
  'country': 'Madison',
  'from_col': 7,
  'from_row': 24,
  'value': 256},
 {'category': 'Aggravated\nassault',
  'country': 'Madison',
  'from_col': 7,
  'from_row': 24,
  'value': 256},
 {'category': 'Aggravated\nassault',
  'country': 'Madison',
  'from_col': 7,
  'from_row': 24,
  'value': 256

  'country': 'Montgomery',
  'from_col': 4,
  'from_row': 26,
  'value': 4},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Montgomery',
  'from_col': 4,
  'from_row': 26,
  'value': 4},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Montgomery',
  'from_col': 4,
  'from_row': 26,
  'value': 4},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Montgomery',
  'from_col': 4,
  'from_row': 26,
  'value': 4},
 {'category': 'Rape\n(revised\ndefinition)1',
  'country': 'Montgomery',
  'from_col': 4,
  'from_row': 26,
  'value': 4},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'from_col': 5,
  'from_row': 26,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'from_col': 5,
  'from_row': 26,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Montgomery',
  'from_col': 5,
  'from_row': 26,
  'value': ''},
 {'category': 'Rape\n(legacy\ndefinition)2',
  'country': 'Mon

 {'category': 'Violent\ncrime',
  'country': 'Russell',
  'from_col': 2,
  'from_row': 28,
  'value': 66},
 {'category': 'Violent\ncrime',
  'country': 'Russell',
  'from_col': 2,
  'from_row': 28,
  'value': 66},
 {'category': 'Violent\ncrime',
  'country': 'Russell',
  'from_col': 2,
  'from_row': 28,
  'value': 66},
 {'category': 'Violent\ncrime',
  'country': 'Russell',
  'from_col': 2,
  'from_row': 28,
  'value': 66},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Russell',
  'from_col': 3,
  'from_row': 28,
  'value': 2},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Russell',
  'from_col': 3,
  'from_row': 28,
  'value': 2},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Russell',
  'from_col': 3,
  'from_row': 28,
  'value': 2},
 {'category': 'Murder and\nnonnegligent\nmanslaughter',
  'country': 'Russell',
  'from_col': 3,
  'from_row': 28,
  'value': 2},
 {'category': 'Murder and\nnonnegligent\nmanslaughte

 {'category': 'Larceny-\ntheft',
  'country': 'Shelby',
  'from_col': 10,
  'from_row': 29,
  'value': 537},
 {'category': 'Larceny-\ntheft',
  'country': 'Shelby',
  'from_col': 10,
  'from_row': 29,
  'value': 537},
 {'category': 'Larceny-\ntheft',
  'country': 'Shelby',
  'from_col': 10,
  'from_row': 29,
  'value': 537},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Shelby',
  'from_col': 11,
  'from_row': 29,
  'value': 98},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Shelby',
  'from_col': 11,
  'from_row': 29,
  'value': 98},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Shelby',
  'from_col': 11,
  'from_row': 29,
  'value': 98},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Shelby',
  'from_col': 11,
  'from_row': 29,
  'value': 98},
 {'category': 'Motor\nvehicle\ntheft',
  'country': 'Shelby',
  'from_col': 11,
  'from_row': 29,
  'value': 98},
 {'category': 'Violent\ncrime',
  'country': 'St. Clair',
  'from_col': 2,
  'from_row': 30,
  'value':

  'value': 1234},
 {'category': 'Property\ncrime',
  'country': 'Tuscaloosa',
  'from_col': 8,
  'from_row': 31,
  'value': 1234},
 {'category': 'Property\ncrime',
  'country': 'Tuscaloosa',
  'from_col': 8,
  'from_row': 31,
  'value': 1234},
 {'category': 'Burglary',
  'country': 'Tuscaloosa',
  'from_col': 9,
  'from_row': 31,
  'value': 387},
 {'category': 'Burglary',
  'country': 'Tuscaloosa',
  'from_col': 9,
  'from_row': 31,
  'value': 387},
 {'category': 'Burglary',
  'country': 'Tuscaloosa',
  'from_col': 9,
  'from_row': 31,
  'value': 387},
 {'category': 'Burglary',
  'country': 'Tuscaloosa',
  'from_col': 9,
  'from_row': 31,
  'value': 387},
 {'category': 'Burglary',
  'country': 'Tuscaloosa',
  'from_col': 9,
  'from_row': 31,
  'value': 387},
 {'category': 'Larceny-\ntheft',
  'country': 'Tuscaloosa',
  'from_col': 10,
  'from_row': 31,
  'value': 726},
 {'category': 'Larceny-\ntheft',
  'country': 'Tuscaloosa',
  'from_col': 10,
  'from_row': 31,
  'value': 726},
 {'ca

## Wrap it up in ETK module and post processing

The below example shows how to use this extractor in ETK module. The extractor's variable syntax only supports using a single built-in varaible or a coordinate. All the post processings need to be done after extraction.

In [4]:
import os, sys
import pprint
from etk.etk import ETK
from etk.etk_module import ETKModule
from etk.extractors.excel_extractor import ExcelExtractor
from etk.utilities import Utility


class ExampleETKModule(ETKModule):
    """
    Abstract class for extraction module
    """
    def __init__(self, etk):
        ETKModule.__init__(self, etk)
        self.ee = ExcelExtractor()

    def document_selector(self, doc):
        return 'file_path' in doc.cdr_document

    def process_document(self, doc):
        """
        Add your code for processing the document
        """

        variables = {
            'value': '$col,$row',
            'country': '$B,$row',
            'category': '$col,$6',
            'from_row': '$row',
            'from_col': '$col'
        }

        raw_extractions = self.ee.extract(doc.cdr_document['file_path'], '16tbl08al', ['C,7', 'M,33'], variables)

        extracted_docs = []
        for d in raw_extractions:
            # post processing
            d['category'] = d['category'].replace('\n', ' ').strip()
            d['country'] = d['country'].replace('\n', ' ').strip()
            d['from_row'] = int(d['from_row'])
            d['from_col'] = int(d['from_col'])
            
            # create sub document
            d['doc_id'] = Utility.create_doc_id_from_json(d)
            extracted_docs.append(etk.create_document(d))

        return extracted_docs


# if __name__ == "__main__":
etk = ETK(modules=ExampleETKModule)
doc = etk.create_document({'file_path': 'alabama.xls'})
docs = etk.process_ems(doc)

for d in docs[1:11]:
    print(d.value)

{'value': 73, 'country': 'Autauga', 'category': 'Violent crime', 'from_row': 6, 'from_col': 2, 'doc_id': '6ee778244362c177d0b365fe8ffcde3684e01fe956af73e923ea0d0c7fcad039'}
{'value': 73, 'country': 'Autauga', 'category': 'Violent crime', 'from_row': 6, 'from_col': 2, 'doc_id': '6ee778244362c177d0b365fe8ffcde3684e01fe956af73e923ea0d0c7fcad039'}
{'value': 73, 'country': 'Autauga', 'category': 'Violent crime', 'from_row': 6, 'from_col': 2, 'doc_id': '6ee778244362c177d0b365fe8ffcde3684e01fe956af73e923ea0d0c7fcad039'}
{'value': 73, 'country': 'Autauga', 'category': 'Violent crime', 'from_row': 6, 'from_col': 2, 'doc_id': '6ee778244362c177d0b365fe8ffcde3684e01fe956af73e923ea0d0c7fcad039'}
{'value': 73, 'country': 'Autauga', 'category': 'Violent crime', 'from_row': 6, 'from_col': 2, 'doc_id': '6ee778244362c177d0b365fe8ffcde3684e01fe956af73e923ea0d0c7fcad039'}
{'value': 1, 'country': 'Autauga', 'category': 'Murder and nonnegligent manslaughter', 'from_row': 6, 'from_col': 3, 'doc_id': '289627d