# Table of contents

## Modules
- [crop](#crop)
- [detect_table_class](#detect_table_class)
- [detection](#detection)
- [fuzzydict]()
- [nutrient_list]()
- [process](#process)
- [regex]()
- [resize]()
- [spacial_map]()
- [symspell]()
- [text_detection_class]()
- [text_detection](#text_detection)

IMPORTANT Notes:
- apparently it is a convention to work in BGR color space

<a id='detection' />

# detection.py

Uses cv2

Imports: [detect_table_class](#detect_table_class), [crop](#crop), [text_detection](#text_detection), [process](#process), regex, nutrient_list, spacial_map

## Functions:

```python
load_model():
```
Creates object of class NutritionTableDetector


```python
detect(img_path, debug):
```
Debug param is for printing time taken along the function at different steps; writes cropped bounding rect to disk;

Reads the image. Gets results from NutritionTableDetector. Crops from the image the bounding rect with highest confidence.

Preprocesses the cropped image.

It passes it to networks which detect bounding boxes for text regions.

The bounding boxes are then used in ocr. Constructs a dict with bounding box, text detected, centre and text type (words, numbers or both).
Then it tries to concatenate boxes on the same line to have the words and their corresponding numbers in the same rectangle.



<a id='detect_table_class' />

# detect_table_class.py

Uses tensorflow 1.x and numpy.

```python
class NutritionTableDetector
```
- hardcoded path to inference graph

## Members:
- `self.detection_graph`
- `self.image_tensor` - input tensor
- `self.d_boxes` - detection_boxes; output tensor
- `self.d_scores` - detection_scores; output tensor
- `self.d_classes` - detection_classes; output tensor
- `self.num_d` - num_detections; output tensor
- `self.sess` - tensorflow session

## Methods:

```python
get_classification(self, img):
```
Given an image (with shape (height, width, 3)), it returns the tuple of boxes, scores, classes, number_detections 

Are results ordered by confidence ??

boxes is an array of shape (-1, -1, 4) ? The 4-tuples are (ymin, xmin, ymax, xmax) in \[0, 1\] float variables (this makes them independent of the original shape of the image).

<a id='crop' />

# crop.py

Uses cv2

## Functions

```python
crop(image_obj, coords, saved_location, extend_ratio=0, SAVE=False)
```
    """
    @param image_path: The image object to be cropped
    @param coords: A tuple of x/y coordinates (x1, y1, x2, y2)
    @param saved_location: Path to save the cropped image
    @param extend_ratio: The value by which the bounding boxes to be extended to accomodate the text that has been cut
    @param SAVE: whether to save the cropped image or not
    """
    
Computes extended coordinates (where is this useful?) and crops the image. If SAVE, then writes it to the saved_location. Returns cropped image.

<a id='process' />

# process.py



Uses cv2, PIL, pytesseract, numpy

## Functions

```python
preprocess_for_ocr(img, enhance=1)
```
    """
    @param img: image to which the pre-processing steps being applied
    """
If enhance > 1, then it enhances the contrast (with PIL) by factor enhance.
Commented gaussing blurr.
Returns the image.

```python
ocr(img, oem=1, psm=3)
```
    """
    @param img: The image to be OCR'd
    @param oem: for specifying the type of Tesseract engine( default=1 for LSTM OCR Engine)
    """

<a id='text_detection' />

# text_detection.py

Uses cv2, numpy

Imports [text_detection_class](#text_detection_class) and things from lib

## Functions

```python
load_text_model()
```
Creates object of class NutritionTextDetector.

```python
resize_im(im, scale, max_scale=None)
```
Resizes an image (enlarges it?) but keeps aspect ratio

```python
text_detection(img)
```
Resizes the image to have min 600 width or height and max 1200 width or height.\
Converts the image to an array suited as input to a network - shape (1, max_height, max_width, 3) which is suited for working with multiple images. This array is 'blob' - a dict. (It scales the image so that the minimum side has at least 600 pixels and the maximum side has at most 1000 - i guess it is connected to the network I/O)\
'blob' dict with keys 'data' and 'rois'.\
After all of this preprocessing, it is passed to the text detection (region proposal) network which returns 'cls_prob' and 'box_pred'.\
From what I can tell, box_pred contains predictions of bounding boxes where text might be (although the shape is a bit weird - don't know why it's a matrix, but explains the last axis).\
And cls_prob contains the confidence scores for those bounding boxes separated in foreground and background probabilities (same shape matrix, but explains last axis).\
These outputs are passed to a function which processes them to bounding boxes in image space.\
After that they are passed to an TextDetector class (in lib.text_connector) which uses graphs to connect all the boxes and detect text lines. It is set by default to detect horizontal text (TO BE TESTED). It returns text boxes in an array of shape (N, 9) where the 9 values for each box are the corners (top-left, top-right, bottom-left, bottom-right) and the score. (Implementation detail: the boxes are also filtered for relevance and shape).
Finally, the boxes are converted back to (xmin, ymin, xmax, ymax) compact form and the score is dropped.

<a id='text_detection_class' />

# text_detection_class.py

Uses cv2, numpy and tensorflow

```python
class NutritionTextDetection:
```

This is a Region Proposal network

## Members
- `self.detection_graph`
- `self.sess` - tensorflow session
- `self.input_img` - 'Placeholder' tensor
- `self.output_cls_prob` - 'Reshape_2' tensor; output shape: (1, H, W, 2xA)
- `self.output_box_pred` - 'rpn_bbox_pred/Reshape_1' tensor; output shape: (1, H, W, 4xA)

## Methods

```python
get_text_classification(self, blobs):
```

blobs is dict with 'data' key

Returns (cls_prob, box_pred)

Read about region proposal networks. They just give bounding boxes where text might be?