Skip to content

Latest commit

 

History

History
122 lines (78 loc) · 4.72 KB

text.rst

File metadata and controls

122 lines (78 loc) · 4.72 KB

Text Guide

Module Overview

This module aims to support the consistent extraction of key features in O&M data:

  • timestamp information
  • characteristic categorical information
  • a concise synopsis of the issue for context

Implemented functions include those for filling in data gaps (text.preprocess submodule), machine learning analyses to fill in gaps in categorical information and to generate concise summary strings (text.classify submodule), functions to prepare data for natural language processing (text.nlp_utils submodule), and a visualization suite (text.visualize submodule).

An example implementation of all capabilities can be found in text_class_example.py for specifics, and tutorial_textmodule.ipynb for basics.

Text pre-processing

:py:mod:`~pvops.text.preprocess`

These functions process the O&M data into concise, machine learning-ready documents. Additionally, there are options to extract dates from the text.

  • :py:func:`~pvops.text.preprocess.preprocessor` acts as a wrapper function, utilizing the other preprocessing functions, which prepares the data for machine learning.
    • See text_class_example.prep_data_for_ML for an example.
  • :py:func:`~pvops.text.preprocess.preprocessor` should be used with the keyword argument extract_dates_only = True if the primary interest is date extraction instead continuing to use the data for machine learning.
    • See text_class_example.extract_dates module for an example.

Text classification

:py:mod:`~pvops.text.classify`

These functions process the O&M data to make an inference on the specified event descriptor.

Utils

:py:mod:`~pvops.text.utils`

These helper functions focus on performing exploratory or secondary processing activities for the O&M data.

NLP Utils

:py:mod:`~pvops.text.utils`

These helper functions focus on processing in preparation for NLP activities.

Visualizations

These functions create visualizations to get a better understanding about your documents.