ScriptSage is a tool designed to parse and evaluate movie scripts. It provides functionalities to scrape screenplay data, structure it into a JSON format, and visualize character interactions and dialogue distributions.
- Screenplay Scraping: Extract screenplay content from the web.
- Data Structuring: Convert screenplay content into structured JSON format.
- Visualization: Generate visualizations for dialogue distribution and character interactions.
Ensure you have Python 3.11 installed. You can install the required dependencies using Poetry:
poetry install
ScriptSage provides a command line interface to scrape, parse, and visualize movie scripts.
To use the CLI, run the following command:
python scriptsage_cli.py <script_url> [--metric]
The --metric
flag is optional. When included, it will display additional metrics about the screenplay.
For example, to scrape the screenplay of "Reservoir Dogs" and save it as a structured JSON file:
python scriptsage_cli.py https://imsdb.com/scripts/Reservoir-Dogs.html
To scrape the screenplay and display additional metrics:
python scriptsage_cli.py https://imsdb.com/scripts/Reservoir-Dogs.html --metric
This will perform the following actions:
- Scrape the screenplay content from the provided URL.
- Parse the screenplay content to extract scenes, characters, and dialogue interactions.
- Save the structured data as a JSON file in
~/.scriptsage/screenplays/
. - Generate and save visualizations for dialogue distribution and character interactions in
~/.scriptsage/viz/
. - If the
--metric
flag is used, display additional metrics such as total word count, scene count, character count, and top words used in the screenplay.
To scrape the screenplay of "Reservoir Dogs" and save it as a structured JSON file:
startLine: 1
endLine: 14
To parse the screenplay content and save it to a JSON file:
startLine: 76
endLine: 83
To generate visualizations for dialogue distribution and character interactions:
startLine: 1
endLine: 52
- scriptsage/helpers/scraper.py: Contains the code to scrape screenplay content from the web.
- scriptsage/helpers/parse-dialogues.py: Contains the code to parse the screenplay content and save it as a structured JSON file.
- scriptsage/helpers/generate-viz.py: Contains the code to generate visualizations for dialogue distribution and character interactions.
- scriptsage/helpers/Reservoir-Dogs-structured.json: Example of a structured JSON file generated from the screenplay.
- scriptsage/helpers/Reservoir-Dogs.html: Example of the raw HTML content of the screenplay.
The project uses the following dependencies:
- requests: For making HTTP requests to fetch screenplay content.
- beautifulsoup4: For parsing HTML content.
- pandas: For data manipulation and analysis.
- matplotlib: For creating visualizations.
- seaborn: For creating statistical visualizations.
- numpy: For numerical operations.
- json: For handling JSON data.
- poetry: For dependency management and packaging.
This project is licensed under the MIT License.