A Python toolkit for reading a hand-drawn (or otherwise rendered) plot image, extracting its pixel coordinates, clustering and removing outliers, and performing symbolic regression to recover an analytic function that fits the drawn curve. It can also load and save pre-trained models, convert between GP expressions and LaTeX, and plot the results.
- Robust file lookup (
file_checker) - Greyscale image ingestion with automatic binarization and inversion (
image_reader) - Outlier removal via DBSCAN, picking the largest cluster
- Symbolic regression using
gplearnto evolve analytic expressions - Model persistence (save/load with
joblib) - GP expression ↔ LaTeX conversion via
sympy - LaTeX ↔ numeric function conversion and evaluation
- Publication-quality plotting (custom fonts, aspect ratio, annotations)
- Python 3.8+
- numpy
- Pillow
- matplotlib
- gplearn
- joblib
- scikit-learn
- sympy
- antlr4-python3-runtime==4.11.0
⚠️ This script requires exactlyantlr4-python3-runtime==4.11.0. Other versions are not supported.
- Clone this repository (or copy Image2Function.py into your project)
- Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # on Windows use `venv\Scripts\activate`
- Install dependencies
pip install numpy pillow matplotlib gplearn joblib scikit-learn sympy antlr4-python3-runtime==4.11.0
from Image2Function import image_fitter
image_fitter(
input_image_location='function_x.png',
model_path='function_x_model.joblib',
use_saved=True,
intd_expr='x',
x_domain=[0, 1],
y_range=[0, 1],
show_cluster_outliers=True,
show_clusters=False,
show_image=True
)
image_fitter(
'my_plot.png',
x_domain=[-5, 5],
y_range=[0, 100]
)
-
First run (no saved model): evolves and saves to
saved_model.joblib. -
Subsequent runs: set
use_saved=Trueto load the existing model instead of re-training.
x_pred, y_pred, gp_program = genetic_fitter(x_vals, y_vals)
latex_str = SymbRegg_to_latex(gp_program)
from Image2Function import Latex_to_function
x_vals, y_vals = Latex_to_function(r"\frac{\sin(x)}{x}", x_domain=[0, 10])
This section demonstrates the toolkit on two sample plots: a straight line (y = x) and a square-root curve (y = √x). In each case, the goal is to recover the underlying function from the plot image.
A hand-drawn plot of the linear function y = x is provided as the input. The image_fitter processes this image to extract its pixel coordinates and perform symbolic regression to fit a function.
The fitted result (red dashed line) overlaps closely with the original drawn line (blue points/line). The symbolic regression returns the formula x, which matches the intended linear function.
This example uses a plot of the square root function y = √x. The curved plot is fed into the image_fitter to test the algorithm’s ability to handle a non-linear shape.
The fitted curve (red dashed line) closely follows the drawn points for y = √x (blue). The symbolic regression yields the formula \sqrt{x-0.082}, which is very near the intended function.
Verifies that file_name exists (absolute path or project-wide search), filters by file_type (extension or list), and suggests close matches if not found.
-
Inputs
-
file_name– filename or full path (string) -
file_type– extension (e.g.'.png') or list of extensions
-
-
Returns
- Full path to the matched file (string)
-
Raises
FileNotFoundErrorwith suggestions or message
Loads a grayscale image, enforces squareness, auto-inverts dark backgrounds, thresholds, and extracts non-white pixel coordinates.
-
Inputs
input_image_location– path to image (string)
-
Returns
[x_vals, y_vals, width, height, img_for_plot]
image_fitter(input_image_location, x_domain=[0,1], y_range=[0,1], model_path=None, use_saved=True, intd_expr=None, show_clusters=False, show_cluster_outliers=False, show_image=True)
High-level pipeline combining image_reader, DBSCAN outlier filtering, genetic_fitter, LaTeX conversion, and plotting.
-
Key parameters
-
input_image_location– path to square image (string) -
x_domain,y_range– numeric scaling bounds (lists) -
model_path– where to save/load the GP model (string or None) -
use_saved– whether to load existing model (bool) -
intd_expr– LaTeX string for intended function (string or None) -
show_clusters,show_cluster_outliers,show_image– booleans
-
Runs gplearn.SymbolicRegressor on provided data or loads a saved model.
- Returns –
[x_pred, y_pred, gp_program]
Converts a gplearn program into a LaTeX string via sympy.
-
Inputs
gp_prog– agplearn._program._Programobject
-
Returns
- LaTeX representation (string)
Parses a LaTeX expression into a numeric function and evaluates on a linspace over x_domain.
- Returns –
[x_values, y_values]
-
proj_dirdetection:
The script determinesproj_dirby stripping the script path. It expects your main module to be namedImage2Function.py. -
ANTLR requirement:
Only version 4.11.0 is supported (pip install antlr4-python3-runtime==4.11.0).
-
Fork the repository
-
Create a feature branch (
git checkout -b feature/foo) -
Commit your changes (
git commit -am 'Add feature') -
Push to the branch (
git push origin feature/foo) -
Open a Pull Request
This project is released under the MIT License
Owen Root
Physics PhD Candidate, Quantum Information & Bio-Photonics
Email: owenbroughallroot@gmail.com
GitHub: github.com/yagoiroot



