Skip to content

A Real-time Search Engine with Integrated Truth Scoring and Fiction Detection, Empowering Users to Evaluate the Reliability of Information Instantly.

License

Notifications You must be signed in to change notification settings

zade90/TrueGL

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search

Status Version Build
TrueGL Interface

TrueGL's intuitive search interface with integrated truth scoring

arXiv GitHub Hugging Face License: MIT

πŸ’‘ Overview

TrueGL is a comprehensive full-stack search engine designed to assess the truthfulness and reliability of textual information. Built on fine-tuned large language models (LLMs), TrueGL provides users with reliable truth scores for search results, helping combat misinformation and promote information literacy.


Model Stack Core Feature Evaluation
TrueGL's comprehensive workflow from data ingestion to truth assessment

TrueGL's comprehensive workflow from data ingestion to truth assessment


🎯 What's Included

This repository contains the complete implementation of TrueGL, including:

  • Fine-tuned Granite-3.1-1B model for truth assessment
  • Full-stack web application with React frontend and Python backend
  • Comprehensive evaluation frameworks (LLM-based and rule-based)
  • Data generation and processing pipelines
  • Training and inference infrastructure

πŸ“‹ Table of Contents

πŸš€ Features

Truth Assessment
Advanced LLM-based scoring
Scale: 0.0-1.0 reliability
Real-time Search
Fast, accurate search with
integrated truth scoring
Multi-modal Evaluation
Neural & rule-based
assessment methods
Source Verification
Transparent source tracking
& reliability metrics
User-friendly Interface
Modern React-based
web application
Performance Optimized
Efficient inference &
caching mechanisms

Core Capabilities

  • Truth Assessment: Advanced LLM-based scoring of statement reliability (0.0-1.0 scale)
  • Real-time Search: Fast, accurate search with integrated truth scoring
  • Multi-modal Evaluation: Both neural and rule-based assessment methods
  • Source Verification: Transparent source tracking and reliability metrics
  • User-friendly Interface: Modern React-based web application

Technical Highlights

  • Fine-tuned Model: Granite-3.1-1B optimized for truth assessment
  • Scalable Architecture: Modular design supporting multiple evaluation backends
  • Comprehensive Datasets: Combines FEVER, SciFact, and custom fake news datasets
  • Advanced Filtering: Multi-dimensional search with truth score filtering
  • Performance Optimized: Efficient inference and caching mechanisms

πŸ—οΈ Architecture

TrueGL Architecture
β”œβ”€β”€ Frontend (React + TypeScript)
β”‚   β”œβ”€β”€ Search Interface
β”‚   β”œβ”€β”€ Results Display
β”‚   └── Truth Score Visualization
β”œβ”€β”€ Backend (Python Flask/FastAPI)
β”‚   β”œβ”€β”€ Search API
β”‚   β”œβ”€β”€ Truth Assessment Service
β”‚   └── Database Management
β”œβ”€β”€ ML Pipeline
β”‚   β”œβ”€β”€ Fine-tuned Granite Model
β”‚   β”œβ”€β”€ Rule-based Scorer
β”‚   └── Ensemble Methods
└── Data Processing
    β”œβ”€β”€ Web Crawling
    β”œβ”€β”€ Data Cleaning
    └── Truth Labeling

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • CUDA-compatible GPU (recommended for training)
  • 16GB+ RAM

Environment Setup

  1. Clone the repository
git clone https://github.com/AlgazinovAleksandr/TrueGL.git
cd TrueGL
  1. Set up Python environment
# Create virtual environment
python -m venv truegl_env
source truegl_env/bin/activate  # On Windows: truegl_env\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt
  1. Set up Node.js environment
cd application
npm install
  1. Download pre-trained models
# Download fine-tuned Granite model
python scripts/download_models.py

⚑ Quick Start

Running the Application

  1. Start the Backend
cd application/search_api
python main.py
  1. Start the Frontend
cd application
npm run dev
  1. Access the Application Open your browser and navigate to http://localhost:5173

Basic Usage Example

from truegl import TruthAssessor

# Initialize the truth assessor
assessor = TruthAssessor(model_path="path/to/granite-model")

# Assess a statement
statement = "The Earth is approximately 4.5 billion years old."
score = assessor.assess(statement)
print(f"Truth Score: {score:.3f}")  # Output: Truth Score: 0.892

πŸ“ Project Structure

TrueGL/
β”œβ”€β”€ application/                     # Web application
β”‚   β”œβ”€β”€ src/                        # React frontend source
β”‚   β”‚   β”œβ”€β”€ components/             # UI components
β”‚   β”‚   β”œβ”€β”€ pages/                  # Page components
β”‚   β”‚   β”œβ”€β”€ services/               # API services
β”‚   β”‚   └── types/                  # TypeScript definitions
β”‚   β”œβ”€β”€ search_api/                 # Python backend
β”‚   β”‚   β”œβ”€β”€ main.py                # Main API server
β”‚   β”‚   β”œβ”€β”€ search_service.py      # Search functionality
β”‚   β”‚   └── functions/             # Additional API functions
β”‚   └── public/                     # Static assets
β”œβ”€β”€ Data-Generation/                 # Data processing scripts
β”‚   β”œβ”€β”€ Existing_Articles_Modification.py
β”‚   └── New_Articles_Generation.py
β”œβ”€β”€ evaluation/                      # Evaluation frameworks
β”‚   β”œβ”€β”€ LLM_evaluation/             # LLM-based evaluation
β”‚   β”‚   β”œβ”€β”€ EVALUATION_Prompt_1.ipynb
β”‚   β”‚   β”œβ”€β”€ EVALUATION_Prompt_2.ipynb
β”‚   β”‚   └── EVALUATION_Prompt_3.ipynb
β”‚   └── rule-based/                 # Rule-based evaluation
β”‚       └── rule-based.py
β”œβ”€β”€ Fine-Tuning Data/               # Training datasets
β”‚   β”œβ”€β”€ fake_news_data/            # Fake news datasets
β”‚   └── scientific_facts_data/     # Scientific fact datasets
β”œβ”€β”€ TrueGL_training_and_inference/  # Model training
β”‚   β”œβ”€β”€ finetune_granite_V2.py     # Training script
β”‚   └── TrueGL_Granite_model_inference.ipynb
β”œβ”€β”€ 2506.12072v2.pdf               # Research paper
└── README.md                       # This file

πŸ’» Usage

Web Interface

  1. Search for Information: Enter your query in the search box
  2. Review Results: Browse search results with integrated truth scores
  3. Filter by Reliability: Use truth score filters to find reliable sources
  4. Analyze Sources: View detailed source analysis and reliability metrics

Python API

# Basic truth assessment
from truegl import TruthAssessor

assessor = TruthAssessor()
score = assessor.assess("Climate change is caused by human activities.")
print(f"Truth Score: {score}")

# Batch processing
statements = [
    "Water boils at 100Β°C at sea level.",
    "The moon is made of green cheese.",
    "Python is a programming language."
]

scores = assessor.assess_batch(statements)
for stmt, score in zip(statements, scores):
    print(f"{stmt}: {score:.3f}")

Command Line Interface

# Assess a single statement
python -m truegl.cli assess "The sun is a star."

# Process a file of statements
python -m truegl.cli batch_assess statements.txt --output results.csv

# Start the web server
python -m truegl.cli serve --port 8000

🎯 Model Training

Fine-tuning the Granite Model

cd TrueGL_training_and_inference

# Prepare training data
python prepare_data.py --input_dir "../Fine-Tuning Data" --output_file "training_data.csv"

# Start fine-tuning
python finetune_granite_V2.py \
    --model_path "granite-3.1-1b-base" \
    --data_path "training_data.csv" \
    --output_dir "granite-truegl" \
    --epochs 5 \
    --batch_size 8

Training Configuration

Key parameters for model training:

  • Model: Granite-3.1-1B base model
  • Learning Rate: 3e-5
  • Batch Size: 6-8 (depending on GPU memory)
  • Max Length: 2240 tokens
  • Epochs: 5
  • Validation Split: 2%

πŸ“Š Evaluation

Running Evaluations

  1. LLM-based Evaluation
cd evaluation/LLM_evaluation
jupyter notebook EVALUATION_Prompt_1.ipynb
  1. Rule-based Evaluation
cd evaluation/rule-based
python rule-based.py --input_file "test_data.csv" --output_dir "results/"

Evaluation Metrics

  • Mean Absolute Error (MAE)
  • Root Mean Square Error (RMSE)
  • RΒ² Score
  • Pearson Correlation
  • Precision/Recall for binary classification

πŸ—ƒοΈ Data Generation

Creating Training Data

cd Data-Generation

# Modify existing articles
python Existing_Articles_Modification.py \
    --input_file "articles.csv" \
    --output_file "modified_articles.csv" \
    --modification_rate 0.3

# Generate new articles
python New_Articles_Generation.py \
    --num_articles 1000 \
    --output_file "generated_articles.csv"

Data Sources

  • FEVER Dataset: Fact verification dataset
  • SciFact: Scientific claim verification
  • Custom Fake News: Curated misinformation examples
  • News Articles: Real news with reliability annotations

πŸ“– API Documentation

REST API Endpoints

Search Endpoint

POST /api/search
{
    "query": "climate change effects",
    "filters": {
        "min_truth_score": 0.7,
        "max_results": 20
    }
}

Truth Assessment Endpoint

POST /api/assess
{
    "statement": "The Earth is flat.",
    "include_explanation": true
}

Batch Assessment Endpoint

POST /api/assess/batch
{
    "statements": ["Statement 1", "Statement 2"],
    "return_details": false
}

Response Format

{
    "truth_score": 0.123,
    "confidence": 0.89,
    "explanation": "Low reliability due to contradicting scientific evidence...",
    "sources": [
        {
            "url": "https://example.com",
            "title": "Source Title",
            "reliability_score": 0.85
        }
    ]
}

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Code Style

  • Python: Follow PEP 8, use black for formatting
  • TypeScript/React: Follow Airbnb style guide, use prettier
  • Documentation: Use clear docstrings and comments

🎯 Demo & Screenshots

Search Interface

TrueGL Search Interface

Clean, intuitive search interface with truth score integration

Search Results with Truth Scores

TrueGL Search Results

Search results displaying reliability scores and source verification

Detailed Analysis View

TrueGL Analysis View

Comprehensive analysis with detailed truth assessment metrics

Key Features in Action

  • πŸ“Š Real-time Truth Scoring: See reliability scores instantly with your search results
  • πŸ” Advanced Filtering: Filter content by truth score, source reliability, and topic
  • πŸ“ˆ Detailed Analytics: Get comprehensive breakdowns of assessment reasoning
  • 🌐 Multi-source Verification: Cross-reference information across multiple reliable sources

πŸ“„ Citation

If you use TrueGL in your research, please cite our paper:

@misc{chandra2025truegltruthfulreliableunified,
      title={TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search}, 
      author={Joydeep Chandra and Aleksandr Algazinov and Satyam Kumar Navneet and Rim El Filali and Matt Laing and Andrew Hanna},
      year={2025},
      eprint={2506.12072},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2506.12072}, 
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A Real-time Search Engine with Integrated Truth Scoring and Fiction Detection, Empowering Users to Evaluate the Reliability of Information Instantly.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 72.0%
  • Python 16.2%
  • TypeScript 11.4%
  • Other 0.4%