Feature/enhanced processing and storage #138

crystalnet · 2025-06-23T05:59:00Z

Hey,

I've been working quite extensively on Fredy because I think it's a great idea!

Here is what I added:

🚀 Enhanced Real Estate Listing System with AI-Powered Processing

📋 Summary

This PR introduces a comprehensive enhancement to the real estate listing system, adding AI-powered listing processing, waypoint calculations, improved error handling, and a new dashboard interface. The changes transform the basic listing scraper into a sophisticated real estate analysis platform.

🎯 Key Features

🤖 GenAI-Enhanced Listing Processing

ChatGPT Integration: Added intelligent extraction of custom fields from listing content
Custom Fields System: Configurable field extraction with natural language prompts
Enhanced Storage: New enhancedListingsStorage with schema validation and enforcement
Robust Error Handling: Graceful fallbacks when AI processing fails

🗺️ Waypoint Calculator

Travel Time Analysis: Calculate travel times to important locations (work, gym, etc.)
Multiple Transport Modes: Support for transit, driving, walking, and cycling
Google Maps Integration: Real-time travel data via Google Maps API

📊 Dashboard & UI Enhancements

New Dashboard: Comprehensive listing overview with filtering and sorting
Enhanced Job Management: Custom fields and waypoints configuration in job creation
Improved Navigation: Updated menu structure and routing

🔧 Technical Improvements

Error Handling & Logging

Centralized Logging: New logger.js utility with Winston integration
Defensive Programming: Robust error handling throughout the extraction pipeline
Schema Validation: Ensures database consistency for enhancedListings
Graceful Degradation: System continues working even when external APIs fail

Code Quality

Type Consistency: Converted all IDs to strings for consistency
Modular Architecture: Separated concerns with dedicated extractors and storage
Comprehensive Testing: Added extensive test coverage for new features
VSCode Configuration: Improved development experience with launch configurations

Performance & Reliability

Sequential Processing: Prevents overwhelming target servers
Persistent Storage: Enhanced listings stored with proper schema validation
Memory Management: Proper cleanup and resource management

📁 Files Changed

Core System (59 files changed, +1053/-325 lines)

lib/FredyRuntime.js - Enhanced with AI processing and waypoint calculation
lib/services/extractor/ - New extraction pipeline with ChatGPT integration
lib/services/storage/enhancedListingsStorage.js - New storage system with schema validation
lib/services/waypoint-calculator/ - New travel time calculation service

UI Components

ui/src/views/dashboard/ - New dashboard interface
ui/src/views/jobs/mutation/ - Enhanced job configuration with custom fields and waypoints
ui/src/services/rematch/models/ - Updated state management

Testing & Configuration

Comprehensive test suite for new features
VSCode launch configurations for debugging
Updated package dependencies

🧪 Testing

✅ Unit Tests: Enhanced listings storage, waypoint calculator, ChatGPT integration
✅ Integration Tests: Full end-to-end workflow testing
✅ Error Handling: Tested fallback scenarios and error recovery
✅ UI Testing: Dashboard and job configuration interfaces
✅ Performance Testing: Sequential processing and delay mechanisms

�� Breaking Changes

None - This is a feature addition that maintains backward compatibility with existing jobs and configurations.

�� Migration Notes

Existing jobs will continue to work without modification
New features (custom fields, waypoints) are opt-in
Enhanced listings are stored separately from basic listings
No database migration required

📈 Impact

Enhanced Data Quality: AI-powered extraction provides richer listing information
Better User Experience: Dashboard provides comprehensive overview of listings
Improved Reliability: Robust error handling and bot detection prevention
Scalability: Modular architecture supports future enhancements

🔮 Future Considerations

Consider rate limiting for ChatGPT API usage
Monitor Google Maps API usage and costs
Evaluate performance impact of sequential processing
Consider caching mechanisms for waypoint calculations

Total Changes: 74 files, +3,436 insertions, -7,285 deletions (net -3,849 lines, mostly due to yarn.lock removal)

This PR represents a significant evolution of the real estate listing system, transforming it from a basic scraper into a comprehensive analysis platform with AI capabilities and travel insights.

🏠 Enhanced Real Estate Listing System - Detailed Feature Architecture

1. Custom Fields System

Overview

The custom fields system allows users to define specific attributes they want to extract from real estate listings using natural language processing. This transforms basic listing data into rich, structured information tailored to individual preferences.

Architecture

User Input → Job Configuration → ChatGPT Prompt → AI Extraction → Structured Data

How It Works

User Configuration: Users define custom fields in the job creation interface with:
- Field Name: Human-readable identifier (e.g., "Price per Square Meter")
- Question Prompt: Natural language question for ChatGPT (e.g., "What is the price per square meter?")
- Answer Length: Expected response format (one_word, one_statement, several_sentences)

AI Processing Pipeline:

// Example custom field configuration
{
  id: "price_per_sqm",
  name: "Price per Square Meter", 
  questionPrompt: "What is the price per square meter?",
  answerLength: "one_word"
}

ChatGPT Integration:
- System generates structured prompts from user questions
- ChatGPT analyzes listing content and extracts specific values
- Responses are validated and mapped to field IDs
- Fallback to empty strings if extraction fails

Functionality

Dynamic Field Creation: No code changes needed for new field types
Natural Language Processing: Uses AI to understand context and extract precise values
Validation & Fallbacks: Graceful handling of extraction failures
Schema Enforcement: Ensures all listings have consistent field structure

Security

API Key Management: ChatGPT API keys are provided by users and stored securely in backend configuration
No Data Persistence: API keys are not stored in listings or transmitted to frontend
Rate Limiting: Built-in delays prevent API abuse

2. Waypoints System

Overview

The waypoints system calculates travel times and distances from listings to user-defined important locations, providing crucial insights for location-based decision making.

Architecture

User Waypoints → Google Maps API → Travel Calculations → Enhanced Listings

How It Works

Waypoint Configuration: Users define locations with:
- Name: Human-readable identifier (e.g., "Work", "Gym")
- Address: Physical location for geocoding
- Transport Mode: Preferred travel method (transit, driving, walking, bicycling)

Google Maps Integration:

// Example waypoint configuration
{
  id: "work",
  name: "Work",
  location: "Alexanderplatz 1, Berlin",
  transportMode: "transit"
}

Calculation Process:
- Geocodes listing address and waypoint addresses
- Queries Google Maps Distance Matrix API
- Calculates travel time and distance for each waypoint
- Stores results as travelTime_work, travelDistance_work fields

Functionality

Multi-Modal Transport: Support for public transit, car, walking, cycling
Real-Time Data: Live travel information from Google Maps
Batch Processing: Efficient calculation for multiple waypoints
Error Handling: Graceful fallbacks when API calls fail

Security

API Key Management: Google Maps API keys provided by users and stored securely
No Key Exposure: API keys never transmitted to frontend or stored in listings
Usage Monitoring: Built-in logging for API usage tracking

3. Enhanced Listings Processing

Overview

Enhanced listings represent a complete transformation of basic search results into rich, AI-analyzed data with travel insights, providing comprehensive information for informed decision making.

Architecture

Search Results → Expose Fetching → Content Extraction → AI Processing → Travel Calculation → Storage

How It Works

Initial Search: Standard listing discovery via search pages
Expose Fetching:
- Navigates to individual listing pages
- Extracts full listing content (not just search snippets)
- Handles different provider formats (HTML, JSON APIs)

Content Processing:

// Sequential processing with bot detection prevention
for (listing of listings) {
  await delay(2000-7000ms); // Random delay
  const exposeContent = await fetchExpose(listing.url);
  const enhancedData = await processWithAI(exposeContent);
  const waypointData = await calculateWaypoints(listing.address);
  await storeEnhancedListing({...listing, ...enhancedData, ...waypointData});
}

AI Enhancement:
- Extracts custom fields using ChatGPT
- Processes natural language content
- Validates and structures responses
Travel Calculation:
- Calculates distances to all configured waypoints
- Provides travel times for different transport modes
- Handles API failures gracefully

Functionality

Comprehensive Data: Combines basic listing info with AI insights and travel data
Sequential Processing: Prevents bot detection with intelligent delays
Error Resilience: Continues processing even when individual listings fail
Real-Time Updates: Fresh data on each processing run

4. Enhanced Listings Storage

Overview

A sophisticated storage system designed to handle the complex, schema-enforced data structure of enhanced listings with proper validation and efficient retrieval.

Architecture

Enhanced Listings → Schema Validation → Object Storage → JSON Files → UI Retrieval

How It Works

Schema Management:

// Dynamic schema generation from job configuration
const schema = [
  ...basicFields,           // id, title, price, size, link, date_found, details
  ...customFieldColumns,    // User-defined custom fields
  ...waypointColumns        // travelTime_*, travelDistance_* fields
];

Storage Structure:
- Object-based Storage: Listings stored as objects keyed by ID for fast access
- Schema Enforcement: All listings must conform to defined schema
- Automatic Backfilling: New fields added to existing listings with defaults
- Deduplication: Overwrites existing listings with same ID

Data Validation:

// Ensures all required fields exist
function validateWithSchema(listing, schema) {
  return schema.every(col => Object.prototype.hasOwnProperty.call(listing, col.id));
}

File Organization:
- One JSON file per job: db/enhanced-listings/{jobId}.json
- Contains both listings object and schema definition
- Automatic directory creation and file management

Functionality

Schema Evolution: Supports adding/removing fields without data loss
Efficient Retrieval: Object storage enables fast lookups by ID
Data Integrity: Validation ensures consistent data structure
Scalability: File-per-job organization supports large datasets

5. Dashboard Interface

Overview

A comprehensive table-based interface that transforms enhanced listing data into actionable insights, enabling users to compare properties and make informed decisions.

Architecture

Enhanced Listings → Dashboard API → React Table → Filtered/Sorted View

How It Works

Data Retrieval:

// Fetches enhanced listings for specific job
const enhancedListings = await getEnhancedListings(jobId);
const schema = await getSchema(jobId);

Dynamic Column Generation:
- Generates table columns from schema definition
- Supports different column types (basic, custom, waypoint)
- Configurable visibility and sorting
Advanced Filtering:
- Text search across all fields
- Numeric range filters for prices, sizes, travel times
- Multi-select filters for categorical data

Interactive Features:

// Example table features
- Sortable columns (price, size, travel time)
- Filterable data (price range, location)
- Export functionality
- Direct links to original listings

Functionality

Comprehensive Overview: All listing data in one view
Comparison Tools: Side-by-side property comparison
Travel Insights: Visual representation of travel times
Custom Field Display: Shows AI-extracted custom information
Responsive Design: Works on desktop and mobile devices

Security Features

User Authentication: Dashboard access requires login
Job Isolation: Users only see listings from their own jobs
API Key Protection: No sensitive configuration data exposed
Input Validation: All user inputs sanitized and validated

�� Security & Privacy

API Key Management

User-Provided Keys: All API keys (ChatGPT, Google Maps) are provided by users
Secure Storage: Keys stored in backend configuration files, not in database
No Frontend Exposure: Keys never transmitted to browser or stored in listings
Environment-Based: Support for environment variables for production deployments

Data Protection

Local Storage: All data stored locally, no external data transmission
User Isolation: Jobs and listings are user-specific
No PII Storage: No personal information stored in listings
Configurable Retention: Users control data retention policies

Access Control

Authentication Required: All enhanced features require user login
Job-Level Permissions: Users can only access their own job data
Audit Logging: Comprehensive logging for debugging and monitoring

This architecture provides a robust, scalable, and secure foundation for advanced real estate analysis while maintaining user privacy and data security.

… .gitignore)

orangecoding · 2025-06-23T16:18:49Z

Dude.. that's quite a pr. First and foremost thanks for the work. I need some time to check it out and go through all of this which might take 2 weeks as I'm on a business trip starting end of this week, so there might be some delays. I'll check it out as soon as possible.

crystalnet · 2025-06-24T01:46:21Z

Yeah, sorry that it became so big... Take your time to review it and feel free to ask me any questions :)

orangecoding · 2025-06-24T11:00:08Z

hey. I just did a quick check (I need to dig way deeper), just a couple of questions for now.

you removed the config.json which breaks Fredy. Can you add it again?
you introduced some big new concepts into Fredy. Some of which are extremely powerful (despite only being relevant to a smaller group of people as I assume the regular user won't use ai), but it would anyhow be benefitial to describe all of this in the Readme
why is there a random delay? await delay(2000-7000ms); (and such a big one)
As for the new settings "Google Maps Api Key" & "Open Ai Key", would it make sense to add a little bit of documentation on how to obtain the keys (for the dau)?
Can we somehow check if the api keys are valid upon saving?
As for the custom fields in the jobs, I think it makes sense to not let the user add any fields if there is no openapi key
I do not understand why (in the sequence of steps) you send out the notifications first and only after you enhance the listings. This way the user will never actually get all the infos. If I put the notification last, I can actually see the enhancements, but due to the delay, it takes super long to process it. Dependening on the number of found listings (could be north of 100 on the first run), this can theoretically take longer than it takes for Fredy to run again.
You do have a function called _enhanceListings. It might be good to rename it to _enhanceListingsWithAi
For the enhancement step, you extract all the data again. I think this is a very big overkill, why don't you use the already extracted data instead?

Lastly, if I change the order as mentioned in (7), and use the ai enhancement, I do get tons of these errors:

error: processExpose failed for listing 1c5733e31026f43622146abbf05f4e87b46de0630c4afa0b662b436169dcaeac: No response received from https://www.immonet.de/classified-search?distributionTypes=Buy,Buy_Auction,Compulsory_Auction&estateTypes=House,Apartment&locations=AD08DE2112&order=Default&m=homepage_new_search_classified_search_result {"timestamp":"2025-06-24T10:59:01.028Z"}

orangecoding · 2025-06-24T11:01:44Z

.gitignore

+package-lock.json
+
+# Config files
+config.json


this must be included, otherwise Fredy will break for everybody when cloning freshly

crystalnet added 9 commits June 16, 2025 14:47

added customs fields in UI + db

90e2c06

adding genAI enhanced listing processing and storage

e270a3f

Fix: minor bug fixes and added testing

fa5afbe

Fix other providers updated

9643b3a

adding dashboard page for listings

48ef648

refactoring customFields

4299871

Remove config.json and yarn.lock from repo, keep them locally (now in…

cccb891

… .gitignore)

adding waypoints to jobs

e43716d

adding schema & enforcement for enhancedListingStorage

f95c10a

orangecoding reviewed Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature/enhanced processing and storage #138

Feature/enhanced processing and storage #138

Uh oh!

crystalnet commented Jun 23, 2025

Uh oh!

orangecoding commented Jun 23, 2025

Uh oh!

crystalnet commented Jun 24, 2025

Uh oh!

orangecoding commented Jun 24, 2025

Uh oh!

orangecoding Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Feature/enhanced processing and storage #138

Are you sure you want to change the base?

Feature/enhanced processing and storage #138

Uh oh!

Conversation

crystalnet commented Jun 23, 2025

🚀 Enhanced Real Estate Listing System with AI-Powered Processing

📋 Summary

🎯 Key Features

🤖 GenAI-Enhanced Listing Processing

🗺️ Waypoint Calculator

📊 Dashboard & UI Enhancements

🔧 Technical Improvements

Error Handling & Logging

Code Quality

Performance & Reliability

📁 Files Changed

🧪 Testing

�� Breaking Changes

�� Migration Notes

📈 Impact

🔮 Future Considerations

🏠 Enhanced Real Estate Listing System - Detailed Feature Architecture

1. Custom Fields System

Overview

Architecture

How It Works

Functionality

Security

2. Waypoints System

Overview

Architecture

How It Works

Functionality

Security

3. Enhanced Listings Processing

Overview

Architecture

How It Works

Functionality

4. Enhanced Listings Storage

Overview

Architecture

How It Works

Functionality

5. Dashboard Interface

Overview

Architecture

How It Works

Functionality

Security Features

�� Security & Privacy

API Key Management

Data Protection

Access Control

Uh oh!

orangecoding commented Jun 23, 2025

Uh oh!

crystalnet commented Jun 24, 2025

Uh oh!

orangecoding commented Jun 24, 2025

Uh oh!

orangecoding Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!