A custom search engine that indexes and provides searchable access to all my portfolio content, personal websites, and professional profiles.
- 🔍 Real-time search functionality
- 🎯 TF-IDF based search ranking
- 🕷️ Automated web crawler for content indexing
- 🎨 Clean, modern UI inspired by popular search engines
- 📱 Responsive design for all devices
- ⚡ Fast and efficient search results
- React 19 with TypeScript
- Vite for build tooling
- React Router for navigation
- Custom CSS with CSS Variables for theming
- FastAPI for the REST API
- PostgreSQL for data storage
- psycopg for database connectivity
- BeautifulSoup4 for web crawling
- Node.js (v18 or higher)
- Python 3.8+
- PostgreSQL database
- Create a virtual environment and install dependencies:
cd server
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
pip install -r requirements.txt
- Set up your environment variables in
.env
:
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_db_name
DB_USER=your_db_user
DB_PASSWORD=your_db_password
- Install dependencies:
cd client
npm install
- Start the development server:
npm run dev
The search functionality uses TF-IDF (Term Frequency-Inverse Document Frequency) scoring for ranking results. The crawler processes content and stores word frequencies in the database, which are then used to calculate relevance scores during search.
pages
: Stores crawled web pageskeywords
: Stores unique keywordskeyword_pages
: Maps keywords to pages with frequency counts
The crawler automatically indexes content from specified start URLs, following links to related pages while respecting robots.txt rules and implementing polite crawling practices.
GET /search?q={query}
: Search for pages matching the query
Feel free to submit issues and enhancement requests!
- LinkedIn: linkedin.com/in/rhamzthev
- GitHub: github.com/rhamzthev