A Ruby command-line application for searching and analyzing client data from JSON datasets. This project demonstrates clean code architecture, comprehensive testing, and modern Ruby development practices with a professional gem packaging approach.
- Name Search: Search through all clients and return those with names matching a given query (case-insensitive, supports regex patterns)
- TTY output includes syntax highlighting to visually highlight matched text
- Duplicate Email Detection: Find clients with duplicate email addresses in the dataset
- Rating Filter: Filter clients by minimum rating threshold
- Dataset Generation: Generate realistic test datasets with customizable size, guaranteed duplicates, and optional rating/feedback data
- Multiple Output Formats: Support for TTY, CSV, JSON, XML, and YAML output formats
- All formats support optional
resultfields (rating and feedback comments)
- All formats support optional
- Flexible Dataset Support: Specify custom dataset files via command-line options
- Robust Error Handling: Validates file existence and JSON format before processing
- Graceful Data Handling: Safely processes datasets with missing or invalid fields
- Gem Distribution: Packaged as a proper Ruby gem for easy installation and distribution
Download the latest .gem file from Releases and install locally:
gem install challenge-1.5.gemgit clone https://github.com/tob1k/challenge.git
cd challenge
bundle install# 1. Generate test data
challenge generate -f data.json --size 1000
# 2. Search for clients
challenge search "John" -f data.json
# 3. Find duplicate emails
challenge duplicates -f data.json| Command | Alias | Description | Example |
|---|---|---|---|
generate |
g |
Generate test dataset | challenge generate -f data.json --size 1000 |
search |
s |
Find clients by name, using regex | challenge search "John" -f data.json |
duplicates |
d |
Find duplicate emails | challenge duplicates -f data.json |
filter_by_rating |
- | Filter clients by minimum rating | challenge filter_by_rating 4.0 -f data.json |
version |
- | Show version number | challenge version |
Add --output FORMAT to any command:
challenge search "John" -f data.json --output json
challenge duplicates -f data.json --output csvAvailable formats: tty (default), csv, json, xml, yaml
# Regex search patterns
challenge search "^John" -f data.json # Names starting with "John"
challenge search "Miller$" -f data.json # Names ending with "Miller"
challenge search "J.*n" -f data.json # Names starting with J and ending with n
# Filter by rating
challenge filter_by_rating 3.5 -f data.json # Clients with rating >= 3.5
challenge filter_by_rating 4.0 -f data.json --output json
# Short aliases
challenge s "John" -f data.json # search
challenge d -f data.json # duplicates
challenge g -f data.json --size 500 # generate
# Large datasets
challenge generate -f big_data.json --size 50000 --forcechallenge help
challenge --versionThe application expects JSON files containing an array of client objects with the following structure:
[
{
"id": 1,
"full_name": "John Doe",
"email": "john.doe@gmail.com",
"result": {
"rating": 4.5,
"feedback": [
{
"comment": "Great job on the project!",
"date": "2023-10-01"
},
{
"comment": "Excellent communication skills.",
"date": "2023-10-15"
}
]
}
},
{
"id": 2,
"full_name": "Jane Smith",
"email": "jane.smith@yahoo.com",
"result": {
"rating": 3.8,
"feedback": []
}
}
]Required fields:
id: Unique identifierfull_name: Client's full nameemail: Client's email address
Optional fields:
result: Object containing performance data (used byfilter_by_ratingcommand)rating: Numeric rating valuefeedback: Array of feedback objects withcommentanddatefields (date is optional)
Run the test suite using RSpec:
# Run all tests
bundle exec rspec
# Run with verbose output
bundle exec rspec --format documentation
# Run specific test file
bundle exec rspec spec/challenge/dataset_spec.rbThe test suite includes:
- Happy path scenarios: Valid searches, duplicate detection
- Edge cases: Empty queries, whitespace handling, empty datasets
- Error scenarios: Missing files, invalid JSON, malformed data
- Multiple duplicate scenarios: Complex email duplication patterns
.
├── .github/
│ └── workflows/
│ ├── test.yml # RSpec tests pipeline
│ ├── rubocop.yml # RuboCop linting pipeline
│ └── release.yml # Automated gem publishing
├── bin/
│ └── challenge # Executable script
├── example/
│ └── clients.json # Sample dataset
├── lib/
│ ├── challenge.rb # Main module loader
│ ├── challenge/
│ │ ├── cli.rb # Thor CLI interface
│ │ ├── dataset.rb # Core dataset operations
│ │ ├── dataset_generator.rb # Test dataset generation
│ │ ├── version.rb # Version constant
│ │ └── formatters/ # Output formatters
│ │ ├── tty_formatter.rb # Terminal output
│ │ ├── csv_formatter.rb # CSV output
│ │ ├── json_formatter.rb # JSON output
│ │ ├── xml_formatter.rb # XML output
│ │ └── yaml_formatter.rb # YAML output
├── spec/
│ ├── spec_helper.rb # RSpec configuration
│ └── challenge/
│ ├── dataset_spec.rb # Dataset class tests
│ └── cli_spec.rb # CLI integration tests
├── Gemfile # Dependencies
├── challenge.gemspec # Gem specification
├── CHANGELOG.md # Release history and changes
├── .rubocop.yml # Code style configuration
└── README.md # This file
- Thor: Chosen for its robust CLI framework with built-in help, option parsing, and command structure
- Modular Design: Separate CLI from business logic for better testability and maintainability
- Gem Packaging: Professional gem distribution with proper gemspec and executable
- Formatter Pattern: Clean separation of output logic from business logic
- Multiple Formats: Support for TTY, CSV, JSON, XML, and YAML outputs
- DRY Configuration: Format options driven by a single FORMATTERS constant
- Extensible Design: Easy to add new output formats without modifying core logic
- Colored Output: Syntax highlighting for search results in TTY format
- Automated Testing: Multi-version Ruby testing (3.1-3.4) on every push and PR
- Code Quality: RuboCop linting with zero violations tolerance
- Automated Releases: Tag-triggered publishing to GitHub Packages
- Reusable Workflows: DRY principle applied to GitHub Actions workflows
- Zero-Config Publishing: No external secrets or manual configuration required
- Dataset Class: Encapsulates all dataset operations with clear separation of concerns
- Eager Loading: Loads entire dataset into memory for fast repeated operations
- Validation: File existence and JSON format validation at initialization
- Early Validation: Dataset file validation occurs at object creation
- Specific Errors: Clear error messages for different failure scenarios
- Graceful Degradation: Empty results rather than crashes for edge cases
- Memory Usage: Current implementation loads entire dataset into memory, which may not scale for very large files
- Search Functionality: Only supports name-based searching; field selection is not dynamic
- Case Sensitivity: Email comparison is case-sensitive (following RFC standards)
Given more time, the following enhancements would be prioritized:
- Streaming JSON Parser: Use streaming parser for large datasets to reduce memory footprint
- Database Backend: Add optional database storage for better performance with large datasets
- Configuration System: External configuration files for default settings
- Dynamic Field Search: Allow users to specify which field to search (name, email, id, etc.)
- Advanced Search: Multiple field search and complex queries (regex patterns already supported)
- REST API: Web service interface for remote access
- Caching Layer: Cache search results for improved performance
- Pagination: Support for paginated results in large datasets
- Indexing: Add search indexing for faster query performance
- Concurrent Processing: Parallel processing for large dataset operations
- Cloud Storage: Support for datasets stored in cloud storage (S3, etc.)
- Interactive Mode: REPL-style interface for multiple queries
- Search Suggestions: Auto-complete and suggestion features
- Progress Indicators: Progress bars for long-running operations
- Add method to
Challenge::CLIclass inlib/challenge/cli.rb - Add corresponding functionality to
Challenge::Datasetclass - Write comprehensive tests in
spec/challenge/
# Load the application in IRB for testing
bundle exec irb -r ./lib/challenge
# Run linting
bundle exec rubocop
# Run both tests and linting (CI simulation)
bundle exec rspec && bundle exec rubocopThe project includes automated release workflows with full CI/CD pipeline:
-
Create a release tag:
# Update version in lib/challenge/version.rb first git add lib/challenge/version.rb git commit -m "Bump version to 1.5" git tag v1.5 git push origin main git push origin v1.5
-
Automated CI/CD process:
- ✅ Quality Gates: Runs RSpec tests across Ruby versions 3.1-3.4 and RuboCop linting
- ✅ Build: Compiles the gem from source
- ✅ Publish: Publishes to GitHub Packages registry
- ✅ Release: Creates GitHub release with changelog and gem attachment
- ✅ Zero-config: Uses built-in
GITHUB_TOKENwith appropriate permissions
Release Features:
- 🔄 Reusable workflows: Leverages existing test and lint workflows to avoid duplication
- 🛡️ Quality assurance: Only publishes if all tests and linting pass
- 📦 Multiple distribution: Available via GitHub Packages and direct download
- 🏷️ Semantic versioning: Tag-based releases with automatic version detection
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is for demonstration purposes.