Client Data Analysis CLI

A Ruby command-line application for searching and analyzing client data from JSON datasets. This project demonstrates clean code architecture, comprehensive testing, and modern Ruby development practices with a professional gem packaging approach.

Features

Name Search: Search through all clients and return those with names matching a given query (case-insensitive, supports regex patterns)
- TTY output includes syntax highlighting to visually highlight matched text
Duplicate Email Detection: Find clients with duplicate email addresses in the dataset
Rating Filter: Filter clients by minimum rating threshold
Dataset Generation: Generate realistic test datasets with customizable size, guaranteed duplicates, and optional rating/feedback data
Multiple Output Formats: Support for TTY, CSV, JSON, XML, and YAML output formats
- All formats support optional result fields (rating and feedback comments)
Flexible Dataset Support: Specify custom dataset files via command-line options
Robust Error Handling: Validates file existence and JSON format before processing
Graceful Data Handling: Safely processes datasets with missing or invalid fields
Gem Distribution: Packaged as a proper Ruby gem for easy installation and distribution

Installation

Quick Install (Recommended)

Download the latest .gem file from Releases and install locally:

gem install challenge-1.5.gem

For Development

git clone https://github.com/tob1k/challenge.git
cd challenge
bundle install

Usage

Quick Start

# 1. Generate test data
challenge generate -f data.json --size 1000

# 2. Search for clients
challenge search "John" -f data.json

# 3. Find duplicate emails
challenge duplicates -f data.json

Commands

Command	Alias	Description	Example
`generate`	`g`	Generate test dataset	`challenge generate -f data.json --size 1000`
`search`	`s`	Find clients by name, using regex	`challenge search "John" -f data.json`
`duplicates`	`d`	Find duplicate emails	`challenge duplicates -f data.json`
`filter_by_rating`	-	Filter clients by minimum rating	`challenge filter_by_rating 4.0 -f data.json`
`version`	-	Show version number	`challenge version`

Output Formats

Add --output FORMAT to any command:

challenge search "John" -f data.json --output json
challenge duplicates -f data.json --output csv

Available formats: tty (default), csv, json, xml, yaml

Advanced Usage

# Regex search patterns
challenge search "^John" -f data.json          # Names starting with "John"
challenge search "Miller$" -f data.json        # Names ending with "Miller"
challenge search "J.*n" -f data.json           # Names starting with J and ending with n

# Filter by rating
challenge filter_by_rating 3.5 -f data.json    # Clients with rating >= 3.5
challenge filter_by_rating 4.0 -f data.json --output json

# Short aliases
challenge s "John" -f data.json                # search
challenge d -f data.json                       # duplicates
challenge g -f data.json --size 500            # generate

# Large datasets
challenge generate -f big_data.json --size 50000 --force

Help

challenge help
challenge --version

Dataset Format

The application expects JSON files containing an array of client objects with the following structure:

[
  {
    "id": 1,
    "full_name": "John Doe",
    "email": "john.doe@gmail.com",
    "result": {
      "rating": 4.5,
      "feedback": [
        {
          "comment": "Great job on the project!",
          "date": "2023-10-01"
        },
        {
          "comment": "Excellent communication skills.",
          "date": "2023-10-15"
        }
      ]
    }
  },
  {
    "id": 2,
    "full_name": "Jane Smith",
    "email": "jane.smith@yahoo.com",
    "result": {
      "rating": 3.8,
      "feedback": []
    }
  }
]

Required fields:

id: Unique identifier
full_name: Client's full name
email: Client's email address

Optional fields:

result: Object containing performance data (used by filter_by_rating command)
- rating: Numeric rating value
- feedback: Array of feedback objects with comment and date fields (date is optional)

Testing

Run the test suite using RSpec:

# Run all tests
bundle exec rspec

# Run with verbose output
bundle exec rspec --format documentation

# Run specific test file
bundle exec rspec spec/challenge/dataset_spec.rb

Test Coverage

The test suite includes:

Happy path scenarios: Valid searches, duplicate detection
Edge cases: Empty queries, whitespace handling, empty datasets
Error scenarios: Missing files, invalid JSON, malformed data
Multiple duplicate scenarios: Complex email duplication patterns

Project Structure

.
├── .github/
│   └── workflows/
│       ├── test.yml          # RSpec tests pipeline
│       ├── rubocop.yml       # RuboCop linting pipeline
│       └── release.yml       # Automated gem publishing
├── bin/
│   └── challenge             # Executable script
├── example/
│   └── clients.json          # Sample dataset
├── lib/
│   ├── challenge.rb          # Main module loader
│   ├── challenge/
│   │   ├── cli.rb            # Thor CLI interface
│   │   ├── dataset.rb        # Core dataset operations
│   │   ├── dataset_generator.rb # Test dataset generation
│   │   ├── version.rb        # Version constant
│   │   └── formatters/       # Output formatters
│   │       ├── tty_formatter.rb    # Terminal output
│   │       ├── csv_formatter.rb    # CSV output
│   │       ├── json_formatter.rb   # JSON output
│   │       ├── xml_formatter.rb    # XML output
│   │       └── yaml_formatter.rb   # YAML output
├── spec/
│   ├── spec_helper.rb        # RSpec configuration
│   └── challenge/
│       ├── dataset_spec.rb   # Dataset class tests
│       └── cli_spec.rb       # CLI integration tests
├── Gemfile                   # Dependencies
├── challenge.gemspec         # Gem specification
├── CHANGELOG.md              # Release history and changes
├── .rubocop.yml              # Code style configuration
└── README.md                 # This file

Architecture Decisions

Command-Line Interface

Thor: Chosen for its robust CLI framework with built-in help, option parsing, and command structure
Modular Design: Separate CLI from business logic for better testability and maintainability
Gem Packaging: Professional gem distribution with proper gemspec and executable

Output Formatting

Formatter Pattern: Clean separation of output logic from business logic
Multiple Formats: Support for TTY, CSV, JSON, XML, and YAML outputs
DRY Configuration: Format options driven by a single FORMATTERS constant
Extensible Design: Easy to add new output formats without modifying core logic
Colored Output: Syntax highlighting for search results in TTY format

CI/CD Pipeline

Automated Testing: Multi-version Ruby testing (3.1-3.4) on every push and PR
Code Quality: RuboCop linting with zero violations tolerance
Automated Releases: Tag-triggered publishing to GitHub Packages
Reusable Workflows: DRY principle applied to GitHub Actions workflows
Zero-Config Publishing: No external secrets or manual configuration required

Data Processing

Dataset Class: Encapsulates all dataset operations with clear separation of concerns
Eager Loading: Loads entire dataset into memory for fast repeated operations
Validation: File existence and JSON format validation at initialization

Error Handling

Early Validation: Dataset file validation occurs at object creation
Specific Errors: Clear error messages for different failure scenarios
Graceful Degradation: Empty results rather than crashes for edge cases

Known Limitations

Memory Usage: Current implementation loads entire dataset into memory, which may not scale for very large files
Search Functionality: Only supports name-based searching; field selection is not dynamic
Case Sensitivity: Email comparison is case-sensitive (following RFC standards)

Future Improvements

Given more time, the following enhancements would be prioritized:

Architecture Enhancements

Streaming JSON Parser: Use streaming parser for large datasets to reduce memory footprint
Database Backend: Add optional database storage for better performance with large datasets
Configuration System: External configuration files for default settings

Feature Extensions

Dynamic Field Search: Allow users to specify which field to search (name, email, id, etc.)
Advanced Search: Multiple field search and complex queries (regex patterns already supported)
REST API: Web service interface for remote access
Caching Layer: Cache search results for improved performance

Scalability Considerations

Pagination: Support for paginated results in large datasets
Indexing: Add search indexing for faster query performance
Concurrent Processing: Parallel processing for large dataset operations
Cloud Storage: Support for datasets stored in cloud storage (S3, etc.)

User Experience

Interactive Mode: REPL-style interface for multiple queries
Search Suggestions: Auto-complete and suggestion features
Progress Indicators: Progress bars for long-running operations

Development

Adding New Commands

Add method to Challenge::CLI class in lib/challenge/cli.rb
Add corresponding functionality to Challenge::Dataset class
Write comprehensive tests in spec/challenge/

Running Development Commands

# Load the application in IRB for testing
bundle exec irb -r ./lib/challenge

# Run linting
bundle exec rubocop

# Run both tests and linting (CI simulation)
bundle exec rspec && bundle exec rubocop

Releasing

The project includes automated release workflows with full CI/CD pipeline:

Create a release tag:

# Update version in lib/challenge/version.rb first
git add lib/challenge/version.rb
git commit -m "Bump version to 1.5"
git tag v1.5
git push origin main
git push origin v1.5

Automated CI/CD process:
- ✅ Quality Gates: Runs RSpec tests across Ruby versions 3.1-3.4 and RuboCop linting
- ✅ Build: Compiles the gem from source
- ✅ Publish: Publishes to GitHub Packages registry
- ✅ Release: Creates GitHub release with changelog and gem attachment
- ✅ Zero-config: Uses built-in GITHUB_TOKEN with appropriate permissions

Release Features:

🔄 Reusable workflows: Leverages existing test and lint workflows to avoid duplication
🛡️ Quality assurance: Only publishes if all tests and linting pass
📦 Multiple distribution: Available via GitHub Packages and direct download
🏷️ Semantic versioning: Tag-based releases with automatic version detection

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

This project is for demonstration purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
bin		bin
example		example
lib		lib
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
challenge.gemspec		challenge.gemspec

tob1k/challenge

Folders and files

Latest commit

History

Repository files navigation