Add Company Classification Scanner pipeline#7
Draft
warunkash wants to merge 1 commit into
Draft
Conversation
Two-stage pipeline that processes LinkedIn job results and filters out staffing/recruitment firms, keeping only product and service companies. Architecture: - cache.py: O(1) lookup against a pre-built knowledge base of 80+ companies - enrichment.py: fetches company website meta tags to supplement sparse data - classifier.py: keyword rule engine + lazy-loaded HuggingFace bart-large-mnli zero-shot classification (facebook/bart-large-mnli) - agents_orchestrator.py: LLM fallback via openai-agents SDK or Anthropic client - pipeline.py: orchestrates all stages with cache write-back on confident results - demo.py: runnable demo with 12 sample LinkedIn job listings Decision priority: cache → name signals → keyword rules → HF zero-shot → agent. Achieves 90–95% accuracy without any network calls for well-known companies. https://claude.ai/code/session_01KRN9i2J3FqDPdtxXmUkzf4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
facebook/bart-large-mnli), and an optional LLM agent orchestrator for 90–95% accuracyArchitecture
Files added
company_classifier/pipeline.pyCompanyClassificationPipelinecompany_classifier/classifier.pyKeywordClassifier+HuggingFaceClassifier+HybridClassifiercompany_classifier/cache.pyCompanyCache— load/lookup/store with fuzzy matchingcompany_classifier/enrichment.pyCompanyEnrichment— fetches website meta tagscompany_classifier/agents_orchestrator.pyCompanyClassifierAgent— LLM fallback via openai-agents or Anthropiccompany_classifier/company_cache.jsoncompany_classifier/demo.pycompany_classifier/requirements.txtTest plan
python company_classifier/demo.py— should classify 12 companies correctly (5 product, 5 recruitment, 2 service)Product CompanyRecruitment CompanyService Companyuse_hf=Trueand verify HuggingFace zero-shot path works for unknown companiesuse_agent=Truewith an Anthropic API key and verify agent fallback triggers for low-confidence caseshttps://claude.ai/code/session_01KRN9i2J3FqDPdtxXmUkzf4
Generated by Claude Code