Tools to construct and process webgraphs from Common Crawl data
-
Updated
Oct 22, 2024 - Java
Tools to construct and process webgraphs from Common Crawl data
Simple Java PageRank implementation.
An implementation of the PageRank algorithm used by Google to rank web pages in search engine results.
Link ranking with Apache Giraph for Apache Nutch
💻🔎Search engine implemented with Java including: web crawling, indexing and ranking and the interaction between them.
Search heuristic used by Google coded in Java
Java application that ranks the importance of subreddit pages based off of link analysis
Cloud Computing programming assignments at UARK
Final project for the Cloud Computing course at University of Pisa.
Use PageRank algorithm and InversePageRank to get the PageRank value and InversePageRank value of each website, and sort them from largest to smallest. Then select the number of normal websites and spam websites in the first N websites, and display them visually
Implementation of the MapReduce PageRank algorithm using the Hadoop framework in Java (developed for Cloud Computing course)
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
Mock search engine created for COMP 250 final in my first semester at McGill. Creates a directed graph of the "internet" with pages as vertices and links between pages as edges. Implements useful functions that assign ranks to pages, index information in those pages, and therefore enable ranked keyword search. Also includes both an implementatio…
Mechanism for a search engine backed by inverted indexes (from using Lucene and Hadoop). Word embeddings and Snippet generation were used.
Command line tool to compute PageRank scores over RDF graphs
Add a description, image, and links to the pagerank topic page so that developers can more easily learn about it.
To associate your repository with the pagerank topic, visit your repo's landing page and select "manage topics."