Skip to content

Latest commit

 

History

History
37 lines (21 loc) · 1.79 KB

stream-analytics-big-data.md

File metadata and controls

37 lines (21 loc) · 1.79 KB

🗻 Big Data 📈 Stream Analytics

Data Availability, Data Accuraccy, Data Qualiity

Data Profiling to examin data available. It provide stats such as,

  1. Column stats: type, unique values, missing values
  2. Potential keys and foreign keys
  3. data quality at column level. missing values, distinct values, ...

Analytics

Rating Type Topic
📰 NextRoll - Making 1M Click Predictions per Second using AWS

Probabalistic Algorithms

Realtime Streaming Data Pipelines

Rating Type Topic
⭐⭐⭐ 📰 Yelp: Realtime data pipelines
📰 A hashtag recommendation system for twitter data streams
📰 Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
📰 Real-time Twitter data analysis using Hadoop ecosystem

Probabilistic Data Structures

  • Probabilistic data structures and algorithms (PDSA) are a family of advanced approaches that are optimized to use fixed or sublinear memory and constant execution time.
  • They are often based on hashing and have many other useful features.- However, they also have some disadvantages such as they cannot provide the exact answers and have some probability of error (that, actually, can be controlled).