Skip to content

Wanna know which languages and execution engines are the quickest or the slowest at processing files? Well here's your answer. πŸ“Š Data Analysis & comparison between the time taken ⌚ for computing word counts in various languages and execution engines for files of different sizes.

License

Notifications You must be signed in to change notification settings

Thomas-George-T/File-Processing-Comparative-Analytics

Repository files navigation

GitHub license GitHub top language GitHub language count GitHub last commit ViewCount

Aim

To find out which of the programming languages and execution engines take the maximum and the minimum amount of time to process files.

Methodology

This πŸ“— project conducts data analysis πŸ“Š & comparisons of the execution times ⌚ taken for computing the word count of input text files varying from extremely small to extremely large sizes in various programming languages and execution engines. This project includes sample findings, observations, comparisons and sample word count programs. We then calculate the time taken to process the files individually and gather the results. All of the findings from individual analyses were collected and combined in a google colab notebook where we have plotted graphs using matplotlib and drawn conclusions based on our findings.

File Sizes

File Name Size
apache-hadoop-wiki.txt 46.5 kB
big.txt 6.5 MB

File Sources

Programming Languages

Computing for individual languages. Click the images to go to the respective data analysis results.


Python Java Scala

Execution engines

Computing for individual execution engines. Click the images to go to the respective data analysis results.


Hadoop Spark

Visualizing Results

Comparing Programming Languages

Languages findings

Comparing Execution Engines

Execution engines findings

Conclusions

We have observed from the graphs that Python has the least execution time for small and large files while Scala has the largest execution time.

We have observed that Spark has the least execution time while Hadoop has the highest execution time.

Notebook

The Google Colab Notebook with the complete Analysis with Graphs: Notebook

About

Wanna know which languages and execution engines are the quickest or the slowest at processing files? Well here's your answer. πŸ“Š Data Analysis & comparison between the time taken ⌚ for computing word counts in various languages and execution engines for files of different sizes.

Topics

Resources

License

Stars

Watchers

Forks