En este archivo puedes escribir lo que estimes conveniente. Te recomendamos detallar tu solución y todas las suposiciones que estás considerando. Aquí puedes ejecutar las funciones que definiste en los otros archivos de la carpeta src, medir el tiempo, memoria, etc.

In [2]:
file_path = "../resources/farmers-protest-tweets-2021-2-4.json"
import matplotlib.pyplot as plt
import memory_profiler
from cProfile import Profile
from pstats import Stats, SortKey

def plot(x_values, y_values, x_label, y_label, title=''):
    plt.plot(x_values, y_values, marker='o')
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.title(title)
    plt.grid(True)
    plt.show()

For the first exercise q1_memory,  I've implemented two approaches. Lets run them in two different cells and decide what is the best
1. For the first one
- Get a list with all tuples (date, username)
- Get top 10 of posted dates
- Get top 1 user who posted in each day

In [None]:
from q1_memory import q1_memory
%load_ext memory_profiler
%mprun -f q1_memory q1_memory(file_path)

In [None]:
from q1_memory import q1_memory
memory_usage = memory_profiler.memory_usage((q1_memory, (), {'file_path': file_path}))
x_values = list(range(1, len(memory_usage) + 1))  
# Plot the memory usage over time
plot(x_values, memory_usage, 'Line number' ,'Memory usage (MiB)', 'Memory Usage Over Time')

2. Second Approach:
- Retrieve the top 10 dates.
- Filter users who posted on those days.
- For each day, identify the top user who posted on that day.
Based on the results, this approach proves superior as it exclusively fetches the required dates while disregarding the remainder. 

In [None]:
from q1_memory import q1_memory_v2
%load_ext memory_profiler
%mprun -f q1_memory_v2 q1_memory_v2(file_path)

In [None]:
from q1_memory import q1_memory_v2
memory_usage = memory_profiler.memory_usage((q1_memory_v2, (), {'file_path': file_path}))
x_values = list(range(1, len(memory_usage) + 1))  
# Plot the memory usage over time
plot(x_values, memory_usage, 'Line number' ,'Memory usage (MiB)', 'Memory Usage Over Time')

Exercise 1 q1_time: This time, I've opted to utilize Pandas due to its utilization of NumPy under the hood. The NumPy library, partially written in C, significantly enhances performance compared to native Python, often by orders of magnitude.
The following cell employs cProfile and pstats to execute q1_time and measure its execution time. I've chosen to display only the 15 functions with the longest execution times.

In [None]:
from q1_time import q1_time

with Profile() as profile:
    q1_time(file_path)
    (
     Stats(profile)
     .strip_dirs()
     .sort_stats(SortKey.TIME)
     .print_stats(15)
    )

Exercise 2: 
Memory problem: To solve this exercise I've decided to use to external libraries emoji and regex. The last allows to find grapheme clusters occurrences

In [None]:
from q2_memory import q2_memory
%reload_ext memory_profiler
%mprun -f q2_memory q2_memory(file_path)

In [None]:
from q2_memory import q2_memory
memory_usage = memory_profiler.memory_usage((q2_memory, (), {'file_path': file_path}))
x_values = list(range(1, len(memory_usage) + 1))  
# Plot the memory usage over time
plot(x_values, memory_usage, 'Line number' ,'Memory usage (MiB)', 'Memory Usage Over Time')

Exercise 2: Time problem. 

A first approach that I've tried to use was with Pandas. But this solution is suboptimal compared with the implemented in q2_time similar to q2_memory but loading the complete file instead to use generator

In [3]:
from q2_time import q2_time_v2

with Profile() as profile:
    q2_time_v2(file_path)
    (
     Stats(profile)
     .strip_dirs()
     .sort_stats(SortKey.TIME)
     .print_stats(15)
    )


         19600715 function calls (19599781 primitive calls) in 19.788 seconds

   Ordered by: internal time
   List reduced from 1030 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    4.638    4.638    4.673    4.673 {built-in method pandas._libs.json.ujson_loads}
   117407    4.257    0.000    4.257    0.000 {method 'findall' of '_regex.Pattern' objects}
   117407    2.760    0.000    5.088    0.000 q2_time.py:32(<listcomp>)
 16364431    2.329    0.000    2.329    0.000 core.py:316(is_emoji)
   117407    0.599    0.000    0.603    0.000 regex.py:476(<setcomp>)
        1    0.488    0.488    0.732    0.732 {method 'read' of '_io.TextIOWrapper' objects}
       22    0.396    0.018    0.396    0.018 {method 'split' of 'str' objects}
   117407    0.347    0.000    1.785    0.000 regex.py:449(_compile)
   234830    0.268    0.000    0.624    0.000 enum.py:986(__and__)
        1    0.247    0.247    0.663    0.663 _json.py:97

This solution improves the execution time compared with the use of pandas

In [None]:
from q2_time import q2_time

with Profile() as profile:
    q2_time(file_path)
    (
     Stats(profile)
     .strip_dirs()
     .sort_stats(SortKey.TIME)
     .print_stats(15)
    )