# Exploring Hacker News Posts: Which Hacker News post types and posting times generate the most interaction?


[Hacker News](https://news.ycombinator.com/) is a community-driven tech forum. For this project, I analyzed 20,000 submissions from 2015–2017 to see whether Ask HN or Show HN posts spark more discussion, and to discover the best time of day to publish a question.


## Dataset


The dataset comes from Kaggle and is titled [Hacker News Posts](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts).


The dataset used in this project has been reduced from almost 300,000 rows to approximately 20,000 rows due to the removal of all submissions that didn't receive any comments and then randomly sampling from the remaining submissions.


Below are descriptions of the columns:

-   `id`: the unique identifier from Hacker News for the post
-   `title`: the title of the post
-   `url`: the URL that the posts links to, if the post has a URL
-   `num_points`: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
-   `num_comments`: the number of comments on the post
-   `author`: the username of the person who submitted the post
-   `created_at`: the date and time of the post's submission (the time zone is Eastern Time in the US)


## Data Analysis


On Hacker News, to submit a post with a specific question, users add `Ask HN` to their post titles.

Likewise, users add `Show HN` to their post titles to show the Hacker News community a project, product, or something interesting.


For this project, I will be comparing `Ask HN` and `Show HN` posts to determine:

1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?


## Setup


The code block below imports all the necessary functions, classes and modules for this analysis:


In [None]:
# Import Path class from pathlib
from pathlib import Path

# Import the reader function from the csv module
from csv import reader

# Import the pprint function from pprint module
from pprint import pp

# Import the datetime module
import datetime as dt

### Load the Hacker News Dataset


Using the `reader()` function, open the `hack_news.csv` file:


In [None]:
BASE_DIR = Path.cwd().parent  # Repo root
DATA_DIR = BASE_DIR / "data"  # Data directory
CSV_PATH = DATA_DIR / "hacker_news.csv"  # Path to csv file

# Open the csv file and save the column headers and data to variables
with CSV_PATH.open() as opened_file:
    read_file = reader(opened_file)
    hn = list(read_file)
    hn_header = hn[0]
    hn = hn[1:]

The following prints the column header of the dataset, along with the first five rows:


In [None]:
# Print column headers and first five rows of the dataset
print(hn_header)
pp(hn[:5])

### Separate “Ask HN” and “Show HN” Submissions


All the `Ask HN` and `Show HN` posts will need to be extracted from the dataset in order to analysis them.

The code below iterates through each row of the data, looking for posts that start with `Ask HN` or `Show HN`, and placing those posts into their respective lists:


In [None]:
ask_posts = [
    row for row in hn if row[1].lower().startswith("ask hn")
]  # Contains all 'Ask HN' posts
show_posts = [
    row for row in hn if row[1].lower().startswith("show hn")
]  # Contains all 'Show HN' posts

The `ask_posts` list should now contain all of the posts starting with `Ask HN`. Here are the first five entries of the list:


In [None]:
pp(ask_posts[:5])

The same should be true for the `show_posts` list - it should only contain posts starting with `Show HN`. Below are the first five entries of the list:


In [None]:
pp(show_posts[:5])

We can now see how many threads are `Ask HN` posts or `Show HN` posts:


In [None]:
print(f"Ask HN Posts: {len(ask_posts)}")
print(f"Show HN Posts: {len(show_posts)}")

The above output shows there are 1,744 `Ask HN` posts and 1,162 `Show HN` posts in the dataset.


### Compare Average Comment Counts (Ask HN vs Show HN)


The next goal is to determine if `Ask HN` posts or `Show HN` posts receive more comments on average.

The code block below calculates the average number of comments for both `Ask HN` and `Show HN` posts, then prints the results:


In [None]:
total_ask_comments = 0  # This will store the number of comments in Ask HN posts
total_show_comments = 0  # This will store the number of comments in Show HN posts

# Iterate through each row in ask_posts
for row in ask_posts:
    # Convert the number of comments from a string value into an integer, and save in a variable
    num_comments = int(row[4])
    # Add num_comments to total_ask_comments
    total_ask_comments += num_comments

# Iterate through each row in show_posts
for row in show_posts:
    # Convert the number of comments from a string value into an integer, and save in a variable
    num_comments = int(row[4])
    # Add num_comments to total_show_comments
    total_show_comments += num_comments

# Calculate the average number of comments for Ask HN and Show HN posts
avg_ask_comments = total_ask_comments / len(ask_posts)
avg_show_comments = total_show_comments / len(show_posts)
print(f"Ask HN (n={len(ask_posts)} averaged {avg_ask_comments:.2f})")
print(f"Show HN (n={len(show_posts)} averaged {avg_show_comments:.2f})")
print(f"Difference: ≈{(avg_ask_comments - avg_show_comments):.2f} comments")

The analysis shows that `Ask HN` threads average 14.04 comments per post, while `Show HN` threads average 10.32. That gap is expected: questions naturally invite advice and back-and-forth discussion, whereas project showcases usually prompt shorter reactions. An Ask post often grows into a layered conversation where commenters offer suggestions, the original poster returns with updates or follow up questions, and participants debate ideas among themselves, quickly boosting the comment count.


Since `Ask HN` posts are more likely to receive comments, the remaining analysis will focus on these posts.


### Find the best posting hours for Ask HN Posts


This next part of the analysis will seek to determine if `Ask HN` posts created at a certain time are more likely to attract comments.


The code block below takes the hour each post in `Ask HN` posts was created and adds it to two dictionaries: one counts the number of times a post was created during the specified hour, and another to tally the number of comments those posts created at each hour received:


In [None]:
counts_by_hour = {}  # This will store the number of ask posts created during each hour of the day
comments_by_hour = {}  # This will store the number of comments ask posts created at each hour received
result_list = [
    [dt.datetime.strptime(row[6], "%m/%d/%Y %H:%M"), int(row[4])] for row in ask_posts
]  # Contains the date, time, and number of comments for each post

# Loop through each row of result_list
for row in result_list:
    # Extract the hour from the created_at date
    hour = dt.datetime.strftime(row[0], "%H")
    # If hour is not a key in counts_by_hour
    if hour not in counts_by_hour:
        # Add hour as a key and set the value to 1
        counts_by_hour[hour] = 1
        # Add hour as a key and set the number of comments as the value
        comments_by_hour[hour] = row[1]
    # If hour is already a key in counts_by_hour
    else:
        # Increment the value in counts_by_hour by 1
        counts_by_hour[hour] += 1
        # Increment the value in comments_by_hour by the number of comments
        comments_by_hour[hour] += row[1]

# Print dictionaries
print("Counts by Hour:", counts_by_hour, "\n")
print("Comments by Hour:", comments_by_hour)

With the `counts_by_hour` and `comments_by_hour` dictionaries, the average number of comments for `Ask HN` posts by hour can be calculated:


In [None]:
avg_by_hour = [
    [hour, round(comments_by_hour[hour] / counts_by_hour[hour], 2)]
    for hour in counts_by_hour
]  # This will store the final results

pp(avg_by_hour)

The code block below will sort the `avg_by_hour` list. From this sorted list, the top five hours for `Ask HN` posts comments can be printed:


In [None]:
swap_avg_by_hour = [
    [row[1], row[0]] for row in avg_by_hour
]  # Contains the values from avg_by_hour but swapped for sorting

# Sort swap_avg_by_hour by highest average comments per post
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

# Print results for the top five hours
print("Top 5 Hours for Ask Posts Comments")
for avg, hour in sorted_swap[:5]:
    print(
        f"{dt.datetime.strptime(hour, '%H').strftime('%H:%M')}: {avg:.2f} average comments per post"
    )

Based on the above results, one could assume that creating an `Ask HN` post on Hacker News at 15:00 (3 p.m.) Eastern Time should have a higher chance of receiving comments.


## Conclusion


This project set out to discover which kinds of Hacker News posts spark the most discussion and when authors should post to maximize engagement.


`Ask HN` vs `Show HN`: Questions (“Ask HN”) clearly generate more conversation than showcases (“Show HN”), averaging roughly `14` comments per post compared with about `10`. The interactive nature of a question (community members giving advice, the poster replying with updates, and responders debating among themselves) drives the higher totals.


Timing matters: Within `Ask HN` posts, comment activity is not evenly distributed across the day. The analysis showed a pronounced peak in the early afternoon Eastern Time (around 3 p.m., with several neighboring hours also performing well). Publishing an `Ask HN` thread during this window increases the likelihood of a lively discussion.


### Takeaway


If your goal is to gather feedback or start a conversation on Hacker News, frame your submission as an Ask HN question and schedule it for the early-afternoon traffic surge. Doing so significantly boosts the odds that the HN community will notice and comment on your post.
