# Webscraping (iPhone Subreddit)

This notebook follows identical steps to the previous notebook on Webscraping for the Android subreddit. Please refer to that notebook for the necessary explanations and annotations.

In [1]:
import pandas as pd
import requests
import time
import random
from bs4 import BeautifulSoup

In [2]:
url = 'https://www.reddit.com/r/iphone/new.json'
res = requests.get(url, headers={'User-agent': 'Zaini Inc'})

In [3]:
res.status_code

200

In [4]:
reddit_dict = res.json()

In [5]:
print(reddit_dict)

{'kind': 'Listing', 'data': {'modhash': '', 'dist': 25, 'children': [{'kind': 't3', 'data': {'approved_at_utc': None, 'subreddit': 'iphone', 'selftext': 'Welcome to the Daily Tech Support thread for /r/iphone. \n\nHave a question you need answered? Ask away! Please remember to adhere to our rules, which can be found in the sidebar. As usual, if you have a serious issue with the subreddit, please contact [the moderators directly.](https://www.reddit.com/message/compose?to=%2Fr%2Fiphone)\n\nPlease be informed that any questions about bypassing iCloud lock,  or anything similar that may infer that you are trying to get access to a locked iPhone, are no longer allowed and will be removed. Thank you.\n\nCheck our [Tech Support FAQ page](https://www.reddit.com/r/iphone/wiki/support-faq)\n\nJoin our Discord room for support:\n\n[Discord](https://discord.gg/iphone)\n\n**Note: Comments are sorted by /new for your convenience.**\n\nThis is the previous [archive](https://www.reddit.com/r/iphone/s

In [6]:
reddit_dict.keys()

dict_keys(['kind', 'data'])

In [7]:
reddit_dict['data']

{'modhash': '',
 'dist': 25,
 'children': [{'kind': 't3',
   'data': {'approved_at_utc': None,
    'subreddit': 'iphone',
    'selftext': 'Welcome to the Daily Tech Support thread for /r/iphone. \n\nHave a question you need answered? Ask away! Please remember to adhere to our rules, which can be found in the sidebar. As usual, if you have a serious issue with the subreddit, please contact [the moderators directly.](https://www.reddit.com/message/compose?to=%2Fr%2Fiphone)\n\nPlease be informed that any questions about bypassing iCloud lock,  or anything similar that may infer that you are trying to get access to a locked iPhone, are no longer allowed and will be removed. Thank you.\n\nCheck our [Tech Support FAQ page](https://www.reddit.com/r/iphone/wiki/support-faq)\n\nJoin our Discord room for support:\n\n[Discord](https://discord.gg/iphone)\n\n**Note: Comments are sorted by /new for your convenience.**\n\nThis is the previous [archive](https://www.reddit.com/r/iphone/search?q=title%3

In [8]:
reddit_dict['data'].keys()

dict_keys(['modhash', 'dist', 'children', 'after', 'before'])

In [9]:
reddit_dict['data']['children']

[{'kind': 't3',
  'data': {'approved_at_utc': None,
   'subreddit': 'iphone',
   'selftext': 'Welcome to the Daily Tech Support thread for /r/iphone. \n\nHave a question you need answered? Ask away! Please remember to adhere to our rules, which can be found in the sidebar. As usual, if you have a serious issue with the subreddit, please contact [the moderators directly.](https://www.reddit.com/message/compose?to=%2Fr%2Fiphone)\n\nPlease be informed that any questions about bypassing iCloud lock,  or anything similar that may infer that you are trying to get access to a locked iPhone, are no longer allowed and will be removed. Thank you.\n\nCheck our [Tech Support FAQ page](https://www.reddit.com/r/iphone/wiki/support-faq)\n\nJoin our Discord room for support:\n\n[Discord](https://discord.gg/iphone)\n\n**Note: Comments are sorted by /new for your convenience.**\n\nThis is the previous [archive](https://www.reddit.com/r/iphone/search?q=title%3A%22Daily+Tech+Support+Thread%22+author%3A%22

In [10]:
reddit_dict['data']['children'][1]['data']   #reddit_dict['data']['children'] is a list of dicts

{'approved_at_utc': None,
 'subreddit': 'iphone',
 'selftext': 'I have hooked up a MagSafe charger in my car and connected it to a third party USB-C cigarette lighter plug\n\nEven with the engine turned off the cigarette lighter continues to deliver power to the MagSafe puck. \n\nI sometimes go a week or two without using my car. Would it be possible that the passive power draw of the MagSafe puck could run my car engine down? Or is the power draw so minimal that I have nothing g to worry about?',
 'author_fullname': 't2_frdyc',
 'saved': False,
 'mod_reason_title': None,
 'gilded': 0,
 'clicked': False,
 'title': 'How much power does MagSafe draw when there is no iPhone attached? Enough to flatten my car battery?',
 'link_flair_richtext': [],
 'subreddit_name_prefixed': 'r/iphone',
 'hidden': False,
 'pwls': 6,
 'link_flair_css_class': 'grey',
 'downs': 0,
 'thumbnail_height': None,
 'top_awarded_type': None,
 'hide_score': False,
 'name': 't3_k15fbj',
 'quarantine': False,
 'link_fla

In [11]:
posts = [p['data'] for p in reddit_dict['data']['children']]   #creating a list of dicts

In [12]:
initialdf = pd.DataFrame(posts)   #change the list of dicts to a dataframe, based on keys of all the dicts

In [13]:
initialdf.shape

(25, 110)

In [14]:
initialdf.head()

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,...,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,link_flair_template_id,url_overridden_by_dest,crosspost_parent_list,crosspost_parent
0,,iphone,Welcome to the Daily Tech Support thread for /...,t2_6l4z3,False,,0,False,Daily Tech Support Thread - [November 26],[],...,https://www.reddit.com/r/iphone/comments/k17z2...,2799950,1606363000.0,0,,False,,,,
1,,iphone,I have hooked up a MagSafe charger in my car a...,t2_frdyc,False,,0,False,How much power does MagSafe draw when there is...,[],...,https://www.reddit.com/r/iphone/comments/k15fb...,2799950,1606353000.0,0,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,
2,,iphone,"Hi guys, if I was to buy a iphone second hand ...",t2_4r7upir5,False,,0,False,Advice needed,[],...,https://www.reddit.com/r/iphone/comments/k14vb...,2799950,1606351000.0,0,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,
3,,iphone,"Hello everyone, I have the iPhone 11 and until...",t2_1qd43scz,False,,0,False,iPhone 11 iOS14,[],...,https://www.reddit.com/r/iphone/comments/k13vk...,2799950,1606348000.0,0,,False,,,,
4,,iphone,Proud owner of a new iPhone 11 Pro. First tim...,t2_4ikub,False,,0,False,Super dumb question: iPhone 11 Pro owners. Do ...,[],...,https://www.reddit.com/r/iphone/comments/k13xi...,2799950,1606348000.0,0,,False,,,,


In [15]:
columns = initialdf.columns

In [16]:
columns[20]

'link_flair_text_color'

In [17]:
text_cols = []

for column in columns:
    if 'text' in column:
        text_cols.append(column)

In [18]:
text_cols

['selftext',
 'link_flair_richtext',
 'link_flair_text_color',
 'link_flair_text',
 'author_flair_richtext',
 'selftext_html',
 'author_flair_text',
 'author_flair_text_color']

In [19]:
if 'text' in 'self-':
    print('YO MAMA')

In [20]:
initialdf[['author_fullname','selftext']]

Unnamed: 0,author_fullname,selftext
0,t2_6l4z3,Welcome to the Daily Tech Support thread for /...
1,t2_frdyc,I have hooked up a MagSafe charger in my car a...
2,t2_4r7upir5,"Hi guys, if I was to buy a iphone second hand ..."
3,t2_1qd43scz,"Hello everyone, I have the iPhone 11 and until..."
4,t2_4ikub,Proud owner of a new iPhone 11 Pro. First tim...
5,t2_4yrsbgdq,Is there a way to lock the screen so that it s...
6,t2_4urhkkj1,"My current iphone works fine, but I was thinki..."
7,t2_4a5l8jkf,
8,t2_7wdg9wl0,I am really excited with iOS 14's new Back Tap...
9,t2_8dxjwtqt,


In [21]:
url = 'https://www.reddit.com/r/iphone/new.json'

In [23]:
posts = []
after = None

for a in range(50):
    if after == None:
        current_url = url
    else:
        current_url = url + '?after=' + after
    print(current_url)
    res = requests.get(current_url, headers={'User-agent': 'Zaini Inc 1.0'})
    
    if res.status_code != 200:
        print('Status error', res.status_code)
        break
    
    current_dict = res.json()
    current_posts = [p['data'] for p in current_dict['data']['children']]
    posts.extend(current_posts)
    after = current_dict['data']['after']
    
    # generate a random sleep duration to look more 'natural'
    sleep_duration = random.randint(6,10)
    print(sleep_duration)
    time.sleep(sleep_duration)

https://www.reddit.com/r/iphone/new.json
7
https://www.reddit.com/r/iphone/new.json?after=t3_jznwal
7
https://www.reddit.com/r/iphone/new.json?after=t3_jyggjr
9
https://www.reddit.com/r/iphone/new.json?after=t3_jxt25m
8
https://www.reddit.com/r/iphone/new.json?after=t3_jwt1qq
7
https://www.reddit.com/r/iphone/new.json?after=t3_jvj50u
6
https://www.reddit.com/r/iphone/new.json?after=t3_juc02p
6
https://www.reddit.com/r/iphone/new.json?after=t3_jtzgtn
6
https://www.reddit.com/r/iphone/new.json?after=t3_jsl1w0
9
https://www.reddit.com/r/iphone/new.json?after=t3_jrvucx
7
https://www.reddit.com/r/iphone/new.json?after=t3_jr1rl2
10
https://www.reddit.com/r/iphone/new.json?after=t3_jqm32y
6
https://www.reddit.com/r/iphone/new.json?after=t3_jpzyfo
6
https://www.reddit.com/r/iphone/new.json?after=t3_jpegro
6
https://www.reddit.com/r/iphone/new.json?after=t3_jomyna
9
https://www.reddit.com/r/iphone/new.json?after=t3_jnu5p1
7
https://www.reddit.com/r/iphone/new.json?after=t3_jmyd9l
7
https://www.

In [46]:
len(posts)

1243

In [68]:
df = pd.DataFrame(posts)

In [69]:
# df.to_csv('datasets/iPhone-posts.csv')

In [70]:
df[['author_fullname','title','selftext', 'subreddit']]

Unnamed: 0,author_fullname,title,selftext,subreddit
0,t2_6l4z3,Daily Tech Support Thread - [November 26],Welcome to the Daily Tech Support thread for /...,iphone
1,t2_frdyc,How much power does MagSafe draw when there is...,I have hooked up a MagSafe charger in my car a...,iphone
2,t2_4r7upir5,Advice needed,"Hi guys, if I was to buy a iphone second hand ...",iphone
3,t2_1qd43scz,iPhone 11 iOS14,"Hello everyone, I have the iPhone 11 and until...",iphone
4,t2_4ikub,Super dumb question: iPhone 11 Pro owners. Do ...,Proud owner of a new iPhone 11 Pro. First tim...,iphone
...,...,...,...,...
1238,t2_14zgqe,Will a regular magnetic car vent mount work wi...,,iphone
1239,t2_jaz2j,Has anyone compared the night mode in the 11 p...,Just wondering!,iphone
1240,t2_n88ch,Question about the Magsafe wallet,I live in a country that doesn’t support Apple...,iphone
1241,t2_cw7n8,3D Printed the MagSafe charger flush mount ada...,,iphone


In [51]:
df['selftext'][1241]

''

In [38]:
df['selftext'][2]

'Hi guys, if I was to buy a iphone second hand but brand new in sealed condition would the person selling me the phone be able to claim insurance and block the phone that was sold to me, essentially leaving me with a useless phone? Please help.'

In [71]:
df['selftext'].replace(r'^\s*$', 'NA', regex=True, inplace = True)

In [72]:
df['title/text'] = df['selftext'] + df['title']

In [73]:
df[['author_fullname','title','selftext', 'title/text', 'subreddit']]

Unnamed: 0,author_fullname,title,selftext,title/text,subreddit
0,t2_6l4z3,Daily Tech Support Thread - [November 26],Welcome to the Daily Tech Support thread for /...,Welcome to the Daily Tech Support thread for /...,iphone
1,t2_frdyc,How much power does MagSafe draw when there is...,I have hooked up a MagSafe charger in my car a...,I have hooked up a MagSafe charger in my car a...,iphone
2,t2_4r7upir5,Advice needed,"Hi guys, if I was to buy a iphone second hand ...","Hi guys, if I was to buy a iphone second hand ...",iphone
3,t2_1qd43scz,iPhone 11 iOS14,"Hello everyone, I have the iPhone 11 and until...","Hello everyone, I have the iPhone 11 and until...",iphone
4,t2_4ikub,Super dumb question: iPhone 11 Pro owners. Do ...,Proud owner of a new iPhone 11 Pro. First tim...,Proud owner of a new iPhone 11 Pro. First tim...,iphone
...,...,...,...,...,...
1238,t2_14zgqe,Will a regular magnetic car vent mount work wi...,,NAWill a regular magnetic car vent mount work ...,iphone
1239,t2_jaz2j,Has anyone compared the night mode in the 11 p...,Just wondering!,Just wondering!Has anyone compared the night m...,iphone
1240,t2_n88ch,Question about the Magsafe wallet,I live in a country that doesn’t support Apple...,I live in a country that doesn’t support Apple...,iphone
1241,t2_cw7n8,3D Printed the MagSafe charger flush mount ada...,,NA3D Printed the MagSafe charger flush mount a...,iphone


In [88]:
df_filtered = df.drop_duplicates(subset = ['title/text'], keep='first')

In [89]:
df_filtered.shape

(943, 114)

In [90]:
df_filtered[['author_fullname','title',
             'selftext', 'title/text', 'subreddit']].to_csv('iPhone-filtered-posts.csv')

In [91]:
df_filtered['title'] = df_filtered['title'].str.lower()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['title'] = df_filtered['title'].str.lower()


In [93]:
df_filtered['title'][:5]

0            daily tech support thread - [november 26]
1    how much power does magsafe draw when there is...
2                                        advice needed
3                                      iphone 11 ios14
4    super dumb question: iphone 11 pro owners. do ...
Name: title, dtype: object

In [94]:
df_filtered['title'].str.contains('daily tech support thread').value_counts()

False    907
True      36
Name: title, dtype: int64

In [95]:
cond_support_thread = df_filtered['title'].str.contains('daily tech support thread')

In [96]:
df_filtered = df_filtered[-cond_support_thread]

In [97]:
df_filtered

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,...,media,is_video,link_flair_template_id,url_overridden_by_dest,crosspost_parent_list,crosspost_parent,media_metadata,author_cakeday,poll_data,title/text
1,,iphone,I have hooked up a MagSafe charger in my car a...,t2_frdyc,False,,0,False,how much power does magsafe draw when there is...,[],...,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,,,,,I have hooked up a MagSafe charger in my car a...
2,,iphone,"Hi guys, if I was to buy a iphone second hand ...",t2_4r7upir5,False,,0,False,advice needed,[],...,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,,,,,"Hi guys, if I was to buy a iphone second hand ..."
3,,iphone,"Hello everyone, I have the iPhone 11 and until...",t2_1qd43scz,False,,0,False,iphone 11 ios14,[],...,,False,,,,,,,,"Hello everyone, I have the iPhone 11 and until..."
4,,iphone,Proud owner of a new iPhone 11 Pro. First tim...,t2_4ikub,False,,0,False,super dumb question: iphone 11 pro owners. do ...,[],...,,False,,,,,,,,Proud owner of a new iPhone 11 Pro. First tim...
5,,iphone,Is there a way to lock the screen so that it s...,t2_4yrsbgdq,False,,0,False,is there a setting on the iphone that locks th...,[],...,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,,,,,Is there a way to lock the screen so that it s...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
938,,iphone,I saw one review where they did a teardown of ...,t2_12f4gk,False,,0,False,any info yet on pwm flicker frequency for ipho...,[],...,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,,,,,I saw one review where they did a teardown of ...
939,,iphone,I don't know if many people remember but when ...,t2_ai5j3,False,,0,False,"with the iphone 12's design, will we see the r...",[],...,,False,,,,,,,,I don't know if many people remember but when ...
940,,iphone,I already preordered online iPhone 12pro thru ...,t2_4lz8iti4,False,,0,False,what is the point of preorder when they will h...,[],...,,False,5664e798-6985-11e8-b750-0eaf69e27a44,,,,,,,I already preordered online iPhone 12pro thru ...
941,,iphone,"If this isn't allowed, please delete.\n\nI bou...",t2_15ufds,False,,0,False,iphone 12 generic case warning,[],...,,False,,,,,"{'kd89jdfekhu51': {'status': 'valid', 'e': 'Im...",,,"If this isn't allowed, please delete.\n\nI bou..."
