In [1]:
import xml.etree.ElementTree as ET
import pandas as pd
import re

# read in the data
platforms_xml = ET.parse('Platforms.xml')
root_xml = platforms_xml.getroot()

# Create Empty list to store attributes
platform_attrs = ["Name", "Emulated", "ReleaseDate", "Developer",
	 "Manufacturer", "Cpu", "Memory", "Graphics", "Sound",
	 "Display", "Media", "MaxControllers", "Notes", "Category"]
                  
rows = []

Extract the data from the Platforms.xml file.

For each platform tag, iterate through all the tags and store the results in the rows list

In [2]:
for platform in root_xml:
	data = []
	# Check if any of them are null and if so, set the value to 'None'
	for field in platform_attrs:
		if platform is not None and platform.find(field) is not None:
			data.append(platform.find(field).text) # add the data
		else: # it is a empty value set to None
			data.append(None)
	# append the data/observation to the rows list
	rows.append({platform_attrs[i]: data[i]
			  for i in range(0,len(platform_attrs))})

Convert the list of data to a DataFrame

In [3]:
platforms = pd.DataFrame(rows, columns = platform_attrs)

List the first 20 consoles in the dataset

In [4]:
platforms.head(n=20)

Unnamed: 0,Name,Emulated,ReleaseDate,Developer,Manufacturer,Cpu,Memory,Graphics,Sound,Display,Media,MaxControllers,Notes,Category
0,3DO Interactive Multiplayer,True,1993-10-04T00:00:00-07:00,The 3DO Company,"Panasonic Corporation, Sanyo Electric Co., Ltd...",32-bit RISC ARM60 @ 12.5 MHz,2 MB main RAM and 1 MB video RAM,2x Yamaha V9990 (CLIO and MADAM) @ 25 MHz,16-bit stereo,320x240 and 640x480,CD-ROM,1 (up to 8 if daisy-chained),The 3DO Interactive Multiplayer (often called ...,Consoles
1,Commodore Amiga,True,1985-07-23T00:00:00-07:00,Commodore,Commodore,Motorola 680x0 (68000-68060),512kb,OCS / ECS / AGA chipset,4-channel,320 x 200 / 640 x 400 / NTSC or 320 x 256 / 64...,"3.5"" disk",1,The Amiga is a family of personal computers ma...,Computers
2,Amstrad CPC,True,1984-01-01T00:00:00-08:00,Amstrad,Amstrad,Zilog Z80A,"64 or 128 KB, expandable to 576 KB",CRTC Motorola 6845,AY-3-8912,"160x200, 320x200, 640x200","Cassette tape, 3 inch Hitachi/Panasonic Floppy...",,"The Amstrad Colour Personal Computer, better k...",Computers
3,Android,True,2008-09-23T00:00:00-07:00,Google,Various,ARM architecture (ARMv7 and ARMv8-A architectu...,,,,Touchscreen,,,Android is a mobile operating system (OS) base...,Mobile
4,Arcade,True,,,,,,,,,,,An arcade game or coin-op is a coin-operated e...,Arcade
5,Atari 2600,True,1977-09-11T00:00:00-07:00,"Atari, Inc.","Atari, Inc.",8-bit MOS Technology 6507 @ 1.19 MHz,128 bytes,Television Interface Adaptor (TIA),2 channel handled by the TIA,160x192,Cartridge,2,"The Atari Video Computer System (VCS), later n...",Consoles
6,Atari 5200,True,1982-11-01T00:00:00-08:00,"Atari, Inc.","Atari, Inc.",8-bit MOS Technology 6502C @ 1.79 MHz,16 KB,Atari Alphanumeric Television Interface Contro...,"Atari Pot Keyboard Integrated Circuit (POKEY),...",320x192,Cartridge,2 or 4 (depending on model),"The Atari 5200 SuperSystem, commonly known as ...",Consoles
7,Atari 7800,True,1986-05-01T00:00:00-07:00,Atari Corporation,Atari Corporation,"8-bit Atari SALLY 6502 (""6502C"") @ 1.79 MHz",4 KB,MARIA custom graphics controller @ 7.16 MHz,"Television Interface Adaptor (TIA), 2 channels",320x192,Cartridge,2,"The Atari 7800 Pro System, commonly known as t...",Consoles
8,Atari Jaguar,True,1993-11-23T00:00:00-08:00,Atari Corporation,IBM,Motorola 68000 @ 13.29 MHz,2 MB DRAM,"Atari custom ""Tom"" chip @ 26.59 MHz, GPU 32-bi...","Atari custom ""Jerry"" chip @ 26.59 MHz, 16-bit ...","320x224p, 360x224p, 640x224p, 720x224p, 320x44...",ROM cartridge,2,The Atari Jaguar is a fifth generation (1993–2...,Consoles
9,Atari Jaguar CD,True,1995-09-21T00:00:00-07:00,Atari Corporation,Atari Corporation,,,,,,CD-ROM,,The Atari Jaguar CD is a fifth generation (199...,Consoles


List the last 20

In [5]:
platforms.tail(n=20)

Unnamed: 0,Name,Emulated,ReleaseDate,Developer,Manufacturer,Cpu,Memory,Graphics,Sound,Display,Media,MaxControllers,Notes,Category
587,SNK Neo Geo CD,,,,,,,,,,,,,
588,SNK Neo Geo CD,,,,,,,,,,,,,
589,SNK Neo Geo CD,,,,,,,,,,,,,
590,SNK Neo Geo CD,,,,,,,,,,,,,
591,Nintendo Satellaview,,,,,,,,,,,,,
592,Taito Type X,,,,,,,,,,,,,
593,Taito Type X,,,,,,,,,,,,,
594,Mattel HyperScan,,,,,,,,,,,,,
595,Sega CD 32X,,,,,,,,,,,,,
596,Sega CD 32X,,,,,,,,,,,,,


Looking at the Platforms.xml file, we can see that there were other attributes added as well.

The last few rows also seem to indicate this.

There is a "PlatformAlternative" tag that just holds alternative names for each console.
    
As such, the majority of the dataset is sparse and we can remove those rows as they'll have empty attributes besides the name.

Here is a screenshot of the file:

![Screenshot%20from%202020-11-12%2009-12-01.png](attachment:Screenshot%20from%202020-11-12%2009-12-01.png)

Since the last platform tag contains the name "Linux", we can just search for the index that has the "Linux" name.

Then, take the location where that name is and just subset the DataFrame.

In [6]:
linux_platform = platforms.index[platforms["Name"] == "Linux"]

linux_platform_index = linux_platform.tolist()[0] # first occurence
platforms = platforms[0:linux_platform_index + 1]

The "ReleaseDate" column has both the date and UTC time.

For this analysis, the date only will do. The UTC time will be removed.

In [7]:
# Replace the UTC time starting with T and afterwards with the empty string
dates = [re.sub("T.*", "", date) if date is not None else None for date in platforms["ReleaseDate"]]
platforms["ReleaseDate"] = dates

The UTC has been removed let's see the date now (the XML has some systems that have no release date)

In [8]:
platforms["ReleaseDate"]

0      1993-10-04
1      1985-07-23
2      1984-01-01
3      2008-09-23
4            None
          ...    
180          None
181    2006-10-23
182    2005-10-01
183          None
184    1991-09-17
Name: ReleaseDate, Length: 185, dtype: object

In [9]:
# Write to a CSV file
platforms.to_csv("platforms.csv",index=False)

Let's do some queries:

        How many nintendo consoles have been made and when were they released?
        What are the specs of each Sony system?
        Which consoles were using floppy disks and when did they start fading out?

In [10]:
nintendo_console_query = platforms[platforms["Name"].str.contains("Nintendo", case=False)]
nintendo_console_query[["Name", "ReleaseDate"]].sort_values("ReleaseDate")

Unnamed: 0,Name,ReleaseDate
140,Nintendo Game & Watch,1980-04-28
26,Nintendo Entertainment System,1985-10-18
135,Nintendo Famicom Disk System,1986-02-21
27,Nintendo Game Boy,1989-04-21
50,Super Nintendo Entertainment System,1990-11-21
142,Nintendo Satellaview,1995-04-23
31,Nintendo Virtual Boy,1995-07-21
24,Nintendo 64,1996-06-23
29,Nintendo Game Boy Color,1998-10-21
162,Nintendo 64DD,1999-12-01


In [11]:
playstation_console_query = platforms[platforms["Name"].str.contains("Sony", case=False)]
playstation_console_query[["Name","Graphics", "Memory", "Cpu", "Display"]]

Unnamed: 0,Name,Graphics,Memory,Cpu,Display
44,Sony Playstation,32-bit Sony GPU,"2 MB RAM, 1 MB video RAM",MIPS R3000A compatible 32-bit RISC CPU,256x224 to 640x480
45,Sony Playstation 2,Graphics Synthesizer GPU,32 MG RDRAM and 4 MB video DRAM,Emotion Engine,256x224 up to 1280x1024
46,Sony Playstation 3,"550 MHz NVIDIA/SCEI RSX ""Reality Synthesizer""",256 MB system and 256 MB video,3.2 GHz Cell Broadband Engine with 1 PPE & 7 SPEs,"Composite, S-Video, RGB, Component, D-Terminal..."
47,Sony Playstation 4,Semi-custom AMD GCN Radeon (integrated into APU),"8 GB GDDR5 (unified), 256 MB DDR3 RAM (for bac...",Semi-custom 8-core AMD x86-64 Jaguar CPU (inte...,HDMI up to 4K
48,Sony Playstation Vita,4 core SGX543MP4+,"512 MB RAM, 128 MB VRAM",4 core ARM Cortex-A9 MPCore,960x544
49,Sony PSP,Sony CXD187,32 MB,Sony CXD2962GG (based on MIPS R4000 core),480x272
169,Sony PSP Minis,,,,
170,Sony PocketStation,,2 KB,ARM7T,32×32 dot monochrome LCD


In [12]:
floppy_disks_query = platforms[platforms["Media"].str.contains("Floppy", case=False, na=False)]
floppy_disks_query[["Name", "ReleaseDate", "Media"]].sort_values("ReleaseDate")

Unnamed: 0,Name,ReleaseDate,Media
95,Apple II,1977-06-01,"Audio Cassette, 5.25"" Floppy"
110,Tandy TRS-80,1977-08-03,"Cassette tape, Floppy disk"
151,Commodore PET,1977-10-01,"Cassette Tape, 5.25"" Floppy, 8"" Floppy"
93,Acorn Atom,1980-01-01,"100kb 5 1/4 inch Floppy Disk, Cassette Tapes"
56,BBC Microcomputer System,1981-12-01,"Cassette tape, Floppy disc, Winchester HD"
13,Commodore 64,1982-01-01,"Cartridge, Tape, Floppy optional"
69,EACA EG2000 Colour Genie,1982-08-01,"Tape, ROM cartridge, Floppy disc"
161,NEC PC-9801,1982-10-01,"8"" floppy diskette"
154,Fujitsu FM-7,1982-11-01,"Cassette Tape, 5.25"" Floppy"
58,Camputers Lynx,1983-01-01,"Cassette tape, Floppy disc (optional)"
