Berkshire Hathaway 13F: https://www.sec.gov/Archives/edgar/data/1067983/000095012322012275/18337.xml

In [209]:
# import libraries
import pandas as pd
import xml.etree.ElementTree as et

In [207]:
file_name = 'xml_files/BRK_Q322.xml'
tree = et.parse(file_name)
root = tree.getroot()

In [203]:
ns = {'info': 'http://www.sec.gov/edgar/document/thirteenf/informationtable'}

full_list = []

for stocks in root.findall('info:infoTable', ns):
    issuer = stocks.find('info:nameOfIssuer', ns).text
    sec_type = stocks.find('info:titleOfClass', ns).text
    cusip = stocks.find('info:cusip', ns).text
    val = int(stocks.find('info:value', ns).text) * 1000
    # run subquery on stocks with different namespace
    shares = stocks.find('info:shrsOrPrnAmt',ns)
    amt = int(shares[0].text)
    full_list.append([issuer,sec_type,cusip,amt,val])

In [205]:
df = pd.DataFrame(full_list,columns=['issuer','security_type','cusip','amount','value'])

In [206]:
df

Unnamed: 0,issuer,security_type,cusip,amount,value
0,ACTIVISION BLIZZARD INC,COM,00507V109,25645116,1906458000
1,ACTIVISION BLIZZARD INC,COM,00507V109,1144672,85095000
2,ACTIVISION BLIZZARD INC,COM,00507V109,33352078,2479393000
3,ALLY FINL INC,COM,02005N100,13719675,381819000
4,ALLY FINL INC,COM,02005N100,2803875,78032000
...,...,...,...,...,...
174,NU HLDGS LTD,ORD SHS CL A,G6693N103,107118784,471323000
175,STONECO LTD,COM CL A,G85158106,10695448,101928000
176,LIBERTY LATIN AMERICA LTD,COM CL A,G9001E102,1005607,6225000
177,LIBERTY LATIN AMERICA LTD,COM CL A,G9001E102,1625185,10060000


# This part of the docs was helpful in refactoring my XML parsing code
## Parsing XML with Namespaces  

https://docs.python.org/3/library/xml.etree.elementtree.html  

If the XML input has namespaces, tags and attributes with prefixes in the form prefix:sometag get expanded to {uri}sometag where the prefix is replaced by the full URI. Also, if there is a default namespace, that full URI gets prepended to all of the non-prefixed tags.  

Here is an XML example that incorporates two namespaces, one with the prefix “fictional” and the other serving as the default namespace:  

<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
        xmlns="http://people.example.com">
    <actor>
        <name>John Cleese</name>
        <fictional:character>Lancelot</fictional:character>
        <fictional:character>Archie Leach</fictional:character>
    </actor>
    <actor>
        <name>Eric Idle</name>
        <fictional:character>Sir Robin</fictional:character>
        <fictional:character>Gunther</fictional:character>
        <fictional:character>Commander Clement</fictional:character>
    </actor>
</actors>  

One way to search and explore this XML example is to manually add the URI to every tag or attribute in the xpath of a find() or findall():  

root = fromstring(xml_text)
for actor in
root.findall('{http://people.example.com}actor'):
    name = actor.find('{http://people.example.com}name')
    print(name.text)
    for char in actor.findall('{http://characters.example.com}character'):
        print(' |-->', char.text)  
        
A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions:  

ns = {'real_person': 'http://people.example.com',
      'role': 'http://characters.example.com'}

for actor in root.findall('real_person:actor', ns):
    name = actor.find('real_person:name', ns)
    print(name.text)
    for char in actor.findall('role:character', ns):
        print(' |-->', char.text)