Skip to content

Commit

Permalink
adding string stripping for p- properties
Browse files Browse the repository at this point in the history
I'm not really happy with this code: this sort of thing should be
abstracted by the HTML/XML library.

Kartik has suggested we switch to BeautifulSoup. it was a bit hacky last
time I looked, but if it reduces this kind of thing, I'd be in favour.
  • Loading branch information
tommorris committed Feb 21, 2014
1 parent 7b8de5a commit f31e3d9
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 1 deletion.
5 changes: 4 additions & 1 deletion mf2py/parser.py
Expand Up @@ -139,7 +139,10 @@ def parse_props(el, is_root_element=False):
# TODO: parse for value-class here
prop_name = prop[2:]
prop_value = props.get(prop_name, [])
prop_value.append(el.firstChild.nodeValue)
# TODO: this is a goddamn horror show right here
text_value = " ".join(t.nodeValue for t in el.childNodes if t.nodeType == t.TEXT_NODE)
text_value = text_value.strip()
prop_value.append(text_value)

if prop_value is not []:
props[prop_name] = prop_value
Expand Down
11 changes: 11 additions & 0 deletions test/examples/string_stripping.html
@@ -0,0 +1,11 @@
<!DOCTYPE html>
<html>
<head>
<title>String Stripping example</title>
</head>
<body>
<div class="h-card">
<span class="p-name"> Tom Morris </span>
</div>
</body>
</html>
5 changes: 5 additions & 0 deletions test/test_parser.py
Expand Up @@ -115,6 +115,11 @@ def test_backcompat():
result = parse_fixture("backcompat.html")
assert set(result["items"][0]["type"]) == set(["h-card"])

def test_string_strip():
result = parse_fixture("string_stripping.html")
print result
assert result["items"][0]["properties"]["name"][0] == "Tom Morris"

if __name__ == '__main__':
result = parse_fixture("nested_multiple_classnames.html")
pprint(result)

0 comments on commit f31e3d9

Please sign in to comment.