diff --git a/html_tree.png b/html_tree.png new file mode 100644 index 0000000..0cf759e Binary files /dev/null and b/html_tree.png differ diff --git a/nokogiri.markdown b/nokogiri.markdown index 8a2e07b..bf54b3e 100644 --- a/nokogiri.markdown +++ b/nokogiri.markdown @@ -97,6 +97,25 @@ but you should choose the one that is most convenient. ### Data structures +To become data extraction Zen Masters, we first need to understand the data +structure returned by Nokogiri. Nokogiri converts HTML and XML documents in +to a tree data structure. + +For example, an HTML document that looks like this: + + + + Hello! + + +

Hello World!

+ + + +will be represented in memory with a tree that looks like this: + +![HTML Tree](html_tree.png) + ## Data Extraction ### Basic XPath