finish up markdown formatting for great victory

whymirror · May 17, 2010 · c078acc · c078acc
1 parent d8f22aa
commit c078acc
Showing 1 changed file with 84 additions and 83 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-= Hpricot, Read Any HTML
+# Hpricot, Read Any HTML
 
 Hpricot is a fast, flexible HTML parser written in C.  It's designed to be very
 accommodating (like Tanaka Akira's HTree) and to have a very helpful library
@@ -13,21 +13,21 @@ thing.
 *Please read this entire document* before making assumptions about how this
 software works.
 
-== An Overview
+## An Overview
 
 Let's clear up what Hpricot is.
 
-# Hpricot is *a standalone library*.  It requires no other libraries.  Just Ruby!
-# While priding itself on speed, Hpricot *works hard to sort out bad HTML* and
+* Hpricot is *a standalone library*.  It requires no other libraries.  Just Ruby!
+* While priding itself on speed, Hpricot *works hard to sort out bad HTML* and
   pays a small penalty in order to get that right.  So that's slightly more important
   to me than speed.
-# *If you can see it in Firefox, then Hpricot should parse it.*  That's
+* *If you can see it in Firefox, then Hpricot should parse it.*  That's
   how it should be!  Let me know the minute it's otherwise.
-# Primarily, Hpricot is used for reading HTML and tries to sort out troubled
+* Primarily, Hpricot is used for reading HTML and tries to sort out troubled
   HTML by having some idea of what good HTML is.  Some people still like to use
   Hpricot for XML reading, but *remember to use the Hpricot::XML() method* for that!
 
-== The Hpricot Kingdom
+## The Hpricot Kingdom
 
 First, here are all the links you need to know:
 
@@ -43,184 +43,185 @@ not going to say "Use at your own risk" because I don't want this library to be
 risky.  If you trip on something, I'll share the liability by repairing things
 as quickly as I can.  Your responsibility is to report the inadequacies.
 
-== Installing Hpricot
+## Installing Hpricot
 
 You may get the latest stable version from Rubyforge. Win32 binaries,
 Java binaries (for JRuby), and source gems are available.
 
-  $ gem install hpricot
+    $ gem install hpricot
 
-== An Hpricot Showcase
+## An Hpricot Showcase
 
 We're going to run through a big pile of examples to get you jump-started.
 Many of these examples are also found at
 http://wiki.github.com/hpricot/hpricot/hpricot-basics, in case you
 want to add some of your own.
 
-=== Loading Hpricot Itself
+### Loading Hpricot Itself
 
 You have probably got the gem, right?  To load Hpricot:
 
- require 'rubygems'
- require 'hpricot'
+    require 'rubygems'
+    require 'hpricot'
 
 If you've installed the plain source distribution, go ahead and just:
 
- require 'hpricot'
+    require 'hpricot'
 
-=== Load an HTML Page
+### Load an HTML Page
 
 The <tt>Hpricot()</tt> method takes a string or any IO object and loads the
 contents into a document object.
 
- doc = Hpricot("<p>A simple <b>test</b> string.</p>")
+    doc = Hpricot("<p>A simple <b>test</b> string.</p>")
 
 To load from a file, just get the stream open:
 
- doc = open("index.html") { |f| Hpricot(f) }
+    doc = open("index.html") { |f| Hpricot(f) }
 
 To load from a web URL, use <tt>open-uri</tt>, which comes with Ruby:
 
- require 'open-uri'
- doc = open("http://qwantz.com/") { |f| Hpricot(f) }
+    require 'open-uri'
+    doc = open("http://qwantz.com/") { |f| Hpricot(f) }
 
 Hpricot uses an internal buffer to parse the file, so the IO will stream
 properly and large documents won't be loaded into memory all at once.  However,
 the parsed document object will be present in memory, in its entirety.
 
-=== Search for Elements
+### Search for Elements
 
 Use <tt>Doc.search</tt>:
 
- doc.search("//p[@class='posted']")
- #=> #<Hpricot:Elements[{p ...}, {p ...}]>
+    doc.search("//p[@class='posted']")
+    #=> #<Hpricot:Elements[{p ...}, {p ...}]>
 
 <tt>Doc.search</tt> can take an XPath or CSS expression.  In the above example,
 all paragraph <tt><p></tt> elements are grabbed which have a <tt>class</tt>
 attribute of <tt>"posted"</tt>.
 
 A shortcut is to use the divisor:
 
- (doc/"p.posted")
- #=> #<Hpricot:Elements[{p ...}, {p ...}]>
+    (doc/"p.posted")
+    #=> #<Hpricot:Elements[{p ...}, {p ...}]>
 
-=== Finding Just One Element
+### Finding Just One Element
 
 If you're looking for a single element, the <tt>at</tt> method will return the
 first element matched by the expression.  In this case, you'll get back the
 element itself rather than the <tt>Hpricot::Elements</tt> array.
 
- doc.at("body")['onload']
+    doc.at("body")['onload']
 
 The above code will find the body tag and give you back the <tt>onload</tt>
 attribute.  This is the most common reason to use the element directly: when
 reading and writing HTML attributes.
 
-=== Fetching the Contents of an Element
+### Fetching the Contents of an Element
 
 Just as with browser scripting, the <tt>inner_html</tt> property can be used to
 get the inner contents of an element.
 
- (doc/"#elementID").inner_html
- #=> "..<b>contents</b>.."
+    (doc/"#elementID").inner_html
+    #=> "..contents.."
 
 If your expression matches more than one element, you'll get back the contents
 of ''all the matched elements''.  So you may want to use <tt>first</tt> to be
 sure you get back only one.
 
- (doc/"#elementID").first.inner_html
- #=> "..<b>contents</b>.."
+    (doc/"#elementID").first.inner_html
+    #=> "..contents.."
 
-=== Fetching the HTML for an Element
+### Fetching the HTML for an Element
 
 If you want the HTML for the whole element (not just the contents), use
 <tt>to_html</tt>:
 
- (doc/"#elementID").to_html
- #=> "<div id='elementID'>...</div>"
+    (doc/"#elementID").to_html
+    #=> "<div id='elementID'>...</div>"
 
-=== Looping
+### Looping
 
 All searches return a set of <tt>Hpricot::Elements</tt>.  Go ahead and loop
 through them like you would an array.
 
- (doc/"p/a/img").each do |img|
-   puts img.attributes['class']
- end
+    (doc/"p/a/img").each do |img|
+      puts img.attributes['class']
+    end
 
-=== Continuing Searches
+### Continuing Searches
 
 Searches can be continued from a collection of elements, in order to search deeper.
 
- # find all paragraphs.
- elements = doc.search("/html/body//p")
- # continue the search by finding any images within those paragraphs.
- (elements/"img")
- #=> #<Hpricot::Elements[{img ...}, {img ...}]>
+    # find all paragraphs.
+    elements = doc.search("/html/body//p")
+    # continue the search by finding any images within those paragraphs.
+    (elements/"img")
+    #=> #<Hpricot::Elements[{img ...}, {img ...}]>
 
 Searches can also be continued by searching within container elements.
 
- # find all images within paragraphs.
- doc.search("/html/body//p").each do |para|
-   puts "== Found a paragraph =="
-   pp para
+    # find all images within paragraphs.
+    doc.search("/html/body//p").each do |para|
+      puts "== Found a paragraph =="
+      pp para
 
-   imgs = para.search("img")
-   if imgs.any?
-     puts "== Found #{imgs.length} images inside =="
-   end
- end
+      imgs = para.search("img")
+      if imgs.any?
+        puts "== Found #{imgs.length} images inside =="
+      end
+    end
 
 Of course, the most succinct ways to do the above are using CSS or XPath.
 
- # the xpath version
- (doc/"/html/body//p//img")
- # the css version
- (doc/"html > body > p img")
- # ..or symbols work, too!
- (doc/:html/:body/:p/:img)
+    # the xpath version
+    (doc/"/html/body//p//img")
+    # the css version
+    (doc/"html > body > p img")
+    # ..or symbols work, too!
+    (doc/:html/:body/:p/:img)
 
-=== Looping Edits
+### Looping Edits
 
 You may certainly edit objects from within your search loops.  Then, when you
 spit out the HTML, the altered elements will show.
 
- (doc/"span.entryPermalink").each do |span|
-   span.attributes['class'] = 'newLinks'
- end
- puts doc
+
+    (doc/"span.entryPermalink").each do |span|
+      span.attributes['class'] = 'newLinks'
+    end
+    puts doc
 
 This changes all <tt>span.entryPermalink</tt> elements to
 <tt>span.newLinks</tt>.  Keep in mind that there are often more convenient ways
 of doing this.  Such as the <tt>set</tt> method:
 
- (doc/"span.entryPermalink").set(:class => 'newLinks')
+    (doc/"span.entryPermalink").set(:class => 'newLinks')
 
-=== Figuring Out Paths
+### Figuring Out Paths
 
 Every element can tell you its unique path (either XPath or CSS) to get to the
 element from the root tag.
 
 The <tt>css_path</tt> method:
 
- doc.at("div > div:nth(1)").css_path
-   #=> "div > div:nth(1)" 
- doc.at("#header").css_path
-   #=> "#header" 
+    doc.at("div > div:nth(1)").css_path
+      #=> "div > div:nth(1)" 
+    doc.at("#header").css_path
+      #=> "#header" 
 
 Or, the <tt>xpath</tt> method:
 
- doc.at("div > div:nth(1)").xpath
-   #=> "/div/div:eq(1)" 
- doc.at("#header").xpath
-   #=> "//div[@id='header']" 
+    doc.at("div > div:nth(1)").xpath
+      #=> "/div/div:eq(1)" 
+    doc.at("#header").xpath
+      #=> "//div[@id='header']"
 
-== Hpricot Fixups
+## Hpricot Fixups
 
 When loading HTML documents, you have a few settings that can make Hpricot more
 or less intense about how it gets involved.
 
-== :fixup_tags
+## :fixup_tags
 
 Really, there are so many ways to clean up HTML and your intentions may be to
 keep the HTML as-is.  So Hpricot's default behavior is to keep things flexible.
@@ -229,7 +230,7 @@ Making sure to open and close all the tags, but ignore any validation problems.
 As of Hpricot 0.4, there's a new <tt>:fixup_tags</tt> option which will attempt
 to shift the document's tags to meet XHTML 1.0 Strict.
 
- doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
+    doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
 
 This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow
 the rules a bit better.  Like: say Hpricot finds a paragraph in a link, it's
@@ -238,13 +239,13 @@ where paragraphs don't belong.
 
 If an unknown element is found, it is ignored.  Again, <tt>:fixup_tags</tt>.
 
-== :xhtml_strict
+## :xhtml_strict
 
 So, let's go beyond just trying to fix the hierarchy.  The
 <tt>:xhtml_strict</tt> option really tries to force the document to be an XHTML
 1.0 Strict document.  Even at the cost of removing elements that get in the way.
 
- doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
+    doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
 
 What measures does <tt>:xhtml_strict</tt> take?
 
@@ -254,7 +255,7 @@ What measures does <tt>:xhtml_strict</tt> take?
  4. Remove illegal content.
  5. Alter the doctype to XHTML 1.0 Strict.
 
-== Hpricot.XML()
+## Hpricot.XML()
 
 The last option is the <tt>:xml</tt> option, which makes some slight variations
 on the standard mode.  The main difference is that :xml mode won't try to output
@@ -266,9 +267,9 @@ to case, friends.
 
 The primary way to use Hpricot's XML mode is to call the Hpricot.XML method:
 
- doc = open("http://redhanded.hobix.com/index.xml") do |f|
-   Hpricot.XML(f)
- end
+    doc = open("http://redhanded.hobix.com/index.xml") do |f|
+      Hpricot.XML(f)
+    end
 
 *Also, :fixup_tags is canceled out by the :xml option.*  This is because
 :fixup_tags makes assumptions based how HTML is structured.  Specifically, how