Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

minor changes to feeds and readability

  • Loading branch information...
commit 10da731028d7b9b65faf957087b0ec6812f60734 1 parent ef712a6
@aria42 aria42 authored
Showing with 10 additions and 0 deletions.
  1. +1 −0  src/webmine/feeds.clj
  2. +9 −0 src/webmine/readability.clj
View
1  src/webmine/feeds.clj
@@ -306,6 +306,7 @@ May not be a good idea for blogs that have many useful feeds, for example, for a
(canonical-feed "http://techcrunch.com/2010/11/02/andreessen-horowitz-650m-fund/")
; This one requires fix-link, otherwise doesn't work
(canonical-feed "http://npr.org")
+ (entries "http://www.nytimes.com/services/xml/rss/nyt/HomePage.xml")
(canonical-feed "http://io9.com/")
(canonical-feed "http://www.huffingtonpost.com/")
)
View
9 src/webmine/readability.clj
@@ -152,6 +152,15 @@
strip-bad-divs!
find-best-content-div))
+(defn extract-content [raw-html]
+ (-> raw-html
+ parser/dom
+ parser/strip-non-content
+ strip-bad-divs!
+ find-best-content-div
+ .getTextContent))
+
+
(comment
;; THESE WORK
"http://gigaom.com/2010/10/22/whos-driving-mobile-payments-hint-some-are-barely-old-enough-to-drive/"
Please sign in to comment.
Something went wrong with that request. Please try again.