Skip to content

Commit

Permalink
change RESPONSE_REGEX to BE
Browse files Browse the repository at this point in the history
  • Loading branch information
yama-natuki committed Jun 26, 2015
1 parent b14d158 commit 14c9ea4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 2chproxy.pl
Expand Up @@ -92,7 +92,7 @@
#以下WEBスクレイピングの際の正規表現
TITLE_REGEX => '^<title>(.*)</title>$', #タイトル抽出
# 1.レス番 2.目欄 3.名前/ハッシュ 4.日付|ID 5.BE1 6.BE2 7.本文
RESPONSE_REGEX => '^<dt>(\d+)\s[^<]*<(?:a href="mailto:([^"]+)"|font[^>]*)><b>(.*?)</b></(?:a|font)>.([^<]+?)\s?(?:(?:<a .+?/a>\s)?<a [^>]*be\((\d+)\)[^>]*>\?([^<]+)</a>)?<dd>(.+)'
RESPONSE_REGEX => '^<dt>(\d+)\s[^<]*<(?:a href="mailto:([^"]+)"|font[^>]*)><b>(.*?)</b></(?:a|font)>.([^<]+?)\s?(?:(?:<a .+?/a>\s)?<a [^>]*be\(([^\)]+)\)[^>]*>\?([^<]+)</a>)?<dd>(.+)'
#WEBスクレイピングの細かい部分の正規表現は下の方
};

Expand Down

2 comments on commit 14c9ea4

@yama-natuki
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://hayabusa6.2ch.net/test/read.cgi/linux/1429072845/124

124 名前:login:Penguin [sage]: 2015/06/26(金) 19:25:02.73 ID:QVLhJ1gj
92さんの改変版2chproxy.plでも取れないレスがあった
BEのところが<a href="javascript:be(Can't);">?##</a>ってなってるやつ
(普通は<a href="javascript:be(数字);">?2BP(2000)</a>ってなってるけど)

正規表現中の be\((\d+)\)[^>]を be\(([^\)]+)\)[^>]ってしたらいけた 

@yama-natuki
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.