New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import very old content from www.wincent.com #82

Closed
wincent opened this Issue Oct 27, 2016 · 2 comments

Comments

Projects
None yet
1 participant
@wincent
Owner

wincent commented Oct 27, 2016

May be able to write some hacky script to get the HTML out of articles like this one. A lot of that old content is garbage but it does have some historical interest. I have articles spanning from around 2005 to 2008. (Actually, just found one as old as 2004.)

Possibly use Pandoc or something to convert to Markdown.

Will need import script that can put these on a branch somewhere, then rewrite the content branch to rebase the new content on top of the old content while preserving all the dates correctly.

@wincent wincent added the chore label Oct 27, 2016

@wincent

This comment has been minimized.

Show comment
Hide comment
@wincent

wincent Apr 20, 2017

Owner

Copying in some older notes I have:

Shut down dat PHP stuff.

The svn/git log stuff could become snippets if I wanted, but I think the main thing of interest is the blog.

Unfortunately, would need some kind of markup converter. I am not even sure what language the blog source is in. It might be easiest to go from the HTML output back to wikitext...

Not sure if I still have this in a DB dump somewhere, or if I have to scrape the HTML.

Eventually want to shut down the kbase subdomain as well: content is still there at: http://kbase.wincent.com/old/knowledge-base/Main_Page.html [dead link]

Also cool to import: I have some very old PHP files archived under ~/web/archive

See also the task I have to make a wikitext to markdown converter: I may end up using Pandoc for both.

Owner

wincent commented Apr 20, 2017

Copying in some older notes I have:

Shut down dat PHP stuff.

The svn/git log stuff could become snippets if I wanted, but I think the main thing of interest is the blog.

Unfortunately, would need some kind of markup converter. I am not even sure what language the blog source is in. It might be easiest to go from the HTML output back to wikitext...

Not sure if I still have this in a DB dump somewhere, or if I have to scrape the HTML.

Eventually want to shut down the kbase subdomain as well: content is still there at: http://kbase.wincent.com/old/knowledge-base/Main_Page.html [dead link]

Also cool to import: I have some very old PHP files archived under ~/web/archive

See also the task I have to make a wikitext to markdown converter: I may end up using Pandoc for both.

@wincent

This comment has been minimized.

Show comment
Hide comment
@wincent

wincent May 15, 2017

Owner

Many URLs are obviously going to break. For example a blog post like:

https://www.wincent.com/a/about/wincent/weblog/archives/2008/02/ragel_wins_fata.php

Will get moved to a new home at a URL like:

https://wincent/com/blog/ragel-wins-fatality

The old page should become a 301 (permanent) redirect.

Owner

wincent commented May 15, 2017

Many URLs are obviously going to break. For example a blog post like:

https://www.wincent.com/a/about/wincent/weblog/archives/2008/02/ragel_wins_fata.php

Will get moved to a new home at a URL like:

https://wincent/com/blog/ragel-wins-fatality

The old page should become a 301 (permanent) redirect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment