Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Podcast Wiki hack for MusicHackDay
Scala
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
project
src/main
.gitignore
README.md
build.sbt
run
sbt
sbt-launch.jar

README.md

podiki

There are hundreds of thousands of podcasts out there, each with hundreds of episodes chock full of really useful information and awesome songs. Problem is, that's all locked up behind a lifetime's worth of audio, unindexed and unsearchable.

Podiki detects songs and transcribes speech in podcasts, making it available to be searched, linked up, indexed and updated.

There are two parts of Podiki: the processing of podcasts and a wiki.

Podcast Processing

http://github.com/thesmith/podiki

Submitted podcasts' new episodes are crawled and all the speech and song data extracted. As users correct the text this creates a feedback loop that updates the linguistic model used to transcribe future episodes.

The song information is determined using EchoPrint and the speech detection and transcription uses the Sphinx4 library.

The background processing is written in Scala and is backed by Redis (atm).

Wiki

http://github.com/thesmith/podiki-web

The wiki lets users submit podcasts for processing and edit the songs and text and add additional links to things that are being talked about in the podcast.

The wiki web-app is also built in Scala using Play and tracks are linked to using the Spotify API.

TODO

It is better to be done than anywhere near perfect. This is only just done.

Currently the wiki only allows the text and a few other bits to be edited and the feedback loop to the linguistic model isn't working.

Something went wrong with that request. Please try again.