Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Tamil Wiktionary Offline Reader. This Reader displays the text-only archives of wiktionary, which can be downloaded from : http://download.wikimedia.org/backup-index.html and are usually named like : pages-articles.xml.bz2 It requires Python, Qt and PyQt. Altough only Qt4/PyQt4 is supported now, the old Qt3/PyQt3 code is still included and should still work. It also assumes you have basic tools like gzip, zcat and zgrep, tail, head... The wiki markup parsing of this reader is not yet complete although fairly useable in its current form. Usage ----- 1. Make sure you have the following folders: a) The "chunks" folder that contains the broken-up downloaded zip (bz2) files. b) The "indexdir" folder that contains the index files. These two folders can be created by downloading the latest wiktionary dump file and using the wxPython application in the "tawiktionary-offline" folder by running the "gui.py" file. 2. On the commandline, run: python Karthika.py or just click on it from your favorite file manager 3. The main window contains the article title area (top), list of articles found (left) main text area (right). You can go to an article directly by typing its name then clicking the "Go" button, or by clicking a link from the main text area. By default, clicking a link loads the article in the background. If you type a Tamil or English word and then click the "Search" button, a list of related articles found will load in the list on the the left. 4. The "Karthika.exe" file and the "chunks" and "indexdir" folders are all you need to run the application on any Windows PC. How to set up for development in Windows XP (should work on Vista and Win 7 as well) ------------------------------------------------------------------------------------ 1. Install Python 2.7.3 2. Install PyQT4 for Python 2.7 (the GUI) 3. Install Whoosh 2.3.2 (for indexing and searching) 4. Install Beautiful Soup 3.2.1 (for XML parsing) 5. Copy the PYQT4 folder under the tawiktionary-offline project from this Github site 6. Launch a command window 7. Change directory to the PYQT4 folder you copied in step 5 above 8. Run <pythondir>\python Karthika.py. For example, I have Python installed in the C:\Python27 folder. I run "C:\Python27\python Karthika.py". This should launch the GUI (You may see an error message "Error while loading math parser" in the console. This is harmless and can be safely ignored for now). You can enter a Tamil or English word and lookup or search. 9. You can use any text editor for editing the Python code files (.py). I am using Notepad++ on Windows. Additional notes for developers ------------------------------- 1. I am able to run this on Ubuntu as well. You may already have Python installed. Check the version and update, if needed. 2. For steps 2 to 4 above, you can use easy_install that allows you to "Download, build, install, upgrade, and uninstall Python packages -- easily!". I downloaded setuptools-0.6c11.win32-py2.7.exe from here http://pypi.python.org/pypi/setuptools 3. For building the Windows exe you have to install Py2exe on Windows. I was able to build the exe successfully using the command "python setup.py py2exe --includes sip" FAQ --- Q. Can i disable the graphical rendering of the maths ? ("latex rendering") A. Yes, but you will have to manually modify the program : Edit the "Karthika.py" file, go to the line which says "self.latexRendering = True" and change "True" to "False" Q. Can i change the text size ? A. Font Size can now be changed, altough you will have to manually modify the program : Edit the "Karthika.py" file, go to the line which says "fontSize = 9" and change "9" to whatever point size fits you best. This will only change the font size of the text area. Q. Can i edit the User Interface to change more settings ? A. If you have the Qt4 "designer" program, shipped with Qt-tools, you can edit "form3.ui" to fit your needs Appreciation ------------ While searching for a way to build on Arunmozhi's code base, I found the following project on Launchpad Launchpad by Benjamin Thyreau - An application to easily read Wikipedia's downloaded dump files: https://launchpad.net/wikipediadumpreader As you can see from the above link, this is open source and dual licensed under Simplified BSD Licence and GNU GPL v2. By combining features from Benjamin's code and Arunmozhi's code, I have built an application that seems to do the basic functions OK. Benjamin's code base had two things that we are looking for: PyQT4 user interface and more usable (though not complete) parsing of the wiki markup. On top of that he had built the ability to follow links. A user can click on hyperlinked words in the results to look-up those words further. However, on additional testing I discovered his code base had one big limitation for our use. It can only be used as an English to Tamil dictionary, but Tamil words cannot be looked up. This is because his indexing was not in unicode - as he himself noted in comments in his code. Arunmozhi uses Python Whoosh module for indexing and searching. Whoosh is natively built to handle Unicode. He also split the larger Wiktionary dump file into smaller chunks for faster look-ups. And he went a step further and built a Windows exe as well. I managed to combine Benjamin's PyQT4 user interface and wiki parsing with Arunmozhi's indexing/searching and then built a Windows exe following Arunmozhi's steps. I would like to record my appreciation of both Benjamin Thyreau and Arunmozhi for their valuable work with Python code as well as for making their code available as open source. -- Ashok Ramachandran - 6/2012