A simple command line, java based, Wikipedia table extractor.
These introduction will tell you more about the objectives and the results that will get by using this software.
The software aim to return you as much table as possible from a wikipedia page that you provide to it in CSV format. It will try to exclude the table that cannot be parsed into CSV and inform you about how much table it can parse.
If you give a wikipedia page link that contains tables, the software will generate you the CSV for every valid tables in the page and exclude the ones that cannot be converted into CSV.
If you give a wikipedia page link that doesn't contains any valid tables or no tables at all, the software will inform you of this and won't generate any file.
If you give any other link than wikipedia, the software will tell you that the page you provide is not compatible and won't generate any file.
At every steps you can check the list of link that you provided to the software and check if and how many table it can extract.
These instructions will show you how to get a working copy of the project to allow you to run and test de software.
You will need a java compiler to compile and run the program properly. Alternatively you can choose one the bellow IDE to make it easier.
IntelliJ IDEA
Eclipse
NetBeans
Depending on the IDE that you chose, you will have to import the pom.xml file to your project. This will allow Maven to automatically import the missing libraries to properly run the project.
Everything is now set and you are ready to run the program.
The Tests are located under:
src/test/java/pdl/wiki
You can simply run them with your favorite IDE.
src/test/java/pdl/wiki/BenchTest.java
This is the class that the client asked us to add and implement into our project. You can use it to BenchMark the software with 300 Wikipedia Links.
No deployment of the project is planned on client servers/computers as the client didn't specifically asked to do it. The project can only be used via console commands.
- IntelliJ IDEA - The IDE mainly used by our crew.
- Maven - Dependency Management.
- Sweble-Wikitext - The java based WikiText Parser.
- Astah - The UML editor.
- Word - The document editor used to create the specifications.
Every members of our crew had the opportunity to contribute as much as they want to there tasks, as long as they are assigned to it in the Roadmap.
We aimed to release the version 1.0 were everything is supposed to work perfectly (project main tasks done and tested). We started on version 0.0.1 with the following update schema:
(X.Y.Z) where X, Y and Z are numbers e.g. 0.2.5
Every small update increment the Z number by one and a major update (with tested and working main functionality done) increment the Y number by 1. Finally only if every features are done the X number is incremented and it can only be incremented again if the client ask new features and if we implement them. You can find every versions changelog in the CHANGELOG.md file
- Kupratsevitch Vadim - Team leader - Java/UML developer - vad101010
- Wacquet Adrien - Java/UML developer - External components integrator - awacquet
- Legroux Thibaut - Java developer - Specification writer - thigroux
- Mande Guillaume - Java developer - Specification writer - guiMande
- Scrimali Gaetan - Java developer - Specification writer - Gueguette
This project is licensed under the MIT License - see the LICENSE.md file for details.