UniOpen2 is a reboot of the initial Uniopen idea.
The aim of this project is to collect public data related to university life - canteens, libraries, places, etc. - and make them available in a nice JSON format.
In this repo you can find a base architecture that handles the flow of starting web grabber and store data into a MongoDB collection.
Following these instruction you will be able to run a copy of the project on your local machine and to create and start your first data grabber.
We set up a docker-compose configuration that provides to build a network with redis, mongo and nodejs and let you focus only on grabber creation.
Anyway if you prefer a no-docker use of the project you'll need to have these software up and running:
- Node.js - https://nodejs.org/
- MongoDB - https://www.mongodb.com
- Redis - https://redis.io/
We will guide you through both of use cases.
First of all we assume that you have docker and docker-compose installed. If it's not your case, follow installation guide on https://docs.docker.com/compose/install/
Now clone or download this repo and run
docker-compose up
You can now create your personal grabber simply working on grabber folder.
To stop, open another terminal and use docker-compose down
command.
Notes:
- first time
docker-compose up
may take some time because it has to download all node, mongo and redis images - for more info on what docker is and how it works refer to https://www.docker.com/what-docker
- Download MongoDB CommunityServer from here and install it (you only need basic stuff, things like MongoDB Compass are not necessary). Next you have to configure it following this guide Install MongoDB on windows
- Download Redis for windows from here (if you download the .zip file instead of .msi, installation isn't required).
- Download and install the latest version of Node from here
- Clone or download this repo and initialize it by running the following command in the terminal into the project folder
npm install
- Rename .env.dev file into .env and setup your configuration parameters
In order to launch the project you have first to open MongoDB and Redis, finally you can open Uniopen:
- MongoDB: open a terminal and run
"C:\Program Files\MongoDB\Server\3.6\bin\mongod.exe"
- Redis: simply double click
redis-server.exe
- Uniopen: open a terminal into Uniopen's folder and run
node .\build\application.js
The simplest way to install all the necessary software is through Homebrew. So first of all paste
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
to install that. Then run brew update
to make sure Homebrew is up to date.
Now you can install
-
Nodejs with
brew install node
-
Redis with
brew install redis
-
MongoDB with
brew install mongodb
Then you also need to run
mkdir -p /data/db
to create the default /data/db directory. -
Clone or download this repo and initialize it by running the following command in the terminal into the project folder
npm install
-
Rename .env.dev file into .env and setup your configuration parameters
In order to launch the project you have first to open MongoDB and Redis, finally you can open Uniopen:
- MongoDB - open a terminal and run
mongod
and keep it running - Redis - open another terminal run
redis-server
- Uniopen - open a third terminal, go into project cloned directory and run
npm run dev
-
Nodejs can be installed via package manager.
You can find the right way to do that depending on your current distribution on the official nodejs page https://nodejs.org/en/download/package-manager/ -
The suggested way of installing Redis is compiling it from sources follow this simple steps
wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make
After the compilation the src directory inside the Redis distribution is populated with the different executables. It is a good idea to copy both the Redis server and the command line interface in proper places, either manually using the following commands:
sudo cp src/redis-server /usr/local/bin/ sudo cp src/redis-cli /usr/local/bin/
Or just using
sudo make install
-
For the best installation experience, MongoDB provides packages for popular Linux distributions. You can find the installation process guides here
-
Clone or download this repo and initialize it by running the following command in the terminal into the project folder
npm install
-
Rename .env.dev file into .env and setup your configuration parameters
In order to launch the project you have first to open MongoDB and Redis, finally you can open Uniopen:
- MongoDB - open a terminal and run
sudo service mongod start
and keep it running - Redis - open another terminal run
redis-server
- Uniopen - open a third terminal, go into project cloned directory and run
npm run dev
Now you are ready to use Uniopen!
Open your web browser and go to 127.0.0.1:5000/api/[service]
Available services are:
return currently implemented grabbers
start all currently implemented grabbers
start only a specific grabber following the request pattern. If not type provided start all uni associate grabbers. If called without uni and type works like full-scan.
data consulting service.
get
without params return an array of object {uni, data[]} with all uni and associate data types in the current mongo collection, if no data return an empty array.
{
"statusCode":200,
"message":
[
{"data":["mensa"],"uni":"unive"},
{"data":["biblio"],"uni":"unipd"}
]
}
get/uni
return an object {uni, data[]} relative to required uni.
{
"statusCode":200,
"message": {"data":["biblio"],"uni":"unipd"}
}
get/uni/type
return an array of all objects of required uni/type
{
"statusCode":200,
"message":
[
{
"id": "419ae894-5e31-462b-aa81-b71ecba80f68",
"obj": {
"nome": "Biblioteca Slavistica e Ungherese",
"indirizzo": "Via Prosdocimo Beldomandi, 1 - 35137 Padova",
"posti": 24
}
}
// ...
]
get/uni/type/id
return specified object data
You can find some examples in grabber
folder
You may think about this project as something like a crawler, for each university it has a set of urls and grabbers. Usually these urls represent main pages of different categories (i.e. libraries or canteens), for each page you need to create a grabber, that pick up all the informations it can find (i.e. a list of libraries where each of them have a link to the details page), then it commits partial information, it uses relative links to open other pages and so on, until it has all the necessary data to fill the DB.
Let's get started with the actual instructions:
-
open
grabber/config.json
, as you can see, in this file there is an array that contains a list of universities, each of them contains an array of objects that represent different categories. If you want to add a grabber, you have to put in the relative university's array (if your university is not present, simply add it with the same structure of the others) a new object with:- type: the name of the chosen category, also the name of the folder where you have to put your grabber.
- code: the name of the main grabber for the category.
- urls: the list of urls necessary to your grabber.
-
now you have to create your grabber file into
grabber/[university code]/[category code]/
, and place your code into a function that will receive urls as a parameter. For example:
(function (args) {
return httpGet(args.url).then((res) => {
return res.text();
}).then((source) => {
let $ = parseHtml(source);
// [use jquery-like code to get your data from $]
let url = [objects_url] // i.e. library's details url
let key = [objects_key] // i.e. library's short name
//use partialData() if you don't have all the informations and next call a specifica grabber with callGrabber()
partialData(args.uni, args.type, args.code, url, key, { [put partial data here] });
callGrabber(args.uni, args.type, grabberCode, href, key);
//else, if you have all the necessary you can use commitData()
commitData(args.uni, args.type, args.code, args.url, args.key, { [put data here] });
}).catch((err) => {
console.error(err.message, err.stack);
});
});
As you may have notice in the grabber code example there are some function that you can consider like some API helpers that we provide to integrate your grabber into our uniopen2 proposed flow. These are: *
Call another grabber. Useful if your information is fragmented into multiple pages or other cases relative to informations partitioning.
Function parameters are:
uni
- string that identify university (eg. unipd, unive, ... )type
- string that identify the type of information parsed (seetypes
)code
- reference to grabber file name (eg. default, iniziative-venete, ...)url
- url to be parsed from the called grabberkey
- optional string identifying parsed object store key (usually generated through helper functionrawkey
);raw
- optional data object useful if you need to share something with the new grabber
Used when you can submit only partial data, maybe because you must call another grabber to complete information retrieval.
Function parameters are:
uni
- current university code accessible byargs.uni
type
- parsed information type (seetypes
)code
- current grabber filename accessible byargs.code
url
- url associate to parsed datakey
- string identifying parsed object store key (usually generated through helper functionrawkey
or, if already present by previus flow calls, accessible byargs.key
);raw
- object contain raw parsed information that will be saved
Function that commits the data passed in the raw param. Stop the current grabber(s) flow and try to validate raw object, according with selected type, then store it.
Function parameters are:
uni
- current university code accessible byargs.uni
type
- parsed information type (seetypes
) ( current accessible byargs.type
)code
- current grabber filename accessible byargs.code
url
- url associate to parsed datakey
- string identifying parsed object store key (usually generated through helper functionrawkey
or, if already present by previus flow calls, accessible byargs.key
);raw
- object contain raw parsed information that will be saved
Until now we provide a basic support to grabber based on 3 types:
Object representing canteens.
To be valid needs to have following data:
- [required string]
nome
- canteen's name - [required string]
indirizzo
- canteen's address - [string]
note
- optional notes
Object representing places dedicated to personal study.
To be valid needs to have following data:
- [required string]
nome
- place's name - [required string]
indirizzo
- place's address - [number]
posti
- room capacity - [timetable]
orari
- timetable follow a specific format type - [string]
note
- optional notes
Object representing library.
To be valid needs to have following data:
- [required string]
nome
- library's name - [required string]
indirizzo
- library's address - [number]
posti
- room capacity - [timetable]
orari
- timetable follow a specific format type - [string]
note
- optional notes
It's an array containing from 1 to 7 element formatted in one of follows options:
// days interval
lun - ven 10:30 - 11:20
// single day
gio 10:20 - 11:20
// multiple days
mer, sab, dom 17:00 - 14:00
valid days names are lun
, mar
, mer
, gio
, ven
, sab
, dom
We provide an helper function that you can use to translate a string like lunedì - giovedì 11 - 21
to an accepted format (see normalizeTimetable )