5GNF Normalization – Cypher Scripts & Experimental Evaluation

This repository contains all Cypher scripts, normalization procedures, and performance experiments used in the research study:

“Trait-Based Fifth Graph Normal Form (5GNF) for Property Graphs.”

The repository provides:

A complete running example demonstrating 0GNF → 5GNF
Scripts to load a real dataset (Northwind) and run pre-5GNF Baseline Experiments
A full Cypher implementation of the TraitExtraction5GNF algorithm
Post-normalization analytical queries
Screenshots of execution plans (before & after 5GNF) for reproducibility
Graph diagrams used in the manuscript

The code follows the VLDB Reproducibility Guidelines and is fully runnable under Neo4j 5.x.

Repository Structure


cypher/
0GNF.cypher
1GNF.cypher
2GNF.cypher
3GNF.cypher
4GNF.cypher
5GNF.cypher

dataset/
northwind_load.cypher
northwind_constraints.cypher

normalization/
trait_extraction_5gnf.cypher
delete_properties.cypher

experiments/
pre_5gnf/
test01_customer_city.cypher
test02_supplier_country.cypher
test03_orders_by_country.cypher
test04_customers_city_country.cypher
test05_supplier_customer_match.cypher

post_5gnf/
test01_customer_city_trait.cypher
test02_supplier_country_trait.cypher
test03_orders_by_country_trait.cypher
test04_customers_city_country_trait.cypher
test05_supplier_customer_match_trait.cypher

results/
before_5gnf/
(screenshots)
after_5gnf/
(screenshots)

graph/
figures used in manuscript

README.md

1. Running Example (0GNF → 5GNF)

The folder /cypher/ contains a simple, pedagogical example demonstrating how a redundant property graph evolves through each normalization stage.

Files

File	Description
`0GNF.cypher`	Original graph with redundant metadata
`1GNF.cypher`	Extract domain values
`2GNF.cypher`	Separate producers from products
`3GNF.cypher`	Extract shared location
`4GNF.cypher`	Decompose technical metadata
`5GNF.cypher`	Apply Trait Extraction → Final 5GNF

This example is used in the paper to illustrate the conceptual progression toward 5GNF.

2. Real Dataset (Northwind)

A real, widely used dataset (Northwind) was loaded into Neo4j using the scripts in:


dataset/

2.1 Dataset Loading

Run:

:run dataset/northwind_constraints.cypher
:run dataset/northwind_load.cypher

This loads:

91 Customers
29 Suppliers
830 Products
78 Categories
~800 Orders
Thousands of metadata attributes (country, city, phone, region, etc.)

This real dataset replaces the synthetic dataset used in the earlier version of the paper and provides more realistic behavior for metadata redundancy and reuse.

3. TraitExtraction5GNF – Normalization Procedure

The full Cypher implementation of the 5GNF transformation is located in:

normalization/trait_extraction_5gnf.cypher

It performs the following steps:

Detect redundant metadata properties
Create Trait nodes for each distinct metadata value
Create HAS_TRAIT edges linking entities to traits
Ensure schema consistency (unique constraints)
Remove embedded metadata (optional cleanup)

After execution, all redundant properties (city, country, region, engine types, contact titles, etc.) become reusable traits.

4. Experimental Evaluation

Experiments were executed before and after applying 5GNF.

4.1 Pre-5GNF (Baseline)

Queries inside:

experiments/pre_5gnf/

measure:

DB hits
runtime (ms)
memory allocation
execution plan behavior

Screenshots of Neo4j PROFILE results are stored under:

results/before_5gnf/

Example (Customer–City scan):

Metric	Value
DB Hits	274
Time	175 ms
Memory	12,544 bytes

4.2 Post-5GNF (Normalized Schema)

Queries inside:

experiments/post_5gnf/

perform the same analytics but using Trait nodes instead of embedded metadata.

Screenshots are stored in:

results/after_5gnf/

Example improvement:

Query	Before	After	Improvement
Customer by City	175 ms	82 ms	53% faster

Even when performance does not improve, schema clarity, metadata reuse, and complexity reduction remain strong benefits.

5. Reproducing the Experiments

Step 1 — Load Northwind

:run dataset/northwind_constraints.cypher
:run dataset/northwind_load.cypher

Step 2 — Run baseline experiments

:run experiments/pre_5gnf/test01_customer_city.cypher
...

Step 3 — Apply 5GNF

:run normalization/trait_extraction_5gnf.cypher

Step 4 — Run post-5GNF experiments

:run experiments/post_5gnf/test01_customer_city_trait.cypher
...

Step 5 — Compare execution plans

Results are provided in /results/.

6. Academic Use

This repository is referenced in the manuscript submitted to ENASE/VLDB-style venues. It provides:

Reproducible Cypher code
Clear normalization stages
Real-dataset evaluation
Before/after screenshots for verification

Feel free to cite or reuse the scripts in academic work.

Acknowledgements

This work builds upon:

Neo4j 5.x Documentation
Property-Graph Normalization literature
Trait Dependency research foundations

Contact

For questions about the scripts or the 5GNF framework, contact:

Yahya Sa'd yahyazuh (GitHub)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

5GNF Normalization – Cypher Scripts & Experimental Evaluation

Repository Structure

1. Running Example (0GNF → 5GNF)

Files

2. Real Dataset (Northwind)

2.1 Dataset Loading

3. TraitExtraction5GNF – Normalization Procedure

4. Experimental Evaluation

4.1 Pre-5GNF (Baseline)

4.2 Post-5GNF (Normalized Schema)

5. Reproducing the Experiments

Step 1 — Load Northwind

Step 2 — Run baseline experiments

Step 3 — Apply 5GNF

Step 4 — Run post-5GNF experiments

Step 5 — Compare execution plans

6. Academic Use

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
cypher		cypher
graph		graph
normalization		normalization
northwind		northwind
queries		queries
results/screenshots		results/screenshots
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

5GNF Normalization – Cypher Scripts & Experimental Evaluation

Repository Structure

1. Running Example (0GNF → 5GNF)

Files

2. Real Dataset (Northwind)

2.1 Dataset Loading

3. TraitExtraction5GNF – Normalization Procedure

4. Experimental Evaluation

4.1 Pre-5GNF (Baseline)

4.2 Post-5GNF (Normalized Schema)

5. Reproducing the Experiments

Step 1 — Load Northwind

Step 2 — Run baseline experiments

Step 3 — Apply 5GNF

Step 4 — Run post-5GNF experiments

Step 5 — Compare execution plans

6. Academic Use

Acknowledgements

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages