Skip to content

webevoexp/webevo

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

WebEvo

Implementation of the paper WEBEVO: Taming Web Application Evoluation via Detecting Semantic Structure Changes.

Introduction

WebEvo combines two main modules to find semantic structure changes occuring between different versions of a web page. First module is called the Semantic Structure Change Detection module. The module first performs DOM-tree based change detection by comparing the DOM trees of two pages to find content-based changes and structural changes. Then the detected changes are further pruned via our History-based semantic structure change detection technique to output only semantic structure changes. Finally, these detected changes are used as input to our Semantics-based visual search module, which outputs the semantic structure changes with their mappings using content similarity analysis.

Overview of Workflow of WebEvo

Requirements

  • JAVA Version: 1.8

Usage

The major modules of WebEvo are listed below:

Semantic structure change detection module.

1). DOM-tree based change detection.

This module detects whether a part of the web page has changed using Levenshtein Edit Distance to compare the attributes and the structure of the corresponding DOM-trees. The inputs of DOM-tree based change detection module are the target page and the evolved page, the output contains changes in the DOM tree structures.

  • Input:

The target page and the evolved page.

  • Ouput:

Changes in the DOM tree structures.

To run the jar file:

java -jar api-monitor-0.0.1-SNAPSHOT-jar-with-dependencies.jar -oldpage:/apple/2018.html -newpage:/apple/2020.html -outputpath:/apple

api-monitor-0.0.1-SNAPSHOT-jar-with-dependencies.jar is in DOM-tree-based-change-detection.

To make the output.txt more readable, we parse the content of output.txt to generate domdiff.xlsx.

  • Input:

output.txt - Generated by the above step.

  • Ouput:

domdiff.xlsx - Changes in the DOM tree structures.

To run the jar file:

java -jar domdiff-writer.jar -outputtxt:/apple/output.txt -domdiffpath:/apple

domdiff-writer.jar is in DOM-tree-based-change-detection.

2). History-based semantic structure change detection.

The goal of this module is to prune the content-based changes from the previous step to find only semantic structure changes. We define content-based chagnes as web contents being constantly updated based on what a web server delivers to the client browser. This types of changes usually do not cause RPA or test scripts failures therefore need to be identified and filtered.

We compare the target page with its historical pages to identify the content-based changes.

  • Input:

The target webpage and three historical pages.

  • Output:

"dynamic.txt" contains the XPaths associated with the labels, which indicate whether the XPaths are dynamic or static.

To run the jar files:

java -jar api-monitor-0.0.1-SNAPSHOT-jar-with-dependencies.jar -oldpage:/apple/2018.html -historypage1:/apple/2018-02-08_history1/index.html -historypage2:/apple/2018-02-07_history2/index.html -historypage3:/apple/2018-02-06_history3/index.html -dynamicpath:/apple

api-monitor-0.0.1-SNAPSHOT-jar-with-dependencies.jar is in History-based-change-detection.

  • Example:

Below is an example of the content-based change.

Apple website - In the output (dynamic.txt) of History-based semantic structure change detection module, the promotion section in the target page is identified as a content-based change (dynamic /body/main[1]/section[2]) because it is constantly updated in a very short period of time, therefore it will not be passed to the Semantics-based Visual Search module to have further analysis.

The promotion section in the target page:
The promotion section in the historical page:

This module focuses on detecting the elements which have their locations changed in the web pages. Rather that analyzing the screenshots of whole web pages, WebEvo obtains the screenshots of the candidate changes and combines both text and image similarities to identify mappings between the original elements in old web page and the changed elements in new web page. The source code is in graphic-image-analysis. Please check out README.md for the usage.

Experiment Steps and Results

The experiment steps and the results are in Results. Please check out README.md for more details.

  • Example:

The location of the link "EXAMPLES" on w3schools is different between the target page and the evolved page.

The location on the target page:

The location on the evolved page:

By DOMTree-based change detection module, the "EXAMPLES" link is incorrectly marked as "NODE_REMOVED". Semantics-based Visual Search can fix this error, it correctly identifies this link on the evolved page. Refer to target screenshot and evolved screenshot.

Acknowledgement

Vista

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published