Broken pairs distance is not symmetric #40

N-Wouda · 2023-01-04T09:06:00Z

I am bringing our VRPTW solver under test, by writing unit tests for most code points. It's an immense amount of work, but it's going well. This testing effort forces me to look at every bit of code in considerable detail, to determine how it works and how to encode that in meaningful test cases.

I am currently testing our implementation of the broken pairs distance (BPD) measure. HGS-CVRP also uses this, and ours is a direct descendant of your efforts. There are two issues that I noticed with our relatively unmodified broken pairs distance:

The code (in various places) assumes it is symmetric, that is, BPD(X, Y) == BPD(Y, X). For example, here, where it is computed only once but used in both ways:

HGS-CVRP/Program/Population.cpp

Lines 35 to 40 in 387b5c5

    
           for (Individual * myIndividual2 : subpop) 
        
           { 
        
           	double myDistance = brokenPairsDistance(*myIndividual,*myIndividual2); 
        
           	myIndividual2->indivsPerProximity.insert({ myDistance, myIndividual }); 
        
           	myIndividual->indivsPerProximity.insert({ myDistance, myIndividual2 }); 
        
           }

The BPD distance as currently implemented is not symmetric. A fairly minimal example are individuals with the following two route sets: X = {{1, 2, 3, 4}, {}, {}} and Y = {{1, 2}, {3}, {4}}. Now, BPD(X, Y) = 2, but BPD(Y, X) = 3. The issue is a double increment for a single client when both conditions here are true:

HGS-CVRP/Program/Population.cpp

Lines 225 to 234 in 387b5c5

    
           double Population::brokenPairsDistance(const Individual & indiv1, const Individual & indiv2) 
        
           { 
        
           	int differences = 0; 
        
           	for (int j = 1; j <= params.nbClients; j++) 
        
           	{ 
        
           		if (indiv1.successors[j] != indiv2.successors[j] && indiv1.successors[j] != indiv2.predecessors[j]) differences++; 
        
           		if (indiv1.predecessors[j] == 0 && indiv2.predecessors[j] != 0 && indiv2.successors[j] != 0) differences++; 
        
           	} 
        
           	return (double)differences / (double)params.nbClients; 
        
           }

My fix for this has been to merge the two conditions into a single update, as follows:

if ((indiv1.successors[j] != indiv2.successors[j] && indiv1.successors[j] != indiv2.predecessors[j])
        || (indiv1.predecessors[j] == 0 && indiv2.predecessors[j] != 0 && indiv2.successors[j] != 0))
    differences++;

This passes all the symmetry tests I have so far thrown at it.

I am not sure if this is a "bug", but is definitely something I struggled understanding for a while. Am I misunderstanding BPD as it is implemented now?

The text was updated successfully, but these errors were encountered:

vidalt · 2023-01-05T17:08:40Z

Hello @N-Wouda, thanks for your very detailed analysis of the code. This is, in fact, not a bug but a characteristic of the broken-pairs distance calculated between two solutions, X and Y, which counts "how many edges of X do not appear in Y".

Consider, for example, X = {{1, 2, 3, 4}, {}} and Y = {{1, 2}, {3,4}}.

Solution X contains five edges : (0,1), (1,2), (2,3), (3,4), and (4,0).
Among these, only edge (2,3) is missing in Y, so BPD(X,Y) = 1.

Solution Y contains six edges: (0,1), (1,2), (2,0), (0,3), (3,4), (4,0).
Among these edges, (2,0) and (0,3) are missing in X, so BPD(Y,X) = 2.

This means that the broken-pairs calculation as defined here is asymmetric when the solutions have different numbers of vehicles. Formally, it should be qualified as a quasimetric (does not satisfy symmetry).

Finally, regarding the use of this distance in the following code snippet:

HGS-CVRP/Program/Population.cpp

Lines 35 to 40 in 387b5c5

    
           for (Individual * myIndividual2 : subpop) 
        
           { 
        
           	double myDistance = brokenPairsDistance(*myIndividual,*myIndividual2); 
        
           	myIndividual2->indivsPerProximity.insert({ myDistance, myIndividual }); 
        
           	myIndividual->indivsPerProximity.insert({ myDistance, myIndividual2 }); 
        
           }

In earlier versions of the code, I was using two distance calculations... but the differences in distances due to asymmetry are so small that it was better to save one distance calculation at this place (saving around 5-10% CPU time overall on some instances) and just use one of the two arbitrarily.

N-Wouda · 2023-01-05T19:15:39Z

@vidalt thank you for your quick response; I understand the BPD measure a lot better now! Based on your description above, I have implemented the following (our notation is a bit different, but hopefully not confusing):

double brokenPairsDistance(ProblemData const &data,
                           Individual const &first,
                           Individual const &second)
{
    auto const &fNeighbours = first.getNeighbours();
    auto const &sNeighbours = second.getNeighbours();

    int numBrokenPairs = 0;

    for (int j = 1; j <= data.nbClients; j++)
    {
        auto const [fPred, fSucc] = fNeighbours[j];
        auto const [sPred, sSucc] = sNeighbours[j];

        // An edge pair (fPred, j) or (j, fSucc) from the first solution is
        // broken if it is not in the second solution. Note that we double count
        // in this loop: we count each edge twice, for both j and for j +- 1.
        numBrokenPairs += fSucc != sSucc;
        numBrokenPairs += fPred != sPred;
    }

    // Average broken pairs distance, adjusted for double counting.
    return numBrokenPairs / (2. * data.nbClients);
}

As an example, let's again take X = {{1, 2, 3, 4}, {}} and Y = {{1, 2}, {3, 4}}:

X contains (0,1), (1,2), (2,3), (3,4), and (4,0).
Y contains (0,1), (1,2), (2,0), (0,3), (3,4), (4,0).

Let us first compute BPD(X, Y). The code tests for each client:

(0, 1) is in Y. Same for (1, 2). No differences.
(1, 2) is in Y. (2, 3) is not in Y. One difference.
(2, 3) is not in Y. (3, 4) is in Y. One difference.
(3, 4) is in Y. (4, 0) is in Y. No differences.

We count each edge twice, which we adjust for before returning. The resulting distance is BPD(X, Y) = 2 / (2 * 4) = 0.25.

Now let's look at BPD(Y, X):

(0, 1) is in X. Same for (1, 2). No differences.
(1, 2) is in X. (2, 0) is not in X. One difference.
(0, 3) is not in X. (3, 4) is in X. One difference.
(3, 4) is in X. (4, 0) is in X. No differences.

So we obtain BPD(Y, X) = 2 / (2 * 4) = 0.25. This is the same as BPD(X, Y)! That is not a coincidence - I have something resembling a proof by considering all four cases of same/same, different/same, same/different, different/different for each client.

The main takeaway here is, I think, that we can symmetrise BPD. Would you be interested in a PR that adapts the above to HGS-CVRP?

vidalt · 2023-01-05T23:40:13Z

Humm... a quick question though, are you sure this code snippet considers the distance between (undirected) edges? What is the result when you measure the distance between X = {{1, 2, 3, 4}} and Y = {{4, 3, 2, 1}} ?

Also, I have the impression that only internal edges that are not containing the depot are "counted double" in your calculation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken pairs distance is not symmetric #40

Broken pairs distance is not symmetric #40

N-Wouda commented Jan 4, 2023 •

edited

vidalt commented Jan 5, 2023

N-Wouda commented Jan 5, 2023 •

edited

vidalt commented Jan 5, 2023

Broken pairs distance is not symmetric #40

Broken pairs distance is not symmetric #40

Comments

N-Wouda commented Jan 4, 2023 • edited

vidalt commented Jan 5, 2023

N-Wouda commented Jan 5, 2023 • edited

vidalt commented Jan 5, 2023

N-Wouda commented Jan 4, 2023 •

edited

N-Wouda commented Jan 5, 2023 •

edited