Distance and similarity measures in fuzzy sets

Distance and similarity measures are widely used in pattern recognition, machine learning, image processing, mathematics, statistics and many other fields.

Distance measures:

Distance is the dissimilarity between two patterns. The pattern could be a scalar number, vector, matrix or any numeric data. Distance measures are quite useful to find the similarity or the difference in the patterns. If two patterns are identical the dissimilarity /distance would be zero. The difference between patterns increases and the dissimilarity/distance grows up.

Different mathematicians and researchers have proposed different distance measures. A few popular distance measures are Euclidian distance, Manhattan distance, and hamming distance.

Hamming distance:

Hamming distance is one of the simplest and computationally cheaper distance measures. It is named after Richard Hamming, who was a popular American mathematician.

It is typically used with binary strings. It finds the number of bits which are different in both strings for the corresponding positions.

In other words, we can say that the hamming distance is the number of edits required two make two strings identical

Example:

S1 = 1 0 1 1 1 0

S2 = 0 0 1 1 1 1

In above both the strings, if we scan from left to right, bits on the first and last positions are different, so the hamming distance between these two strings would be 2.

The concept of hamming distance can be extended to other data types also

“PYTHON” and “PARROT” = 4

HELLOO and HEIGHT = 4

WELL and FALL = 2

CODECRUCKS and CODEWORDSS = 5

Hamming distances are widely used in coding theory to check the quality of the sent signal.

The hamming distance between two fuzzy sets A and B is given as,

formula for hamming distance
The formula for hamming distance

Fuzzy hamming distance is simply the summation of element-wise absolute difference.

Example:

Let us compute the hamming distance between given two fuzzy sets:

A = { (x1, 0.4), (x2, 0.8), (x3, 1.0), (x4, 0.0)}

B = { (x1, 0.4), (x2, 0.3), (x3, 0.0), (x4, 0.0) }

h(A, B) = | 0.4 – 0.4 | + | 0.8 – 0.3| + | 1.0 – 0.0 | + | 0.0 – 0.0 | = 1.5

Relative Hamming distance:

Relative hamming distance is the average distance between elements. which is computed as h(A, B) / n, where n denotes the number of elements in the fuzzy set.

For above data, relative hamming distance = 1.5 / 4 = 0.375

Manhattan Distance:

Manhattan distance is also popularly known as city block distance, L1 norm or rectilinear distance. It is computed by taking the sum of the absolute difference of Cartesian coordinates.

Euclidean distance between points (x1, y1) and (x2, y2) is computed as,

d = |x1 – x2| + |y1 – y2|

For fuzzy sets, hamming distance and manhattans distance are identical.

In chess, the way an elephant moves from one board position to other is measured using Manhattan distance.  The distance between two points is measured along axes at right angles.

Euclidean distance:

Euclidean distance is one of the most popular distance measures. It is also known as Pythagorean distance or L2 norm. Euclidean distance between two points in Euclidean space is simply the length of the line joining those two points.

In the simplest form, Euclidean distance is the distance between two points on a 2D plane measured using a scale/ruler. It is the minimum physical distance between two points. This can be visualized as,

distance and similarity measures
Visualization of Euclidean distance

Euclidean distance between points (x1, y1) and (x2, y2) is computed as,

distance and similarity measures
Euclidean distance between two points

We can generalize this equation to find the Euclidean distance between vectors or fuzzy sets of length n.

Generalized Euclidean distance formula
Generalized Euclidean distance formula

Example:

Let us compute the Euclidean distance between given two fuzzy sets:

A = { (x1, 0.4), (x2, 0.8), (x3, 1.0), (x4, 0.0)}

B = { (x1, 0.4), (x2, 0.3), (x3, 0.0), (x4, 0.0) }

d(A, B) = ( (0.4 – 0.47)2 + (0.8 – 0.3)2 + (1.0 – 0.0)2 + (0.0 – 0.0)2 )1/2 = 1.12

Minkowski Distance:

Minkowski distance is a generalization of both – the Manhattan distance and Euclidean distance.

Minkowski distance formula
Minkowski distance formula

By changing the value of w, we can derive w-th norm distance between vectors/sets.

  • w = 1 → Hamming / Manhattan Distance
  • w = 2 → Euclidean Distance

Properties of Distance:

Any distance measure satisfies the following properties:

1. d( A, B ) ≥ 0

2. d( A, B ) = d( B, A )

3. d( A, C ) ≤ d( A, B ) + d( B, C )

4. d( A, A )= 0

Watch on YouTube: Distance and similarity measures

distance and similarity measures

Similarity Measure:

It is an important method for determining the similarities between the elements of two vectors in a set of vectors.

Let X={x1, x2, …, xn} be the set of vectors, where each element xi represents a vector of length m

xi={xi1, xi2,…, xim}

The similarity between two vectors xi and xj is given as,

distance and similarity measures

Like dissimilarity measures, there are plenty of similarity measures around. We will discuss cosine amplitude similarity measures and max-min similarity measures in the context of fuzzy sets.

Cosine similarity measure:

cosine similarity measure

Max-min similarity measure:

Max-min similarity measure

Here, m indicates the length of the vector.

Cosine Amplitude Similarity Measure:

We will see how to compute cosine amplitude similarity between any pair of fuzzy sets/vectors:

Consider xi represents vectors stating the fuzzy value corresponding to no damage, medium damage and serious damage in a flood situation. The vector may represent the colony or area. Using the cosine amplitude similarity measure, we can find out what is the similarity of damage between two colonies/areas.

distance and similarity measures

rij=1, for i = j

So, r11 = r22 = r33 = r44 = r55 = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

cosine similarity - step 1
cosine similarity - step 2
cosine similarity - step 3

The cosine similarity between vectors x1 and x2 is 0.836, which represents a very high similarity between the vectors. In a similar way, we can compute the cosine similarity between every pair of vectors as,

Cosine amplitude similarity measure matrix
Cosine amplitude similarity matrix

Max-Min Similarity Measure:

We will consider the same data used for the cosine amplitude similarity measure to demonstrate the max-min similarity method.

distance and similarity measures

Like the cosine similarity measure,

rij = 1, for i = j

So, r11 = r22 = r33 = r44 = r55 = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

max-min similarity - step 1
max-min similarity - step 2
max-min similarity - step 3

The max-min similarity between vectors x1 and x2 is 0.538. In a similar way, we can compute the max-min similarity between every pair of vectors as,

Max min similarity measure matrix

Test Your Knowledge:

A = { (x1, 0.4), (x2, 0.5), (x3, 0.0), (x4, 0.8) , (x5, 0.6) }

B = { (x1, 1.0), (x2, 0.5), (x3, 0.1), (x4, 0.4) , (x5, 0.8) }

For the fuzzy sets given above, find the following distance and similarity measures:

  • d1 = Hamming distance
  • d2 = Relative Hamming distance
  • d3 = Euclidean distance
  • s1 = Cosine amplitude similarity
  • s2 = max-min similarity

Please post your answer / query / feedback in comment section below !

6 Responses

  1. Smita Mahajani says:

    Hamming distance
    d1 = |0.4 – 1.0| + |0.5 – 0.5| + |0.0 – 0.1|+ |0.8-0.4 | + | 0.6 – 0.8 | = 0.6 + 0 +0.1 + 0.4 + 0.2 = 1.3

  2. Smita Mahajani says:

    Relative hamming distance
    d2 = 1.3 / 5 = 0.26

  3. Smita Mahajani says:

    Euclidean distance
    d3 = (0.36 +0 + 0.01 + 0.16 + 0.04 ) ^ 1/2
    = 0.75 (Round off up to 2 decimal places)

Leave a Reply

Your email address will not be published. Required fields are marked *