Distance and similarity measures in fuzzy sets

by codecrucks · Published 02/08/2021 · Updated 08/03/2023

Distance and similarity measures are widely used in pattern recognition, machine learning, image processing, mathematics, statistics and many other fields.

Distance measures:

Distance is the dissimilarity between two patterns. The pattern could be a scalar number, vector, matrix or any numeric data. Distance measures are quite useful to find the similarity or the difference in the patterns. If two patterns are identical the dissimilarity /distance would be zero. The difference between patterns increases and the dissimilarity/distance grows up.

Different mathematicians and researchers have proposed different distance measures. A few popular distance measures are Euclidian distance, Manhattan distance, and hamming distance.

Hamming distance:

Hamming distance is one of the simplest and computationally cheaper distance measures. It is named after Richard Hamming, who was a popular American mathematician.

It is typically used with binary strings. It finds the number of bits which are different in both strings for the corresponding positions.

In other words, we can say that the hamming distance is the number of edits required two make two strings identical

Example:

S1 = 1 0 1 1 1 0

S2 = 0 0 1 1 1 1

In above both the strings, if we scan from left to right, bits on the first and last positions are different, so the hamming distance between these two strings would be 2.

The concept of hamming distance can be extended to other data types also

“PYTHON” and “PARROT” = 4

HELLOO and HEIGHT = 4

WELL and FALL = 2

CODECRUCKS and CODEWORDSS = 5

Hamming distances are widely used in coding theory to check the quality of the sent signal.

The hamming distance between two fuzzy sets A and B is given as,

Fuzzy hamming distance is simply the summation of element-wise absolute difference.

Example:

Let us compute the hamming distance between given two fuzzy sets:

A = { (x₁, 0.4), (x₂, 0.8), (x₃, 1.0), (x₄, 0.0)}

B = { (x₁, 0.4), (x₂, 0.3), (x₃, 0.0), (x₄, 0.0) }

h(A, B) = | 0.4 – 0.4 | + | 0.8 – 0.3| + | 1.0 – 0.0 | + | 0.0 – 0.0 | = 1.5

Relative Hamming distance:

Relative hamming distance is the average distance between elements. which is computed as h(A, B) / n, where n denotes the number of elements in the fuzzy set.

For above data, relative hamming distance = 1.5 / 4 = 0.375

Manhattan Distance:

Manhattan distance is also popularly known as city block distance, L1 norm or rectilinear distance. It is computed by taking the sum of the absolute difference of Cartesian coordinates.

Euclidean distance between points (x₁, y₁) and (x₂, y₂) is computed as,

d = |x₁ – x₂| + |y₁ – y₂|

For fuzzy sets, hamming distance and manhattans distance are identical.

In chess, the way an elephant moves from one board position to other is measured using Manhattan distance. The distance between two points is measured along axes at right angles.

Euclidean distance:

Euclidean distance is one of the most popular distance measures. It is also known as Pythagorean distance or L2 norm. Euclidean distance between two points in Euclidean space is simply the length of the line joining those two points.

In the simplest form, Euclidean distance is the distance between two points on a 2D plane measured using a scale/ruler. It is the minimum physical distance between two points. This can be visualized as,

distance and similarity measures — Visualization of Euclidean distance

Euclidean distance between points (x₁, y₁) and (x₂, y₂) is computed as,

Euclidean distance between two points

We can generalize this equation to find the Euclidean distance between vectors or fuzzy sets of length n.

Example:

Let us compute the Euclidean distance between given two fuzzy sets:

A = { (x₁, 0.4), (x₂, 0.8), (x₃, 1.0), (x₄, 0.0)}

B = { (x₁, 0.4), (x₂, 0.3), (x₃, 0.0), (x₄, 0.0) }

d(A, B) = ( (0.4 – 0.47)² + (0.8 – 0.3)² + (1.0 – 0.0)² + (0.0 – 0.0)² )^1/2 = 1.12

Minkowski Distance:

Minkowski distance is a generalization of both – the Manhattan distance and Euclidean distance.

By changing the value of w, we can derive w-th norm distance between vectors/sets.

w = 1 → Hamming / Manhattan Distance
w = 2 → Euclidean Distance

Properties of Distance:

Any distance measure satisfies the following properties:

1. d( A, B ) ≥ 0

2. d( A, B ) = d( B, A )

3. d( A, C ) ≤ d( A, B ) + d( B, C )

4. d( A, A )= 0

Watch on YouTube: Distance and similarity measures

Similarity Measure:

It is an important method for determining the similarities between the elements of two vectors in a set of vectors.

Let X={x₁, x₂, …, x_n} be the set of vectors, where each element x_i represents a vector of length m

x_i={x_i1, x_i2,…, x_im}

The similarity between two vectors x_i and x_j is given as,

Like dissimilarity measures, there are plenty of similarity measures around. We will discuss cosine amplitude similarity measures and max-min similarity measures in the context of fuzzy sets.

Cosine similarity measure:

Max-min similarity measure:

Here, m indicates the length of the vector.

Cosine Amplitude Similarity Measure:

We will see how to compute cosine amplitude similarity between any pair of fuzzy sets/vectors:

Consider x_i represents vectors stating the fuzzy value corresponding to no damage, medium damage and serious damage in a flood situation. The vector may represent the colony or area. Using the cosine amplitude similarity measure, we can find out what is the similarity of damage between two colonies/areas.

r_ij=1, for i = j

So, r₁₁ = r₂₂ = r₃₃ = r₄₄ = r₅₅ = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The cosine similarity between vectors x₁ and x₂ is 0.836, which represents a very high similarity between the vectors. In a similar way, we can compute the cosine similarity between every pair of vectors as,

Cosine amplitude similarity measure matrix — Cosine amplitude similarity matrix

Max-Min Similarity Measure:

We will consider the same data used for the cosine amplitude similarity measure to demonstrate the max-min similarity method.

Like the cosine similarity measure,

r_ij = 1, for i = j

So, r₁₁ = r₂₂ = r₃₃ = r₄₄ = r₅₅ = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The max-min similarity between vectors x₁ and x₂ is 0.538. In a similar way, we can compute the max-min similarity between every pair of vectors as,

Test Your Knowledge:

A = { (x₁, 0.4), (x₂, 0.5), (x₃, 0.0), (x₄, 0.8) , (x₅, 0.6) }

B = { (x₁, 1.0), (x₂, 0.5), (x₃, 0.1), (x₄, 0.4) , (x₅, 0.8) }

For the fuzzy sets given above, find the following distance and similarity measures:

d₁ = Hamming distance
d₂ = Relative Hamming distance
d₃ = Euclidean distance
s₁ = Cosine amplitude similarity
s₂ = max-min similarity

Please post your answer / query / feedback in comment section below !

Smita Mahajani says:

03/08/2021 at 9:36 AM

Hamming distance
d1 = |0.4 – 1.0| + |0.5 – 0.5| + |0.0 – 0.1|+ |0.8-0.4 | + | 0.6 – 0.8 | = 0.6 + 0 +0.1 + 0.4 + 0.2 = 1.3

- codecrucks says:
  
  03/08/2021 at 5:37 PM
  
  You got that right.. Bingo !
  
Smita Mahajani says:

03/08/2021 at 9:38 AM

Relative hamming distance
d2 = 1.3 / 5 = 0.26

- codecrucks says:
  
  03/08/2021 at 5:37 PM
  
  thumbs up :-)
  
Smita Mahajani says:

03/08/2021 at 9:49 AM

Euclidean distance
d3 = (0.36 +0 + 0.01 + 0.16 + 0.04 ) ^ 1/2
= 0.75 (Round off up to 2 decimal places)

- codecrucks says:
  
  03/08/2021 at 5:38 PM
  
  Correct ! All good.
  
imran says:

14/01/2024 at 3:02 AM

what about A={x1,0.2,0.3},{x2,0.3,0.3},{x1,0.9,0.4}
B= A={x1,0.4,0.3}, {x2,0.3,0.7}, {x3,0.1,0.4}

hamming and normalzied distance