Distance and similarity measures are widely used in pattern recognition, machine learning, image processing, mathematics, statistics and many other fields.

Distance measures:

Distance is dissimilarity between two patterns. Pattern could be scalar number, vector, matrix or any numeric data. Distance measures are quite useful to find the similarity or the difference in the patterns. If two patterns are identical the dissimilarity /distance would be zero. As difference between patterns increases, the dissimilarity / distance grows up.

Different mathematicians and researchers have proposed different distance measures. Few popular distance measures are Euclidian distance, Manhattan distance, hamming distance.

Hamming distance:

Hamming distance is one of the simplest and computationally cheaper distance measure. It is named after Richard Hamming, who was popular American mathematician.

It is typically used with the binary strings. It finds the number of bits which are different in both strings for on the corresponding positions.

In other words, we can say that the hamming distance is the number of edits required two make two strings identical

Example:

S1 = 1 0 1 1 1 0

S2 = 0 0 1 1 1 1

In above both the strings, if we scan from left to right, bits on first and last positions are different, so the hamming distance between these two strings would be 2.

The concept of hamming distance can be extended to other data types also

“PYTHON” and “PARROT” = 4

HELLOO and HEIGHT = 4

WELL and FALL = 2

CODECRUCKS and CODEWORDSS = 5

Hamming distance are widely used in coding theory to check the quality of sent signal.

The hamming distance between two fuzzy sets A and B is given as,

formula for hamming distance
Formula for hamming distance

Fuzzy hamming distance is simply the summation of element wise absolute difference.

Example:

Let us compute the hamming distance between given two fuzzy sets:

A = { (x1, 0.4), (x2, 0.8), (x3, 1.0), (x4, 0.0)}

B = { (x1, 0.4), (x2, 0.3), (x3, 0.0), (x4, 0.0) }

h(A, B) = | 0.4 – 0.4 | + | 0.8 – 0.3| + | 1.0 – 0.0 | + | 0.0 – 0.0 | = 1.5

Relative Hamming distance:

Relative hamming distance is the average distance between elements. which is computed as h(A, B) / n, where n denotes the number of elements in the fuzzy set.

For above data, relative gamming distance = 1.5 / 4 = 0.375

Manhattan Distance:

Manhattan distance is also popularly known as city block distance, L1 norm or rectilinear distance. It is computed by taking the sum of absolute difference of Cartesian coordinates.

Euclidean distance between points (x1, y1) and (x2, y2) is computed as,

d = |x1 – x2| + |y1 – y2|

For fuzzy sets, hamming distance and manhattans distance are identical.

In chess, the way elephant moves from one board position to other, is measured using Manhattan distance.  The distance between two points measured along axes at right angles.

Euclidean distance:

Euclidean distance is one of the most popular distance measure. It is also known as Pythagorean distance or L2 norm. Euclidean distance between two points in Euclidean space is simply the length of the line joining those two points.

In simplest form, Euclidean distance is the distance between two points on 2D plane measure using scale/ruler. It is the minimum physical distance between two points. This can be visualized as,

Visualization of Euclidean distance
Visualization of Euclidean distance

Euclidean distance between points (x1, y1) and (x2, y2) is computed as,

Euclidean distance
Euclidean distance between two points

We can generalize this equation to find the Euclidean distance between vectors or fuzzy sets of length n.

Generalized Euclidean distance formula
Generalized Euclidean distance formula

Example:

Let us compute the Euclidean distance between given two fuzzy sets:

A = { (x1, 0.4), (x2, 0.8), (x3, 1.0), (x4, 0.0)}

B = { (x1, 0.4), (x2, 0.3), (x3, 0.0), (x4, 0.0) }

d(A, B) = ( (0.4 – 0.47)2 + (0.8 – 0.3)2 + (1.0 – 0.0)2 + (0.0 – 0.0)2 )1/2 = 1.12

Minkowski Distance:

Minkowksi distance is generalization of both – Manhattan distance and Euclidean distance.

Minkowski distance formula
Minkowski distance formula

By changing the value of w, we can derive w-th norm distance between vectos/sets.

  • w = 1 → Hamming / Manhattan Distance
  • w = 2 → Euclidean Distance

Properties of Distance:

Any distance measure satisfies the following properties:

1. d( A, B ) ≥ 0

2. d( A, B ) = d( B, A )

3. d( A, C ) ≤ d( A, B ) + d( B, C )

4. d( A, A )= 0

Watch on YouTube: Distance and similarity measures

similarity measure

Similarity Measure:

It is an important method for determining the similarities between the elements of two vectors in a set of vectors.

Let X={x1, x2, …, xn} be the set of vectors, where each element xi represents a vector of length m

xi={xi1, xi2,…, xim}

Similarity between two vectors xi and xj is give as,

similarity

Like dissimilarity measures, there are plenty of similarity measures around. We will discuss cosine amplitude similarity measure and max-min similarity measure in context of fuzzy sets.

Cosine similarity measure:

cosine similarity measure

Max-min similarity measure:

Max-min similarity measure

Here, m indicates length of vector.

Cosine Amplitude Similarity Measure:

We will see how to compute cosine amplitude similarity between any pair of fuzzy sets / vectors:

Consider xi represents vectors stating the fuzzy value corresponding to no damage, medium damage and serious damage in flood situation. Vector may represent the colony or area. Using cosine amplitude similarity measure, we can find out what is the similarity of damage between two colony/area.

rij=1, for i = j

So, r11 = r22 = r33 = r44 = r55 = 1

There are 5 vectors and each has size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

cosine similarity - step 1
cosine similarity - step 2
cosine similarity - step 3

The cosine similarity between vector x1 and x2 is 0.836, which represents very high similarity between the vectors. In similar way, we can compute the cosine similarity between every pair of vector as,

Cosine amplitude similarity measure matrix
Cosine amplitude similarity matrix

Max-Min Similarity Measure:

We will consider the same data used for cosine amplitude similarity measure to demonstrate the max-min similarity method.

Like cosine similarity measure,

rij = 1, for i = j

So, r11 = r22 = r33 = r44 = r55 = 1

There are 5 vectors and each has size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

max-min similarity - step 1
max-min similarity - step 2
max-min similarity - step 3

The max-min similarity between vector x1 and x2 is 0.538. In similar way, we can compute the max-min similarity between every pair of vector as,

Max min similarity measure matrix

Test Your Knowledge:

A = { (x1, 0.4), (x2, 0.5), (x3, 0.0), (x4, 0.8) , (x5, 0.6) }

B = { (x1, 1.0), (x2, 0.5), (x3, 0.1), (x4, 0.4) , (x5, 0.8) }

For the fuzzy sets given above, find following distance and similarity measures:

  • d1 = Hamming distance
  • d2 = Relative Hamming distance
  • d3 = Euclidean distance
  • s1 = Cosine amplitude similarity
  • s2 = max-min similarity

Please post your answer / query / feedback in comment section below !