# Distance and similarity measures in fuzzy sets

Distance and similarity measures are widely used in pattern recognition, machine learning, image processing, mathematics, statistics and many other fields.

## Distance measures:

Distance is the dissimilarity between two patterns. The pattern could be a scalar number, vector, matrix or any numeric data. Distance measures are quite useful to find the similarity or the difference in the patterns. If two patterns are identical the dissimilarity /distance would be zero. The difference between patterns increases and the dissimilarity/distance grows up.

Different mathematicians and researchers have proposed different distance measures. A few popular distance measures are Euclidian distance, Manhattan distance, and hamming distance.

### Hamming distance:

Hamming distance is one of the simplest and computationally cheaper distance measures. It is named after Richard Hamming, who was a popular American mathematician.

It is typically used with binary strings. It finds the number of bits which are different in both strings for the corresponding positions.

In other words, we can say that the hamming distance is the number of edits required two make two strings identical

Example:

S1 = 1 0 1 1 1 0

S2 = 0 0 1 1 1 1

In above both the strings, if we scan from left to right, bits on the first and last positions are different, so the hamming distance between these two strings would be 2.

The concept of hamming distance can be extended to other data types also

“P**YTH**O**N**” and “P**ARR**O**T**” = 4

HE**LLOO** and HE**IGHT** = 4

**WE**LL and **FA**LL = 2

CODE**CRUCK**S and CODE**WORDS**S = 5

Hamming distances are widely used in coding theory to check the quality of the sent signal.

The hamming distance between two fuzzy sets A and B is given as,

Fuzzy hamming distance is simply the summation of element-wise absolute difference.

**Example: **

Let us compute the hamming distance between given two fuzzy sets:

A = { (x_{1}, 0.4), (x_{2}, 0.8), (x_{3}, 1.0), (x_{4}, 0.0)}

B = { (x_{1}, 0.4), (x_{2}, 0.3), (x_{3}, 0.0), (x_{4}, 0.0) }

h(A, B) = | 0.4 – 0.4 | + | 0.8 – 0.3| + | 1.0 – 0.0 | + | 0.0 – 0.0 | = 1.5

### Relative Hamming distance:

Relative hamming distance is the average distance between elements. which is computed as h(A, B) / n, where n denotes the number of elements in the fuzzy set.

For above data, relative hamming distance = 1.5 / 4 = 0.375

### Manhattan Distance:

Manhattan distance is also popularly known as city block distance, L1 norm or rectilinear distance. It is computed by taking the sum of the absolute difference of Cartesian coordinates.

Euclidean distance between points (x_{1}, y_{1}) and (x_{2}, y_{2}) is computed as,

d = |x_{1} – x_{2}| + |y_{1} – y_{2}|

For fuzzy sets, hamming distance and manhattans distance are identical.

In chess, the way an elephant moves from one board position to other is measured using Manhattan distance. Â The distance between two points is measured along axes at right angles.

### Euclidean distance:

Euclidean distance is one of the most popular distance measures. It is also known as **Pythagorean distance **or **L2 norm**. Euclidean distance between two points in Euclidean space is simply the length of the line joining those two points.

In the simplest form, Euclidean distance is the distance between two points on a 2D plane measured using a scale/ruler. It is the minimum physical distance between two points. This can be visualized as,

Euclidean distance between points (x_{1}, y_{1}) and (x_{2}, y_{2}) is computed as,

We can generalize this equation to find the Euclidean distance between vectors or fuzzy sets of length n.

**Example: **

Let us compute the Euclidean distance between given two fuzzy sets:

A = { (x_{1}, 0.4), (x_{2}, 0.8), (x_{3}, 1.0), (x_{4}, 0.0)}

B = { (x_{1}, 0.4), (x_{2}, 0.3), (x_{3}, 0.0), (x_{4}, 0.0) }

d(A, B) = ( (0.4 – 0.47)^{2} + (0.8 – 0.3)^{2} + (1.0 – 0.0)^{2} + (0.0 – 0.0)^{2} )^{1/2} = 1.12

### Minkowski Distance:

Minkowski distance is a generalization of both – the Manhattan distance and Euclidean distance.

By changing the value of w, we can derive w-th norm distance between vectors/sets.

- w = 1 â Hamming / Manhattan Distance
- w = 2 â Euclidean Distance

**Properties of Distance**:

Any distance measure satisfies the following properties:

1. d( A, B ) âĨ 0

2. d( A, B ) = d( B, A )

3. d( A, C ) â¤ d( A, B ) + d( B, C )

4. d( A, A )= 0

## Watch on YouTube: Distance and similarity measures

## Similarity Measure:

It is an important method for determining the similarities between the elements of two vectors in a set of vectors.

Let X={x_{1}, x_{2}, âĻ, x_{n}} be the set of vectors, where each element x_{i} represents a vector of length m

x_{i}={x_{i1}, x_{i2},âĻ, x_{im}}

The similarity between two vectors x_{i} and x_{j} is given as,

Like dissimilarity measures, there are plenty of similarity measures around. We will discuss cosine amplitude similarity measures and max-min similarity measures in the context of fuzzy sets.

**Cosine similarity measure:**

**Max-min similarity measure:**

Here, m indicates the length of the vector.

**Cosine Amplitude Similarity Measure**:

We will see how to compute cosine amplitude similarity between any pair of fuzzy sets/vectors:

Consider x_{i} represents vectors stating the fuzzy value corresponding to no damage, medium damage and serious damage in a flood situation. The vector may represent the colony or area. Using the cosine amplitude similarity measure, we can find out what is the similarity of damage between two colonies/areas.

r_{ij}=1, for i = j

So, r_{11} = r_{22} = r_{33} = r_{44} = r_{55} = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The cosine similarity between vectors x_{1} and x_{2} is 0.836, which represents a very high similarity between the vectors. In a similar way, we can compute the cosine similarity between every pair of vectors as,

### Max-Min Similarity Measure:

We will consider the same data used for the cosine amplitude similarity measure to demonstrate the max-min similarity method.

Like the cosine similarity measure,

r_{ij} = 1, for i = j

So, r_{11} = r_{22} = r_{33} = r_{44} = r_{55} = 1

There are 5 vectors and each has a size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The max-min similarity between vectors x_{1} and x_{2} is 0.538. In a similar way, we can compute the max-min similarity between every pair of vectors as,

## Test Your Knowledge:

A = { (x_{1}, 0.4), (x_{2}, 0.5), (x_{3}, 0.0), (x_{4}, 0.8) , (x_{5}, 0.6) }

B = { (x_{1}, 1.0), (x_{2}, 0.5), (x_{3}, 0.1), (x_{4}, 0.4) , (x_{5}, 0.8) }

For the fuzzy sets given above, find the following distance and similarity measures:

- d
_{1}= Hamming distance - d
_{2}= Relative Hamming distance - d
_{3}= Euclidean distance - s
_{1}= Cosine amplitude similarity - s
_{2}= max-min similarity

**Please post your answer / query / feedback in comment section below !**

Hamming distance

d1 = |0.4 – 1.0| + |0.5 – 0.5| + |0.0 – 0.1|+ |0.8-0.4 | + | 0.6 – 0.8 | = 0.6 + 0 +0.1 + 0.4 + 0.2 = 1.3

You got that right.. Bingo !

Relative hamming distance

d2 = 1.3 / 5 = 0.26

thumbs up :-)

Euclidean distance

d3 = (0.36 +0 + 0.01 + 0.16 + 0.04 ) ^ 1/2

= 0.75 (Round off up to 2 decimal places)

Correct ! All good.