# Distance and similarity measures in fuzzy sets

Distance and similarity measures are widely used in pattern recognition, machine learning, image processing, mathematics, statistics and many other fields.

## Distance measures:

Distance is dissimilarity between two patterns. Pattern could be scalar number, vector, matrix or any numeric data. Distance measures are quite useful to find the similarity or the difference in the patterns. If two patterns are identical the dissimilarity /distance would be zero. As difference between patterns increases, the dissimilarity / distance grows up.

Different mathematicians and researchers have proposed different distance measures. Few popular distance measures are Euclidian distance, Manhattan distance, hamming distance.

### Hamming distance:

Hamming distance is one of the simplest and computationally cheaper distance measure. It is named after Richard Hamming, who was popular American mathematician.

It is typically used with the binary strings. It finds the number of bits which are different in both strings for on the corresponding positions.

In other words, we can say that the hamming distance is the number of edits required two make two strings identical

Example:

S1 = 1 0 1 1 1 0

S2 = 0 0 1 1 1 1

In above both the strings, if we scan from left to right, bits on first and last positions are different, so the hamming distance between these two strings would be 2.

The concept of hamming distance can be extended to other data types also

“P**YTH**O**N**” and “P**ARR**O**T**” = 4

HE**LLOO** and HE**IGHT** = 4

**WE**LL and **FA**LL = 2

CODE**CRUCK**S and CODE**WORDS**S = 5

Hamming distance are widely used in coding theory to check the quality of sent signal.

The hamming distance between two fuzzy sets A and B is given as,

Fuzzy hamming distance is simply the summation of element wise absolute difference.

**Example: **

Let us compute the hamming distance between given two fuzzy sets:

A = { (x_{1}, 0.4), (x_{2}, 0.8), (x_{3}, 1.0), (x_{4}, 0.0)}

B = { (x_{1}, 0.4), (x_{2}, 0.3), (x_{3}, 0.0), (x_{4}, 0.0) }

h(A, B) = | 0.4 – 0.4 | + | 0.8 – 0.3| + | 1.0 – 0.0 | + | 0.0 – 0.0 | = 1.5

### Relative Hamming distance:

Relative hamming distance is the average distance between elements. which is computed as h(A, B) / n, where n denotes the number of elements in the fuzzy set.

For above data, relative gamming distance = 1.5 / 4 = 0.375

### Manhattan Distance:

Manhattan distance is also popularly known as city block distance, L1 norm or rectilinear distance. It is computed by taking the sum of absolute difference of Cartesian coordinates.

Euclidean distance between points (x_{1}, y_{1}) and (x_{2}, y_{2}) is computed as,

d = |x_{1} – x_{2}| + |y_{1} – y_{2}|

For fuzzy sets, hamming distance and manhattans distance are identical.

In chess, the way elephant moves from one board position to other, is measured using Manhattan distance. The distance between two points measured along axes at right angles.

### Euclidean distance:

Euclidean distance is one of the most popular distance measure. It is also known as **Pythagorean distance **or **L2 norm**. Euclidean distance between two points in Euclidean space is simply the length of the line joining those two points.

In simplest form, Euclidean distance is the distance between two points on 2D plane measure using scale/ruler. It is the minimum physical distance between two points. This can be visualized as,

Euclidean distance between points (x_{1}, y_{1}) and (x_{2}, y_{2}) is computed as,

We can generalize this equation to find the Euclidean distance between vectors or fuzzy sets of length n.

**Example: **

Let us compute the Euclidean distance between given two fuzzy sets:

A = { (x_{1}, 0.4), (x_{2}, 0.8), (x_{3}, 1.0), (x_{4}, 0.0)}

B = { (x_{1}, 0.4), (x_{2}, 0.3), (x_{3}, 0.0), (x_{4}, 0.0) }

d(A, B) = ( (0.4 – 0.47)^{2} + (0.8 – 0.3)^{2} + (1.0 – 0.0)^{2} + (0.0 – 0.0)^{2} )^{1/2} = 1.12

### Minkowski Distance:

Minkowksi distance is generalization of both – Manhattan distance and Euclidean distance.

By changing the value of w, we can derive w-th norm distance between vectos/sets.

- w = 1 → Hamming / Manhattan Distance
- w = 2 → Euclidean Distance

**Properties of Distance**:

Any distance measure satisfies the following properties:

1. d( A, B ) ≥ 0

2. d( A, B ) = d( B, A )

3. d( A, C ) ≤ d( A, B ) + d( B, C )

4. d( A, A )= 0

## Watch on YouTube: Distance and similarity measures

## Similarity Measure:

It is an important method for determining the similarities between the elements of two vectors in a set of vectors.

Let X={x_{1}, x_{2}, …, x_{n}} be the set of vectors, where each element x_{i} represents a vector of length m

x_{i}={x_{i1}, x_{i2},…, x_{im}}

Similarity between two vectors x_{i} and x_{j} is give as,

Like dissimilarity measures, there are plenty of similarity measures around. We will discuss cosine amplitude similarity measure and max-min similarity measure in context of fuzzy sets.

**Cosine similarity measure:**

**Max-min similarity measure:**

Here, m indicates length of vector.

**Cosine Amplitude Similarity Measure**:

We will see how to compute cosine amplitude similarity between any pair of fuzzy sets / vectors:

Consider xi represents vectors stating the fuzzy value corresponding to no damage, medium damage and serious damage in flood situation. Vector may represent the colony or area. Using cosine amplitude similarity measure, we can find out what is the similarity of damage between two colony/area.

r_{ij}=1, for i = j

So, r_{11} = r_{22} = r_{33} = r_{44} = r_{55} = 1

There are 5 vectors and each has size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The cosine similarity between vector x_{1} and x_{2} is 0.836, which represents very high similarity between the vectors. In similar way, we can compute the cosine similarity between every pair of vector as,

### Max-Min Similarity Measure:

We will consider the same data used for cosine amplitude similarity measure to demonstrate the max-min similarity method.

Like cosine similarity measure,

r_{ij} = 1, for i = j

So, r_{11} = r_{22} = r_{33} = r_{44} = r_{55} = 1

There are 5 vectors and each has size of 3, so n = 5 and m = 3

Lets take i = 1, j = 2.

The max-min similarity between vector x_{1} and x_{2} is 0.538. In similar way, we can compute the max-min similarity between every pair of vector as,

## Test Your Knowledge:

A = { (x_{1}, 0.4), (x_{2}, 0.5), (x_{3}, 0.0), (x_{4}, 0.8) , (x_{5}, 0.6) }

B = { (x_{1}, 1.0), (x_{2}, 0.5), (x_{3}, 0.1), (x_{4}, 0.4) , (x_{5}, 0.8) }

For the fuzzy sets given above, find following distance and similarity measures:

- d1 = Hamming distance
- d2 = Relative Hamming distance
- d3 = Euclidean distance
- s1 = Cosine amplitude similarity
- s2 = max-min similarity

**Please post your answer / query / feedback in comment section below !**

Hamming distance

d1 = |0.4 – 1.0| + |0.5 – 0.5| + |0.0 – 0.1|+ |0.8-0.4 | + | 0.6 – 0.8 | = 0.6 + 0 +0.1 + 0.4 + 0.2 = 1.3

You got that right.. Bingo !

Relative hamming distance

d2 = 1.3 / 5 = 0.26

thumbs up :-)

Euclidean distance

d3 = (0.36 +0 + 0.01 + 0.16 + 0.04 ) ^ 1/2

= 0.75 (Round off up to 2 decimal places)

Correct ! All good.