Chem: Descriptor-Based Similarity

Jeheonpark
5 min readSep 30, 2020

This space is what we are trying to study and determine how we decide the distance. We will use the distance between compounds and we can conclude the similarity. If they are far from each other, then it is not similar to each other in this descriptor space. Then, we can find a potential drug candidate.

Distance and Metric

They are different but people used to think of it as the same concepts. The metric includes the distance. If the distance wants to be a metric, then it needs 4 conditions.

  1. All distances are positive
  2. The distance between identical objects is 0. If they are different, it must be bigger than 0
  3. The function must be symmetric
  4. The function must obey the triangle inequality.

If the distance can not fully meet those conditions, then it becomes the metric. The most famous distance is Euclidean Distance and Manhattan Distance.

Descriptor-Based Similarity

The similarity is negatively correlated with the distance between the two compounds. We can calculate the similarity by the…

--

--

Jeheonpark

Jeheon Park, Software Engineer at Kakao in South Korea