Chem: Dimensionality Reduction & Clustering

Jeheonpark
3 min readOct 1, 2020

When you have a high dimensional dataset, it is hard to visualize or recognize the pattern. Low dimensionality is the rule of thumb in Chemoinformatics but it is not easy to maintain because we need a lot of descriptors to get the information. It is also hard to build low dimensionality from the beginning since we cannot know the independent of vectors and their correlation. Therefore, dimensionality reduction is preferred.

Dimensionality Reduction

The advantages of Dimensionality Reduction are:

  1. Balanced compound distributions
  2. Orthogonal reference spaces
  3. Improved interpretability
  4. Possible visualization

Non-linear Mapping

MDS(Multidimensional scaling)

You can check my post.

PCA

You can check my post.

Cell-Based Partitioning Methods

When it comes to the big dataset, it is really hard to calculate the pairwise distance. However, cell-based partitioning can be an alternative to distance-based methods. If the molecules map to the same partition(cell), then it considers they are similar to each other.

--

--

Jeheonpark

Jeheon Park, Software Engineer at Kakao in South Korea