Are you ready to grasp the geometry of high dimensional data? Otherwise, you are cursed!

High dimensional data is a term that describes the number of features columns in a data set is relatively higher than the number of instances or rows. Most of traditional statistical and machine learning methods are suffering and not effective in high dimensional tasks. Even us, we have difficulty to grasp the picture of high dimension. All of us may agree visualizing high dimensional data hides seeing patterns among features. You can blame the school geometry, which is based on plane geometry: it may reduce our high dimensional understanding.

The sparsity of data increases exponentially as the dimension (the number of features or columns, not rows) increases. Data sparseness has a negative impact, especially, on local methods since, for increasing n, we need to consider larger and larger neighborhoods to find the closest points. In other words, the higher number of features, the less meaningful is the notion of locality and the less accurate is any locally-based estimation procedures. A hypercube is a p-dimensional analogue of a square (p = 2) and a cube (p = 3). For example, for 50 features of data, we need to have an edge length which is 95% of the unit length if we want to barely cover 10% of the total volume.

What is the point here? I would like to draw the importance of training on the concepts and data modeling applications to deal with high dimensional settings. Just expect some resources from us on the issue.  

DataCodix

Leave a Reply

Your email address will not be published. Required fields are marked *