Understanding Data Representation, Approximation, and Operations in high Dimensional Spaces  

Introduction:

The concept of space plays a crucial role in representing data points and more importantly data points with underlying patterns. Let's start with an intuitive definition of space: it can be visualized as a virtual container comprising an infinite collection of tiny points, each precisely located using a coordinate system, for example, in our world coordiante system, we use Cartesian coordinates (x, y, z).  

The "span" or reach of the space refers to all the possible locations that can be reached from a point within that space, and this reach is enclosed or bounded under this space. Instead of describing the space as a collection of all individual points, we can efficiently represent the entire space using a smaller set of independent vectors and linear operators, such as vector addition and scaling. This means that any point within the space can be represented as a combination of these smaller set of vectors with various scaling factors.

In more complex scenarios, especially in higher-dimensional spaces, a larger set of vectors is necessary for accurate representation of the data's richness and complexity. This larger set allows us to encompass more variations and finer details in the data. By leveraging this idea of space and using a concise set of vectors, we can effectively represent data with fewer dimensions while preserving its essential characteristics and patterns.

Key Concepts

In data representation and processing, different types of spaces play crucial roles in understanding and analyzing datasets. These spaces provide a mathematical framework for organizing and visualizing data points with underlying patterns. Let's explore some key concepts:

Representation of Signals:

 In digital signal processing (DSP), signals are often represented as sequences of numbers, which can be thought of as vectors in sequence spaces like ℓ2 (the space of square-summable sequences).ℓ2 is a Banach space, so many of its beneficial properties can be applied directly to the analysis of digital signals. 


Linear System:

A linear system is a system that follows the principle of superposition, which means its output is a result of a direct and linear combination of its input variables. In other words, the output (let's call it Z) can be expressed as Z = Ax + By and not as Y = Ax^2 + By^2. The linearity here refers to the fact that the system's response can be represented by a simple equation, where A and B are linear scaling factors that determine the influence of the input variables X and Y on the output Z. So, the system can be described by a line equation.

In summary, a linear system exhibits a straightforward and proportional relationship between its inputs and outputs, governed solely by linear scaling factors. It follows the principle of superposition, and its response can be accurately represented using a line equation. 


Linear Space:

Let's extend this idea of linear system into space. In the context of linear spaces, the idea of linearity is expanded beyond just systems with input and output variables. A linear space, also known as a vector space, is a mathematical concept that encompasses a collection of elements, called vectors, where certain properties of addition and scalar multiplication are satisfied.

Let's consider a linear space defined over a field, typically real numbers (R) or complex numbers (C). We'll denote this space as V. In this linear space V, we have two fundamental operations:

Now, we can connect this concept of linearity in vector spaces to the previous idea of linear systems. In a linear space, a set of vectors forms a linear system if the following condition is met:

For any vectors u, v in the set, and any scalars a, b in the field, the linear combination au + bv is also in the set.

In simpler terms, a linear system in a vector space V implies that the system's responses to linear combinations of vectors from the set will also lie within the same set.

Linear Operators:

When we talk about systems or filters in signal processing, especially linear ones, we can think of them as operators acting on signals. A signal, being a function or sequence, can be viewed as a vector in some function space, and the system or filter is an operator acting on this space.

In this context, the operation of the system on a signal is equivalent to the action of the operator on a vector in the space.

Vector:

A vector is an array of numerical values, representating of a point on to a space of interest . In 2D euclidean space, a vector is the (x, y) coordinate represent a point in that space, which a point has both a length (from origin to the point) and angle information. In 3D, it's a point is represented by (x, y, z) coordinate system. To visualize a vector, you can imagine an arrow originating from the origin of the coordinate system and pointing to the location described by the vector's coordinates. The length of the arrow indicates the magnitude of the vector, and its direction represents the angle or orientation of the vector in the space.  The import things to remeber is that a vector captures critical information of a data point, the angle and magnitude, in a high dimentional space. 


Following depiction shows vector as an array with magnitude and angle for several linear vector spaces:

Euclidean Space:

Euclidean space refers to a specific type of linear space known as n-dimensional Euclidean space. It is characterized by its metric properties, where distances between points are defined using the Euclidean distance formula, and angles between vectors are defined using the dot product. In two-dimensional (2D) Euclidean space, a vector is represented by (x, y) coordinates, while in three-dimensional (3D) space, the coordinates expand to (x, y, z) to describe a point's position.

Vector Space:

A vector space is a more general term used to describe any linear space, regardless of its dimensionality. It encompasses both finite and infinite-dimensional spaces and includes Euclidean space as a special case when considering 2D or 3D space. In a vector space, vectors can be added and scaled, and they obey specific axioms that define the space's algebraic properties. In another word, it's a space where vectors can live abides to vector algebraic addition and multiplication properties. 

Here are some examples of vector spaces:

For example, P_2 is the set of all polynomials of degree less than or equal to 2. It includes functions such as:

In this space, we can perform operations like addition, subtraction and scalar multiplication, similar to other vector spaces. Here are a few examples:

Moreover, the zero vector in this space is the zero polynomial, the polynomial p(x) = 0 for all x.

For example, consider the set of all real-valued functions of a real variable, denoted by F(R, R). This means that each function in this set takes a real number as input and produces a real number as output.

The function f(x) = x^2 and g(x) = sin(x) are both elements of this function space. We can perform operations such as addition and scalar multiplication in this function space. For instance:

Fourier Series

The Fourier basis is associated with a function space, specifically the space of functions that are square-integrable over a certain interval (often taken to be [-π, π] or [0, 2π]). The concept of a Fourier series tells us that any function in our function space can be written as a (possibly infinite) linear combination of sine and cosine functions, which serve as the Fourier basis functions. 

Another common example of a function space is the space of square-integrable functions, denoted by L^2, which is used frequently in quantum mechanics 

A matrix space is the set of all matrices of a particular dimension. Here's an example:

Let's consider the space of 2x2 real matrices. This space is denoted as M_2,2(R), and it is a vector space. A typical element of this space looks like:

[a, b]

[c, d]

where a, b, c, d are real numbers.

The vector addition and scalar multiplication in this space are defined as follows:

So, the space M_2,2(R) is a vector space because it includes the zero vector (the 2x2 zero matrix), it is closed under vector addition and scalar multiplication, and these operations satisfy the required properties for a vector space.

These examples show the variety of sets that can form vector spaces, from the familiar real numbers and n-dimensional spaces to more abstract spaces of functions, polynomials, and matrices. Each of these vector spaces can be studied with the tools of linear algebra.

Subspace of a Real Linear Vector Space

A subspace is a subset of a vector space that forms a vector space itself. It contains all the properties of a vector space, such as closure under vector addition and scalar multiplication. Subspaces are essential in various mathematical applications, including linear transformations and eigenspaces, which help understand data variance and patterns. Below is a depictions for projecting 3D points into respective 2D (A plane with the 3D space) and 1D subspaces (A line within the 3D space). 

Note: We want to emphasize that a subspace is represented by a hyperplane (plane in 2D and a line in 1D) that passes through the origin. This is crucial for linear transformation properties, especially when working with different coordinate systems or vector spaces involving translation and rotation. 

Hilbert Space: 

A Hilbert space is a particular type of vector space with additional mathematical structure. It is an infinite-dimensional space equipped with an inner product that allows for the definition of lengths and angles. Hilbert spaces are foundational in functional analysis, quantum mechanics, and signal processing, and they find applications in representing infinite-dimensional data, such as continuous signals and functions.

Normed Space: 

A vector space that is furnished with a norm is referred to as a normed space. A norm is a special function that, in a way coherent with the vector space's structure, attributes a strictly positive length or size to each vector in the space - the sole exception being the zero vector, to which it assigns a length of zero. An effective method to calculate the norm of a vector involves taking the square root of the inner product of the vector with itself, denoted as sqrt(<v, v>), where v is the vector. The introduction of a norm in a vector space endows it with the valuable properties of size and distance, which are instrumental in many mathematical and real-world applications, such as in defining metrics for error measurement, or distances in machine learning algorithms. .

Note: It's important to understand that any vector space can have a norm defined on it, but not all vector spaces do have a norm defined. Whether a vector space is a normed space depends on whether a norm has been explicitly defined for that space. 

For example, consider the vector space of all polynomials with real coefficients, denoted as P(R). We can add and scale polynomials, so this set of all polynomials forms a vector space. But if we haven't defined a norm (a measure of "length") for these polynomials, then this vector space is not a normed space. 

Metric Space: 

A metric space is another crucial concept in space representation. It is a set with a distance function (metric) defined on it, enabling the measurement of distances between elements. Metric spaces are fundamental in topology, analysis, and data science, as they provide a way to quantify and compare relationships between data points, aiding in clustering and similarity-based analysis.

Note: A normed space and a metric space are both mathematical constructs that, in essence, allow for a concept of "distance" between points. However, they differ in their structure and the level of strictness in their definitions. Here's a closer look:

Normed Space: A normed space is a vector space on which a norm is defined. A norm is a function that assigns a non-negative length or size to each vector in the vector space - with the zero vector being the only vector assigned a length of zero - in a way that aligns with the structure of the vector space. This norm function must satisfy certain properties, including definiteness, scalability or absolute homogeneity, and the triangle inequality.

Metric Space: A metric space is a set equipped with a function called a metric. This metric, denoted as d(x, y) where x and y are elements of the set, defines the "distance" between any two points in the set. Like a norm, a metric must satisfy certain properties: non-negativity, identity of indiscernibles, symmetry, and the triangle inequality.

The key differences between them are:

To summarize, while both normed spaces and metric spaces give us a way to measure distance, a normed space is a more specific structure that also allows us to perform vector operations and measure vector lengths

Complete Space:

A metric space (or normed space) is said to be complete if every Cauchy sequence in the space converges to a point in the space. A Cauchy sequence is a sequence where the distance between its terms becomes arbitrarily small as the sequence progresses.

Banach Space:

Putting it all together, a Banach space is a normed vector space that is complete. This means not only can we talk about the size or length of vectors and the distance between them, but every sequence of vectors that "should" converge (i.e., is Cauchy) does converge to a vector in the space. In a Banach space, the idea of limits and convergence is well-defined for all sequences that intuitively should have limits. This property ensures that when we're discussing the action of operators (systems) on signals and the convergence behavior of sequences of signals, everything behaves as expected. 

In essence, the completeness property of Banach spaces provides a robust framework ensuring that sequences converge, operators behave predictably, and various mathematical constructs have the properties we expect. This assurance is not guaranteed in more general metric or normed spaces. Without this completeness, many of the powerful results and theorems in functional analysis wouldn't hold, or their proofs would be far more intricate. 

Kernel Space:

Kernel space is an important addition to the discussion of space representation, particularly in machine learning and data analysis. It extends the concept of Euclidean space to represent non-linear relationships between data points. By utilizing a kernel function, it computes the inner product between data points in a higher-dimensional feature space. This technique effectively transforms data points, making them linearly separable in higher dimensions, even when they are not separable in the original Euclidean space. Kernel methods, like the kernel trick in Support Vector Machines (SVMs), find applications in diverse areas, such as classification, regression, and dimensionality reduction tasks.

Mathematical Spaces Summary

Linear Space Construction and Orthogonal Basis:

To construct a linear space, one can start with a set of vectors and extend it using linear combinations of these vectors. An orthogonal basis is a special type of basis that consists of linearly independent vectors that are mutually orthogonal (perpendicular). Having an orthogonal basis simplifies vector operations and allows for a more efficient representation of the space.


Nonlinear Space and Manifold:

In contrast to linear spaces, nonlinear spaces lack the properties of vector addition and scalar multiplication. A manifold is a prime example of a nonlinear space, resembling Euclidean space locally. Manifolds find applications in geometry, physics, and data representation. They are crucial because projecting or representing objects or information from one space to a subspace might not be optimally captured using a linear entity (like a line or plane in lower-dimensional spaces). Instead, a more suitable representation could be a smooth-surfaced spherical object, like a sphere, which preserves most of the visual information. For instance, projecting a 3D world view into a panoramic representation or mapping the same information onto a 2D nonlinear manifold, such as a sphere, can be more effective in preserving essential details. 


Important Concepts of High dimensional Information Construction and Properties:

Vector Norm:

A vector norm is a function that assigns a non-negative length or size to a vector. It satisfies certain properties, such as non-negativity, homogeneity, and the triangle inequality. Common examples of vector norms include the Euclidean norm (also known as the 2-norm) and the Manhattan norm (also known as the 1-norm).


Orthonormal:

A set of vectors is orthonormal if all the vectors are mutually orthogonal and have a unit norm (length equal to 1). Orthonormal vectors are often used in linear algebra and signal processing due to their mathematical simplicity and usefulness in computations.


Operations in Vector Space:

Just like we can addition, multiplication can transform data in classic algebra. For vector, there are also mathmatic operations that we can apply to to transform these data.


Inner Product Operator:

The inner product operator is a mathematical operation that takes two vectors from a vector space and produces a scalar value. It measures the "closeness" or alignment between the two vectors and is defined using the dot product in Euclidean space or the general inner product in a more abstract vector space.


Multiplication Operator:

The multiplication operator is a linear transformation that multiplies a vector by a fixed matrix or scalar. It is a fundamental operation in linear algebra, and its applications range from transformations in graphics to solving systems of linear equations.


Note. The equation Ax = B represents a system of linear equations that can be solved to find the values of vector x that satisfy the equation for the given vector B. Matrix A is a linear transformation matrix that maps data X from the input vector space to the output vector space, which is spanned by the column vectors of A. Each column vector of A represents a basis vector in the output vector space.


Addition Operator:

The addition operator, simply put, is the operation of adding two vectors element-wise to create a new vector that represents their sum. It is a fundamental operation in linear spaces and plays a central role in vector calculations.


Adjoint Operator:

In the context of linear transformations, the adjoint operator, also known as the Hermitian adjoint or conjugate transpose, is the generalization of the transpose for complex vector spaces. It is used to represent the transpose of a matrix or the complex conjugate transpose.


Projection Operator:

A projection operator is a linear transformation that projects a vector onto a subspace along a specified direction. It is commonly used in signal processing, image compression, and data reduction techniques.


Subspace Methods for Approximating Higher-dimensional Objects into Lower-dimensional Presentations:

Least Square Approximation:

Least square approximation is a method used to find the best-fitting linear representation of data when there is no exact solution. It minimizes the sum of squared errors between the data points and the linear model.


Orthogonal Projection:

Orthogonal projection is a technique used to project a vector onto a subspace along a direction that is perpendicular (orthogonal) to that subspace. It is particularly useful when dealing with orthogonal bases.


Pseudoinverse:

The pseudoinverse, also known as the Moore-Penrose inverse, is a generalization of the matrix inverse to non-square and singular matrices. It is used in cases where the matrix is not invertible and helps in solving linear systems with more equations than unknowns.


SVD (Singular Value Decomposition):

SVD is a powerful matrix factorization technique that represents a matrix as a product of three matrices: U, Σ, and V^T (transpose of V). It is widely used in data compression, image processing, and collaborative filtering algorithms.


Practical Example 1 - Hand Gesture Recognition:


Let's consider a practical example to illustrate the concepts discussed above. Imagine you are working on a computer vision project where you need to recognize hand gestures to control a virtual game. To achieve this, you have a camera that captures images of the hand in real-time.


1. Space Representation: Each image can be thought of as a point in a high-dimensional space, where each pixel's intensity corresponds to a specific coordinate in color space (i.e R, G B, coordinate system). Since the images are high-dimensional, directly processing them can be computationally expensive and challenging.


2. Dimensionality Reduction: To reduce the dimensionality and capture the essential features of the hand gestures, you can use the Singular Value Decomposition (SVD) method. SVD will decompose the high-dimensional image data into three matrices: U, Σ, and V^T. The matrix U represents the orthonormal basis of the image space, while Σ contains the singular values, and V^T captures the projection of images onto the new basis.


3. Approximation: By retaining only the most significant singular values and corresponding basis vectors, you can approximate each image using a lower-dimensional representation. This approximation will enable you to represent complex hand gestures with fewer parameters, making the recognition task more efficient.


4. Subspace Classification: After reducing the dimensionality, you can define a subspace for each specific hand gesture (e.g., open palm, fist, thumbs-up) based on the low-dimensional representations. During real-time recognition, you can project new incoming images onto each subspace using the projection operator.


5. Classification: To classify the hand gesture, you can use the inner product operator to calculate the similarity between the projected image and the subspace representation of each gesture. The gesture with the highest similarity score will be identified as the recognized hand gesture.


Practical Example 2 - Flower Classification:

The visualization showcases a comparison between the original Iris dataset and its reduced version with K-means clustering. The left subplot represents the original dataset in a 3D space, where each data point corresponds to an Iris flower sample. The three axes represent the sepal length, sepal width, and petal length in centimeters, respectively. Different colors highlight the three distinct species of Iris flowers: setosa, versicolor, and virginica.

In contrast, the right subplot displays the reduced dataset after applying TruncatedSVD, representing the data in a 2D space. The two principal components, Component 1 and Component 2, capture the most significant patterns in the original dataset. K-means clustering has further segmented the data into three distinct clusters, indicated by varying colors.

The comparison allows us to observe the distribution of the Iris flower samples in the reduced 2D space compared to their original 3D representation. Although reduced to just two dimensions, the reduced dataset successfully retains essential patterns and species groupings, showcasing the power of dimensionality reduction techniques.

Practical Example 3 - 3D Visual Representation onto 2D Maniford and 2D Linear Vector Space

In this example, we explore the concept of projecting a panoramic image onto two different spaces: a spherical manifold and a linear vector space. A panoramic image is a wide-angle view that wraps around seamlessly, creating an immersive environment. To better understand this, we can think of each pixel in the panorama as capturing both the intensity and the angular data of a visual ray originating from the 3D world. Since there are infinite visual rays, only the ones directly in the camera's line of sight (or in our case, our eyes) can be captured by the panoramic image. 


The visualization presented below showcases the difference between two types of projections: the spherical manifold (specifically, the Riemann manifold) and the linear vector space, used for displaying panoramic images. The Riemann manifold is a special type of manifold where the local surface appears smooth or resembles a linear vector space.

The spherical manifold projection preserves the continuous panorama on the surface of the sphere, maintaining the immersive experience. This Riemann manifold allows for a seamless representation of the panoramic image, providing a natural and immersive viewing experience.

On the other hand, the linear vector space, being a plane, can only capture a portion of the panoramic image without appearing to be distorted. Since it lacks the curvature and global continuity of the Riemann manifold, it cannot fully encompass the immersive qualities of the panorama.

This example effectively highlights the distinction between projecting onto the nonlinear Riemann manifold and the linear vector space, both representing real-world panoramic images.

By understanding and comparing these two projection methods, we recognize the significance of nonlinear representations, particularly the Riemann manifold, in various applications such as virtual reality, image processing, and data visualization. Nonlinear representations, like the Riemann manifold, play a crucial role in preserving the immersive and accurate visual experience in these fields.

Summary:


Conclusion: