Neural networks are intrinsically tied to gradient-based learning due to the backpropagation algorithm, which uses gradient descent to adjust the network's weights. However, it's theoretically possible to train neural networks without gradient descent:
However, these alternative methods often don't scale well to large networks with millions (or billions) of parameters, like the ones in deep learning. Gradient-based methods remain the most efficient way to train large neural networks.
IN PROGRESS
Here, I will extrapolate foundational mathematical methods and structures, and it’s an application to Deep Learning. Here, I preview the Theory of Deep Learning purely on its mathematical foundations. See
Deep learning operates on vectors in a geometric space, where each layer performs a simple transformation on the data that goes through it. The chain of layers forms a complex transformation that attempts to map the input space to the target space. This transformation is parameterized by the weights of the layers, which are iteratively updated based on how well the model is performing. The process requires a differentiable geometric transformation, which must be smooth and continuous.
The magic of deep learning lies in turning meaning into vectors, then into geometric spaces, and then incrementally learning complex geometric transformations that map one space to another. The process is based on the core idea that meaning is derived from the pairwise relationships between things and that these relationships can be captured by a distance function. While neural networks use vector spaces to encode meaning, other data structures for intelligence, such as graphs, can also be envisioned. Neural networks were originally based on the idea of using graphs to encode meaning, but their name is misleading as they have little to do with the brain. A more appropriate name would be layered representations learning or hierarchical representations learning.
Multiplying Matrices and Vectors