Deep Learning Wikipedia

Leo Migdal

-Nov 26, 2025, 3:23 PM

In machine learning, deep learning focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers (ranging from three to several hundred or thousands) in the network. Methods used can be supervised, semi-supervised or unsupervised.[2] Some common deep learning network architectures include fully connected networks, deep belief networks, recurrent neural networks, convolutional neural networks, generative adversarial networks, transformers, and neural radiance fields. These architectures have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have...

Early forms of neural networks were inspired by information processing and distributed communication nodes in biological systems, particularly the human brain. However, current neural networks do not intend to model the brain function of organisms, and are generally seen as low-quality models for that purpose.[6] Most modern deep learning models are based on multi-layered neural networks such as convolutional neural networks and transformers, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models... Fundamentally, deep learning refers to a class of machine learning algorithms in which a hierarchy of layers is used to transform input data into a progressively more abstract and composite representation. For example, in an image recognition model, the raw input may be an image (represented as a tensor of pixels). The first representational layer may attempt to identify basic shapes such as lines and circles, the second layer may compose and encode arrangements of edges, the third layer may encode a nose and eyes,...

A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio.[1] CNNs are the de-facto standard in deep learning-based approaches... Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by the regularization that comes from using shared weights over fewer connections.[3][4] For example, for each neuron in the fully-connected... However, applying cascaded convolution (or cross-correlation) kernels,[5][6] only 25 weights for each convolutional layer are required to process 5x5-sized tiles.[7][8] Higher-layer features are extracted from wider context windows, compared to lower-layer features. CNNs are also known as shift invariant or space invariant artificial neural networks, based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known... Feedforward neural networks are usually fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer.

The "full connectivity" of these networks makes them prone to overfitting data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the... Convolutional networks were inspired by biological processes[18][19][20][21] in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field. Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks.[1] As with other kinds of machine learning, learning...

In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as recognizing and understanding speech, images or handwriting, are easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways,... Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning.

RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g. every pixel rendered to the screen in a video game) and decide what actions to perform to optimize an objective (e.g. maximizing the game score). Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision,[1] education, transportation, finance and healthcare.[2]

Deep learning is a form of machine learning that transforms a set of inputs into a set of outputs via an artificial neural network. Deep learning methods, often using supervised learning with labeled datasets, have been shown to solve tasks that involve handling complex, high-dimensional raw input data (such as images) with less manual feature engineering than prior... In the past decade, deep RL has achieved remarkable results on a range of problems, from single and multiplayer games such as Go, Atari Games, and Dota 2 to robotics.[3] Reinforcement learning is a process in which an agent learns to make decisions through trial and error. This problem is often modeled mathematically as a Markov decision process (MDP), where an agent at every timestep is in a state s {\displaystyle s} , takes action a {\displaystyle a} , receives a... The agent attempts to learn a policy π ( a | s ) {\displaystyle \pi (a|s)} , or map from observations to actions, in order to maximize its returns (expected sum of rewards).

In reinforcement learning (as opposed to optimal control) the algorithm only has access to the dynamics p ( s ′ | s , a ) {\displaystyle p(s'|s,a)} through sampling. In many practical decision-making problems, the states s {\displaystyle s} of the MDP are high-dimensional (e.g., images from a camera or the raw sensor stream from a robot) and cannot be solved by traditional... Deep reinforcement learning algorithms incorporate deep learning to solve such MDPs, often representing the policy π ( a | s ) {\displaystyle \pi (a|s)} or other learned functions as a neural network and developing... Along with rising interest in neural networks beginning in the mid 1980s, interest grew in deep reinforcement learning, where a neural network is used in reinforcement learning to represent policies or value functions. Because in such a system, the entire decision making process from sensors to motors in a robot or agent involves a single neural network, it is also sometimes called end-to-end reinforcement learning.[4] One of... With zero knowledge built in, the network learned to play the game at an intermediate level by self-play and TD( λ {\displaystyle \lambda } ).

In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a... Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM).[2] Later variations have been widely adopted for training... The modern version of the transformer was proposed in the 2017 paper "Attention Is All You Need" by researchers at Google.[1] The predecessors of transformers were developed as an improvement over previous architectures for... They are used in large-scale natural language processing, computer vision (vision transformers), reinforcement learning,[6][7] audio,[8] multimodal learning, robotics,[9] and even playing chess.[10] It has also led to the development of pre-trained systems, such as... For many years, sequence modelling and generation was done by using plain recurrent neural networks (RNNs). A well-cited early example was the Elman network (1990).

In theory, the information from one token can propagate arbitrarily far down the sequence, but in practice the vanishing-gradient problem leaves the model's state at the end of a long sentence without precise, extractable... A key breakthrough was LSTM (1995),[note 1] an RNN which used various innovations to overcome the vanishing gradient problem, allowing efficient learning of long-sequence modelling. One key innovation was the use of an attention mechanism which used neurons that multiply the outputs of other neurons, so-called multiplicative units.[13] Neural networks using multiplicative units were later called sigma-pi networks[14] or... However, LSTM still used sequential processing, like most other RNNs.[note 2] Specifically, RNNs operate one token at a time from first to last; they cannot operate in parallel over all tokens in a sequence. 深度学习（英語：deep learning）是机器学习的分支，是一種以人工神經網路為架構，對資料進行表徵學習的算法。[1][2][3][4][5]深度学习中的形容词“深度”是指在网络中使用多层。早期的工作表明，线性感知器不能成为通用分类器，但具有非多项式激活函数和一个无限宽度隐藏层的网络可以成为通用分类器。深度学习是机器学习中一种基于对数据进行表征学习的算法。观测值（例如一幅图像）可以使用多种方式来表示，如每个像素强度值的向量，或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务（例如，人脸识别或面部表情识别[6]）。深度学习的好处是用非监督式或半监督式（英语：Semi-supervised learning）的特征学习和分层特征提取高效算法来替代手工获取特征。[7]

表征学习的目标是寻求更好的表示方法并建立更好的模型来从大规模未标记数据中学习这些表示方法。表示方法来自神经科学，并松散地建立在類似神经系统中的信息处理和对通信模式的理解上，如神经编码，试图定义拉動神经元的反应之间的关系以及大脑中的神经元的电活动之间的关系。[8] 至今已有數种深度学习框架，如深度神经网络、卷积神经网络和深度置信网络（英语：Deep belief network）和循环神经网络已被应用在计算机视觉、语音识别、自然语言处理、音频识别与生物信息学等领域并取得了极好的效果。另外，「深度学习」已成為時髦術語，或者说是人工神经网络的品牌重塑。[9][10] Deep learning is a subset of machine learning driven by multilayered neural networks whose design is inspired by the structure of the human brain. Deep learning models power most state-of-the-art artificial intelligence (AI) today, from computer vision and generative AI to self-driving cars and robotics. Unlike the explicitly defined mathematical logic of traditional machine learning algorithms, the artificial neural networks of deep learning models comprise many interconnected layers of “neurons” that each perform a mathematical operation.

By using machine learning to adjust the strength of the connections between individual neurons in adjacent layers—in other words, the varying model weights and biases—the network can be optimized to yield more accurate outputs. While neural networks and deep learning have become inextricably associated with one another, they are not strictly synonymous: “deep learning” refers to the training of models with at least 4 layers (though modern neural... It’s this distributed, highly flexible and adjustable structure that explains deep learning’s incredible power and versatility. Imagine training data as data points scattered on a 2-dimensional graph, and the goal of model training to be finding a line that runs through each of those data points. Essentially, traditional machine learning aims to accomplish this using a single mathematical function that yields a single line (or curve); deep learning, on the other hand, can piece together an arbitrary number of smaller,... Deep neural networks are universal approximators: it has been proven theoretically that for any function, there exists a neural network arrangement that can reproduce it.1

Deep learning models are most commonly trained through supervised learning on labeled data to perform regression and classification tasks. But because large-scale neural networks usually require a massive amount of training data to reach optimal performance, the cost and labor of acquiring sufficiently large datasets of annotated training examples can be prohibitive. This has led to development of techniques to replicate supervised learning tasks using unlabeled data. The term self-supervised learning was coined by Yann LeCun in the late 2010s to disambiguate such methods from traditional unsupervised learning. Self-supervised learning has since emerged as a prominent mode of training neural networks, particularly for the foundation models underpinning generative AI. Though neural networks (or analogous concepts) were introduced by data scientists early in the history of machine learning, their breakthrough didn’t begin in earnest until the late 2000s and early 2010s.

The advent of deep learning networks across most subsets of machine learning was enabled in part by advancements in high-performance graphic processing units (GPUs) that enabled parallel processing of massive amounts of computational steps. Because deep learning requires a tremendous amount of computing power for both training and inference, these hardware advancements greatly increased the speed and practicality of implementing deep learning models at scale. This category has the following 5 subcategories, out of 5 total. The following 54 pages are in this category, out of 54 total. This list may not reflect recent changes. Deep learning is a subset of machine learning that has significantly advanced the field of artificial intelligence by enabling computers to learn and understand like the human brain.

By using neural networks that mimic brain cells, it helps computers independently discover patterns in data, leading to breakthroughs in image recognition, speech understanding, and natural language processing. In this beginner’s guide, we'll answer the question: “What is deep learning?” We’ll also explore how it works, its real-world applications, and how you can learn this in-demand skill. If you’re more of a visual learner, check out our video here: Deep Learning Explained Let's begin our discussion by exploring what deep learning is. Deep learning, an advanced artificial intelligence technique, has become increasingly popular in the past few years, thanks to abundant data and increased computing power. It's the main technology behind many of the applications we use every day, including online language translation, automated face-tagging in social media, smart replies in your email, and the new wave of generative models.

While deep learning is not new, it has benefitted much from more availability of data and advances in computing. ChatGPT, the AI-powered chatbot that has become the fastest growing app of all time, is powered by a deep-learning model that has been trained on billions of words gathered from the internet. DALL-E, Midjourney, and Stable Diffusion, AI systems that can generate images from text descriptions, are deep-learning systems that model the relation between images and text descriptions. Deep learning is a subset of machine learning, a branch of artificial intelligence that configures computers to perform tasks through experience. Contrary to classic, rule-based AI systems, machine learning algorithms develop their behavior by processing annotated examples, a process called "training." For instance, to create a fraud-detection program, you would train a machine-learning algorithm with a list of bank transactions and their eventual outcome (legitimate or fraudulent).

The machine-learning model examines the examples and develops a statistical representation of common characteristics between legitimate and fraudulent transactions. After that, when you provide the algorithm with the data of a new bank transaction, it will classify it as legitimate or fraudulent based on the patterns it has gleaned from the training examples. As a rule of thumb, the more high-quality data you provide, the more accurate a machine-learning algorithm becomes at performing its tasks.

Deep Learning Wikipedia

People Also Search

In Machine Learning, Deep Learning Focuses On Utilizing Multilayered Neural

Early Forms Of Neural Networks Were Inspired By Information Processing

A Convolutional Neural Network (CNN) Is A Type Of Feedforward

The "full Connectivity" Of These Networks Makes Them Prone To

In Many Cases, Structures Are Organised So That There Is