2 Preliminaries Dive Into Deep Learning 0 1 0 Documentation

Leo Migdal

-Nov 16, 2025, 11:11 PM

2 preliminaries dive into deep learning 0 1 0 documentation

To get started with deep learning, we will need to develop a few basic skills. All machine learning is concerned with extracting information from data. So we will begin by learning the practical skills for storing, manipulating, and preprocessing data. Moreover, machine learning typically requires working with large datasets, which we can think of as tables, where the rows correspond to examples and the columns correspond to attributes. Linear algebra gives us a powerful set of techniques for working with tabular data. We will not go too far into the weeds but rather focus on the basic of matrix operations and their implementation.

Additionally, deep learning is all about optimization. We have a model with some parameters and we want to find those that fit our data the best. Determining which way to move each parameter at each step of an algorithm requires a little bit of calculus, which will be briefly introduced. Fortunately, the autograd package automatically computes differentiation for us, and we will cover it next. Next, machine learning is concerned with making predictions: what is the likely value of some unknown attribute, given the information that we observe? To reason rigorously under uncertainty we will need to invoke the language of probability.

In the end, the official documentation provides plenty of descriptions and examples that are beyond this book. To conclude the chapter, we will show you how to look up documentation for the needed information. Interactive deep learning book with code, math, and discussions Implemented with PyTorch, NumPy/MXNet, JAX, and TensorFlow Adopted at 500 universities from 70 countries Star You can modify the code and tune hyperparameters to get instant feedback to accumulate practical experiences in deep learning. We offer an interactive learning experience with mathematics, figures, code, text, and discussions, where concepts and techniques are illustrated and implemented with experiments on real data sets. You can discuss and learn with thousands of peers in the community through the link provided in each section.

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. In order to get anything done, we need some way to store and manipulate data.

Generally, there are two important things we need to do with data: (i) acquire them; and (ii) process them once they are inside the computer. There is no point in acquiring data without some way to store it, so let us get our hands dirty first by playing with synthetic data. To start, we introduce the \(n\)-dimensional array (ndarray), MXNet’s primary tool for storing and transforming data. In MXNet, ndarray is a class and we call any instance “an ndarray”. If you have worked with NumPy, the most widely-used scientific computing package in Python, then you will find this section familiar. That’s by design.

We designed MXNet’s ndarray to be an extension to NumPy’s ndarray with a few killer features. First, MXNet’s ndarray supports asynchronous computation on CPU, GPU, and distributed cloud architectures, whereas NumPy only supports CPU computation. Second, MXNet’s ndarray supports automatic differentiation. These properties make MXNet’s ndarray suitable for deep learning. Throughout the book, when we say ndarray, we are referring to MXNet’s ndarray unless otherwise stated. In this section, we aim to get you up and running, equipping you with the basic math and numerical computing tools that you will build on as you progress through the book.

Do not worry if you struggle to grok some of the mathematical concepts or library functions. The following sections will revisit this material in the context of practical examples and it will sink. On the other hand, if you already have some background and want to go deeper into the mathematical content, just skip this section. To start, we import the api and mxnet-engine modules from Deep Java Library on maven. Here, the api module includes all high level Java APIs that will be used for data processing, training and inference. The mxnet-engine includes the implementation of those high level APIs using Apache MXnet framework.

Using the DJL automatic engine mode, the MXNet native libraries with basic operations and functions implemented in C++ will be downloaded automatically when DJL is first used. An ndarray represents a (possibly multi-dimensional) array of numerical values. With one axis, an ndarray corresponds (in math) to a vector. With two axes, an ndarray corresponds to a matrix. Arrays with more than two axes do not have special mathematical names—we simply call them tensors. In some form or another, machine learning is all about making predictions.

We might want to predict the probability of a patient suffering a heart attack in the next year, given their clinical history. In anomaly detection, we might want to assess how likely a set of readings from an airplane’s jet engine would be, were it operating normally. In reinforcement learning, we want an agent to act intelligently in an environment. This means we need to think about the probability of getting a high reward under each of the available actions. And when we build recommender systems we also need to think about probability. For example, say hypothetically that we worked for a large online bookseller.

We might want to estimate the probability that a particular user would buy a particular book. For this we need to use the language of probability. Entire courses, majors, theses, careers, and even departments, are devoted to probability. So naturally, our goal in this section is not to teach the whole subject. Instead we hope to get you off the ground, to teach you just enough that you can start building your first deep learning models, and to give you enough of a flavor for the... We have already invoked probabilities in previous sections without articulating what precisely they are or giving a concrete example.

Let us get more serious now by considering the first case: distinguishing cats and dogs based on photographs. This might sound simple but it is actually a formidable challenge. To start with, the difficulty of the problem may depend on the resolution of the image. |Images of varying resolutions (:math:`10 \times 10`, :math:`20 \times 20`, :math:`40 \times 40`, :math:`80 \times 80`, and :math:`160 \times 160` pixels).| :width: 300px .. _fig_cat_dog: As shown in fig_cat_dog, while it is easy for humans to recognize cats and dogs at the resolution of \(160 \times 160\) pixels, it becomes challenging at \(40 \times 40\) pixels and next to...

In other words, our ability to tell cats and dogs apart at a large distance (and thus low resolution) might approach uninformed guessing. Probability gives us a formal way of reasoning about our level of certainty. If we are completely sure that the image depicts a cat, we say that the probability that the corresponding label \(y\) is “cat”, denoted \(P(y=\) “cat”\()\) equals \(1\). If we had no evidence to suggest that \(y =\) “cat” or that \(y =\) “dog”, then we might say that the two possibilities were equally likely expressing this as \(P(y=\) “cat”\() = P(y=\)... If we were reasonably confident, but not sure that the image depicted a cat, we might assign a probability \(0.5 < P(y=\) “cat”\() < 1\). Now consider the second case: given some weather monitoring data, we want to predict the probability that it will rain in Taipei tomorrow.

If it is summertime, the rain might come with probability 0.5. To prepare for your dive into deep learning, you will need a few survival skills: (i) techniques for storing and manipulating data; (ii) libraries for ingesting and preprocessing data from a variety of sources;... In short, this chapter provides a rapid introduction to the basics that you will need to follow most of the technical content in this book. Finding the area of a polygon had remained mysterious until at least 2,500 years ago, when ancient Greeks divided a polygon into triangles and summed their areas. To find the area of curved shapes, such as a circle, ancient Greeks inscribed polygons in such shapes. As shown in Section 2.4, an inscribed polygon with more sides of equal length better approximates the circle.

This process is also known as the method of exhaustion. In fact, the method of exhaustion is where integral calculus (will be described in sec_integral_calculus) originates from. More than 2,000 years later, the other branch of calculus, differential calculus, was invented. Among the most critical applications of differential calculus, optimization problems consider how to do something the best. As discussed in Section 2.3.10.1, such problems are ubiquitous in deep learning. In deep learning, we train models, updating them successively so that they get better and better as they see more and more data.

Usually, getting better means minimizing a loss function, a score that answers the question “how bad is our model?” This question is more subtle than it appears. Ultimately, what we really care about is producing a model that performs well on data that we have never seen before. But we can only fit the model to data that we can actually see. Thus we can decompose the task of fitting models into two key concerns: i) optimization: the process of fitting our models to observed data; ii) generalization: the mathematical principles and practitioners’ wisdom that guide... To help you understand optimization problems and methods in later chapters, here we give a very brief primer on differential calculus that is commonly used in deep learning. We begin by addressing the calculation of derivatives, a crucial step in nearly all deep learning optimization algorithms.

In deep learning, we typically choose loss functions that are differentiable with respect to our model’s parameters. Put simply, this means that for each parameter, we can determine how rapidly the loss would increase or decrease, were we to increase or decrease that parameter by an infinitesimally small amount. An interactive deep learning book with code, math, and discussions Provides Deep Java Library(DJL) implementations Adopted at 175 universities from 40 countries Amazon Scientist CMU Assistant Professor Amazon Research ScientistMathematics for Deep Learning Amazon Applied ScientistMathematics for Deep Learning

Postdoctoral Researcher at ETH Zürich Recommender Systems Until recently, nearly every computer program that we interact with daily was coded by software developers from first principles. Say that we wanted to write an application to manage an e-commerce platform. After huddling around a whiteboard for a few hours to ponder the problem, we would come up with the broad strokes of a working solution that might probably look something like this: (i) users... To build the brains of our application, we’d have to step through every possible corner case that we anticipate encountering, devising appropriate rules. Each time a customer clicks to add an item to their shopping cart, we add an entry to the shopping cart database table, associating that user’s ID with the requested product’s ID.

2 Preliminaries Dive Into Deep Learning 0 1 0 Documentation

People Also Search

To Get Started With Deep Learning, We Will Need To

Additionally, Deep Learning Is All About Optimization. We Have A

In The End, The Official Documentation Provides Plenty Of Descriptions

ArXivLabs Is A Framework That Allows Collaborators To Develop And

Generally, There Are Two Important Things We Need To Do