Articles on Machine learning
Last updated: 2023/03/06
Top deep-dives on Machine learning
Machine learning libraries can optimize any function by using a process called backpropagation. Backpropagation is the tool that helps a model find the gradient estimate so that we know which direction to move in. In machine learning, we want to observe how changing x will change the loss. The chain rule enables us to unravel the composed functions we discussed earlier—giving us the ability to compute arbitrarily complex partial derivatives and gradients. Using the computation graph we constructed earlier, we can move backwards from the final result to find individual derivatives for variables. This backwards computation of the derivative using the chain rule is what gives backpropagation its name.
The article argues that it is impossible to develop an understanding of intelligence that resembles human intelligence using only statistical methods. The article then goes on to explore the limitations of statistical methods and machine learning. The article argues that while machines may be able to outperform humans in some tasks, they will never be able to replace humans completely. The article finishes by exploring the possibility of the singularity, where humans become near-obsolete, and argues that this is unlikely.
A deep-dive into how Door Dash used machine learning to try to optimize conversions for restaurants through their app.
This is a brief explanation of the Reinforcement Learning concept of a Markov Decision Process (MDP), with a focus on the Markov Property and how it can cause problems when dealing with long-term dependencies between events. The article then describes a solution to this problem that allows for the learnability of a policy that can solve tasks with long-term dependencies.
This post describes fine-tuning GPT-2, a powerful natural language model, for the task of generating fake rock climbing route descriptions. A climbing route description uses technical climbing jargon to describe a specific pathway up a rock face, and each one is unique. As seems to be common with GPT-2, the author found that the model accurately captures the tone and specific language of the domain, but it sometimes generates impossible climbing routes given the physical world. It is also prone to repeating itself. A Colab is provided, where anyone can download the fine-tuned model and try it out.
Building real-time ML systems involves translating prediction requests into feature vectors, which can be done using different approaches depending on the use case. The feature vectors are then passed to a model for inference, which can be done using different approaches depending on the type of model and the desired latency of the system. This articles covers these approaches.
Ever since its introduction in the 2017 paper, Attention is All You Need, the Transformer model architecture has taken the deep-learning world by storm. Initially introduced for machine translation, it has become the tool of choice for a wide range of domains, including text, audio, video, and others. Transformers have also driven most of the massive increases in model scale and capability in the last few years. OpenAI’s GPT-3 and Codex models are Transformers, as are DeepMind’s Gopher models and many others. This article goes into explaining how they transformers work.
In this paper, McIlroy-Young et al. explore the question of whether or not AI models can learn to perform tasks in a human-like way. They use chess as a model system to test move matching and blunder prediction performance. They find that while AI models can learn to perform some tasks to human or even super-human levels, they do not always perform those tasks in a human-like way. This difference in approaches can matter in human-machine collaboration, where humans are still involved in task performance, supervision, or evaluation. The paper is important in showing that we need to take care in how we design AI models, to ensure that they are interpretable, easy to learn from, and safe for humans to follow.
This is one iteration of gradient descent. With this foundation, it’s possible to create a simple algorithm: randomly set Try to take a step to the minimum by updating : Add a small multiplier, called the learning rate, to make sure we don’t take too big of a step. If loss hasn’t gone down by a significant amount, continue! Otherwise, return the model I intentionally left out a lot of the details here, because the purpose of this article is to understand the simple neural net, not gradient descent as a whole. But there are great resources for that, like this one by 3blue1brown. In this edition of Napkin Math, the author establishes a mental model for how a neural network works by building one from scratch. A neural network is made up of an input layer, a hidden layer, and an output layer. The hidden layer is where the magic happens- it's a layer where a lot of math is done
This article explores how xorshift128 PRNG works, how to use a neural network to model the xorshift128 PRNG (including design and results), and finally ends with creating a machine learning resitant version of xorshift128 PRNG.
The author morphs and creates lego faces with variational autoencoder. The article itself gives a background on variational autoencoder and goes through the implementation. Following that, the model is evaluated and used for generation of lego face images.
The author describes the evolutionary process of rovers learning to cooperate together.
This code implements an image delta viewer which the author is calling “FLOP” as an homage to the paper its implementation is based on (FLIP). The FLIP metric attempts to account for the following facts about how our eyes perceive color: Our eyes perceive color according to the opponent process, with red and green sharing one channel (a), blue and yellow sharing another (b), and an achromatic channel which codes for lightness (L). Our eyes are more sensitive to chrominance for brighter colors (Hunt effect) Our eyes pick out edges and pixel discontinuities Our eyes perceive luminance non-linearly Our eyes have limited spatial resolution which is hue and lightness dependent The FLIP metric first transfered images into YyCxCz space, a space which is like CIELAB (also known as L*a*b*), but avoids the non-linear transform meant to mimic the HVS’s response function. The images were
The author illuminates their recent exploration of teaching RL agents to cooperate together, in the form of rovers on a bounty hunt.
The batch size refers to the number of training examples in each minibatch when training a neural network. It is common for the batch size to be a power of 2, such as 64, 128, 256, 512, 1024. There are some valid theoretical justifications for this, including memory alignment and floating-point efficiency. However, it is unclear if these reasons hold in practice. To see how different batch sizes affect training in practice, a benchmark was run training a MobileNetV3 (large) for 10 epochs on CIFAR-10. The results showed that there is no substantial difference in performance when the batch size is a power of 2 or multiple of 8.
Transformers architectures are a type of deep learning architecture that has been shown to be effective for a variety of natural language processing, vision, audio, and multimodal tasks. Their key capability is to capture which elements in a long sequence are worthy of attention, resulting in great summarisation and generative skills. It is possible to use transformer architectures for reinforcement learning tasks by refactoring reinforcement learning as a sequence problem. This has the potential to improve the performance of reinforcement learning algorithms, but there are some limitations to this approach.
The authors demonstrate using evolutionary strategies to teach machines to make abstract drawings of things.
David and Yujin train a machine learning network to use different "senses" to become more resilient when the noise drowns out the signal.
A visual explanation of applying a tranformer based machine learning model to interpret audio language.
The authors present some of their recent discoveries in creating a general music transcription model using transformers.
The authors present a novel approach to using a machine learning model to supply you with the optimal learning algorithm for a given dataset.