Articles on Machine learning
Last updated: 2022/09/08
Top deep-dives on Machine learning
Machine learning libraries can optimize any function by using a process called backpropagation. Backpropagation is the tool that helps a model find the gradient estimate so that we know which direction to move in. In machine learning, we want to observe how changing x will change the loss. The chain rule enables us to unravel the composed functions we discussed earlier—giving us the ability to compute arbitrarily complex partial derivatives and gradients. Using the computation graph we constructed earlier, we can move backwards from the final result to find individual derivatives for variables. This backwards computation of the derivative using the chain rule is what gives backpropagation its name.
A deep-dive into how Door Dash used machine learning to try to optimize conversions for restaurants through their app.
This post describes fine-tuning GPT-2, a powerful natural language model, for the task of generating fake rock climbing route descriptions. A climbing route description uses technical climbing jargon to describe a specific pathway up a rock face, and each one is unique. As seems to be common with GPT-2, the author found that the model accurately captures the tone and specific language of the domain, but it sometimes generates impossible climbing routes given the physical world. It is also prone to repeating itself. A Colab is provided, where anyone can download the fine-tuned model and try it out.
Building real-time ML systems involves translating prediction requests into feature vectors, which can be done using different approaches depending on the use case. The feature vectors are then passed to a model for inference, which can be done using different approaches depending on the type of model and the desired latency of the system. This articles covers these approaches.
Ever since its introduction in the 2017 paper, Attention is All You Need, the Transformer model architecture has taken the deep-learning world by storm. Initially introduced for machine translation, it has become the tool of choice for a wide range of domains, including text, audio, video, and others. Transformers have also driven most of the massive increases in model scale and capability in the last few years. OpenAI’s GPT-3 and Codex models are Transformers, as are DeepMind’s Gopher models and many others. This article goes into explaining how they transformers work.
In this paper, McIlroy-Young et al. explore the question of whether or not AI models can learn to perform tasks in a human-like way. They use chess as a model system to test move matching and blunder prediction performance. They find that while AI models can learn to perform some tasks to human or even super-human levels, they do not always perform those tasks in a human-like way. This difference in approaches can matter in human-machine collaboration, where humans are still involved in task performance, supervision, or evaluation. The paper is important in showing that we need to take care in how we design AI models, to ensure that they are interpretable, easy to learn from, and safe for humans to follow.
This article explores how xorshift128 PRNG works, how to use a neural network to model the xorshift128 PRNG (including design and results), and finally ends with creating a machine learning resitant version of xorshift128 PRNG.
The author morphs and creates lego faces with variational autoencoder. The article itself gives a background on variational autoencoder and goes through the implementation. Following that, the model is evaluated and used for generation of lego face images.
The author describes the evolutionary process of rovers learning to cooperate together.
This code implements an image delta viewer which the author is calling “FLOP” as an homage to the paper its implementation is based on (FLIP). The FLIP metric attempts to account for the following facts about how our eyes perceive color: Our eyes perceive color according to the opponent process, with red and green sharing one channel (a), blue and yellow sharing another (b), and an achromatic channel which codes for lightness (L). Our eyes are more sensitive to chrominance for brighter colors (Hunt effect) Our eyes pick out edges and pixel discontinuities Our eyes perceive luminance non-linearly Our eyes have limited spatial resolution which is hue and lightness dependent The FLIP metric first transfered images into YyCxCz space, a space which is like CIELAB (also known as L*a*b*), but avoids the non-linear transform meant to mimic the HVS’s response function. The images were
The author illuminates their recent exploration of teaching RL agents to cooperate together, in the form of rovers on a bounty hunt.
The batch size refers to the number of training examples in each minibatch when training a neural network. It is common for the batch size to be a power of 2, such as 64, 128, 256, 512, 1024. There are some valid theoretical justifications for this, including memory alignment and floating-point efficiency. However, it is unclear if these reasons hold in practice. To see how different batch sizes affect training in practice, a benchmark was run training a MobileNetV3 (large) for 10 epochs on CIFAR-10. The results showed that there is no substantial difference in performance when the batch size is a power of 2 or multiple of 8.
Transformers architectures are a type of deep learning architecture that has been shown to be effective for a variety of natural language processing, vision, audio, and multimodal tasks. Their key capability is to capture which elements in a long sequence are worthy of attention, resulting in great summarisation and generative skills. It is possible to use transformer architectures for reinforcement learning tasks by refactoring reinforcement learning as a sequence problem. This has the potential to improve the performance of reinforcement learning algorithms, but there are some limitations to this approach.
The authors demonstrate using evolutionary strategies to teach machines to make abstract drawings of things.
David and Yujin train a machine learning network to use different "senses" to become more resilient when the noise drowns out the signal.
A visual explanation of applying a tranformer based machine learning model to interpret audio language.
Since JAX is a numerical computing library, it enables easy and efficient calculation of gradients of functions. This is made possible by the fact that all operations in JAX are implemented in terms of the operations of XLA. This enables JAX to automatically differentiate any function, and to vectorize and parallelize those computations using XLA. The to_jax_table function allows us to take a standard NumPy function and turn it into a JAX function with gradient information. By using the to_jax_table function, we can easily create a JAX function with gradient information. JAX's autodiff functionality enables us to automatically differentiate any function, and to vectorize and parallelize those computations using XLA. This makes it easy and efficient to calculate gradients of functions, which is crucial for training Machine Learning models. Just-in-time Compilation with The to_jit function in JAX allows us to take a function and turn it into
The authors present some of their recent discoveries in creating a general music transcription model using transformers.
The authors present a novel approach to using a machine learning model to supply you with the optimal learning algorithm for a given dataset.