Articles on Machine learning

Last updated: 2022/09/08

Top deep-dives on Machine learning

A Visual Tour of Backpropagation

Machine learning libraries can optimize any function by using a process called backpropagation. Backpropagation is the tool that helps a model find the gradient estimate so that we know which direction to move in. In machine learning, we want to observe how changing x will change the loss. The chain rule enables us to unravel the composed functions we discussed earlier—giving us the ability to compute arbitrarily complex partial derivatives and gradients. Using the computation graph we constructed earlier, we can move backwards from the final result to find individual derivatives for variables. This backwards computation of the derivative using the chain rule is what gives backpropagation its name.

Uncovering Online Delivery Menu Best Practices with Machine Learning

A deep-dive into how Door Dash used machine learning to try to optimize conversions for restaurants through their app.

Generating Rock Climbing Route Descriptions with ML (GPT-2)

This post describes fine-tuning GPT-2, a powerful natural language model, for the task of generating fake rock climbing route descriptions. A climbing route description uses technical climbing jargon to describe a specific pathway up a rock face, and each one is unique. As seems to be common with GPT-2, the author found that the model accurately captures the tone and specific language of the domain, but it sometimes generates impossible climbing routes given the physical world. It is also prone to repeating itself. A Colab is provided, where anyone can download the fine-tuned model and try it out.

Approaches for Building Real-Time ML Systems

Building real-time ML systems involves translating prediction requests into feature vectors, which can be done using different approaches depending on the use case. The feature vectors are then passed to a model for inference, which can be done using different approaches depending on the type of model and the desired latency of the system. This articles covers these approaches.

Transformers for software engineers

Ever since its introduction in the 2017 paper, Attention is All You Need, the Transformer model architecture has taken the deep-learning world by storm. Initially introduced for machine translation, it has become the tool of choice for a wide range of domains, including text, audio, video, and others. Transformers have also driven most of the massive increases in model scale and capability in the last few years. OpenAI’s GPT-3 and Codex models are Transformers, as are DeepMind’s Gopher models and many others. This article goes into explaining how they transformers work.

Aligning superhuman AI with human behaviour: chess as a model system

In this paper, McIlroy-Young et al. explore the question of whether or not AI models can learn to perform tasks in a human-like way. They use chess as a model system to test move matching and blunder prediction performance. They find that while AI models can learn to perform some tasks to human or even super-human levels, they do not always perform those tasks in a human-like way. This difference in approaches can matter in human-machine collaboration, where humans are still involved in task performance, supervision, or evaluation. The paper is important in showing that we need to take care in how we design AI models, to ensure that they are interpretable, easy to learn from, and safe for humans to follow.

Cracking Random Number Generators using Machine Learning – Part 1: xorshift128

This article explores how xorshift128 PRNG works, how to use a neural network to model the xorshift128 PRNG (including design and results), and finally ends with creating a machine learning resitant version of xorshift128 PRNG.

Creating and morphing Lego faces with neural networks

The author morphs and creates lego faces with variational autoencoder. The article itself gives a background on variational autoencoder and goes through the implementation. Following that, the model is evaluated and used for generation of lego face images.

Game Emulation via Neural Network

The author describes the evolutionary process of rovers learning to cooperate together.

Implementing the FLIP algorithm

This code implements an image delta viewer which the author is calling “FLOP” as an homage to the paper its implementation is based on (FLIP). The FLIP metric attempts to account for the following facts about how our eyes perceive color: Our eyes perceive color according to the opponent process, with red and green sharing one channel (a), blue and yellow sharing another (b), and an achromatic channel which codes for lightness (L). Our eyes are more sensitive to chrominance for brighter colors (Hunt effect) Our eyes pick out edges and pixel discontinuities Our eyes perceive luminance non-linearly Our eyes have limited spatial resolution which is hue and lightness dependent The FLIP metric first transfered images into YyCxCz space, a space which is like CIELAB (also known as L*a*b*), but avoids the non-linear transform meant to mimic the HVS’s response function. The images were

Greedy AI Agents Learn to Cooperate

The author illuminates their recent exploration of teaching RL agents to cooperate together, in the form of rovers on a bounty hunt.

No, We Don't Have to Choose Batch Sizes As Powers Of 2

The batch size refers to the number of training examples in each minibatch when training a neural network. It is common for the batch size to be a power of 2, such as 64, 128, 256, 512, 1024. There are some valid theoretical justifications for this, including memory alignment and floating-point efficiency. However, it is unclear if these reasons hold in practice. To see how different batch sizes affect training in practice, a benchmark was run training a MobileNetV3 (large) for 10 epochs on CIFAR-10. The results showed that there is no substantial difference in performance when the batch size is a power of 2 or multiple of 8.

On the potential of Transformers in Reinforcement Learning

Transformers architectures are a type of deep learning architecture that has been shown to be effective for a variety of natural language processing, vision, audio, and multimodal tasks. Their key capability is to capture which elements in a long sequence are worthy of attention, resulting in great summarisation and generative skills. It is possible to use transformer architectures for reinforcement learning tasks by refactoring reinforcement learning as a sequence problem. This has the potential to improve the performance of reinforcement learning algorithms, but there are some limitations to this approach.

Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts

The authors demonstrate using evolutionary strategies to teach machines to make abstract drawings of things.

Permutation-Invariant Neural Networks for Reinforcement Learning

David and Yujin train a machine learning network to use different "senses" to become more resilient when the noise drowns out the signal.

HuBERT: How to Apply BERT to Speech, Visually Explained

A visual explanation of applying a tranformer based machine learning model to interpret audio language.

Why You Should (or Shouldn't) Be Using JAX in 2022

Since JAX is a numerical computing library, it enables easy and efficient calculation of gradients of functions. This is made possible by the fact that all operations in JAX are implemented in terms of the operations of XLA. This enables JAX to automatically differentiate any function, and to vectorize and parallelize those computations using XLA. The to_jax_table function allows us to take a standard NumPy function and turn it into a JAX function with gradient information. By using the to_jax_table function, we can easily create a JAX function with gradient information. JAX's autodiff functionality enables us to automatically differentiate any function, and to vectorize and parallelize those computations using XLA. This makes it easy and efficient to calculate gradients of functions, which is crucial for training Machine Learning models. Just-in-time Compilation with The to_jit function in JAX allows us to take a function and turn it into

Double Debiased Machine Learning (part 1)

null

Music Transcription with Transformers

The authors present some of their recent discoveries in creating a general music transcription model using transformers.

Semantic Bug Seeding: A Learning-Based Approach for Creating Realistic Bugs

null

AutoGRD: Model Recommendation Through Graphical Dataset Representation [pdf]

The authors present a novel approach to using a machine learning model to supply you with the optimal learning algorithm for a given dataset.


Want to see more in-depth content?

subscribe to my newsletter!

Other Articles