Articles on Python
Last updated: 2023/01/23
Top deep-dives on Python
After you learn the basic principles of programming, improvement is really just a matter of becoming better at solving problems. Of course knowing more about the tools you can use to solve problems with is beneficial, your approach to actually figuring out how to solve the problem is equally, if not moreso important. In this informative article, Denver Smith illuminates the approach to solving a problem using a graph and Python.
I'd say the difference between a persistent and normal data structure is that the presistent version always has one additional dimension; time. Does that make sense? If not, Arpit Bhayani's intro should put it into context, and if it still doesn't make sense, then I'm just a poor explainer heh. Although the article is pretty brief, it outlines some of the core concepts and approaches, while linking to other, more comprehensive resources.
I'll be honest; I very much dislike spellcheckers, and have had mine disabled on all devices since middleschool. And intuitively, I thought there is nothing super interesting about them, since they're probably just a word look up. Boy was I wrong. Victor Shepelev's series of articles dives into Hunspell and illuminate some of its intricacies.
Parsing is a fundamental part of any compiler. It's also used commonly used for any study of languages, since it breaks down sentences or texts into their grammatical parts. Although we do it naturally and pretty much without thinking (when the writing is good), there are many different approaches for computers to achieve the same task. In this extensive article, Laurence Tratt covers the plethora of parsing techniques available, including recursive descent, generalized parsers, statically unambiguous parsers, LL parsing, and LR parsing.
The finance sector, not much unlike the tech sector, has to deal with its own set of unique challenges. In this illuminating article, Cal Paterson presents "proprietary forks of the entire Python ecosystem which are in use at many (but not all) of the biggest investment banks", the collection of which he refers to as "Minerva".
We've had a number of articles on concurrent programming in Python. This one, however, Maurits van Riezen explains, summarizes, and compares multiprocessing, threading (Global Interpreter Lock), and async.
I think it's safe to argue that Python itself isn't a very good language for machine learning, but the libraries and community built around it make it the obvious choice. George Ho's article does a deep dive on the advantages and differences of PyTorch, Jax, and Theano.
Ahren Stevens-Taylor delves into the nuances of how most things in Python are PyObjects.
Brett Cannon dives into Python classes.
Trey Hunner illuminates the connection between Python variables, pointers, and the object data.
Although this is technically a web book, I thought it was worth featuring. In this first chapter, Pavel Panchekha and Chris Harrelson go through the process of downloading a webpage using command line tools, and explain every intricacy along the way. Ultimately the book is on building a browser from scratch using Python.
Simon Hørup Eskildsen implements a neural network from scratch in Python.
A common issue when working with data is how accurate you want the data to be vs the performance. The more accurate the data, the more expensive it is to collect and keep. Fortunately, Dr. Martin Jones has written an extensive article on researching to what extent you can cut down on data percision without having a major impact on the accuracy of the end result (hot damn that's a great example of the difference between precision and accuracy). Ultimately for real data sets, switching from 64 to 32 bit is safe.
If you've used Spotify before, you're probably familiar with the little black and white wave patterns, that are actually barcodes. Well in this second part of the series, Peter Boone discusses how URI are actually transformed into this format, including the cyclic redundancy check calculation and convolutional encoding/decoding.
Adam Sawicki discusses the process of a "Hello World" program written in Python being executed on a Windows computer. Adam goes from the print statement all the way through to how the words "Hello World" are actually displayed on your screen.
- In practice, many popular scripting languages are compiled into their own variants of bytecode – a binary form that, although incompatible with the machine language of real processors, is much easier to quickly interpret and execute than pure source code
- Compilation can be reduced to three steps: lexical analysis -> parser analysis -> generating code
- All widely used vector font formats support hinting, i.e., programming certain hints in the font as to how a given character should be drawn in certain sizes
Packages are a convenient method for sharing code with the language-specific community. For Python, setuptools and distutils stood out for a long time as the most common options for building packages. In this informative article, Paul Ganssle discusses the history of building packages in Python and how the recent shift in focus for the setuptools team has changed the best practices for creating Python packages.
"Ensemble nets are a method of representing an ensemble of models as one single logical model". This basically means you can combine different models into one processing unit. Sounds complicated? It kind of is. Luckily Mat Kelcey's has an article that goes more into the details about that, which he sites in this one. The focus of this article though is how to replace a more "normal" convolution model with an ensemble net. Mat presents how he does it and the results from his experiment.
Dealing with threads and parallel processing is one of the more complicated aspects of writing code, especially when there are obscure issues that might cause your program to become deadlocked. In this informative article, Itamar Turner-Trauring dives deeply into the inner workings of Python's multiprocessing pool and why deadlock issues might arise due to process forking.
You've probably used RSA at somet point to generate a key for ssh. Do you know how the underlying algo works though? In this first part article, the author explains the math behind RSA and implements it in Python.
Patrick Mooney's series of articles go in-depth on writing a language parser in Python for a text based game.
Mark Veidemanis explains his work on Pathogen, a data analytics pipeline. Mark also mentions Sandstorm, which is a cool open source platform for self-hosting utility apps.
- Concurrency and threads are hard
- Finding the correct tool/library is hard
- "There's always another millisecond to shave off the execution time, but how long are you going to spend doing it?"
After porting over some deep learning code to Pytorch Lightning, Florian Ernst noticed a unexpected 4x increase in time for training the model. In this article, Florian describes the clues he got as to what was causing the issue, which was ultimately something being unnecessarily reset on each epoch.
Writing code that's meant to execute concurrently can be difficult, especially since most languages haven't been built to support it as a central design point. Jason Brownlee's article focuses on the ThreadPoolExecutor functionality in Python, specifically looking at handling exceptions in thread initialization, task execution, and task completion callbacks.
The blog post addresses the key points of the discussion and the Pradyun Gedam's thoughts on where the Python packaging ecosystem is today.
- The Python packaging ecosystem unintentionally became competitive and the community needs to decide if it wants to continue operating under the same model
- "The reason there are so many packaging tools is because Python is not a monoculture and different folks need different things"
- Covers the disadvantages of having so many choices, in regards to packages
Engineer and author Jeremy Kun takes a shot at implementing a solution in Python for the genre of mathematical problems where two players compete in taking an action, but are unaware of each other's action taking. This specific article is the fourth and most recent in the series, but I'd recommend starting from the beginning (this one just has all of the previous articles at the top of the page for your convenience).
Samir Moon (I think this is the author, although I couldn't find out exactly, so if it's wrong, let me know) explains what homomorphic encryption is and demonstrates how a simple version can be implemented in Python.
- Homomorphic encryption (HE) requires that two encrypted items added together = the encryption of the two unencrypted items added
- It also requires that the encryption of one item multiplied by another item = the encryption of the two unencrypted items multiplied
- HE requires lattice-based cryptography
Ryan O'Connor introduces JAX as "a numerical computing library which incorporates composable function transformations" and elaborates on what makes it tick and when you should/shouldn't use it.
Exaloop presents how the Codon compiler works and how various Python constructs are mapped to LLVM IR.
- The compiler works by first parsing source code into an abstract syntax tree (AST), then performing type checking on the AST using a modified Hindley-Milner-like algorithm
- The AST is converted to an intermediate representation called CIR and various analyses, transformations, and optimizations are performed on the CIR
- The CIR is converted to LLVM IR and the LL