The Good Stuff (10/8/20 - 16/8/20)

10 Aug 2020 • LearningJourney

Here’s where I’ll be noting fun reads, interesting finds, cross-topic connections, useful sources and unresolved questions that I’ve had over the week. This weekly recap is in part inspired by the illustrious John Baez and his weekly finds in mathematical physics. This is the first post of a long series, so I wouldn’t be too surprised if I bring in some major overhauls.

I’ve decided to split it up by category, so you’ll have some long categories like Physics, supplemented by bite-sized ones like Linear Algebra. Have fun!

Chemistry:

I decided to make this section after remembering how fun computational quantum chemistry is.

Back to Molecular orbitals! -> bonding and antibonding (add/subtract the wavefunctions, not their squares), orbital diagrams and dienes.
I read an awesome explanation of why triplets have lower energy to singlets (and I JUST found out that electrons in different shells need to be antisymmetrised - and that spin and spatial states need to be symmetrised oppositely).
Interesting perspective on the ‘violation’ of the Aufbau principle while losing electrons: the same thing isn’t happening to protons, unlike while moving across the periodic table
I like to imagine the inert pair effect as a ripple of relativistic contractions of s-orbitals.
Orbitals don’t exist because a linear combination of them preserves the Slater determinant (I have unsettling feeling which means that I haven’t fully understood something regarding the superposition states of the electron - this is a byproduct of the hideous terminology, ‘this electron is in so-and-so orbital’ - how does this relate to delocalisation?).
I found out that (anti)bonding isn’t caused by (de/)constructive interference: it’s due the presence of two split energy levels for the total Hamiltonian, with associated wavefunctions $c(\psi_1\pm\psi_2)$.
Orbital hybridisation is just a fancy name for LCAO of $s$ and $p$ - with suitable squared coefficient ratios. It’s honestly a shame that many phenomenological chemistry explanations involve a degree of hand-waving rather than going the quantum route.
I like the ‘cone’ visualisation for $\mathbf{J = L + S}$ – its better than the vectors in Griffith’s QM since it shows they’re ‘smeared out’ too.
I’ve been scrambling for a good PDF on the Hartree-Fock SCF method (I revisited some old slides I found months ago - actually pretty good, describing the spin states well) - I came back to the excellent compphys.go.ro and practical-oriented NZNano, and even my old hydrogen basis-set generator which works amazingly well for a 10 minute SciPy optimisation job.
Finally, I found a set of lecture notes in Computational Quantum Chemistry from Finland. They look nice and detailed, and actually care to delve into the computational side of things right from the start (unlike a very famous book whose name I will withhold). I also chanced upon another PDF, which will complement the former, since it goes in detail to numerical problem solving (something which I have little experience in)

Machine Learning

I did try and find out a bit about neural ODEs, but the popular literature seems to be a bit abstruse.
Google’s AI blog looks promising - I found a post on Neural Architecture Search (I happened to be reading that on Li’l Log) as well as Reinforcement Learning
Lagrangian Duality in Convex Optimisation (what’s the benefit of the more complex methods over the relatively straightforward Lagrange multipliers?).
Upgrades of ResidualNets (e.g. FractalNet) look pretty interesting, but unfortunately it’s not practical for me to undertake anything involving distributed GPUs - that’s why I gave up on CNNs (besides, I find things like RL, Transformers and especially VAEs far cooler)
I was amazed to find an analytical solution to least-squares that doesn’t even require calculus! (I’ve noticed that on Stats SE, the best answers usually aren’t the most upvoted ones!)
I’ll take a look at some MCTS enhancements (my newfound RL knowledge will come in handy) - beyond transposition tables and primitive PV lines.
I have firmly decided to start with Lilian Weng’s evolutionary algorithms (equipped with a pen and paper of course!).

Computer Science and App Dev

I have to write a Python script to convert my phone Chrome History JSON to an acceptable format for History Trends Unlimited - its so annoying how Chrome only stores history for three months - I loved seeing all the Wiki page links purple :(
I should also investigate Adobe XD, seems promising for design (completely in character, Adobe only permits one free project)
I had ‘Adjoint like Haskell currying’ (Google Search stores my fleeting epiphanies - in this case, I’d made the painfully obvious realisation that category theory concepts are used in Haskell) - but I was amazed to see some coherent results, talking about adjoint functors in Haskell. Finally, applied category theory! (this is the best excuse to pick up some Haskell).
It makes sense that you can’t curry out of order without flip (OCaml has some weird workaround) - so you order the parameters from most likely to be partially applied to least.
Random Forests seem vaguely interesting - maybe Andrew Gibiansky has a classification implementation?
I found out that Greedy algorithms are just locally optimal solutions (Djikstra’s too - why does it need a queue?) - Brilliant is an outstanding source on computer science
The opposite of this, so to speak, is dynamic programming (honestly, I just consider it a fancy name for memoisation, but it’s also a good exercise in writing out recurrence relations). Working out the dynamic programming solution reminded me of a chess tactic (not in the usual sense, but you know what I mean) that I use: invert the move order!
A company called ‘Cuberto’ makes nice app UI (it’s not as impratical and showy as some of the other stuff I see on Dribbbbbble)
What’s the Material Design widget that looks like circles separated by lines?
I’d like to see some improvements to vanilla Monte Carlo integration (apparently it comes in handy while evaluating the phase space integrals for Feynman diagrams - wait, does that just mean unconstrained momenta?), e.g. how do you avoid large ‘dead’ zones? I think ‘Vegas’ does something about this, but it’s ancient now, there’s got to be a more refined approach
My brother gave me a fun project: generate non-looping beats (this stemmed from his annoyance at the repetitive music in ‘Color Switch’) - I’d prefer a symbolic expression manipulator for Dirac gamma matrices (see FORM).

Logic and Set Theory

The diagonal lemma was a fun concrete example of applied set theory (although I mainly looked at how textbook proofs using it + Gödel’s First Incompleteness Theorem had logical inconsistencies).
Aha! So Theseus’ ship’s cousin is called the Sorites paradox (their relation reminds me of a category functor if you see what I mean)

Group Theory, Calculus and Fun Stuff

Brian Sittinger’s Quora feed is my go-to destination for integrals and other calculus tidbits (applied group theory too).
The uniqueness of a Fourier series was a bit anticlimatic (just the fact that the coefficients are given by a constant integral? - maybe I should check uniqueness of power series’ then). Anyway, over the last week I’ve read most of MMPS (Fourier Series today) - up next, we’ve got residue theorem examples, conformal mapping (ooh - I’d love to see an E&M solution with that) and other complex analysis, followed possibly by Bessel functions and PDEs.
I confirmed a long-standing hunch - that the Hodge star operator is not involutive (with regards to div-curl and curl-grad identities in Michael Penn’s video). Strangely, everyone on Maths SE seems to use physics-style language while discussing it.
Pseudovectors and pseudoscalars now make sense, although I’m still hoping to see them in the broader context of Clifford algebras.
I felt like continuing with Galois theory today, so I found a nice, friendly blog post by science4all that cleared up field extensinons and the tower rule with some nice concrete examples.
Wikipedia for maths is, on the whole, one of the best sources you’ll find - excellent examples - although I found more concise explanations of normal and separable extensions elsewhere (the 6 cross-ratio group was exotic).
math3ma had a cool analogy between algebraic elements and topological limit points and her suggestion to use category theory really got me thinking
I also read about some small groups on Brilliant (since a nice point is that Galois groups for $1 \le n\le4$ are ‘too small’ to not be solvable).
A determined search for explanatory Dynkin diagrams questions - mostly for E8 - on Maths SE proved inconclusive (well, it lead me back to the comprehensive E8 wiki page which I surprisingly hadn’t visited). In the meanwhile, I’ll content myself with excellent PDFs that I’ve found on roots systems, Lorentz group reps and SUSY.
I rediscovered the n-Category Cafe - a collective blog that includes legends like John Baez, Qioachu Yuan and Emily Riehl, among others. It has a nice post on G2 and the split octonions (it, at the very least, consolidated my understanding of phase spaces).
The E8 sphere packing proof of Viazovska looks really accessible (apparently, the proof is also very elegant and intuitive). {After reading a bit more: it is amazing - without doubt the most beautiful proof I have ever seen, and not just the argument, the presentation too, involving the coalescence of two of my favourite topics: Fourier transforms and modular forms)
Also, I’m no longer confused about the position of the E8 Dynkin diagram ‘leg’ (even preparing to ask a question on Maths SE gets your brain working!).
Glasser’s Master Theorem - that’s what the $f(u)\mathrm dx = f(x)\mathrm dx$ trick was! - a very cool substitution that’ll no doubt make $\frac{x^n}{1+x^m}$ integrals easier (although Dr. Sittinger confirmed that there is a beta function approach).
I went back to nrich.maths.org for their Galois theory intro for high school students - nicely written. Simple groups, solvable groups and subnormal series make sense - and I correctly intuited that the Jordan-Holder theorem could be used to prove the Fundamental Theorem of Arithmetic (in fact, that’s the one given on AoPS - which I should really revisit :P)
The n-Category Cafe has a promising series on the use of octonions in the Standard Model (watered down, yes, but I’ve always wanted to see an alternative to SU(3)).
John Baez had a nice post on countable ordinals (I only managed to reach epsilon-nought)
Now that I know a bit more about character tables, I should check out molecular symmetry point groups (this was one my first ever exposures to group theory!), speaking of which, I loved the fact that cyclic groups have a character ‘matrix’ as a multiple of the DFT matrix!
For some reason, I’m not quite satisfied with the proof that finite fields have to be of prime power order (the routine ‘contradicts the minimality’ argument).
I want to see an accessible derivation of the analytic continuation of the zeta function (I found a good analogy to the infinite geometric series).
I was super cool that outer automorphisms are the quotient $\text{Aut}/\text{Inn}$ and $\text{Inn}$ as $\text{G}/\text{Z(G)}$. It was here that I discovered that viewing Wikipedia in mobile version on a desktop is more relaxing.
Turns out Lie algebras are a real vector space when their groups are a real manifold. Yep.
PSL(n) is just stereographic projection! And ‘Identification’ (of opposite points) is ridiculous terminology honestly. Previously I’d said that I couldn’t find applications of Nimbers. Now I’m relating them AND surreal numbers to Game Theory.
John Baez’s website is the final source on exceptional Lie groups now (and for groups in general), not least because he relates them to Spin(8).
Why do I see the gamma reflection formula popping up everywhere? Since I saw it in the derivation, I think I’ll just perform complex Fourier series from now on (why bother with an extra integral?)
I don’t know why, but the phrase ‘generates freely’ really touches a nerve matrices.
Proofs involving conjugacy classes are reminded me of exponentials of diagonalised matrices

Linear Algebra

I was surprised to find that irreducible group representations of the same dimension are isomorphic (in hindsight it’s obvious, but $\mathbf3$ and $\mathbf{\overline3}$ irreps for quarks must have confused me)
On the other hand, I felt that it was incredibly obvious that two vector spaces of equal dimension are isomorphic
Continuous functions on an interval form a vector space? The proof didn’t feel substantial, but I guess that’s because of my idea of non-rigorous function arithmetic.

Physics

IB Chemistry reminded me to take a quick look at real orbitals and energy levels - so I dug up some stuff on instability of higher energy levels
Helmholtz free energy (HyperPhysics has a good grid diagram - all those “messy” thermodynamics terms just involve $\text{PV}$ and $\text{TU}$), atomic orbitals (molecular orbitals and quantum and computational chemistry as a whole deserves a revisit), electronic configuration and an observable calculation that reminded me to take a look at Griffiths (or even better, Sakurai).
I saw radiation pressure (again, I should re-read Griffith’s E&M). I also revisited black bodies and the Stefan Boltzmann law (first time since Feb, I think!).
I was trying to find the Feynman diagrams for the Lamb Shift, but it didn’t yield anything except some nice PDFs on (strangely) supersymmetry. The SUSY treatise turned out to be quite accessible; I’d like to read it properly
I wasn’t satisfied with some explanations on why spins have to be antiparallel in an orbital, or on the SO(4) symmetry in Hydrogen - and I need to read up on the band structure (semiconductors too). The Laplace-Runge vector was WAY too complicated (and not sufficiently physical); I think I’ll come back to it later. Honestly, there’s so much to read up on in physics (I need to tone it down a bit - more CS is in order).
Hund’s rules seem simple enough. ‘Feynman Diagrams for Beginners’ looks like a good review of QED concepts and Feynman diagrams - speaking of which, I REALLY want to see a calculation for the lifetime of a particle (should cement theoretical knowledge with a practical example - in fact, I might get back to Schwartz now)
Now that I’ve read Chapters 13 and 14 of QFTGA, I finally understood the Dirac equation, chirality (projection operators are a good mnemonic for remembering Binet’s formula), helicity, polarization (whose meaning is, in my opinion, hidden behind the opaque Jones vector). I can’t stress just how good the book is - and I’m looking forward to attacking fermion-photon coupling (hopefully the path integral is simple enough - sadly this book doesn’t detail the derivation of the fermionic Feynman rules).
I know Weinberg’s book is highly respected, but I just can’t get past the evil 1980’s typesetting. Also, I have no idea why Fermi’s golden rule looks so weird in Griffith’s QM (I didn’t find a good account of decay widths either)
I’ve returned to Peskin and Schroeder, and I am astounded how long it took for me to appreciate it. P&S on the practical front (it is extremely practical-oriented), combined with QFTGA for the intuitive side, is a killer combination in QFT. I managed to finish Chapter 4 and a bit of 5 (honestly, Wick contractions probably discouraged me from going further before). I FINALLY used Feynman diagrams in a real computation, and it seems straightforward (although I’m sure loops are going to be a tough cookie) {Nihar from the future: I’ve been reading this book for 4 days and it’s like I’ve spent the last 4 months devouring it, it’s that good. Nihar out)
How about a program to i) Draw Feynman diagrams ii) Do Dirac matrix computations and simplifications?
I found a couple of papers on Lorentz group representations that are friendly enough.
Revised Hund’s Rules -> what is j-j coupling, and how is it different from s-l coupling? I found the term symbol wiki (the term multiplicity also makes sense in reference to triplet and singlet)
I’m still eager to find out the motivation (in fact, also why it’s even allowed at all) behind promoting global to local symmetries (I have a hunch I’ll find that in chapter 14 of QFTGA).
For what I hope is the final time, I read about representation theory of the Lorentz group, Casimir elements, Clebsch-Gordan coefficients (not their recursion relations though). I managed to revise Chapters 4 through 7 of Griffiths in under twenty minutes (I have a much greater appreciation for spin).
I finally found out what the bold numbers are in representation theory: it’s the dimension of the irrep - which got me thinking: spin-2 transforms as a 5 under Lorentz transforms; does this have something to do with SU(5) theory (probably nothing, but it’s rare that you see 5’s in particle physics).
I actually am very interested in quantum chemistry (maybe I should revisit the ABC of DFT) - it’s a shame that even members of the chemistry SE are unable to allude to QM when needed.
Why most metals are silver-gray (their transitions require UV, so is reflective to visible) is the precisely the kind of thing I want to use quantum chemistry for - gold apparently has relativistic considerations (raising the 5d orbital reduces the frequency for absorption from UV to blue for 5d $\rightarrow$ 6s) – could the ‘diffuse’ orbital stand for not being contracted relativistically?
Ironically the stop squark is the lightest, so it’d be discovered first (is the mass order inverted for all? How do I derive this?). Now I’m on the lookout for the one-loop corrections to the electron g-factor.
The Lie algebra adjoint representation reminded me of Haskell currying (see below)
It’s unfortunate that the pure geometric algebra formulation of electrodynamics isn’t as elegant as the differential forms version.
The one-loop correction was right under my nose - it was, incidentally, the next section of P&S that I was going to attack (they introduce Lie groups so late, its mind-boggling!
Why don’t they just call ‘faithful’ representations injective?
A book on intensely mathematical QM turned out to be a good refresher on … er, topological sets, amazingly! (if delirium ever besets me, I’ll read the rest of the book)
I can see how the terminology ‘lower energy’ is can be misleading for electron bound states (its greater in magnitude, just negative, so it requires more energy to lift it).
Of course as soon as I read Schwinger’s trick did it appear in P&S, along with the one-loop correction to the g-factor (◔_◔).
Funnily enough, Murphy’s Law reminded me of Dirac’s principle in quantum mechanics.
Searching amputable feynman diagrams brought me to an Imperial PDF that I had viewed two months ago! (that’s a whole month before I properly started QFT!).
I found a PDF on John Baez’s GUT theories (same content as the UCR website, but in latex, yay!)
I guessed right - multi-power Feynman parameters do involve an inverse generalised beta function.
Interesting how, while constructing a spin basis vector with different $l$, you just make it orthogonal (what if there are more possibilities?). Besides, the $\text{SU}(3)$ construction is far more elegant for constructing the eigenbasis.
Is there a deeper reason why internal lines can be described by virtual particles? (other than that the propagators are equivalent).
I found another one of John’s PDFs, this time on the octonions (its the preface to the exceptional groups - I found the relations amazing). It’s going to be a very constructive summary of the Cayley-Dickinson construction, Spinors - even Clifford Algebra. So that’s where the Fano plane is used!
I came across a visual paper on QCD reps which unlike John Baez’s GUT paper, starts with $\text{SU}(3)$ right off the bat - but it doesn’t compare at all to John’s paper, which is far more detailed, far more lucid and focuses on the qualitative, intuitive aspect of charges and particles - since my foundation of exterior algebra, invariants and irreps has been set, I can’t wait to explore beyond the SU(5) GUT (I’ll first need to check the octonions book though)

The Good Stuff (4/8/20 - 10/8/20)

04 Aug 2020 • LearningJourney

Aaaand I forgot this week too.

Maths

I can’t wait for the Hodge star-cross product duality from Michael Penn - in the meanwhile, I contented myself with a newfound blog series on exterior algebra by Alex Kritchevsky (quite nice actually) $\rightarrow$ A proper book on Geometric Algebra is a must.
I think I’m now in a position to tackle the relation between the gamma matrices and Clifford Algebra (Geometric Algebra apparently doesn’t have the Hodge Star, which I’ve been dying to see used in a practical problem besided cleaning up Maxwell’s Equations).
Cool notation for matrix norms - $1/p \rightarrow q/p \rightarrow 1/q$, rows first (e.g., frobenius norm is [2, 2]). The L-infinity norm is hilariously just the supremum (which I found out means the limit of the terms, but might not be contained in the set itself, contrary to the maximum.
Socratic is actually a great surfing alternative to Quora maths. I should really find out about the more exotic finite simple groups (monstrous moonshine! what a name - and a very deep, difficult connection to the j-invariant which I hardly understand
I found a cool page on the E8 group, detailing some kind of calculation, but more importantly, its properties and signifance (maybe String Theory will prove to be useful for maths!). I think I understand root systems too (talking about them from the lattice perspective is way more understandable), and the E8 Dynkin Diagram makes sense (thank you John Baez!) - but I’m not sure why the leg has to stick out in the middle - the relation of the icosahedron to E8 was honestly mind-boggling.
I like the visualisation of the real projective plane with the infinite circle surrounding it (although I’m still puzzled by the complex projective space).
I was trying to find the proof of the $\R^n \rightarrow \R^{2n}$ embedding theorem that I read in Carroll, but I couldn’t remember the name.
What’s the Fano plane? - its symmetries form a finite simple group, according to Alon.
Cool integral formula from Michael Penn (I find that x1.5 - x1.75 is a little more my speed)
Logic and Set Theory
Reading about logical fallacies is a fun pastime of mine.
The By::Analogy Quora space has a collection of nice proof perspectives (mostly populated by the great Roman Andronov - he ranks fourth on my list after Brian, Senia and Alon)
I was completely wrong in thinking that necessity was a subset of sufficiency (using arrows in such a detailing is a strict no-no, which is why math3ma’s post on this confused me).
The Wug test (and unrelated Lex/Yacc and parser research that I did when doing SMILES parsing for Molecular VAEs) has convinces me to maybe (?) take a look at Chomsky’s linguistics work (some claim that it’s just airy and back-fitted - if that’s the case, I’m sure there’s something on universal grammars in some other context)
Physics
The Kerr effect seems interesting (I should read about negative refractive indices!) $\rightarrow$ Dispersion relation and group velocity (I had visited these in April, I believe, but the first signs of Chrome history decay are showing - the links weren’t purple).
Don’t anyons have (anti)commutation relations? I think I should find the answer in its representation theory - speaking of which, I really enjoyed the Harvard RT book by Noah Miller, but I think a break from QFT and QM is long overdue.
Over the last week, I’ve been reading QFT for the Gifted Amateur. It’s EXACTLY what I needed at this stage. I picked up from Chapter 18 and just like that I cruised through Feynman Diagrams (and I’m obsessing over their elegance) and a bit of Spinors. It’s laid out in the perfect manner for me (Wick’s theorem, contractions, interaction picture) - all of it made perfect sense and brought intuitive insights. One thing though - Feynman’s propagator is kind of pulled out of a hat (but IIRC, Schwartz has a complete derivation without resorting to the something-Plemelj theorem)

Computer Science

‘Dynamic Programming’ is a strangely fancy way of saying that you need memoization and data structures.
I read the closest pair of points problem (personally, I like brute forcing all problems because I’m lazy) $\rightarrow$ the element distinctness problem is best left to randomness (thank you, birthday problem) - even smashing everything together in a hash table works.
The difference between Quantum Computers and NDTMs reminds me of the common misconception that the measurement/decoherence process (which I should touch up on) is not ‘random’, as they say.
SAT makes sense (exponential time in disjunctive/random $\rightarrow$ conjuctive). Incidentally, I found a Haskell SAT solver on Andrew Gibiansky’s blog. It’s really well designed and although I couldn’t finish it, it got me thinking: could I do this in Rust? (that might be stretching its Enum capabilities but it should be worthwhile)
Transposition tables are worth researching even if I don’t have plans for a chess engine - it’s a good exercise in data management (hey, caches!). In fact, I have to find out more improvements to the MCTS method in general (not just RAVE and other technical ones - I mean bridging alpha-beta and MCTS).
Fortune’s Algorithm looked super difficult (a bit more research into Bezier curves MIGHT be useful to this end) - again, I’d just brute force each pixel.
I found Nicky Case’s website! The Game Theory connection to social relationships was seriously cool - I admire their exposition and art style.
I understood how Voronoi diagrams are the dual of a Delaunay triangulation (how do you generate dual diagram screen coverings? I’ll try and use SFML to have another crack at the physics engine without having to deal with OpenGL (SFML actually looks a lot like Piston in Rust - but it has audio and networking too).
Stats and Machine Learning
Back to Li’l log! Reinforcement learning and Transformers are on the menu today $\rightarrow$ from Brian Keng’s excellent (and much easier to read) blog, I sampled log-likelihood for generative models, the one neuron approximator (seeing the proof should be a good section on Lebesgue integration) and VAEs with Normalising Flows (sorry Eric Jang).
Hyperparameter optimisation is going to link nicely with operations research (I’ll try and throw genetic algorithms in there too). Unfortunately I’d forgotten exactly what priors and posteriors were, so a quick trip to Stats SE sorted that out
I picked up Bayesian Data Analysis by Gelman (looks comprehensive, but a bit too long in the current climate)
spinningtop’s docs are a fantastic intro to RL (they don;t throw a lot at once unlike Lilian’s blog - although hers too is great - and RL seems to be her main focus). In fact, the state transitions (or Markov Decision Processes in general) reminded me a lot of Turing Machines (obviously). Q-Learning apparently converges faster than SARSA.
Following Andrew Gibiansky, I should try and use k-means (k-means++ probably, unless I find a simple but more powerful alternative) to do MNIST classification, just to see how high I can get it.
I also looked at symmetric matrix decomposition, just for fun
Lilian has a gripping tale on Evolutionary Algorithms (goodbye, GA!). Finally, I should try and make a project using OpenAI GPT-3

The Good Stuff (20/7/20 - 27/7/20)

20 Jul 2020 • LearningJourney

I’d forgotten to upload this week’s finds, so it’s been rebranded ‘Week 0’. Unfortunately, there are at least 90 days of backlog on this project - each week’s worth possibly as large as this page - so I will have to git commit to memory the absolute mountain of information I’ve learnt until now. After some consideration, I’ve decided to employ the $\rightarrow$ symbol as shorthand for ‘this led me to discover/learn’ You’ll notice that the format is a little different from weeks >1, I hadn’t organised the first week very well.

Mathematics

If you really want an intuitive explanation of something, I would recommend seeing Quora answers - They tend to have fantastic resources for building up intuition, contributed by what seems like a tightly-bound community of mathematicians who really know how to write. Not as rigorous as on the stackexchange.com sites, but far less rigid and formal (once I found someone using category theory to explain a trivial group property - talk about overkill).
Category theory is like an upgrade in elegance to the already superb group theory that I’m learning. Math3ma and John Baez’s blogs really got me hooked (especially John Baez’s elucidation on their connection to spin networks) - so that in turn led me to find more about topological spaces (one of the categories where I haven’t ventured at all really). I enjoyed the Kolmogorov classification and pathological counter-examples to Hausdorff spaces. This apparently is used quite a bit in algebraic geometry (Grothendieck!), which leads me back to the modularity theorem and FLT! I actually understand a bit of Wiles’ first page. This goes on to elliptic curves (FLT, yet again) and Diffie-Hellman key exchange, and the general discrete-logarithm problem (along with man in the middle attacks).
Applied group theory, finally! I found two really nice PDFs from Harvard about Rubik’s cube and sudoku groups. Sadly they didn’t contain the derivation of their solution. Speaking of Rubik’s cubes, I thought they might be useful for my IB Maths IA (proving successive lower and upper bounds, Superflip, group presentations, etc.). I have to find out what else GAP can do (besides finding the group order). And on the topic of sudoku (suprisingly a relatively recent invention), I found about about Knuth’s X algorithm (need to find more about Dancing Links though)
…and back to number theory - I rediscovered the baby-step giant-step algorithm (and recalled my poor Python implementation of it); Lagrange’s group theorem is super cool, because it proves Euler’s (totient) Theorem. Carmichael numbers seem interesting - I wonder how Erdos derived the bound on their density
Schafli symbols are fun for the dual polyhedron (Honeycomb lattices and Coxeter groups probably need some work on my part though)
Just touched up on some group theory - normal subgroups, commutator groups, direct sum/product/tensor product - nothing too fancy, but terms which have the unfortunate habit of slipping from my mind. Today also marks the end of an apparently intensive abstract maths unit - I’ll aim to shift towards theoretical physics and quant a bit more now. Still dont know what a Dynkin diagram is though (probably need to consult a uni PDF).
Coxeter and exceptional groups could be the concluding subtopic in this unit - E8 and F4 in particular (I need to get finished with the Standard Model so I can savour all of these elegant symmetry theories). I think operator theory and elliptic equations could come in handy, since I’m defecting to the physics side now.
John Conway seems to pop up in all sorts of esoteric number theory (Nimbers were cool - didn’t see any applications though; reminded me of surreal numbers).
The Abel-Ruffini Theorem and Galois and automorphism groups were fun while they lasted, but I think my interest in them is slowly fizzling out $\rightarrow$ Review quartic formula derivation.
Russell’s paradox looks nice when formalised in Brilliant (reminds me of when I tried to use first order set theory rigorously in routine school maths exercises - I didn’t know the ‘equivalent’ symbol though), and is vaguely reminiscent of my approach to programming, “concretise then generalise” (I’ll explain that some other time) $\rightarrow$ Godel’s incompleteness theorems should be fun to see too. (Whenever I’m bored, I can just consult this list!).
I did grasp root systems a little, but I’m still uncertain as to how they relate to Dynkin diagrams. On the other hand, the source that piqued my interest in them in the first place, John Baez’s excellent (but unfortunately, developed before CSS) blog is wholeheartedly bookmark-worthy.
I only just realised the link between gradient line integrals and complex integrals (both evalute to zero for a closed loop (no poles, of course)).
Hopefully, now that I know a bit about Sp and Spin groups, I can advance into the modular group unimpeded.
I surprisingly hadn’t seen the method for finding the quadratic residues mod a prime (how do you generalise to composite?).
Pareto efficiency seems to be a useless metric, but I disagree that Nash equilibria are similarly impractical (the max-row max-column shortcut will prove to be useful if I ever do behavioral microeconomics).
The semidirect product as the ‘opposite’ of quotient groups is a good perspective (plus this applies to exact sequences) - [Nihar from the future: “You fool! You fell victim to one of the most common blunders!”]
I chanced upon an old friend, the volume of an n-ball (courtesy of Zach Star).
I did end up seeing the quartic formula (it’s the first PDF in my library!), and it wasn’t nearly as hard as I remember it to be (respect for Ferrari and Cardano for working all of that out without having variable names).
I hopped back to Galois theory (Rational root theorem, Alon’s great answer on solvable groups; extensions like differential Galois groups - looks promising if I can find a source).
To get a better idea of algebraic geometry, (hail Grothendieck!) I found a MIT pdf on Bezout’s theorem $\rightarrow$ projective representation of vector and Hilbert spaces. I learnt about the two alternative forms of the fundamental theorem of algebra (besides the degree n - n roots one) along with the observation that its neither fundamental nor using algebra!
Looked at Helmholtz decomposition again (some minus signs looked suspect).
Short exact sequences are completely clear now.

Set Theory, Logic and Linear Algebra (don’t ask)

The hierarchy of set grammars is somewhat reminiscent of Turing machine hierarchies (nothing beyond superficial, I doubt) - still need to see what ‘TT’ is.
I’d consider Brilliant to be the de facto source on proof and technique-related pages - Fermat’s method of infinite descent, the well ordering principle were well layed out. I had to go back to Vieta root jumping (the Legend of Six!) - seems like a potential blog post to me.
I couldn’t quite grasp Godel’s first incompleteness theorem beyond Godel numbers (foremostly due to laziness).
How do you define numbers in set theory (is it cardinality? In that case, surreal numbers almost feel like the natural step upwards).
Also, I should find out some corrolaries to the Axiom of Choice and Brouwer’s fixed point theorem (beyond the trivial and mildly amusing).
Obligatory xkcd: The principle of explosion (seemed a bit ‘suspicious’ at first, but on closer inspection, it turned out to be painfully obvious).
I looked at the construction of natural numbers in set theory (it’s what I expected) - Richard’s paradox was better than Russell’s, if you ask me $\rightarrow$ the existence of non-definable real numbers really surprised me.
Induction on real numbers was unexpected, even if impractical.
The question of identifying whether a vector is a linear combination of other $\rightarrow$ 3b1b’s rank/nullity video (which I think is superseded by Zach Star’s more modern video - whose videos I should watch more of).
The FangCheng method for Gaussian elimination is nifty (especially considering it predates it by a good 1000 years).
Canonical vs standard isomorphisms seem clear to me now, after seeing the isomorphisms between a vector space, its dual and its double dual.

Category Theory

Math3ma’s posts on the Yoneda Lemma were great (Category Theory just keeps on getting better and better!) $\rightarrow$ Topology, whose basics I finally understand (and now continuity and convergence are so much cooler).
A Berkeley paper managed to link cobordisms (I think) to Hilbert spaces, so that’s another reason to push on with Category Theory.
I found a master’s thesis linking Category Theory with QFT - but the sprawling cheat sheet I unearthed on a GitHub repo is going to be even more useful.
Once again Math3ma doesn’t disappoint - just got started on the limits and colimits series $\rightarrow$ Once over-abstract notions like ‘dual’ and product are ironically intuitive in Category Theory (the arrow reversal for dual may well be the coolest thing I’ve seen in a long time) - it might be worth taking some university maths courses just for this.
I enjoyed reading about categorical limits on math3ma (although the colimit part hasn’t been released yet, I can invoke the sacred arrow-reversal and try and derive what they actually are - by pure thought!).
I didn’t find any sources on CT for physics to be all that captivating (hint: scroll to the end to see what punchline they’re building up towards - maths is said to be anti-comedy, after all)
I’m really curious about the smash product - in fact, come to that, seeing the semidirect/wedge/smash products defined categorically should remove the final barriers in my understanding of them.
The categorical construction of functional languages was a serendipitous find.
While “application-based category theory” may seem like an oxymoron, Tatsuyo’s thesis on categorical functional languages does just that

[Nihar from the future: I ended up not doing any category theory (at least explicitly) for the next month :( ]

Physics

Finally got back round to the Tennis Racket Theorem, Euler’s Theorem and in continuation, Euler Angles, gimbal lock and 4-d rotations. Oh, and the method finding the continued fraction for a number was hiding in plain sight (at least, the greedy algorithm)
The spin-statistics theorem is much easier than Srednicki’s book makes it out to be; Weinberg’s book is sadly not well typesetted, and nLab is unreasonably convoluted, but a (still incomplete) source that I found is physicstravelguide.com - incidentally, one of the contributors is Jakob Schwichtenberg - his chapter on representation theory for physics is effectively what started this unit.
I still admire Lubos Motl’s SE answers, but man, if that guy doesn’t have an appetite for stoking controversy, I don’t know who does (his blog posts also happen to be a string of cynical but entertaining rants on matters from Jordan algebras to the EU).
I managed to get a copy of Landau’s CM (which weirdly has the same font as H.C. Verma)
I asked a fairly mundane question about the physicists’ Lie algebra on SE, but it still garnered a couple of upvotes, with a handful scattered between two (again, straightforward) answers.
I’m diving back into the Standard Model - from the deep end. I’m decidedly lukewarm to the approach in Schwartz and Srednicki (make unmotivated propositions first, introduce representation theory MUCH later) - Peskin and Schroeder (except for the mathematical dubiousness in the first chapter) seems to really pick up from chapter 3, supplemented of course by the Columbia QM book (may have to skip to a practical-oriented section though).
So I just touched up on helicity, chirality (Physics from Symmetry really helped here), conservation laws, boost-vs-translation $\rightarrow$ Euclidean group.
Clifford algebra should be really useful too, but I haven’t looked up anything on that yet.
“Nature favours simplicity” - works for Lagrangians (just make it gauge-invariant and a Lorentz scalar!) - and alternative Lagrangians (e.g. with the Levi-Civita tensor) should uncover a web of links.
I couldn’t resist looking up renormalisablity $\rightarrow$ IR and UV cutoffs and the renormalisation group will be coming up soon $\rightarrow$ Wien’s displacement law is far more accurate than people give it credit for (I mean, seriously, Rayleigh-Jeans fails at like 5 Hz).
I’ve come to the conclusion that prior to QCD, I should definitely get a solid grounding in QED and renormalisation (I’ve thrown in supersymmetry too, just for fun).
I should explore the Ising model further (it’s also a potential Python project) $\rightarrow$ I had a thermodynamics intro (with beta and micro-states) - still on the search for a quality cheatsheet though.
Renormalisation group explains renormalisation better than the wiki page proper, funnily enough.
Noether has a second theorem? $\rightarrow$ if gauge symmetries correspond to redundancies in the description of a physical system, why do they have a corresponding charge?
I’d forgotten that the Dirac spinor conjugate had a gamma matrix attached to it! $\rightarrow \gamma_0^2=1$
I did end up skipping Columbia QM chapter 8 (but now, the only thing worth reading is Clebsch-Gordan coefficients and the symplectic group - there just aren’t enough concrete examples in it).
A treatise that I found on Clifford algebra might prove to be, more than anything, a fantastic refresher on group, representation and character theory (I say ‘refresher’, but I’ve only barely pored over the last of those).
The Dirac adjoint is used, of course, for Lorentz invariant transformation. The Clifford algebra pdf was truly awesome and super dense.
A hunt for a QED document led me back to one of the first I had ever seen - David Tong’s lecture notes! I had no idea that all 6 sections were there on the Cambridge website, but I am very thankful $\rightarrow$ I also found a colourful Comic Sans PPT too - informative and succinct (and Comic Sans!)

Chemistry

I couldn’t find a good source on molecule symmetry character tables - all the LibreTexts thus far appear to be garbled and badly formatted
In relation to molecular symmetry groups, crystal symmetry groups look even more intimidating (I should get my Bravais lattice up to speed)

Computer Science and App Stuff

Lambda calculus to Turing machine is interesting, but the Oracle computability hierarchy is even more so.
A music-to-score application would be really nice. I was inspired by a Two Minute Paper where some guys from IIIT effectively used ML to perform lip reading.
Not going to restart Statistics and Machine Learning though, perhaps my previous burst was a tad too intensive (feels a bit like burnout).
I did learn what a melspectogram is, so that took me to DSP: filters, window functions and oscillators at a medium-to-high level.
Finally finished the news aggregator app, after just a week of learning Flutter. Ridiculously intuitive and with far less mess than fiddling around with XML in Android Studio. Found out about this app with 10m+ downloads - they just started using AI to summarise news articles to no more than 60 words.
I am going to get back into ML - NLP, Transformers and Normalising Flows kicks off right where I left (Two Minute Papers will continue to be a riveting source) - sadly I may have to be confined to the theoretical subspace only, given the obscene training times.
I did revise Hamiltonian MC sampling (because of NUTS)
I thought of making a Pomodoro timer app $\rightarrow$ sprite animation in Flutter could reignite my interest in app dev.
Game Jam expositions are also entertaining and thought-provoking, but I feel Unity is a bit clunky for my taste. Anyway, I think I’ve reached the point where my stream of app ideas has run dry, so I may have to take a break here.
Surprisingly, Spotify uses P2P to stream music without overloading its servers (!). Shazam uses a bit of DSP $\rightarrow$ 3b1b’s generalised Fourier Transform video (the Doppler effect example was a great example).
I was thinking about voting systems (it spontaneously occurred to me while browsing an otherwise mundane list of revolutionary app ideas), so naturally I trundled off to find out why the (notoriously overhyped, imo) blockchain doesn’t fulfill the voting criteria $\rightarrow$ I should review Arrow’s paradox.
It’s a close call between Cassandra and MongoDB, but Cassandra takes the win here with CQL (very similar to SQL) $\rightarrow$ Microservices seem like a much more natural approach than a hulking monolith (of course, I can consult a resident Software Architect!), popularised by Netflix’s blog on Medium $\rightarrow$ neural diff tools (how does regular hash-based diff tools work?).
Dad taught me about =VLOOKUP, uncovering the power of spreadsheet editors (and saving me from an agonising 40 minutes). Pivot tables seem to tie in nicely with my previous topic of databases, essential for my new app idea, a content-sharing platform, similar to Medium (with features from Reddit and Quora), but just for Science! Incremental, of course, and using Firebase $\rightarrow$ How can I use BaaS?
I was taken aback by C++ “concepts” - they’re nothing revolutionary, but they highlighted to me the limitations of templates - but the fact that they’re not structural (cf. Rust’s traits) provides a much more relaxed third-party experience.
It’s curious that “synchronous” tasks means the opposite of what it does in colloquial usage - asynchronous is obviously more efficient, but the problem of optimising a program at compile-time to its ‘maximally-asynchronous’ flow is equivalent to the Halting Problem, and so is undecidable.
Software architecture is quite interesting (as it turns out, ‘agile’ and ‘efficient’ aren’t just meaningless buzzwords). I stand corrected on the DB-War though; Mongo looks very appealing now. I looked at a couple of examples of chat app and streaming architectures, but they weren’t very informative.
Most sources on n-SAT don’t seem to explain what ‘n’ actually is (turns out it’s n booleans per term in disjunctive normal form) $\rightarrow$ Soon going to verify the Cook-Levin theorem. Angular looks interesting (evidently I’m endorsing Google), but having to do JS, CSS and HTML seems like a backache compared to Flutter’s elegant integrated system.
The UI of my chat app looks polished (no CustomPaint) $\rightarrow$ Firestore vs Realtime Database? - Google’s docs are kind of misleading - seemed to imply that Firestore had a delay of up to 3 seconds (in reality, it’s almost seamless, but in hindsight, 3 seconds isn’t really a problem either).
Found out how the various definitions of ND Turing Machines are equivalent (I like the self-replicating one the most).
I touched up on some factorisation algorithms (Wiki has it nicely categorised at the bottom) $\rightarrow$ refresher of the baby step-giant-step algorithm $\rightarrow$ group element inverse powers was SO obvious all along (hint: g^|G| = 1 - reminds me of Lagrange’s theorem)

Convolutions in 4D

07 Jul 2020 • Programming

I wanted to see a 4-D convolutional layer. I was tired of sifting through the Tensorflow source code on Github only to find layers upon layers (pun intended) of what seemed like circular references. After hours of arduous searching (okay, maybe not) I thought I’d hit the jackpot, but through the mist, all I saw was an arcane cuDNN call rearing its ugly head. :(

Of course, the Internet is absolutely littered with 1-D and 2-D convolution algorithms which no-one would in a serious setting (~~I’m looking at you, Medium~~) – not least because using seven nested for loops leaves a bad taste in the mouth.

But enough chit-chat. Let’s hop into what I’m gonna be showing you: Three algorithms –

Convolutional forward pass
Convolutional backward pass (transfer the gradient to the previous layer)
Calculation of the convolutional filter gradient, so we can update it via an optimiser

Convolutions

The layer is entirely specified by its filter and an optional bias – which I will ignore for now. The filter is a tensor (a scalar is a 0-dimensional tensor, a vector is a 1-d tensor, a matrix is 2-d – I don’t think there are any special names for higher dimensional tensors, but generally when people say ‘tensor’, they’re referring to >2-dimensional ones) – a 4-tensor to be precise, having dimensions: (height, width, input_channels, output_channels), for example, (5, 5, 3, 4).

The input (I’ll call it the ‘image’ for simplicity, though in general it may not be a coherent image, or even an image at all) is another 4-tensor, but it has different dimensions: (batch_size, image_height, image_width, input_channels), like (64, 28, 28, 3). You’ll notice that input_channels is there in both of them - it could represent RGB channels in a colour image, for example.

Here’s a 6 step process on how to grasp what this means exactly:

Forget about the image’s batch_size for now. In the end, you can just imagine applying the process to each image in the batch. So we have 64 of these (28, 28, 3) images
Instead of one 4-tensor, imagine the filter as output_channels many 3-tensors (of course these are equivalent representations). In our example, this means that there are 4 such(5, 5, 3) tensors
Now you see that we have two 3-tensors, whose last dimension matches. You can visualise this as 2 cuboids, whose length and breadth might be different, but whose depths are the same. Now we just do the simple element-wise multiplication that every ML aficionado loves like their first-born child.
Normally, you slide around (say) a (5, 5) filter around a (28, 28) image, multiplying the elements at each step. Well, here it’s the same thing, except with another dimension!
- First you put your little (5, 5, 3) cuboid at the top left of the (28, 28, 3) cuboid. It’ll fit in snugly depth-wise.
- Multiply each element of the filter with the corresponding element of the image subsection. You’ll end up doing $5\times5\times3=75$ multiplications in all.
- Sum up all the results that you got (all 75 numbers in this case), to get a single scalar value
- Put this value as element (0, 0) in your result matrix.
- Shift the filter cuboid one unit to the right. If it’s already at the right edge, move it down one unit and slide it all the way to the left. Repeat the steps until you’ve slid it over every part of the image cuboid.
- In this case, you’ll end up with a (24, 24) result matrix. In general, it’ll be (image_height - filter_height + 1, image_width - filter_width + 1)
If you got the previous step then it’s all smooth from here. Notice how we ended up with a matrix at the end of the slidy thing for two cuboids. Now all we gotta do is factor in the multiple filters as well as the batch size. Let’s start with the first: If we have 4 filters, we apply the previous process for each filter. Obviously, we’ll end up with 4 of these (24, 24) result matrices (note that they will have different values since each filter has different values). We can just concatenate these along a new axis to form a (24, 24, 4) tensor (a bit like how you can concatenate row vectors to form a matrix)
Finally, the batch_size. We apply steps 4 and 5 for each image in the batch (64 of them in this example) and accordingly obtain 64 such (24, 24, 4) tensors. As we did in the previous step, we can stack these up on another axis (let’s do the first axis this time) to get our final result: a newborn (64, 24, 24, 4) tensor. The general result is a tensor of shape (batch_size, image_height - filter_height + 1, image_width - filter_width + 1, output_channels)

That’s it! No, it wasn’t that hard, and yes, this is how it’s done in Tensorflow and PyTorch too (although, from what I’ve seen, they delegate the work to super-fast NVIDIA GPU kernels). The observant reader might notice that I’ve left out a little detail: strides. If you’re lazy, you can just skip this part without missing out on much.

Striding involves a bit of a change in step 4: instead of sliding our filter one unit to the right and one unit down, we slide it $S_w$ units to the right and $S_h$ units down - these values are specified while constructing the network. This changes the step 4 result matrix size to $(\frac{I_h - F_h}{S_h} + 1, \frac{I_w - F_w}{S_w} + 1)$, i.e. it’s effectively scaled down by the stride size, so the final output has the shape $(B, \frac{I_h - F_h}{S_h} + 1, \frac{I_w - F_w}{S_w} + 1, D)$, where $D$ is the number of output channels. Note that there’s no striding along the image channels axis, unless you’re the sort of weirdo who enjoys doing that kind of thing.

For the mathsy-types among you who are super restless right now because I haven’t been entirely rigorously (or haven’t used enough $\LaTeX$): here’s the formula that I derived for the forward pass (warning: cover your eyes if you have a weak stomach):

$$ y_{b, i, j, d} = \sum_{m=0}^{f_h - 1}\sum_{n=0}^{f_w - 1}\sum_{c=0}^{C} x_{b, \,S_hi+m, \,S_wj+n,\,c} \,f_{m,n,c,d} $$

Behold, summation hell. I didn’t say it would be pretty.

After all that, we just have one of the three algorithms I promised to share, and I haven’t even shown you how to implement it! That’s all right, as long as I’ve managed to build up a little intuition regarding the matter, you might even be able to do it yourself – besides, the backpropagation voodoo stuff involves heavy code reuse of the forward pass that I just described.

You may be asking, “How can we possibly implement something that complicated without using the 7 for loops that you laughed at a minute ago?” Well here’s the amazing thing: my code contains a grand total of 0 for loops – “What, even with strides?” – yep, even with strides, but that’s not all: it takes up just 6 lines in its entirety. That’s right. 6 lines. As you stutter in disbelief, I’ll give you the final kicker: 3 of those lines are tuple unpackings!

Here it is in all its glory:

Code

import numpy as np
from numpy.lib.stride_tricks import as_strided

def conv4d_forward(x, f, s):
   	B, H, W, C = x.shape
	Fh, Fw, C, D = f.shape
	Sh, Sw = s

	strided_shape = B, 1 + (H - Fh) // Sh, 1 + (W - Fw) // Sw, Fh, Fw, C  
	xs = as_strided(x, strided_shape, strides=(x.strides[0], Sh * x.strides[1], Sw * x.strides[2], *x.strides[1:4]))
	
	return np.einsum('wxyijk,ijkd->wxyd', xs, f)

Okay, okay, the 5th line may violate PEP’s 80 character convention, but who’s counting? (sorry mobile users). Let’s break it down, line by line:

B, H, W, C = x.shape $\leftarrow$ That’s just what I called (batch_size, image_height, image_width, input_channels) previously, if you remember.

Fh, Fw, C, D = f.shape $\leftarrow$ Same spiel here, but with (height, width, input_channels, output_channels) for the filter.

Sh, Sw = s $\leftarrow$ This is left as an exercise for the reader

This is where the fun begins. strided_shape = B, 1 + (H - Fh) // Sh, 1 + (W - Fw) // Sw, Fh, Fw, C looks familiar - it’s kind of what the output shape is, but instead of D, we’ve got Fh, Fw, C at the tail end. This becomes significant in view of our next line:

x =  as_strided(x, strided_shape, strides=(x.strides[0], Sh * x.strides[1],
	Sw * x.strides[2], *x.strides[1:4]))

What? What even is that? If you’re an experienced NumPy dev, the name ‘as_strided’ may send shivers down your spine as you recall painful memories of tangling up dimensions and strides. It may well be the most misunderstood, feared function in the entire library: so much so that even the library makers advise you to avoid it as far as possible. It is also superbly elegant and powerful - enough to kickstart in anyone a lifelong interest in low-level matrix operations. To understand what it does, you have to take a detour down to the bones of matrix implementations.

Strides

How does a matrix (or indeed, a tensor) store its elements? To preserve locality of reference during matrix operations, all elements are stored sequentially in contiguous memory locations. The way NumPy converts an indexing operation on an array, like a[i, j], is by calculating the offset = a.strides[0] * i + a.strides[1] * j from the location of the first element (note: I am assuming C-style element ordering). Don’t confuse these strides with the ones in the conv4d algorithm. These ‘intrinsic’ array strides, a.strides[n], tell us how many bytes we need to move across to get the next element along the nth axis. This is incredibly powerful. For example, transpositions are super fast because they don’t change the element ordering at all, they just reverse the strides tuple; broadcasting sets the stride along the new axis to be 0; even diagonals can be extracted via this method. This SciPy lecture is an excellent source on strides and indexing.

According to the NumPy documentation, “as_strided creates a view into the array given the exact strides and shape”. So it is usually very cheap, manipulating the strides tuple rather than the actual data. I think showing rather than telling here is more effective, so:

'''
This converts the image to a 6-dimensional tensor, where the two extra
dimensions represent strided, windowed 'snapshots' of each image (along the
H and W dimensions) for every batch and channel

E.g. if each image is Bx5x5xC, the filter is 3x3xCxD and the strides are
(1, 1), let's take:

x[0, :, :, 0] = [[ 1.  2.  3.  4.  5.]
		 [ 6.  7.  8.  9. 10.]
		 [11. 12. 13. 14. 15.]
		 [16. 17. 18. 19. 20.]
		 [21. 22. 23. 24. 25.]]

as our guinea pig. After the as_strided() call, this is converted to:

xs[0, :, :, :, :, 0] = 
[ 
  [[[ 1.  2.  3.]
    [ 6.  7.  8.]
    [11. 12. 13.]]
   
   [[ 2.  3.  4.]
    [ 7.  8.  9.]
    [12. 13. 14.]]

   [[ 3.  4.  5.]
    [ 8.  9. 10.]
    [13. 14. 15.]]]

  [[[ 6.  7.  8.]
    [11. 12. 13.]
    [16. 17. 18.]]
	
   [[ 7.  8.  9.]
    [12. 13. 14.]
    [17. 18. 19.]]
	
   [[ 8.  9. 10.]
    [13. 14. 15.]
    [18. 19. 20.]]]

  [[[11. 12. 13.]
    [16. 17. 18.]
    [21. 22. 23.]]

   [[12. 13. 14.]
    [17. 18. 19.]
    [22. 23. 24.]]

   [[13. 14. 15.]
    [18. 19. 20.]
    [23. 24. 25.]]]
]

i.e. mimicking a 3x3 filter sliding over the image. This is done for each
batch and channel, thus obtaining a 6-dimensional strided image.
'''

Yeah, I just copied and pasted my own docs, sue me. So the previous two lines were there to get the original image to a strided form, which is a lot easier to work with for the next step. You might notice that I’m throwing the word ‘strided’ around a lot, so even if you can’t remember what each one means, just nod your head as if in deep thought - it’ll impress any Javascript developers who happen to be looking over your shoulder. Onto the last line!

Einsum

return np.einsum('wxyijk,ijkd->wxyd', xs, f). Ah yes, einsum. Another one of those supposedly fiddly functions (not true) that can greatly increase your stature in the workplace, just by mentioning it in casual conversations (true - along with other cryptic terms like “kernel”, “RAII” and “The Rust Programming Language”). I found this to be a great intro. It’s short for ‘Einstein Summation’, named of course after everyone’s favourite physicist, Mr Summation. Here’s a (very) quick primer:

See that arrow? On the right-hand side of it is the output; on the left side is a comma-separated list of inputs. By ‘input’ and ‘output’, I specifically mean their dimensions.
What’s with the fancy letters? Each one labels an axis size - so 'ij' means a matrix with shape (i, j); 'ij,jk' means two matrices with shapes (i, j) and (j, k) respectively.
If an index appears on the left-hand side but not the right, all axes labelled with that index get summed over (called ‘contraction’). Here, take a look at these examples:

C = np.einsum('ij,jk->ik', A, B) is a shorthand for

$$ C_{ik} = \sum_j A_{ij} B_{jk} $$

If you have an - aha! - moment here, you’re right: that’s just matrix multiplication! There doesn’t necessarily have to be anything on the right side though: C = np.einsum('ii->', A) means

$$ C = \sum_i A_{ii}, $$

which is, of course, the trace.

If you think it’s convoluted (see what I did there?) and frankly, a bit useless, you’ll be surprised to know that a lot of computational quantum physicists and deep learning researchers think almost exclusively in einsum. In fact, once you use it a bit more, you’ll start to realise how feeble dot, * and for loops are in front of the mighty einsum.

Now I can introduce one of my favourite techniques of solving tensor problems like these in general: dimensional analysis (similar to a technique of the same name in physics involving units rather than tensor dimensions). If you know what output shape you want, as well as the input shape(s), you can often work out the exact operation that you want to perform. Let’s say you suddenly forget how matrix multiplication works (don’t laugh, it’s more common than you think). You pause for a sec, remembering this gorgeous table:

$\mathbf A$	$\mathbf B$	$\mathbf C$
$ij$	$jk$	$ik$

“Hold on just a moment now! There’s an $i$ and a $k$ on both sides, but the $j$ is only on the left! Which means all I need to do is contract over $j$ - so $\mathbf{C = AB} = \sum_j\mathbf A_{ij} \mathbf B_{jk}$!” Admittedly, this isn’t the best example, because you have to remember the dimensions of the matrices. But I for one can recount several instances of where, despite knowing the dimensions, my mind freezes when I’m trying to decide whose rows and whose columns to multiply together. The best part is when you realise that you can just plug the dimensions into einsum and it does all the work for you! With that in mind, let’s tackle the final line.

What are we trying to combine here?

Name	Shape
`xs`	`strided_shape = B, 1 + (H - Fh) // Sh, 1 + (W - Fw) // Sw, Fh, Fw, C`
`f`	`Fh, Fw, C, D`
`output`	`B, 1 + (H - Fh) // Sh, 1 + (W - Fw) // Sw, D`

We’re on the home stretch now. Now it’s just a matter of stuffing everything into einsum, making sure to label matching indices correctly. This is our template: return np.einsum('______,____->____')

The first three dimensions of xs appear in output, so no contraction there. We’ll call the axes w, x and y. return np.einsum('wxy___,____->wxy_')

The last three dimensions need to be contracted with the first three dimensions of f. We’ll call the contracted indices i, j and k. return np.einsum('wxyijk,ijk_->wxy_')

Finally, the last dimension of f remains as it is. We’ll call this one d. return np.einsum('wxyijk,ijkd->wxyd')

So we’ve recovered the last line in great style. If you head back up to the docs example, you can see that each filter gets multiplied element-wise with each 3-d block in xs - exactly the behaviour we want.

Seemed like a lot of explanation for a couple of lines of code, right? However, once you practice enough, it almost becomes second nature. Oops, forgot to do one thing. Let’s check if all this is correct (notice how Tensorflow’s one is called ‘conv2d’ - in hindsight that was probably a better name, since the actual convolution [read: sliding] is only happening over 2 dimensions):

from tensorflow.nn import conv2d
tf.enable_eager_execution()

x = np.random.randn(64, 28, 28, 3)
f = np.random.randn(6, 6, 3, 2)
strides = 2, 2

y = conv4d_forward(x, f, strides)

tf_x, tf_f =  tf.Variable(x), tf.Variable(f)
tf_y = conv2d(tf_x, tf_f, strides, padding='VALID')

print(np.allclose(y, tf_y))

> True

The code for my entire neural network project (codename: Twist) will be up on Github shortly. EDIT: Here it is!

The Good Stuff (10/8/20 - 16/8/20)

Chemistry:

Machine Learning

Computer Science and App Dev

Logic and Set Theory

Group Theory, Calculus and Fun Stuff

Linear Algebra

Physics

The Good Stuff (4/8/20 - 10/8/20)

Maths

Logic and Set Theory

Physics

Computer Science

Stats and Machine Learning

The Good Stuff (20/7/20 - 27/7/20)

Mathematics

Set Theory, Logic and Linear Algebra (don’t ask)

Category Theory

Physics

Chemistry

Computer Science and App Stuff

Convolutions in 4D

Convolutions

Code

Strides

Einsum