Advanced Research Computing 2024/25

This year’s course will broadly focus on aspects of machine learning and its interactions with AGQ.

Topics to be covered include: 
* A review of Python, git, and coding environments 
* Machine learning, from basic architecture to modern design (using PyTorch tools)
* ML in particle physics 
* ML and geometric/topological structures in biology 
* Topological data analysis and applications to ML 
Our lecturing team consists of Rob Currie, Christos Leonidopoulos, Davide Michieletto, and Sjoerd Beentjes.
Our TA’s are Djordje Mihajlovic, Tuan Pham, and Siddharth Setlur.
The course organiser is Tudor Dimofte.

Logistics

Please email Tudor Dimofte to register for the course, either for credit or to audit.

The course meets for three hours a week, in person near the Bayes Centre (University of Edinburgh central campus) and online. Our meeting rooms/times are:

24 Jan 10am-1pm Lec 1 (Rob) G.02, 16-20 George Square (access at 19GSq) 
31 Jan 10am-1pm Lec 2 (Rob) LG.10 (40 George Square Lower Teaching Hub)
7 Feb 10am-1pm Lec 3 (Rob) LG.10 (40 George Square Lower Teaching Hub)
14 Feb 10am-1pm Lec 4 (Rob) LG.10 (40 George Square Lower Teaching Hub)
24 Feb 2pm-5pm Lec 5 (Christos) M2 – Teaching Studio (Appleton Tower)
3 Mar 2pm-5pm Lec 6 (Christos) M2 – Teaching Studio (Appleton Tower)
10 Mar 2pm-5pm Lec 7 (Davide) M2 – Teaching Studio (Appleton Tower)
17 Mar 2pm-5pm Lec 8 (Davide) M2 – Teaching Studio (Appleton Tower)
24 Mar 4pm-5pm guest lecture: James Sully (Anthropic) online / view together in TBA
28 Mar 10am-1pm Lec 9 (Sjoerd) LG.10 (40 George Square Lower Teaching Hub)
4 Apr 10am-1pm Lec 10 (Sjoerd) G.02, 16-20 George Square (access at 19GSq) 
11 Apr    TBA – guest?  

Zoom links for virtual participation are sent out via email. For Glasgow-based students: 414 East Quad (Geography – opposite the cloisters in the main building) has been booked to join remotely.

Assignments: A homework coding exercise will be given every two lectures (five in total). For students taking the course for credit, you must score at least 60% on at least four out of five homeworks to pass the course.

Course Material

This course will begin with 4 lectures followed by workshops introducing modern AI/ML technologies.

To be able to participate in the workshops students will need to install Anaconda and set up an environment for running AGQ Jupyter notebooks.

Installing Anaconda

First you need to install Anaconda on your laptop:

https://docs.anaconda.com/anaconda/install/

Import an Anaconda environment

The environment file for the start of this course is found here:

https://github.com/rob-c/agq-uploads/blob/master/AGQenv.yaml

Importing an environment into Anaconda:

https://docs.anaconda.com/navigator/tutorials/manage-environments/#importing-an-environment

Anaconda cheat sheet:

https://github.com/jumdc/cheat-sheets/blob/main/cs/conda.md

Testing your Setup

After installing Anaconda and setting up the AGQenv I would recommend testing your install with this Jupyer-Notebook:

https://github.com/rob-c/agq-uploads/blob/master/TestPlayBookAGQ.ipynb

Running this notebook should at the very bottom display the line:   `AGQ Tests Passed!`

If you see a bunch of red text then something has gone wrong and you may need to reach out to an expert for help. (If you don’t have access to an expert before the first workshop, don’t worry — we’ll help you out then.)

 

Familiarity with Python3

The first lecture of this course will require some familiarity with the Python3 programming language. Some useful online resources for becoming familiar with Python are below. If you’ve never used Python before, please take a look. We’ll help during the first few workshops as well.

Scientific programming with Python: https://git.ecdf.ed.ac.uk/pclark3/sciprog2024

Python for Beginners: https://python.land/python-tutorial

Intro to Python: https://python-course.eu/python-tutorial/

Online Python tutorials from Microsoft: https://learn.microsoft.com/en-us/shows/intro-to-python-development/

 

Familiarity with Git

The first workshop will go through the basics of learning how to use git to fork, clone and submit changes to a repo and installing and editing some Python3 code. 

Some other online resources for mastering git are:

Online Git and GitHub training materials from Microsoft: https://learn.microsoft.com/en-us/training/paths/github-foundations/

Learning how to work with Git branches: https://learngitbranching.js.org/

W3Schools online Git training: https://www.w3schools.com/git/

Atlassian Git training: https://www.atlassian.com/git

 

Familiarity with Jupyter-Notebooks

Again, during this course Jupyter-labs and Jupter-notebooks will be introduced to manage, perform and store the results of Machine Learning problems.

For anyone wanting to learn more about Jupyter-notebooks there are these online resources:

Dataquest’s intro to Jupyter-Notebooks: https://www.dataquest.io/blog/jupyter-notebook-tutorial/

Geeks4Geeks: https://www.geeksforgeeks.org/how-to-use-jupyter-notebook-an-ultimate-guide/

A good tutorial found on GitHub: https://gist.github.com/rob-c/b0541f3a51f0cfb518dd5ddc648a79f7 

Anaconda’s introduction to Jupyter-Notebooks: https://learning.anaconda.cloud/jupyter-notebook-basics-course 

Lecture 1:

This lecture will begin by describing some of the more advanced features in Python programming as well as giving an introduction to the git version control system.

For the Workshop:

Python & Git

The example playbook for this workshop is: https://github.com/rob-c/farm

This repo provides a demonstration of using classes, inheritance, callbacks and basic exception handling.

The goals for working with this would be:
  0) Fork the repo on GitHub and clone locally
  1) Install and run the package in a Python virtualenv
  2) Switch to a development branch
  3) Add a new crop, animal and equipment class to the farm
  4) Fix and break different pieces of equipment
  5) After adding new ‘features’ commit this to a new branch 
  6) Push features back to GitHub
  7) Make a PR detailing changes

Numerical Minimization

There will also be some examples of using numerical minimization to tune parameters within a PDF to some simulated data in the workshop.

This should help familiarize you with different types of machinery designed to solve the problems of extracting information from finite datasets. As well as some of the potential short-comings.

 

Slides:

https://github.com/rob-c/agq-uploads/blob/master/Lecture1/Lecture1.pdf

Workshop-Notebooks:

https://github.com/rob-c/agq-uploads/blob/master/Lecture1/Minimization_Problems.ipynb

https://github.com/rob-c/agq-uploads/blob/master/Lecture1/data-science-tools.ipynb

 

Lecture 2

This lecture and workshop will go through the fundamentals of how Deep Neural Networks are constructed.

We will work through an example of building and fitting a DNN in numpy and training it on some input data.

After doing this, we will introduce the PyTorch framework and show how it can be used to create models to perform the same task but much quicker.

The workshop will finish with describing, building and training a DNN classifier using the PyTorch framework.

 

Slides:

https://github.com/rob-c/agq-uploads/blob/master/Lecture2/AGQ2.pdf

Workshop-Notebook:

https://github.com/rob-c/agq-uploads/blob/master/Lecture2/DNN-Simplified-Questions.ipynb

 

Lecture 3

This lecture and workshop will dive deeper into different Neural Network designs.

First we will recap what is involved in building a Classifiers using DNNs. Then we will move on to discuss different neurons and how they can be used to achieve the same results.

I will introduce (Variational-)AutoEncoders, uNET and other network designs which are able to perform image analysis.

In the workshop I will go through AutoEncoders, their uses, advantages and pitfalls as well as how to use such models to perform anomaly detection.

Slides:

https://github.com/rob-c/agq-uploads/blob/master/Lecture3/AGQ3.pdf

Workshop material:

https://github.com/rob-c/agq-uploads/blob/master/Lecture3/img.png

https://github.com/rob-c/agq-uploads/blob/master/Lecture3/AGQ3_Questions.ipynb

https://github.com/rob-c/agq-uploads/blob/master/Lecture3/trained_vae_model.pth

 

Lecture 4

This lecture and workshop will touch on some modern Neural Network designs.

I will introduce the concept of Attention in ML, it’s uses as well as advantages and disadvantages.

I will also discuss the impact of precision on ML model design and the results from the importance of model performance in evaluation vs training.

In the workshop at the end of this lecture I will go through construction a multi-headed attention network to perform classification, and training an attention based network on a waveform.

As well as this I will discuss and demonstrate some of the technical issues surrounding training and evaluating a 1-bit neural network compared to using a network using full floating point precision.

More advanced (but related) things that have been recommended:

Machine learning for the working mathematician – seminar series at SMRI

LLM’s and optimizing performance, from the point of view of a processor – lectures by Rafi Witten