Advanced Data Science
CBB510
Course Directors: Giorgio Quer and Chunlei Wu
Term: Spring
Credits: 3.0
COURSE DESCRIPTION
This course will focus on the recent advances in data science and their application in biomedical research, in which the bioinformatics and data science approaches have played an essential role. The course is designed as a series of lectures by either course directors or invited speakers, each of which will focus on a particular topic per week and followed by a journal-club style discussion or a hand-on coding exercise on the same topic.
The covered Data Science topics are generally organized in two dimensions: 1) Data Science techniques, covering topics such as Machine-learning algorithms, NLP/LLM, Knowledge graphs, Alphafold, as well as general Python programming skills; 2) Biomedical Data Science scenarios, covering topics such as scRNA-Seq, Proteomics data. The primary goal of this course is to enhance students’ knowledge foundation and practical skills by exposing them to a broad diversity of state-of-art Data Science topics.
The target students are those who have finished introductory bioinformatics training (e.g., finished Applied Bioinformatics or Introduction to Data Science or Introduction to Biostatistics course), and look for the continued advancement of their computational skills and knowledge. This course also prepares students for applying data science approaches in their own research area, as well as writing and presenting research projects in the computational field. Coding exercises will be done in Python with possibly few exceptions (e.g. might use R in some invited lectures), so basic Python programming skills are required and will be summarized at the beginning of the class.