Education

Computational lab course that spans research data processing workflow starting just after the point of acquisition through to computation and visualization. Topics will span Stanford specific best practices for data storage, code management, file formats, data curation, toolchain creation, interactive and batch computing, dynamic visualization, and distributed computing. Students will work with datasets of their choosing when working through topics. Course information at: http://bioe301p.stanford.edu

  • Dates: 01/05/2026 - 03/13/2026
  • Time: Mon, Wed 3:00 PM - 4:20 PM
  • Location: Hewlett Teaching Center 103
  • Units: 2-3
  • Link: ExploreCourses

This one-week course introduces core tools for research computing through short lectures and guided, hands-on coding. Students will learn to use command line Bash and work effectively in the Unix shell; track, version, and share code with Git; transfer research data between Stanford data storage systems; access Stanford HPC resources (FarmShare, Sherlock) via the browser-based Open OnDemand portal; and use Python for scientific computing and reproducible analysis. Prerequisite: knowledge of basic Python syntax and data types.

  • Dates: 05/18/2026, 05/20/2026, 05/22/2026
  • Time: Mon, Wed, Fri 1:30 PM - 4:20 PM
  • Units: 1
  • Link: ExploreCourses

  • Dates: 03/03/2026, 03/05/2026, 03/06/2026
  • Time: Tu, Thu, Fri 1:30 PM - 4:20 PM
  • Units: 1
  • Link: ExploreCourses

Building on Data Best Practices: Basics, this one-week course covers advanced topics in reproducible, portable, and scalable scientific computing. Students will containerize software and environments with Podman/Docker; parallelize analyses locally with GNU Parallel and orchestrate cloud-based, asynchronous parallelization with Google Cloud Pub/Sub; and implement CI/CD in Stanford GitLab, using runners to automate scientific data preprocessing and analysis pipelines. It is recommended that students take Data Best Practices: Basics prior to enrolling in this course. Students with equivalent experience may also enroll.

  • Dates: 05/27/2026, 05/29/2026, 06/01/2026
  • Time: Wed, Fri, Mon 1:30 PM - 4:20 PM
  • Units: 1
  • Link: ExploreCourses

  • Dates: 03/10/2026, 03/12/2026, 03/13/2026
  • Time: Tu, Thu, Fri 1:30 PM - 4:20 PM
  • Units: 1
  • Link: ExploreCourses

Please log in to view the schedule and apply for workshops.

  • Introduction to Unix shells
  • Essential Bash commands
  • Developing and executing a Bash script

  • Introduction to version control concepts
  • Common git commands and workflows
  • Getting started with Stanford GitLab

  • Basic syntax, data types, and core libraries
  • Jupyter
  • NumPy
  • Pandas

  • Stanford-specific and external resources for data management
  • Command line data transfer tools:
    • rsync (needed for FarmShare/Sherlock/OAK)
    • rclone (used for cloud services)

  • History, development, and current state of large language models (LLMs)
  • Building a retrieval-augmented model that allows for management and querying of scientific knowledge

  • Promoting reproducibility through using containerized analysis
  • Interactive, containerized, brower-based analysis using HPC resources:
    • FarmShare/Sherlock OnDemand
    • Hosting python kernels on FarmShare/Sherlock

  • Modifying existing containers
  • Building custom containers from scratch
  • Making custom containers available to your lab or to the public

  • Introduction to embarrassingly parallel problems
  • Methods of parallelization at scale:
    • GNU Parallel (powerful, shell-based tool that allows for easy parallization of tasks on a local machine)
    • Pub/Sub (cloud-based service that allows for scalable, aynchronous analysis pipelines)

  • Fundamentals of continuous integration and pipelines
  • GitLab CI