Software
I am actively involved in open-source (scientific) software development. Most projects can be found on my GitHub page.
Software in Production
Here are software applications used by at least hundreds of users for their research.
xESMF: Universal Regridder for Geospatial Data (link)
Regridding is one of the most important needs in Atmospheric and Climate Data Analysis. xESMF combines the battle-tested ESMF and modern scientific Python stacks like Xarray and Dask to provide fast and easy-to-use regridding functionalities. I am also interested in how it can be integrated with the Pangeo big-data stack.
GEOS-Chem: Global Model of Atmospheric Chemistry and Composition (link)
GEOS-Chem is a numerical model for atmospheric composition, supported by NASA and managed at Harvard. It is used by hundreds of groups worldwide and has been developed over 20+ years. My major contributions include: (1) Porting the original OpenMP-only code to MPI for multi-node parallelization. Unlike writing MPI code from scratch, refactoring an existing giant code base (~1 million lines of source code) imposes unique software engineering challenges. The new code uses the ESMF software framework on top of the original code, to minimize the changes to the original code base. (2) Bringing the model to the Amazon Web Services (AWS) cloud, including a cloud-HPC environment with ~1000 CPU cores.
Software in Prototype
They are typically for class projects or just for fun. They can be useful prototypes for building more serious software applications.
The Cubed-Sphere grid is getting increasingly popular in atmospheric modeling, in part due to its excellent scalability on massively parallel architecture. For example, it is used by NOAA's next generation global weather prediction system. Interestingly, such grid is also used for extending convolutional neural networks to spherical geometries. Unlike the well-known latitude-longitude grid, Cubed-Sphere can be a bit unintuitive at first glance (although it is actually super intuitive!). A good visualization can let people understand the grid geometry quickly without dividing into the math. The code uses Plotly Python API to generate JavaScript that runs in user's browser.
Cubed-Sphere Data Processing (link)
A tiny package to facilitate the processing of cubed-sphere data with Xarray. It implements a fast weighted-binning algorithm for taking zonal mean over curvilinear grids, 10x faster than the reference implementation. It is used in one of my papers.
2D High-order Advection Solver in TensorFlow (link)
No one prevents you from using neural network libraries like TensorFlow & PyTorch for general-purpose scientific computing :) This can be a quick way to make code running on GPU/TPU. The code implements a second-order flux-limited scheme using TensorFlow Eager Execution.
Neural Network ODE Solver (link)
The code implements a very old and simple idea that the solution to an Ordinary Differential Equation (ODE) can be parameterized by a neural network and "solved" by minimizing a loss function. It uses HIPS-autograd to perform automatic differentiation on NumPy functions. (It is NOT about the fancy new paper on Neural ODEs!)
Parallel K-means (link)
K-means is a simple and highly-parallelizable unsupervised clustering algorithm. The code parallelizes K-means by OpenMP, MPI, hybrid OpenMP-MPI, and CUDA C.