I am actively involved in open-source (scientific) software development. Most projects can be found on my GitHub page.
Software in Production
Here are software applications used by at least hundreds of users for their research.
xESMF: Universal Regridder for Geospatial Data
is one of the most important needs
in Atmospheric and Climate Data Analysis. xESMF combines the battle-tested ESMF
and modern scientific Python stacks like Xarray
to provide fast and easy-to-use regridding functionalities. I am also interested in how it can be integrated with the Pangeo
GEOS-Chem: Global Model of Atmospheric Chemistry and Composition
GEOS-Chem is a numerical model for atmospheric composition, supported by NASA
and managed at Harvard
. It is used by hundreds of groups
worldwide and has been developed over 20+ years. My major contributions include: (1) Porting the original OpenMP-only code to MPI for multi-node parallelization. Unlike writing MPI code from scratch, refactoring an existing giant code base (~1 million lines of source code) imposes unique software engineering challenges. The new code uses the ESMF
software framework on top of the original code, to minimize the changes to the original code base. (2) Bringing the model to the Amazon Web Services (AWS) cloud, including a cloud-HPC environment with ~1000 CPU cores.
Software in Prototype
They are typically for class projects or just for fun. They can be useful prototypes for building more serious software applications.
Cubed-Sphere Grid Visualization
Cubed-Sphere Data Processing
A tiny package to facilitate the processing of cubed-sphere data with Xarray. It implements a fast weighted-binning algorithm
for taking zonal mean over curvilinear grids, 10x faster than the reference implementation. It is used in one of my papers
2D High-order Advection Solver in TensorFlow
Neural Network ODE Solver
The code implements a very old and simple idea that the solution to an Ordinary Differential Equation (ODE) can be parameterized by a neural network and "solved" by minimizing a loss function. It uses HIPS-autograd
to perform automatic differentiation on NumPy functions. (It is NOT about the fancy new paper on Neural ODEs
K-means is a simple and highly-parallelizable unsupervised clustering algorithm. The code parallelizes K-means by OpenMP, MPI, hybrid OpenMP-MPI, and CUDA C.