Work | TimMattson

Timothy G. Mattson, Ph.D.

Honorary Professor
University of Bristol

LinkedIn Page
Ocean Park, WA
email:tgmattso@gmail.com

Biographies: My professional — and not so professional — background.
Key Publications: This list is always out of date, but I try to keep the major papers on this list.

Brief, Stuffy Bio

Tim Mattson is a parallel programmer obsessed with every variety of science (Ph.D. Chemistry, UCSC, 1985). In 2023 he retired after a 45-year career in HPC (30 of which were with Intel). During his long career he has had the privilege of working with smart people on great projects including: (1) the first TFLOP computer (ASCI Red), (2) Parallel programming languages … Linda, Strand, MPI, OpenMP, OpenCL, OCR and PyOMP (3) two different research processors (Intel’s TFLOP chip and the 48 core SCC), (4) Data management systems (Polystore systems and the TileDB Array-based storage engine), (5) the GraphBLAS API for expressing graph algorithms as sparse linear algebra. Tim has over 150 publications including six books on different aspects of parallel computing.

He is also a recently retired kayak coach and instructor trainer (ACA certified). His obsession with sea kayaking, including “self-wetting” moments in the ocean, is pretty bad.

Long, Informal Bio

In graduate school, I couldn’t decide whether to be a chemist, physicist, mathematician or a computer scientist. Enjoying chaos, I choose all four by getting a Ph.D. in chemistry (U.C. Santa Cruz, 1985) for solving a physics problem (quantum scattering) with new numerical methods (approximate potential methods for coupled systems of radial Schroedinger equations), on the primitive computers available in those days (Vax 750).

My confusion deepened during a Caltech post-doc in Geoffrey Fox’s Concurrent Computation project where I took my differential equation solvers and ported them to the Caltech/JPL hypercubes. These machines were painful to use, but being a true masochist, I fell in love with parallel computers and have been using them ever since.

The details of my career are boring, but basically involve industrial experience in radar signal processing, seismic signal processing, numerical analysis, computational chemistry and of course, the use of parallel computers. I emphasize “the use of” parallel computers. I have always measured the value of a computer by how useful it is. Eventually, I ended up at Yale (1991) where my research took me into the depths of many different parallel programming environments (p4, PVM, TCGMSG, Linda, Parlog, CPS, and many others) on many different parallel computers including clusters of workstations.

In 1993, I left Yale and joined Intel’s Scalable Systems Division (SSD).That was an exciting time at Intel SSD. The Paragon supercomputer was new and a huge amount of work was needed to understand how to use this big machine (with the best front panel lights in the industry). My job was to get inside the user’s heads and make sure our products really worked for their problems. This resulted in a collection of performance models to help guide the design of future parallel computers.

The pinnacle of my work at Intel SSD was the ASCI Option Red supercomputer – the world’s first computer to run the MPLINPACK benchmark in excess of one teraFLOPS (1.34 TFLOPS to be exact). I was a senior scientist on this project and was in the middle of the whole project. I helped write the proposals, verify the design, and debug the system. I was responsible for communicating technical issues with the customers and had to make sure the initial applications effectively scaled on the machine. When we delivered the system to the customer, I left Intel SSD and moved to Intel’s long range research laboratory — the Microcomputer Research Lab (MRL).

At MRL, my job was to solve once and for all the software crisis in parallel computing. Even with almost 20 years of research, the parallel computing community hadn’t attracted more than a minuscule fraction of programmers. Clearly, we were doing something wrong. My hypothesis was that we could solve this problem, but only if we’d work from the algorithm down to the hardware — not the traditional hardware first mentality. This work contributed to the creation of OpenMP (but to be clear — the good people at SGI and KAI played a larger role in getting OpenMP started).

Once OpenMP got off the ground, I moved to Intel’s software products group (the Microcomputer Software Laboratory or MSL) to support the transition of OpenMP from research to product. With OpenMP moving forward, I took the time to contribute to several efforts to support parallel programing for clusters, distributed computing, and peer to peer computing.

I then took a few years away from parallel computing to work on life sciences. The idea was that Intel was a powerful technology company. Shouldn’t we be able to drive technology deep into healthcare and make the healthcare system better? This was a fascinating period of my career covering the early 2000s. I collaborated with people working on the human genome project, established a research center with the Swiss Institute of bioinformatics to explore the exciting field of proteomics, and worked closely with academic cancer researchers.

Since I was traveling to Switzerland regularly, I was pulled into our efforts to support CERN and the Large Hadron Collider. I formed deep connections with people in the the high energy physics (HEP) community. These connections continue today as I work with the HEP community on a number of educational programs in high performance computing.

My career centered on parallel computing, but the field was still not doing well. Even with simple APIs such as OpenMP and ubiquitous multi-core chips, most programmers were still unwilling to write parallel code. Where had we gone wrong? Talking to people across the academic community I came across the effort by psychologists to understand the cognitive processes involved in programming. Every known programmer (at least before AI LLM systems came on line) is a human. So maybe when we design programming models, we should base our designs on how human brains work. it’s a long story but this led me to the idea of design patterns and how they could be used to organize our understanding of parallel computing. This resulted in a book (Patterns for Parallel Programming) that has become my top-cited research work. Several years after the book came out, I worked closely with a team of researchers at UC Berkeley where we expanded the patterns concept to address the full software design problem. Intellectually, this is some of my most important work since it is not tied to any particular language (and hence will continue to be relevant even as new languages replace the older ones).

The world of parallel computing changed around 2006 when Nvidia put General Purpose GPU programming on the map. They did this through a new programming language based on the idea of streaming work onto a throughput oriented device (i.e., a GPU). They were convinced that the market would not accept a proprietary programming language for what they knew would grow into a major class of parallel systems. So they joined together with people from AMD, Apple, and Intel to create an open standard for GPU programming. The result was called OpenCL. I am extremely proud of my work on OpenCL and believe with the right support it could have become the ubiquitous platform for GPU programming. It did not get the support it needed (mostly due to technically-ignorant business-people) and with its lukewarm support of OpenCL, Intel helped clear the path for CUDA to totally dominate the market. I’m not happy where this ended up, but I am honored that I got to be right smack in the middle of the emergence of the GPU for HPC.

Shortly after the work on OpenCL was underway, I was presented with the opportunity to launch a research program on Graph Algorithms. My motivation for getting involved with this work was the fact that I knew almost nothing about graph algorithms. As a traditional HPC person specializing in scientific computing, I just didn’t run into graphs that often. Due to this research project, I became a graph algorithms “expert” and started working with the research community exploring “Graphs in the language of linear algebra”. I can represent a graph as a sparse array and my graph algorithms can be constructed around operations over matrices and vectors. It was a small step from graphs as arrays to the idea that if you are going to consider graphs as linear algebra then you need a set of Basic Linear Algebra Subprograms (BLAS) designed around the needs of graph algorithms. Hence was born the GraphBLAS which has been a very successful research effort (i.e. lots of good papers and multiple companies selling software based on the GraphBLAS).

When the folks in Intel labs discovered that I new about graphs, they decided that made me a big data person (this was around 2012 when big data was all the rage) and that meant I could be the Intel principle investigator (PI) for our new Big Data Research Center at MIT. This didn’t make any sense to me, but it was what Intel needed me to do and therefore I did it. I spent the rest of my career at Intel managing research projects at MIT. I won’t go into the details, but by the time I was done, I had become an official database person. After a career ignoring data (as any good parallel computing person does since most parallel computers don’t do very well with I/O) I was publishing papers on polystore-databases and array-based storage engines (the amazing TileDB system). It was a surprising twist in my career.

Along the way, I worked with some very smart people at Intel and MIT to launch a research agenda on Machine Programming. We wrote a massive paper on the three pillars of machine programming which established a framework for making sense of programming systems that automate some of the steps in creating software. Along those same lines, I met a great team of researchers in Israel and we worked together to explore ways to use large language models to generate code. Our angle was to create large language models trained from scratch on HPC languages (C, Fortran, and C++). The idea was that by restricting the scope of the training data, we can make smaller models that actually work better than the larger LLMs (such as GPT). The idea worked and led to an excellent series of papers.

In August of 2023, I retired from Intel. It was time. I was working between 60 and 80 hours a week and it was taking a toll on my health. In retirement, I am focussing on my love of teaching and I travel the world to teach topics in high performance computing. I am continuing with my research on AI for programming systems, quantum computing, and the GraphBLAS, but much of my time is focussed on how to take the complex ideas I helped develop in my career and turn them into course materials I can use in teaching at Universities all over the world. It’s a great way to go through life after 40+ years living in the corporate-grind.

Key Papers and Technical Reports

I have a pretty long list of publications. Here’s an excerpt of the full list organized by research topic.

Parallel Programming Models

Programming your GPU with OpenMP, MIT Press, Tom Deakin and Tim Mattson. Nov. 2023.
The OpenMP Common Core: Making OpenMP Simple Again, MIT Press, Tim Mattson, Helen He, Alice Koniges. Nov. 2019.
Quantifying OpenMP: Statistical insights into usage and adoption, Tal Kadosh, Niranjan Hasabnis, Timothy Mattson, Yuval Pinter, Gal Oren. HPEC, 2023,
Multithreaded parallel Python through OpenMP support in Numba, Todd Anderson, Timothy G. Mattson. SciPy 2021.
The Open Community Runtime: A Runtime System for Extreme Scale Computing, T. G. Mattson, R. Cledat, V. Cave, V. Sarkar, Z. Budimlic, S. Chatterjee, J. Fryman, I. Ganev, R. Knauerhase, M. Lee, B. Meister, B. Nickerson, N. Pepperling, B. Seshasayee, S. Tasirlar, J. Teller, and Ni. Vrvilo, IEEE High Performance Extreme Computing, 2016
OpenCL Programming Guide, Addison Wesley, A. Munshi, B. Gaster, T. Mattson, J. Fung, D. Ginsburg. July 2011.
Introduction to Concurrency in Programming Languages, CRC Press, Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen. 2009
The Efficiency of Linda for General Purpose Scientific Computing, T. Mattson. Scientific Programming, vol 3, p. 61, 1994.

Design Patterns for Parallel Programming

Patterns for Parallel Programming, Addison Wesley, Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill. 2004.
Our Pattern language: a Design Pattern Language for Engineering (Parallel) Software, Kurt Keutzer and Tim Mattson. ParaPLOP, 2009.

Hardware/Software Co-design

The 48-core SCC processor: the programmer’s view, T, G. Mattson, R. F. Van der Wijngaart, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, S. Dighe. SC10, Nov. 2010.
Programming Intel’s 80 core terascale processor, T. G. Mattson, R. van der Wijngaart, M. Frumkin. SC’08, Nov. 2008.
Comparing runtime systems with exascale ambitions using the Parallel Research Kernels, R. F. Van der Wijngaart, A. Kaya, J.R. Hammond, G. Jost, T. St. John, S. Sridharan, T.G. Mattson, J. Abercrombie, and J. Nelson. SC’16, Nov. 2016.
The Case for Message Passing on Many-core Chips, R. Kumar, T. G. Mattson, G. Pokam, R. van der Wijngaart. In Multiprocessor System-on-Chip: Hardware Design and Tool Integration edited by M. Hubner and J. Becker Springer Verlag, pp 115-123, 2011.
A TeraFLOP in 1996: The ASCI TeraFLOP Supercomputer, T.G. Mattson, D. Scott and S. Wheat. IPPS, 1996.

Database Management systems

The Big Dawg Polystore System, J. Duggan, A. J. Elmore, M. Stonebraker, M. Balazinska, B. Howe, J. Kepner, S. Madden, D. Maier, T. Mattson, S. Zdoânik. ACM Sigmod Record, 44(3), 2015.
Associative Array Model of SQL, NoSQL, and NewSQL Databases, Jeremy Kepner, Vijay Gadepally, Dylan Hutchison, Hayden Jensen, Timothy Mattson, Siddhartha Samsi, Albert Reuther. HPEC, 2016.
The TileDB Array Data Storage Manager, Stavros Papadopoulos, Kushal Datta, Samuel Madden, Timothy Mattson. VLDB Volume 10, No. 4, Dec. 2016.
Enabling Query Processing across Heterogeneous Data Models: A Survey, R. Tan, R. Chirkova, V. Gadepally, and T. Mattson. IEEE Big data workshop: Methods to manage heterogeneous big data, Boston, MA, 2017.

Graph Algorithms … Primarily with Graphs as Sparse Linear Algebra

Standards for Graph Algorithm Primitives, T. G. Mattson, D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Jaloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Proceedings of the IEEE High Performance Extreme Computing Conference. 2013
Mathematical Foundations of the GraphBLAS, Jeremy Kepner, Peter Altonen, David Bader, Aydin Buluc, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Jose Moreira, John D. Owens, Carl Yang, Marcin Zalewski, Timothy Mattson, IEEE High Performance Extreme Computing, 2016
Design of the GraphBLAS API for C, Aydin Buluc, Tim Mattson, Scott McMillan, Jose Moreira, and Carl Yang. GrAPL workshop at IPDPS, 2017.
The GraphBLAS in Julia and Python: the PageRank and Triangle Centralities, Michel Pelletier, Will Kimmerer, Timothy A. Davis, Tim Mattson. HPEC, 2021.
C++ and Interoperability Between Libraries: The GraphBLAS C++ Specification, Benjamin Brock, Scott McMillan, Aydın Buluc, Timothy G. Mattson, and Jose ́ E. Moreira, GrAPL workshop at IPDPS 2023
Parallel GraphBLAS with OpenMP, Mohsen Aznaveh, Jinhao Chem, Timothy A. Davis, Balint Hegyi, Scott P. Kolodziej, Timothy G. Mattson, and Gabor Szarnyas. SIAM workshop on Combinatorial Scientific Computing, 2020.
LAGraph: A Community Effort to Collect Graph Algorithms Built on Top of the GraphBLAS, T Mattson, TA Davis, M Kumar, A Buluç, S McMillan, J Moreira, C Yang. GrAPL workshop at IPDPS 2019.
Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite, Ariful Azad, Mohsen Mahmoudi Aznaveh, Scott Beamer, Mark Blanco, Jinhao Chen, Luke D’Alessandro, Roshan Dathathri, Tim Davis, Kevin Deweese, Jesun Firoz, Henry A Gabb, Gurbinder Gill, Balint Hegyi, Scott Kolodziej, Tze Meng Low, Andrew Lumsdaine, Tugsbayasgalan Manlaibaatar, Timothy G Mattson, Scott McMillan, Ramesh Peri, Keshav Pingali, Upasana Sridhar, Gabor Szarnyas, Yunming Zhang, Yongzhe Zhang, IEEE International Symposium on Workload Characterization, 2020.

AI for Programming Systems

The Three Pillars of Machine Programming, Justin Gottschlich, Armando Solar- Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B Tenenbaum, Tim Mattson. Second ACM SIGPLAN Workshop on Machine Learning and Programming Languages (MAPL), PLDI, 2018
A zero-positive learning approach for diagnosing software performance regressions, Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier S Turek, Tim Mattson, Abdullah Muzahid. Neurips, 2019
Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks, Tal Kadosh, Niranjan Hasabnis, Vy A. Vo, Nadav Schneider, Neva Krien, Mihai Capota, Abdul Wasay, Guy Tamir, Ted Willke, Nesreen Ahmed, Yuval Pinter, Timothy Mattson and Gal Oren. arXiv:2312.13322v1, 2023
Advising OpenMP Parallelization via a Graph-Based Approach with Transformers, Kadosh, Tal, Nadav Schneider, Niranjan Hasabnis, Timothy Mattson, Yuval Pinter, and Gal Oren, IEEE HPEC, 2023.