Last summer, I moved into a cramped Airbnb in Pasadena with two roommates to work at Caltech’s amazing Jet Propulsion Laboratory (JPL). I tinkered with an airborne radar system‘s visualization system, taking it from slow and static to streamlined and dynamic with a custom built Python data pipeline and GUI. I thought the whole project was pretty interesting, so here’s what went into it.
Some Background
The JPL research site, a joint venture between NASA and Caltech, is mostly known for its combination of cutting-edge space exploration and robotics technology. The intern experience at this site is famous for being interactive, memorable, and not like any other internship.
The team I worked with, the folks responsible for the Airborne Third Generation Precipitation Radar (APR3), loads big and complicated radar systems onto research aircraft and sends them to far away places. They do this to measure cloud precipitation and study things like extreme weather, the effects of slash-and-burn agriculture and other polluting land uses on precipitation, and the general impact of aerosols on climate.
In previous missions, the APR3 radar was simply “along for the ride.” In other words, the actual direction of the aircraft was decided by a different research team working with a different tool. Essentially, the APR3 team just requested when the radar be turned on/off and took a look at the data using in-house code after each trip.
However, APR3’s next trip would be at the end of summer 2019 over the Philippines shortly after my internship as the principal instrument; that meant it could direct the plane as it flew.
Furthermore, for this experiment, they wouldn’t have a plan beforehand of where to fly until the radar went live and was in the air. They only knew what precipitation patterns they were looking for. Basically, they had to go cloud chasing.
The problem was that, for almost all the relevant decision=making data, their custom-built visualization program (originally written in MATLAB) was only capable of reading in files after each flight, not during. And it was slow, meaning you couldn’t really just tediously reopen files as it flew to approximate a live feed.
My summer was spent designing something better.
The Cloud Chaser
When I was at JPL, I called my homegrown visualization package “APR-3 Meta”, which sounds really legit and let me submit weekly official-looking reports to government employees. Now that we’re in my blog, I can call it whatever I want. I’m going with Cloud Chaser, because the idea of going through so much trouble to chase down clouds is pretty funny.
Cloud Chaser was intended to be a comprehensive Python-based upgrade to their current viz system, which was done entirely in MATLAB. The downsides to MATLAB (and the onboard viz program generally) were numerous:
- It needs a license! The license is crazy expensive and we couldn’t rely on the JPL site-wide license since that needed to be consistently verified online. There’s no guarantee of internet halfway across the world and 80,000 feet in the air.
- The scripts had loads of confusing cross-referencing and redundancies. It was built over countless iterations to satisfy all sorts of functions that it didn’t need anymore for the new mission. Lots of scripts were separated for no clear reason (little reusability, tons of redundant lines). It was also slow, and had lots of bottlenecks in nested for loops that weren’t needed, especially after being rewritten in Python.
- Critically, the program lacked any live-update features for many relevant flight parameters.
To solve these problems, we decided that it would be best if I just rewrote the entire thing in Python, which is open-source and has advantages in its wide range of free visualization packages and optimizations with vectorized code.
The main difficulty in this project was the time and manpower limit. Some interns got to work in groups, but the radar team only requested one intern, so it was just me! I only had about two months to learn MATLAB, rewrite the code that translated the binary feed from APR3 to usable data, and figure out how to make a pretty GUI in Python that also gave critical info to airplane operators in potentially extreme weather situations.
Luckily, I also had access to a great team of mentors in Dr. Raquel Rodriguez-Monje, Dr. Simone Tanelli, Dr. Ousmane Sy, and Dr. Steve Durden from the APR3 team, who offered advice and direction along the way. Ultimately, I was able to hack together a pretty competent solution before the end of summer that ended up being used in actual flight missions!
Interpreting Radar Data
I looked at several Python modules that could make data visualization GUIs, but none provided the robustness and depth of customization of PyQt, a popular Python binding for the cross-platform widget toolkit Qt. PyQt also has support for embedding Matplotlib plots, which I planned to use for graphing the radar data since it has a robust animation/rapid update class.
When designing, I quickly realized that the original method for reading binary data in the MATLAB code took too long and, for especially large input files, actually overloaded the company laptop I was given.
Since the data was arranged into packets, each varying in format along a single packet, it initially seemed necessary to iterate line-by-line individually through each packet to correctly apply the read procedure. This was essentially the method that APR3’s original code implemented.
However, I devised a workaround that leveraged the massive (3-4 orders of magnitude) speedup associated with vectorizing code in Numpy while losing no precision in the analysis step.
My idea was to read an entire file at once using a Numpy “memmap”, a method of mapping a binary file as a NumPy array, and set it to read everything as the smallest byte format that the radar’s files used, an 8-bit unsigned integer.
For the rest of the formats, I simply used vectorized operations in NumPy to convert multiple 8-bit columns to higher orders, e.g. two 8-bits could become a 16, and four 8-bits could be converted to 32. Since I knew the format ahead of time, I knew exactly which groups of columns corresponded to the higher-order formats. And if you didn’t already know, vectorizing Python code makes it much faster.
I knew it was important our code worked fast so that the system could actually be used for live visualization. This method took parsing times for even the largest files we worked with from several minutes (at least on my dinky laptop) to consistently less than a second. That’s step one done.
Visual Design
APR3 is made up of three constituent frequency bands and we wanted to track two important metrics for each, meaning six plots would essentially capture everything we needed. In PyQt, you just have write the correct horizontal and vertical containers into the window and populate them with Matplotlib widgets. Easier said than done, especially if you’ve only used PyQt a few times before (like me), but the support for Matplotlib is basically already built into PyQt.
One interesting design requirement was that the plots needed to be “file agnostic”. In other words, the team wanted to specify what gets plotted by time interval and not by file. Some files don’t totally overlap, meaning it had to be able to handle “empty” time on some intervals when plotted.
Luckily, if you populate a Matplotlib mesh chart with multiple data arrays spaced apart, the space between will just be filled with the background color. I changed it to gray to symbolize that there was no data available for that time.
Live Update
The final and main challenge of this project was to make the interface update live as the radar picked up new data. I’ve never made something like this, so at first it felt impossible to do in the time I had.
But the nature of the project also meant I had an easy place to start. APR3 automatically updates its target directory periodically as it finds new data. This meant the challenge could be reduced to simply detecting when there were changes to the data inside the directory and updating the plots accordingly.
I found an amazing package called fsmonitor to watch a filesystem after the program was initialized. The program now had an option to open a directory instead of a single file, read all of the data inside it (including some metadata files included alongside each), and then continue to watch for changes. Every new change would redraw the graph with new data.
There were some extra considerations, like logical operations specific to continuously graphing data. For instance, I had to keep track of the most “extreme” x-values so that I could continuously update the bounds of the graph. Also, each time new data was added to the Matplotlib graph, it added another color legend bar to represent only that new data – I didn’t have enough time to come up with a perfect solution, so I settled on a system that just ignores new color bar updates after the first.
Final Notes
There are a number of features that future implementations of this program may want to consider updating. First, the time-based paradigm of the plots has some limitations. The y-axis is not bounded like the x-axis, since the y-axes for the w-band were different from the ku- and ka-bands. This could potentially be resolved by linking only the ku- and ka- bands or by scaling the changes in those types to changes in the w-band dynamically.
Second, the color bars for the power and doppler velocity plots are not properly scaled to the entirety of the plot. Rather, it simply creates a color bar that is representative of the first file and refuses to add any more. When implementing the color bar originally, I found that asking it to update the color bar simply adds another color bar to the left of what is already stored. However, there is probably a workaround that I was not able to find given the time constraints.
Lastly, it would be nice to have a way to change the color scheme and plot style inside the program to prepare plots for publication and see changes immediately. Currently, it is necessary to change the internal code, restart the program, reselect the directory, and then wait for the plots to generate if you want to change the style. This implementation greatly restricts lead times for plot-making.