TLDR: If any vis (or other researcher) is in need of a large unstructured mesh data set (tets and/or other linear elements): I’m hereby sharing a properly wrangled and pre-processed version of the “NASA Mars Lander” Data Set (link at bottom).
For those that haven’t yet heard of this data set: It’s one of the most amazing data sets I’ve ever gotten my hands on – partly because of the amazing back-story (simulating the landing of Mars, how much geekier could it get?), but also because
- it’s gigantic (over 6 billion tets – yes, billion, not million)
- it’s not – as most ‘big’ data sets – some artificial test case, but a “real world” data set (yes, the did simulate at that accuracy)
- it even contains multiple time steps, so you can make cool animations (see here: https://developer.nvidia.com/techdemos/video/d2s20)
- it’s a “raw” data drop in the sense that this is really what the sim code (Fun3D) wrote out (ie, it’s useful for “in situ” and “data-parallel rendering” research, too; and
- it actually looks awesome when rendered:
(image credits: Nate Morrical, UofU)
The full, unadulterated data for that “Mars Lander Retropulsion Study” is available from the “Fun3D Retropulsion Data Portal” at https://data.nas.nasa.gov/fun3d/, and has been made available for the wider vis rendering community by the scientists that ran this data (for full attribution, see https://data.nas.nasa.gov/fun3d/), with a lot of help from Pat Moran.
Unfortunately, if you start working on that data you’ll quickly realize that the main reasons for its awesomeness – its “raw data dump” nature, and sheer size – have a flip side in that getting this data into any form useful for rendering comes with a “non trivial” amount of data wrangling : you need to get the thousands of different files downloaded, parse them, strip ghost cells, extract variables and time steps, re-merge from hundreds of per-rank results to a single mesh, etc. Doing so has been a lot of fun, but it was also a lot(!) of work (even for somebody that has a lot of hardware, and a lot of experience dealing with those things)… so to make it easier for others to use this data I decided to make both the result of my wrangling, and the code used for doing it, available to others that might want to work with it.
The resulting data for me is – for both the “small” (order three-quarters of a billion tets) and the large lander (about six billion – with a ‘b’) — a single unstructured mesh, with a single per-vertex scalar field (for me, ‘rho’, for one of the later time steps). For really high-quality rendering you probably want more than one variable; and/or multiple time steps …. but since my google drive space is limited I’ll provide only these two dumps, which should be more than enough to get you started. I’ll also provide the library I wrote to deal with this data set, so anybody serious enough to deal with data of that size should be able to follow my steps and extract other variables, time steps, etc.
With that:
- The data I made available on my google drive: https://drive.google.com/drive/folders/1NwteqBEd4WDAWlWNEfgjRX_FWeEwq1Di (if anybody has a better/more permanent link to host this data, let me know!)
- The library I used for it (‘umesh’ for ‘unstructured mesh’ library) is available under apache 2 on my gitlab page; https://gitlab.com/ingowald/umesh/
And as usual: Any comments/praise/feedback/criticism…. let me know. I’d be particularly interested in hearing from you if you actually use this data!
Cheers
Ingo
PS: If you end up using this data, please do not forget to properly attribute the researchers that made this data available; please check the corresponding info on the original data portal!