Written on 29 Jan 2023
Taking stock of my "data tool belt"
Twelve software projects and companies I use all the time as a working data journalist.
I’ve been doing data work for nearly a decade and, at this point, I’ve got a set of tools that I really enjoy and make me really productive. At least I feel that they do.
I wrote a recent post where I gushed about Datasette, a tool I use all the time. I likened it to my "data hammer" because it is always within reach of my work.
Well, now I want to talk about the other tools I use. In part to strain the metaphor and in part because I’ve never talked about the software I use day in and day out. There’s only one criteria for something to show up on this list: I have to use it all the time.
These tools are always close at hand because they’re that useful. But I also use these tools because they feel like magic. I can cast data alchemy spells with non-trivial geospatial analysis - the sort of feeling that hooks you, that makes you go “I guess computers aren’t all terrible.”
It’s a bit hyperbolic, but I love each and every software project on this list. And I’m super grateful to the hundreds of thousands of hours that have been put in to them.
Books have been written about how useful, and complicated, the D3 library is. And even though I don't use it so much for visualization anymore, it has some features that I just can't live without. Namely:
- CSV parsing and serializing
- Date and time parsing
- Scales and interpolation
- Basic math like sums and means
Web scraping tools
The thing for when I need promise based control flow. It makes it easy to run tasks in parallel and allows me to adjust the number of concurrent tasks going at once. Why is that cool? Because it means I can dramatically reduce the run time of some of my web scrapers.
A jQuery inspired API for parsing and manipulating HTML outside of a browser.
Scraping websites can mean you need to be able to automate a browser. There are a number of options to control browsers but I've found that I like puppeteer's API the most so it is the one I use.
Need to convert geographic data between file formats? Need to simplify it in the process? Or filter out particular features that you don't want to include in a dataset?
Mapshaper does it all and because it's a command-line tool it's easy to integrate into data pipelines.
Sounds simple but it turns out to be really powerful and is the crux of how I built a tool for California voters to see how redistricting affected their home address.
And sometimes you just need a GUI to do... well, something. I'm just a novice when it comes to this open-source geospatial power house but I use it a lot to explore new data by putting it on a map.
If I'm making a website, SvelteKit is the first thing I'm going to reach for. Not only does it use the very popular component model for building interfaces, but it also makes it very easy to keep API/data fetching code and UI code in the same repository.
Tools that are also companies
Github Actions are where all of my scrapers and recurring tasks run. It's easy to schedule scrapers, read the logs of the times they've succeeded and failed, and it can easily commit data to a repository.
I don't have the skill to run webservers, let alone the interest. I want my websites to be deployed whenever a change happens on Github and I want to be able to see when a deploy succeeded and when it fails. Netlify does it all.
Do you use any of these tools in your day to day work? Love them? Hate 'em? I'm super curious to hear from you.