How To Data Journalist
Making sense of numbers is a valuable skill.
Working as a data journalist is a cool job, or at least, it used to be. đ Some fun things you can do include analysing every stock in China to get to a story nobody else could, showing how every person in Hong Kong gets ripped off by pension companies, or simply taking repetitive and boring tasks and automating them so your time is freed up to do other stuff.
Here's how you can do it.1
Prerequisites
You will need to know mathematics, including how to add, subtract, multiply, divide, calculate a percentage and handle exponents. If you cannot, go to Khan Academy and refresh your memory.2
You will need to know Excel, including many keyboard shortcuts. The best way is Wall Street Prep, who offer a short and extremely robust online training course for $39. You will need to memorise many commands to be able to wrestle with a spreadsheet on deadline and this is a great way to do it.
If the previous two paragraphs are provoking anxiety or panic, I recommend reading A Mind For Numbers. You can absolutely do this.
Also, I know you want all your stuff for free, but spending money on programming stuff is usually well worth the entry fee in terms of time saved and potential career advancement.3
Now the bad news. Excel is a good way to present raw data to people in a newsroom who donât code. But it is increasingly becoming obsolete as a skill for the job. And enough people know how to use it how to use it that it won't really distinguish you in any sense. It also breaks all the time.
You Need To Learn Python (or R)
Yep. Go big or go home.
There are plenty of resources to learn to use Python. DataCamp's Data Science modules and Udacityâs Python Bootcamp are decent options.
For those that know Python already, if I had to pick one course that has the most useful practices for simply getting your head around data, itâd be Exploratory Data Analysis in Python by Allen Downey.
The best skill you can learn for data manipulation once you are up to speed with Python is pandas. The best resource to learn this library is the book Python for Data Analysis by Wes McKinney, who created the pandas project.
One of the other most useful skills you can learn is webscraping. This is a means of peeling information out of websites which has reached the profession through the digital advertising world. Paul Bradshawâs Scraping for Journalists is the best guide.
However, it's increasingly well-known within newsrooms and doesn't give you much of an edge these days as a reporter. Additionally, some editors do not know what it entails and will require some hand-holding. There are legal risks attached.
I am sceptical about investing in R instead of Python.4 Consider this graphic from Stack Overflow Trends:
Maybe this just means that Python users get confused more. But the fact that there are almost as many queries for pandas as R should tell you a lot about the size of the community. Here is a breakdown of pandas vs dplyr, for those interested.
My Single Best Bit of Advice for News Graphics
Usually there are only two things you can put in a chart: A change over time, or how one thing is different from all the others. Donât include any more information than that or you will completely overwhelm your reader.
A Word of Caution
A former editor of mine used to say that the most dangerous thing in the world is a journalist with a calculator.5
đ¶ Musical Interlude đ¶
Now Do It On Deadline
Alright! Whoâs ready to move fast and break things?
Most of the time you don't need to code anything hugely fancy to achieve what you need to, since as a data reporter you will always be trying to bash something out fairly quickly.
An experienced programmer will smoke a good journalist when it comes to writing code in a hurry.
There are a few ways the lazy hack can speed up. The first is to use a pencil and paper. Write out the steps in a longhand explanation that makes sense to you, then erase the pseudocode you have created and substitute the actual code you need. Only then approach your computer.
Any useful functions or snippets of code you use regularly should be saved either in a text file that you can pull up quickly as a reference, or as macros. The less you can rely on your memory while on deadline, the better.
Make your functions and variables easy to understand to a reader of your code, because writing readable code is a bit like producing a news article, in that you should try to keep it as simple as possible. This will make life easier later for others reading the code, and yourself.6
Journalists often use the word âTKâ, meaning âto comeâ, which is a bit of placeholder text for when you canât quite TK a word but donât want to break the flow of your writing to look it up. It is just as handy when typing code as it is for drafting a story. Just remember to go back and fill in your TKs afterwards. Ctrl-F (or Command-F) is your friend here.
Do It On Deadline, Seriously
Don't sweat too much about optimising production code, big O notation or any of that. You're not running a search engine handling billions of queries a day. Most of the time, you're simply trying to calculate changes in stock prices or scrape a few dozen webpages. Just get it to work.
But there are a few critical ways that coding in a newsroom differs from other types of programming work because of time constraints.
The first is that your workflow is going to be totally messed up by breaking news. Programming is time-intensive and requires your full attention to do well, and itâs much harder to do if you have a TweetDeck window open or you need to jump into a news conference every half hour. Paul Graham wrote over a decade ago about how codersâ productivity gets thrown around by meetings - itâs good advice, and worth considering within your own newsroom. You may wish to carve out dedicated time for this purpose.
Also, this happens a lot:
It is crucially important to build in sanity checks to your own data, because unlike in most software development you do not have the luxury of being able to release a patch to fix bugs in your news stories. The goal is to make sure your addled reporter brain is not leading you astray without you realising.
Make sure you backtest any calculations using an alternative method. For example, if you are calculating how much each line item of a councilâs spending contributes to their overall budget, you want to make sure that the sum of those numbers totals 100%. If it doesnât, either you have made a mistake, or there is something funky going on. Both require investigation.
Editors absolutely should take an interest in any and all code their reporters have written, just as they should double-check any calculations in their story. Failing to do so is professional malpractice!
For reporters, I cannot overstate the importance of having a good editor. When you write the story, find an editor who knows their way around a spreadsheet and can kick the tyres on your mathematics. Make life easier for them by explaining the logic in as clear a manner as possible. The more lucid you are in your own thinking about this, the better.
The skillset for data journalism is different from editing copy, so you should also aim to collaborate as widely as possible. If youâre working on a big project, be sure to identify someone who can check your code early in the process. The best way to find these people is often to ask the graphics team.
You can maintain relations with the desk by cheerfully mentioning that your code mysteriously got a lot better after you connected it to your homebrew artificial intelligence algorithm following a long night of Red Bull and programming. I wouldnât necessarily recommend it.
Use the Right Tool for the Job
Itâs best to simplest method you can. Take market data as an example.
There are lots of good financial information providers on the market. I like the Bloomberg Terminal because it has tab autocomplete rather than space autocomplete like Refinitivâs Eikon. Both are extremely expensive, and if you are not trading credit default swaps, or whatever, you may wish to explore alternatives.
Once you have learned Excel, you should immediately figure out how to use the Google Finance API to call up share prices. This can be accessed through Sheets and gives you access to live data from a bunch of different markets. You can also do some rudimentary webscraping this way.
These days I mostly call up stock quotes and index values from Yahoo! Finance or bond yields and economic data from FRED using pandasâ datareader library.
This can be done on your command line without faffing around on a web browser. It is also free. I told you learning pandas was worth it.
Copy and paste this into a terminal window and away you go:
pip install pandas-datareaderI leave this as an exercise for the motivated reader.
I Hope This Has Changed
Here is another note of caution for aspiring data journalists.
Over the years, a substantial and growing skills divide in newsrooms has emerged between those who are literate in data, and those who are not.7
Data is not always the best tool for the job, and it is certainly not a silver bullet. It is also well worth asking whether years spent learning code might be better spent honing your social skills, finessing your writing, or mastering Freedom of Information Act requests.
The result is that many journalists have not bothered to keep up with recent advances in technology. Iâm not fully convinced they are wrong, either!
To be fair, there are a lot of demands on newsroom staff battling against the 24-hour news cycle, not limited to partners, children, and needy colleagues with a record of making their lives difficult through poorly conceived and executed data projects.
Still, this places additional burdens of expectation upon data reporters.
Start pulling rabbits out of hats, and those editors who like you may view you as capable of doing the impossible. It becomes extremely hard to live up to this expectation. That might be one reason to focus on simpler projects. However, these can be harder to boast about.
You will also be viewed as a threat by people in the newsroom whose skills are increasingly worthless. You will lose a lot of credibility when you overpromise and slip up. Be prepared to get stabbed in the back as well.
This whole newsletter is a warning to others. There is no contest: Humans donât compete with algorithms. But they do compete with other humans, and they fight dirty.
I Would Like To End on a Nicer Note Than This
Data journalists, by and large, are some of the loveliest people around and are easy to reach on social media. They tend to be into open-source software, reproducibility, and showing their working. They often put their code up on GitHub. Be nice to them and they will be kind to you, and do try to give back to the community.8
The obvious answer to is âjust ask them out.â This joke gets old really fast if this is your job. Also, it might require British pronunciation rather than American. I canât remember which is which anymore. Anyway, this whole post should carry a health warning that itâs going to really heavy brogrammer energy, but you knew that already, right?
I regret to inform you that in almost every single newsroom I have worked in, there has been some reporter who did not know how to do this. Bloomberg News is the one notable exception, but then people there may just have been more ashamed of the fact.
Obviously, thatâs not a guarantee. However, I experienced this consistently from about 2012 until Covid-19.
The Financial Times is big on R, and their data team does great work.
I have never asked him about whether or not he thought reporters should get into machine learning.
Yes, yourself. Chances are, you will have a hard time understanding your beautiful code when you have to revisit it six months down the line unless you have written comments explaining what it does. And you will want to reuse code.
If itâs not clear, thatâs my ego talking.
The other joke you can use is: âSure you can data journalist, but remember they're not marriage material!â




This article was much more informative than I was expecting. Quite interesting to get a journalists take on software stuff, as a SE myself I don't think I realize how many professions are starting to dip their toes into the waters.
"Be prepared to get stabbed in the back as well."
I am dying to learn more about this ...
I need the W's.