The Importance of Prose in Communicating Data

Tim Brock / Thursday, June 2, 2016

If you're a data communicator, having a good understanding of chart and table design is important. Thankfully, the art and science of creating effective charts and tables is the subject of a great number of books. (One of my favorites is Stephen Few's Show Me the Numbers.) This doesn't, however, mean that how we use ordinary prose - spoken or written - should be ignored.

About a year ago I wrote an article here titled "7 Do's and Don't of Dataviz". The first of those seven things was "Don’t use a chart when a sentence will do". Ryan Sleeper takes the opposing view in this article:: "Even when the numbers are in units, you can likely tell that the first number is smaller than the second number, but it is challenging to consider the scale of the difference."

I'm not convinced by this argument for written text if the numbers are formatted consistently, with commas (or points) as thousand separators. The varying number of digits and separators make it fairly obvious when numbers differ by a couple of orders of magnitude. (Aside: this is also why you should right-align numbers in tables (or align on the decimal point where possible and applicable).)

Better still, with written prose we can be explicit about large differences: "Value 1 (4,500,000) is 150 times larger than value 2 (30,000)." In a single sentence we have expressed a pair of values precisely and provided a simple comparison that is really easy to understand: the first entity is 150 times the size of the second.

Even if you don't like the use of parentheses or you don't want to type out all the 0's in 4,500,000 there are plenty of other ways to create a sentence that clearly conveys the difference between two very different numbers. In most cases you'll probably be looking at real entities or concepts rather than completely abstract numbers so we'll imagine a scenario: "With a salary of $30,000, Bob would have to work for 150 years to earn the $4.5 million that Aaron earns in just one twelve-month period." While you can see there's a massive difference between the wages of Aaron and Bob in the bar chart below, it's pretty tough to get the factor of 150 from comparing the lengths or end-points of the bars alone. You might think "Wow, Aaron earns a lot more than Bob" but you're unlikely to get the bit where it'd take Bob a century and a half to earn as much.

Of course, neither the descriptive sentence nor the bar chart tell us anything about why Aaron earns so much or why Bob's salary is more modest. Neither really tells us why we should care either. Two numbers are different. So what?

Well there might be a good story in the difference. They could be twins or friends who made slightly different life choices with huge consequences. Or we could "just" be comparing a company CEO and someone further down the company's pyramid. As I keep saying, context is key when it comes to conveying data. Even if you really insist that your two data points require a chart, chances are that to convey the proper context and make a real impact you'll need to provide some descriptive information that doesn't have a natural chart form. So it makes sense to take some time to think about prose whether you listen to my earlier advice or not.

I'm not saying we can't enhance charts by adding more visual cues that help with context, if we have relevant information available. The chart below is the same as above except with a reference line for the median salary for the made-up company I've decided they work for. Now we can see that Bob's salary is much closer to the median than Aaron's (as you might expect).

Let's try to put the salient details from the chart above in a sentence or two: "Bob's salary of $30,000 is 80% of the company median of $37,500; he would have to work for 150 years to earn the $4.5 million that Aaron earns in just one twelve-month period." From this we learn precisely what Bob's salary is, how Bob's salary compares to the median, precisely what the median is, how Bob's salary compares to Aaron's salary and what Aaron's salary is. The only information we're not given directly is how Aaron's salary compares to the median. But since we know Bob's salary is similar to the median but two orders of magnitude less than the Aaron's, it should be fairly evident that Aaron's salary is two orders of magnitude greater than the median.

Using prose to communicate data effectively isn't just about picking the right number formatting and sentence structure. You also need to use the right units for your audience: I could tell you that the Andromeda galaxy is about 780 kiloparsecs away, but unless you've taken an astronomy course you're unlikely to feel better informed. If I told you it was two and a half million light years from Earth you might at least be inclined to think "oh, that's a really long way". More people will understand that light travels a long way in a year than will know that a parsec is the distance at which the mean radius of the Earth's orbit subtends an angle of one arcsecond and that a kiloparsec is a thousand times that distance. Don't be afraid to use unusual units of measurement, perhaps alongside conventional ones, to provide context when you expect your audience will lack domain expertise.

Take the time to think about how best to express your key values, which set of units to express them in and how to make comparisons easier. Turn to charts when you have something important to convey that can't be said in a few words and when you want to highlight patterns, trends and differences rather than provide precise values (where a table should be used instead). Whichever option you go for, remember to provide context.

Try one of our most wanted features - the new XAML 3D Surface Chart and deliver fast, visually appealing and customizable 3D surface visualizations! Download Infragistics WPF 16.1 toolset from here.