Get your point across by flattening it

As an example of the power of effective data visualisation, it’s hard to beat. Here’s a little background on the diagram that’s all over the internet.

The story behind the coronavirus ‘flatten the curve’ chartFast Company
The first instance of Flatten the Curve can be found in a paper called Interim pre-pandemic planning guidance: community strategy for pandemic influenza mitigation in the United States: early, targeted, layered use of nonpharmaceutical interventions, and no, it doesn’t exactly roll off the tongue. Published in 2007 by the CDC, the paper was a preview to a pandemic like COVID-19, and it suggested simple interventions like social distancing and keeping kids home from school in order to slow the spread of a disease so that the healthcare system could keep up. […]

flatten-a-curve-1

Pearce breathed new life into the CDC graphic. Then Harris added an anchor, a single line, that articulated its significance. But it was Dr. Siouxsie Wiles who took the final step: She demonstrated the possibility that everyday people really could make a meaningful difference in slowing the spread of COVID-19. To do this, she transformed the graphic into two futures, each caused by a mentality: ignore it or take precautions. Wiles transformed the graphic into the perfect response to the polarized nature of COVID-19 across social media, in which people were either in full prep mode or far too skeptical that the pandemic was even real.

flatten-a-curve-3

It’s not the first of its kind, though.

This chart of the 1918 Spanish flu shows why social distancing worksQuartz
The extreme measures—now known as social distancing, which is being called for by global health agencies to mitigate the spread of the novel coronavirus—kept per capita flu-related deaths in St. Louis to less than half of those in Philadelphia, according to a 2007 paper in the Proceedings of the National Academy of Sciences.

flatten-a-curve-2

Putting Covid-19 into perspective

Here’s another way of visualising the numbers connected with the coronavirus.

Just how contagious is COVID-19? This chart puts it in perspectivePopular Science
One quantity scientists use to measure how a disease spreads through a population is the “basic reproduction number,” otherwise known as R0 (pronounced “R naught,” or, if you hate pirates, “arr not”). This number tells us how many people, on average, each infected person will in turn infect. While it doesn’t tell us how deadly an epidemic is, R0 is a measure of how infectious a new disease is, and helps guide epidemic control strategies implemented by governments and health organizations.

If R0 is less than 1, the disease will typically die out: Each infected person has a low chance of passing the infection along to even one additional individual. An R0 larger than 1 means each sick person infects at least one other person on average, who then could infect others, until the disease spreads through the population. For instance, a typical seasonal flu strain has an R0 of around 1.2, which means for every five infected people, the disease will spread to six new people on average, who pass it along to others.

perspective

Here’s more on that.

What is the coronavirus’s R0 and why does it matter?Life Hacker
R0 is one of the numbers epidemiologists use to describe how an infectious agent spreads through a population. But it’s important to remember that it’s simply a statistic that describes some of the numbers we see. It’s not a rating of how scary a virus is, nor does it dictate how deadly a disease is or how difficult it might be to contain. We need more information for that.

And another way of comparing such things, from 2014.

Visualised: how Ebola compares to other infectious diseasesThe Guardian
Every disease has a basic reproduction number but the numbers are scattered across the literature. We’ve web-crawled and gathered them all here in one graphic, plotting them against the average case fatality rate – the % of infectees who die. This hopefully gives us a data-centric way to understand the most infectious and deadly diseases and contextualise current events.

Visualising our plastic problem

I’m sure I’m not the only one who has difficulty visualising large numbers. It can make the significance of some news stories hard to grasp, especially environmental ones.

By comparing the number of plastic bottles sold around the world to such things as a rubbish truck, the Eiffel Tower, and even Manhattan, Reuters have published a very effective way of getting across ridiculous statistics like 54,900,000 bottles sold every hour, 1,300,000,000 sold every day, and 481,600,000,000 sold every year. (via Cool Infographics)

Drowning in plastic: Visualising the world’s addiction to plastic bottlesReuters
Around the world, almost 1 million plastic bottles are purchased every minute. As the environmental impact of that tide of plastic becomes a growing political issue, major packaged goods sellers and retailers are under pressure to cut the flow of the single-use bottles and containers that are clogging the world’s waterways.

plastic-addiction-1

plastic-addiction-2

Leave us alone

Hot on the heels of Robot Day is Data Protection Day, initiated by the Council of Europe  in 2007.

Data Protection DayCouncil of Europe
The Council of Europe is celebrating this year the 14th edition of Data Protection Day. This initiative aims to raise the individuals awareness about good practices in this field, informing them about their rights and how to exercise them.

Joint statement by Vice-President Jourová and Commissioner Reynders ahead of Data Protection DayEuropean Commission
Data is becoming increasingly important for our economy and for our daily lives. With the roll-out of 5G and uptake of the Artificial Intelligence and Internet of Things technologies, personal data will be in abundance and with potential uses we probably can’t imagine. While this offers amazing opportunities, some cases show that robust rules are needed to address clear risks for individuals and for our democracies. In Europe we know that strong data protection rules are not a luxury, but a necessity. […]

20 months after the entry into application of the landmark General Data Protection Regulation, we see that the GDPR has acted as a catalyst to put data protection at the centre of many of the on-going policy debates. It is a cornerstone of the European approach underpinning several political priorities of the new Commission promoting a human centric approach to Artificial Intelligence and other digital technologies. European Data Protection rules will therefore be a foundation and inspiration for the success of key initiatives in artificial intelligence, health or mobility to name just a few.

Part of me wants to find out how our leaving the EU on Friday will affect this, but a larger part of me is too fed up with the whole stupid act of national self-harm to bother.

Happy “Data Privacy Day” – Now read The New York Times privacy project about total surveillanceForbes
The shocking thing about the obvious and growing loss of privacy is how unconcerned everyone is. Technologists started “snooping” around servers, desktops and data bases years ago to understand the status of hardware and software and how they should be managed. Enterprise snooping is still a best practice. But snooping is now central to entire national and global business models, and has emerged with a scary name: surveillance capitalism. No one predicted how pervasive snooping would become. No one predicted just how much profit snooping would generate, and no one predicted how entire populations would essentially shrug their shoulders about how they’re stalked each and every day – to make someone else money!

I’ve shared a number of articles about surveillance before, including one from The New York Times Privacy Project mentioned above, but there are many more to worry over.

Surprisingly (not really), Google doesn’t seem to be celebrating the day with a Google Doodle, although there is a prompt to complete a privacy check-up.

privacy-day

I quite like Protect Internet health and privacy with Mozilla’s internet health initiative, on the other hand.

Data detox: Five ways to reset your relationship with your phoneThe Firefox Frontier
We use our phones for everything from hailing rides to ordering in, and even to track our literal steps. All that convenience at our fingertips comes at a cost: our personal data and our mental health. It’s hard to be present in the moment when push notifications and texts are enticing us to look down. Meanwhile, the amount of personal data we share, many times without even realizing, can be alarming.

But not all hope is lost! Here are five simple steps you can take to protect your data and sanity.

Data disasters

Check out this interactive ‘balloon race’ data visualisation from Information Is Beautiful, of all the major data breaches from the last ten years. Billions of records.

You can choose to highlight the items by year or data sensitivity, and filter for different sectors like academic, governmental or the media.

World’s biggest data breaches & hacks

Our data problems could get a whole lot worse, and not because of hackers this time, but politicians.

A no-deal Brexit may trigger a data disaster, and UK companies don’t have a clue
In the event of a no-deal Brexit, the Data Protection Act will ensure that personal information processed in the UK will keep enjoying the same level of protection they do now. Still, under EU law, the UK will be automatically considered a third country not bound by GDPR rules, and able to diverge from the current strong standards if parliament so decides. Consequently, data from EU countries would not be able to flow freely to the UK.

“Things will remain the same for organisations residing in the UK, and who need to transfer data to the EU,” says Cillian Kieran, CEO of privacy start-up Ethyca. “But you won’t be able to gather data from the EU into the UK. This is an issue for any company that processes information at any level.”

Poor performance

For such a small number, a school’s Progress 8 score can be quite a big deal. So the last thing we need is an exam board messing up the performance tables process by not sending complete data to the DfE.

Progress 8 error in performance table checking after BTEC gaffe
Peter Atherton, data manager at a school in Wakefield, told Schools Week some schools had received a “nasty surprise” when they went to check the website.

“It could be the case that, if all of these qualifications were missing for your school, that could affect your progress 8 score by quite a lot. Some schools are saying they’re -0.20 below what they were expecting.”

Gaffe is such an odd word, if you think about it. French, I guess. Would that make Pearson a gaffeur?

It is the second gaffe relating to BTECs to hit the exam board this year.

In August, Pearson was forced to apologise after it hiked grade boundaries for its BTEC Tech Awards just days before pupils were due to collect their results, meaning youngsters were handed lower grades than they were expecting.

Requires improvement, I’d say.

Excel errors are everywhere

I know that Excel is only trying to be helpful when it ‘corrects’ what it sees as formatting errors, but it really needs to pack it in.

An alarming number of scientific papers contain Excel errors
A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals [and] found that roughly 1 in 5 of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates or random numbers.

You see, genes are often referred to in scientific literature by symbols — essentially shortened versions of full gene names. The gene “Septin 2” is typically shortened as SEPT2. “Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase” gets mercifully shortened to MARCH1.

But when you type these shortened gene names into Excel, the program automatically assumes they refer to dates — Sept. 2 and March 1, respectively. If you type SEPT2 into a default Excel cell, it magically becomes “2-Sep.” It’s stored by the program as the date 9/2/2016.

A life in print

Last year, Facebook gave us the option to download all our data. Katie Day Good, an avid Facebook user since the early days, took them up on the offer and, perhaps because of her former interest in scrapbooking, decided to print it all out…

Why I printed my Facebook
Other files were less amusing. “Advertisers Who Uploaded a Contact List With Your Information” was a 116-page roster of companies, most of which I had never heard of, that have used my data to try to sell me things. The document called “Facial Recognition Code” was disturbingly brief and indecipherable, translating my face into a solid block of jumbled text—a code that only Facebook’s proprietary technology can unlock—about 15 rows deep. Some documents held secrets, too. “Search History” revealed an embarrassingly detailed record of my personal obsessions and preoccupations over the years. Crushes, phobias, people I have argued with and envied―this was the information I never wanted to post on Facebook, but instead had asked Facebook to help me find. This information, along with the facial recognition codes of my children (which were not included in the .zip file, but which I assume Facebook owns), is the data I most wish I could scrub from the servers of the world.

All told, my Facebook archive was 10,057 pages long.

The tree rings of US immigration

Here’s an unusual way of representing population growth. Pedro M Cruz, from Northeastern University in Boston, takes two centuries of US census data and shows the increasing population as rings of a tree, one for each decade.

For a radical new perspective on immigration, picture the US as an ancient tree
According to Cruz, the tree metaphor ‘carries the idea that these marks in the past are immutable’ and it ‘embodies the concept that all cells contributed to the organism’s growth’. As with so many renderings of US history, indigenous populations are conspicuously absent from the tableau. Still, Cruz’s skilfully deployed data doubles as a resonant work of cultural commentary, offering a rich and often surprising look at the ever-evolving makeup of the country.

There’s more information on the video’s Vimeo page.

Simulated dendrochronology of U.S. immigration (1830-2015)
Trees in their natural setting have annual growth rings that reflect varying environmental conditions; the rings’ forms are neither perfect circles nor ellipses. The algorithm is inspired by this variation and accordingly deposits immigrant cells in specific directions depending on the geographic origin of the immigrant. Rings that are more skewed toward the country’s East, for example, show more immigration from Europe, while rings skewed South show more immigration from Latin America. With this, it is possible to observe the quantity of immigration through the thickness of the rings. The color of the cells corresponds to specific cultural-geographical regions.

Re-thinking supposedly anonymous data

This is a little alarming.

Anonymised data isn’t nearly anonymous enough – here’s how we fix it
We developed a machine learning model to assess the likelihood of reidentifying the right person. We took datasets and we showed that in the US fifteen characteristics, including age, gender, marital status and others, are sufficient to reidentify 99.98 per cent of Americans in virtually any anonymised data set.

Some more examples.

The simple process of re-identifying patients in public health records
In late 2016, doctors’ identities were decrypted in an open dataset of Australian medical billing records. Now patients’ records have also been re-identified – and we should be talking about it.

‘Anonymous’ browsing data can be easily exposed, researchers reveal
A journalist and a data scientist secured data from three million users easily by creating a fake marketing company, and were able to de-anonymise many users …

“What would you think,” asked Svea Eckert, “if somebody showed up at your door saying: ‘Hey, I have your complete browsing history – every day, every hour, every minute, every click you did on the web for the last month’? How would you think we got it: some shady hacker? No. It was much easier: you can just buy it.”

Lancaster University’s student data stolen

University application processes are in full swing, but here is some reputationally damaging news from Lancaster University.

Lancaster University hit by cyber attack, hundreds of students’ personal data stolen
The full scale of the cyber attack was revealed yesterday (July 22), when university chiefs confirmed that hackers had breached IT systems and accessed student records … It said it regretted that the breach has led to fraudulent invoices being sent to some undergraduate applicants demanding large sums of money.

Two days later, and the police have arrested someone for it.

Man arrested over UK’s Lancaster University data breach hack allegations
Names, addresses, email addresses and phone numbers were among the categories of data visible to the hackers. Fraudulent invoices were sent to some, the university admitted. With overseas applicants (of which Lancaster had 575 last year from non-EU countries and 375 from other EU countries) paying fees measured in the tens of thousands of pounds per year, the potential for high returns is great.

Our sources added that around half a dozen students had paid these fraudulent invoices. The highest undergraduate fees for overseas (non-EU) students is Lancaster’s Bachelor of Medicine, Bachelor of Surgery (MBChB) course at £31,540.

It’s more than a little embarrassing, as Lancaster University is one of a number of universities offering degrees in cyber security

Cyber Security MSc – Lancaster University
In addition to the taught modules, you will also work on an individual research project, supervised by two academics from two of the four departments. Through this project, you will obtain an in-depth understanding of the theoretical and practical aspects of cyber security and technology. You will put the skills and knowledge you have developed throughout the year into practice and gain experience of tackling real-world cyber security issues.

Well, there’s a ‘real-world cyber security issue’ for you.

Known unknowns

An introduction to what promises to be a fascinating new blog from Anna Powell-Smith, “about the data that the government should collect and measure in the UK, but doesn’t.”

Missing numbers
Across lots of different policy areas, it was impossible for governments to make good decisions because of a basic lack of data. There was always critical data that the state either didn’t collect at all, or collected so badly that it made change impossible.

Eventually, I decided that the power to not collect data is one of the most important and little-understood sources of power that governments have. This is why I’m writing Missing Numbers: to encourage others to ask “is this lack of data a deliberate ploy to get away with something”?

By refusing to amass knowledge in the first place, decision-makers exert power over over the rest of us. It’s time that this power was revealed, so we can have better conversations about what we need to know to run this country successfully.

Excel timesavers

I sit and stare at Excel for a significant proportion of my day. I can’t believe I’ve not been aware of this simple trick with copying formulas without messing up cell references. It’s saving me an immense amount of time.

Copy Excel formula without changing cell references (or without file references)
It’s quite simple actually!

  1. Highlight the are you’d like to copy
  2. Go to Home / Find & Select / Replace (or press Ctrl + H)
  3. Search for = and replace with a text that’s not in your file – in this example I chose “notinfile” (note as mentioned in the comments in YouTube, you can also replace with ” =”, i.e. a space before the equal sign)
  4. Go back to Home / Find & Select / Replace (or press Ctrl + H) – search for your text – in my example “notinfile” and replace with =.
  5. That’s it!

Here are a few more tips and tricks.

10 easy Excel timesavers you might have forgotten
Microsoft has packed Excel with all kinds of different ways to get things done quicker. However, you can’t take advantage of these features if you don’t know about them. These ten techniques may only save you a few seconds every time you use them. That might not sound like much, but if you can integrate them into your workflow, you’re sure to reap the benefits over time.

A typical day, comically speaking

Via FlowingData, here’s a witty visualisation of how we spend our days, on average. It’s just a stacked bar chart, but turning it into a comic “can allow the audience to identify with the story, sparking self-reflection: “Is this how I live my life? How am I different?””

A day in the life of Americans: a data comic
There are three settings in this comic (a bedroom, an office, and a bar), each serving as a metonym for an activity (sleep, work, and leisure). I have also included colors and positions as redundant, but clarifying, codes of classification. Such scenes allow for a novel method of highlighting data; a setting inside a panel is “lit up” by a light source if the activity for which it stands occupied those two hours of Americans the most.

a-typical-day

Self-improvement

The Economist’s charts are usually very clear and helpful, but that’s not to say they can’t be improved – as they themselves show.

Mistakes, we’ve drawn a few
At The Economist, we take data visualisation seriously. Every week we publish around 40 charts across print, the website and our apps. With every single one, we try our best to visualise the numbers accurately and in a way that best supports the story. But sometimes we get it wrong. We can do better in future if we learn from our mistakes — and other people may be able to learn from them, too. …

Misleading charts
Let’s start with the worst of crimes in data visualisation: presenting data in a misleading way. We never do this on purpose! But it does happen every now and then. Let’s look at the three examples from our archive.

Mistake: Truncating the scale

self-improvement-1

Another data protection failure

Hot on the heels of Facebook’s latest password problem, TechCrunch has news of another online service with a very shoddy approach to data protection – i.e. there wasn’t any.

The app, Family Locator, allows families to track each other’s movements, similar to the location sharing option in Google Maps. But it seems the backend database for their nearly a quarter of a million users wasn’t protected at all.

A family tracking app was leaking real-time location data
Based on a review of the database, each account record contained a user’s name, email address, profile photo and their plaintext passwords. Each account also kept a record of their own and other family members’ real-time locations precise to just a few feet. Any user who had a geofence set up also had those coordinates stored in the database, along with what the user called them — such as “home” or “work.”

They tried to get in touch with the developer, React Apps, but to no avail.

The company’s website had no contact information — nor did its bare-bones privacy policy. The website had a privacy-enabled hidden WHOIS record, masking the owner’s email address. We even bought the company’s business records from the Australian Securities & Investments Commission, only to learn the company owner’s name — Sandip Mann Singh — but no contact information. We sent several messages through the company’s feedback form, but received no acknowledgement.

On Friday, we asked Microsoft, which hosted the database on its Azure cloud, to contact the developer. Hours later, the database was finally pulled offline.

What makes good governance?

In an attempt to get rid of the sour taste left in our mouths from yesterday’s post about the rise of populist politics, here are some more award-winning data visualisations via David McCandless and the Information is Beautiful people.

The winners of the World Data Visualization Prize
Conducted in partnership with the World Government Summit, the prize focuses on how governments are improving citizens’ lives. We asked entrants to use the power of data-visualization to illuminate data on the innovations and decisions – seen and unseen – that drive progress.

Here’s my favourite, an interactive overview of the different factors that contribute to happy countries (or not).

GOV|DNA — Discover the DNA of a good government
This interactive visualization enables the exploration of the DNA of a good government. You can analyze and compare multiple indicators to investigate their influence on countries and the related behaviour and performance of governments.

what-makes-good-governance-1

Where is everybody?

Each six months Andy Kirk of Visualising Data highlights some of the significant developments in data visualisation. It’s a great collection, but this one in particular caught my eye.

10 significant visualisation developments: July to December 2018
2. ‘Human Terrain’: A genuinely captivating project from Matt Daniels of ThePudding, ‘Human Terrain’ is a staggeringly detailed, explorable prism map of the world’s population that can trap you into browsing for far longer than you can realistically afford. It evokes memories of a classic graphic from 2006, created by Joe Lertola for Time magazine. There is also a wonderful companion piece, ‘Population Mountains‘, where Matt walks through ‘a story about how to perceive the population of cities’.

When you fly from one part of the world to another, it becomes very quickly apparent just how crowded some places must be, compared to others.

visualising-populations-2

Human Terrain: visualizing the world’s population, in 3D
Kinshasa is now bigger than Paris. Guangzhou, Hong Kong, and Shenzhen are forming an epic, 40 million-person super city. Over the past 30 years, the scale of population change is hard to grasp. How do you even visualize 10 million people?

visualising-populations-3

It puts those incredibly dense housing schemes in Hong Kong I mentioned earlier into context, doesn’t it?

Population growth, like charity, starts in the home, so here’s an animated chart on family sizes in the US.

How many kids we have and when we have them
The chart above shows 1,000 timelines, based on data from the National Survey of Family Growth. Each moving dot is a mother. Age is on the horizontal, and with each live birth, the dot moves down a notch. The green bubbles represent the total counts for a given age.

visualising-populations-1

It’s interesting to watch the chart populate. You’ve got to wonder about the stories behind those outliers though.

Statistically insignificant?

One of the dangers at just looking at the numbers.

Progress 8 scores for most schools aren’t that different
There were over 300 schools with P8 scores between -0.05 and +0.05 – a difference of over 300 rank places (10% of schools) between the highest and lowest scoring of them. But what do these numbers mean?

Let’s say the score for School A was +0.05 and School B was -0.05. Taking the numbers at face value, one interpretation is that if you picked two pupils with the same KS2 attainment, the two pupils would have the same grades in seven of the subjects included in Attainment 8 but the pupil from School A would have one grade higher in one and only one subject than the pupil in School B.

Is this an educationally important difference?

It depends?

And talking of Progress 8 confidence intervals…

statistically-insignificant

xkcd: Error bars

Bringing back postcards

Postcards are such simple things, really – just small rectangular pieces of thin card. No technology required. Perhaps that’s why they don’t seem as popular these days? But thanks to the Postcrossing project, I’ve been sending and receiving more and more—and from all over the world.

Postcrossing history
The Postcrossing project was created in 2005 by Paulo Magalhães as a side project when he was a student in Portugal. Paulo loves to receive mail and postcards in particular; from friends, family — or from anyone in the world. Finding a postcard in the mailbox always makes his day!

He knew more people shared the same interest, but there was no good way yet to connect everyone. And that’s how he got the initial idea of creating the online platform for this which he called Postcrossing. Its goal: to connect people across the world through postcards, independently of their country, age, gender, race or beliefs.

Here are some more postcard-related links.

Wish you were here? Postcards from the art world
“It’s possible to form a significant collection of extremely good and important works of art without being wealthy,” he says. “Anyone could decide to form a collection very close to mine with most of the same things – and I like that. It’s anti-exclusive.”

bringing-back-postcards-1

A postcard writing Rube Goldberg machine in a suitcase
As the sun regrettably sets on the art of letter writing, the inventive folks at design studio HEYHEYHEY have pieced together a clever contraption that promises to keep the art of travel postcards a thing of the present. Kind of. Melvin the Traveling Mini Machine is an elaborate Rube Goldberg machine that fits in a pair of suitcases that executes the simple task of “writing” and stamping a postcard of your choice, that is, if the absurdly elaborate sequence of steps goes off without a hitch.

bringing-back-postcards-2

Two women leading parallel lives are getting to know each other through data
Giorgia Lupi, who lives in New York, and Stefanie Posavec, who lives in London, are engaged in a long-distance, postcard-based data exchange in order to get to know each other better: “Dear Data.” They’ve only met in person twice, and they’re both interested in data, so they’re sending each other postcard drawings of data about their day-to-day lives.

Four Corners Books announces its next publication
Four Corners Books has just announced the next book in its Irregulars series titled, Leeds Postcards, a celebration of the independent postcard press. For the past four decades, independent postcard press Leeds Postcards has been making oppositional, inspiring images; activism by design. The cards are not of Leeds; the name represents a defiant rejection of the hegemony of London. The images cover a fascinating range of domestic and international politics, causes and campaigns, creating, in their own unique and graphically inventive way a record of the struggles as well as the progressive political triumphs from 1979 to the present day.