data skeptics

Data Skeptics: How Big Data Explains Global Conflict

Image courtesy of rosefirerising via Flickr

When it comes to mapping data to form conclusions, the world has a problem. It’s not that we don’t have enough of data — we’re rolling in it, actually.

The issue is that people aren’t asking enough questions to determine the authenticity of datasets’ usage. We assume that because the visualization is pretty — bedazzled with colors and interactive buttons, more often than not — it must be accurate too.

This is especially true when it comes to mapping and forecasting world conflict, says Kalev Leetaru. Leetaru is the founder of GDELT: a global-scale project, supported by Google and other partnerships, that captures and maps real-time data of over 30,000 global events daily.

Using data to report and forecast global events and their reactions isn’t new. But it’s something often done based on limited data that isn’t representative enough to tell an accurate story.

In the most recent NYC Data Skeptics meetup, Leetaru explains why that is — and how GDELT (Global Data On Events Location and Tone) hopes to do better.

Speaking the language

There’s a ton of information that data scientists can scoop up, analyze, and map to paint a picture of global events and their impact. Much of this vital data is carried and contextualized by language.

The problem? The world is large, and one language alone is but a fragment of the whole. Even so, some researchers rely on English news articles, papers, and social media postings alone to derive meaning. This is only a slice of the bigger picture.

To avoid this common error, GDELT ingests as much information as possible over 100 languages, and applies algorithms to extract information. This way, the results aren’t biased to Western sources.

Probing the news

[contextly_auto_sidebar id=”ks2QRR0heXCV4Z5O7u7h3bEJsHqUuejY”]Speaking of language, where does it come from? Many datasets including GDELT’s take a close look at global news sources to scoop up data. This becomes more effective the more are included, and the better facts and datasets can be sensed and extrapolated from raw articles.

Unfortunately, as Leetaru notes, the news does not always accurately report on events. And even more unfortunately, sometimes it fails to do so at all. The news is therefore an imperfect source, albeit a valuable one.

For example, relying solely on Western news sources can completely erase conflicts in locations that aren’t being touched by the mass media. When data is inclusive of local reporting and foreign language reporting, however, the accuracy is better and the value higher.

Using hyper-local, geolocated data gives researchers a look beneath surface rhetoric. For example, Western outlets may say there’s been a ceasefire in Crimea, but local hotspots may show that beneath talks of peace, clashes remain.

The news can also, as we know, be filled with fluff and bias. Which is why GDELT applies its algorithms to extract only facts and events from reports.

Feeling the feels

Another factor that we can now quantify as data, strangely enough, is emotion.

This may seem difficult to extract algorithmically, but there are ways. GDELT in particular applies 24 different sentiment dictionaries to define 4,500 emotions and themes.

Even this expansive approach, which is not utilized by all researchers, has its imperfections. For example, when it comes to conflict, sentiment depends a lot on what side you’re on — so death and destruction would evoke positive emotions on the winning side, and negative for the losers.

Another issue is that emotions and themes can be easily lost in translation between languages.

Western spin and context

Leetaru notes that the Western media, and America in particular, paints a misleading narrative when reporting global conflict. It often goes a little something like this: global issue emerges when it affects Americans, a solution is proposed, then the issue fizzles out.

He uses the example of Ebola to demonstrate this tendency. Both before and after Western media reported on the epidemic, the disease was killing thousands of people.

But reporting was driven by Ebola’s intersection with Americans, and eventually cycled out of the news when Western treatment “prevailed.” This framework isn’t exactly accurate, but it serves what TV viewers want and expect.

Such narratives tend to be myopic, especially in the long run. That’s because reporters and people tend to think in the moment, only — we forget about the past, and thusly, miss overall patterns of events.

So while to some, it may seem like there are more riots than ever, data can frame today’s events within historical context to portray the larger pattern of highs and lows — and provide an early warning system that flags emerging issues in real time.

To avoid the spins of narrative, GDELT extracts cultural framework beyond just the news, including past events dating back to 1979. It scoops up academic literature, Amnesty International reports, photo imagery and social media postings to extract deeper meaning.

The resulting platform is open to all for research and analysis of global society; datasets can be used for commercial, academic, or governmental purposes without fee. It has huge implications for the military, especially, which can use it to glean intelligence from across the globe.

The importance of skepticism

As data gets denser and its visualization gets prettier, it’s easy to believe conflict forecasts and maps of feelings and events. Leetaru has called out a bunch of well-regarded researchers and computer scientists, however, for being selective and misleading in their projects.

It all comes down to asking questions to cut through the hype and evaluate conclusions. The more diverse and broad the dataset, the more accurate results tend to be. The more open the data, the better basis you have for trust.

When it comes to understanding conflict, GDELT is a frontrunner in just such breadth and detail. With updates every 15 minutes and 300 million events covered throughout decades worth of open-source data, it’s essentially a real-time catalog on global events. It quantifies society in a way previously thought impossible.
Is it enough to predict the next world war? Maybe not. But if there are any clues, odds are they will be picked up and processed, along with everything else.

We measure success by the amount of understanding we deliver. If you could express it as a percentage, how much fresh understanding did we provide?
Jennifer Markert