How Big is Big Data? Try 500,000,000 Records, and Counting.
It starts with a single consumer who’s drinking your wine.
They can be anywhere in the world, and they can be at home or in a restaurant or standing in the aisle of a retail outlet. The point is that this is where the data starts: with one person.
That one person has bought a wine, and they’re engaging some digital platform in order to document it for themselves or to share it with their friends. Those moments are when the data scale begins to tip, because that one person joins the chain of wine consumer behavior that is very quickly hundreds of links long.
Soon, simply because wine is dynamic and interactive and so too is the digital nature of things, those links multiply – to thousands, to hundreds of thousands, to millions, to hundreds of millions…
To billions.
That scale is where we find ourselves this week.
In other words, neck deep.
This week we’ve been working on a project in Europe. It involves several data sources and over 10 million records, just for starters.
Here’s one of the things that we want to do with that data: analyze consumer sentiment, which means breaking down user reviews into terms – each one a link in the wine consumer chain – that are analyzable algorithmically.
They’re words that an everyday consumer, your end consumer, uses to describe your wine. When each of their words in a consumer review is a data point, it’s also a new link in the chain we’re analyzing.
It’s about turning unstructured data (free text) into structured data. Some people call it natural-language processing (NLP) – a branch of Artificial Intelligence (AI), which looks for the meaning of what consumers are saying and convert it to structured, mine-able data.
Let’s say that the 10 million records we’re starting with each contains a modest 10-word review. That’s 100 million data points.
When those words are in three different languages – English, German and Italian, in this case – the number of data points expands by another factor.
As of this writing, we’re processing more than one half of a billion (500,000,000) data records, and that’s just with one project.
The result is that we can confidently link the consumer sentiment, in their own words, to specific wines, brands, regions, varietals, and competitors. The number of data points continues to expand, and expand some more.
As I said, we’re neck deep.
We’ve had to make some adjustments, structurally speaking.
We had to enhance our infrastructure. We had to switch over to what’s called a data lake, which is a storage repository that holds a vast amount of raw data in its native format. And we’re harnessing the power of machine learning to do what we need to do to fulfill our promise to our clients.
Big data is big, right? But what we all need to remember is that it begins and ends with that one single consumer who’s buying and drinking your wine.
How can we apply the power of big data to help you reach that person?
We’ve got some ideas. Please be in touch, and let’s talk about it.
Thank you as always for reading.