Big Thoughts About Big Data

Before we go into Big Data, let’s talk about machine learning. Machine learning is a branch of Artificial Intelligence (AI) that works to build systems that are able to automatically improve with more experience, more data. Once designed, they don’t need to rely on explicit programming. Through identifying patterns in the data, they are able to improve the accuracy of their predictions on their own.

Machine learning algorithms have been around for a long time; however, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. We are facing a data revolution where the volume, variety, and velocity of available data has grown exponentially (Big Data). This growth in data has allowed for Big Data and machine learning to come together. As Machine Learning needs large amounts of data to “learn”, Big Data has played a Big role.

It has made it possible to produce models that can analyze bigger, more complex data and deliver faster, more accurate results on a very large scale. Systems are able to learn user’s interaction patterns, their interests, their likes and dislikes, and more. It has made it possible for every user of such systems to have optimized user interactions with individualized experiences.

Big Data and machine learning is changing and evolving the world around us.

Just a few years ago, if I wanted to discover new music, I would read through pages of Pitchfork and create playlists for myself based on my readings. Now, all I have to do is go onto my Spotify. There, fresh new playlists with songs that I’ve never listened to, but based on the type of music that I have been listening to, are created for me every week.

I appreciate Big Data and machine learning for moments like these. They’re allowing for more efficient human-centric designs that truly understand the users’ wants and needs.

However, something that I’ve been thinking about lately is the impact of our society on machine learning. Since the data that systems are collecting and analyzing are human behaviors, the algorithms that are based on them will inevitably be an extension of those behaviors. Not just the innocent ones, but the violent ones too.

For example, in the aftermath of Michael Brown’s death, St. Louis police turned to HunchLab, a predictive policing software that identifies “hot spots” where crimes are most likely going to occur. They turned to the software because they thought it would help them be more objective in their policing, but something that they didn’t consider was that HunchLab was learning from previous crime reports that hold histories of bias. The software learned to identify majority black and brown neighborhoods as high-risk “hot spots” to police. And because it kept sending police to those areas, they reported more crimes from those areas and caused a feedback loop of over-policing just majority black and brown neighborhoods.

It’s examples like this that make me wonder, how can systems be built to protect against learning discrimination? Can they be built?