Recently, Mark Zuckerberg released a lengthy, carefully-organized statement titled “Building Global Community”. In the statement, Zuckerberg outlines his vision of a prosperous, modern society and emphasises the role of Facebook in achieving that dream.
It’s an ambitious document, one that indirectly acknowledges many of the concerns people have about huge companies such as Facebook. For whose benefit is your experience being curated? How do we think about privacy in the age of so much passive data collection? For better or worse, it’s also a document that positions social media as a necessary–and powerful–global utility.
It is compelling to read such broad, future-focused commentary from a public figure like Zuckerberg. However, though it’s easy to get caught up in the mythology of the whole thing, I wanted to highlight one specific aspect of the statement: the role of artificial intelligence in understanding a world’s worth of noise.
In a section titled “Safe Community”, Zuckerberg sketches different ways AI can prevent acts of terrorism, identify bullying online, and otherwise foster a safer world: “Looking ahead, one of our greatest opportunities to keep people safe is building artificial intelligence to understand more quickly and accurately what is happening across our community.”
What’s interesting about this proposition is how it builds on work that companies (not just Facebook) are tackling right now. Teaching machines to “understand” social data–i.e., to automatically classify content by subject matter, sentiment, syntax, and other features–is a vital mission for large companies. Whenever you read something about, say, Twitter policing harassment or Facebook facilitating “fake news”, you can be sure that a legion of engineers are working behind the scenes to improve the scale, speed, and accuracy of their classification.
Where we are now
There are a lot of different ways that machines can be trained to classify content. At eContext, we start with curation, translating the subtleties of human interpretation into a massive set of formalized rules. While an English-speaking human would automatically understand that “my love life is a roller coaster” is not a literal statement, a poorly-trained machine classifier probably doesn’t know better. Therefore, we sculpt our technology with millions of tiny language rules until it is able to sort data with the common sense of a human.
Of course, the internet is far too broad and chaotic to organize by hand. That’s why we take that human curation and scale it with machine learning models. Say our classifier encounters a piece of content that can’t be understood with any of our existing human-created rules. Maybe it’s a brand-new product name or an unfamiliar viral hashtag. Machine learning allows us to compare the unknown content against all the data that’s been classified already–weighing features like the source of the data or the surrounding language–and arrive at the best possible answer. With minimal supervision to flag exceptions and make sure everything stays on course, we are able to scale our classifier to the speed and volume of social media while maintaining that human-like capacity for interpretation.
Where we’re headed
When different classification styles are used in tandem, automated systems can become much more sophisticated. For example, a system seeking to preempt real-life violence might look for syntax related to guns or other weapons, but incorporate a topic classifier to understand if the context is related to hunting, video games, or something else. The more features we can accurately identify, the more intelligent the overall system can become.
It’s worth noting that, while Zuckerberg’s letter touts AI as a solution for large-scale safety, tech companies like Facebook have plenty of other applications in mind. A safe community is a noble pursuit, but the technology we’re talking about here is also meant to help these companies target ads more efficiently, keep users on-site (or on-app) longer and more frequently, and obtain more lucrative data. By learning more about how data is obtained, labeled, and synthesized, businesses and users alike can take a more thoughtful look at the future of the internet (whether that ends up being Mark’s future or not).