Sunday, November 10, 2013

Big is not necessarily better: Issues when using big data analysis on social media platforms


An MIT Sloan Management Review article caught my eye this week as I was doing some research for my group project. The article, “The Pitfalls of using online and social data in Big Data Analysis,” discusses a provocative finding, made by two professors from Princeton and North Carolina (Chapel Hill). They say that “inferences based on how people use social media platforms like Twitter and Facebook should be reconsidered” because they are “skewed samples.” In other words, data analysis on these platforms has real issues in terms of veracity and applicability in assessing and predicting social behavior. As such, this has important ramifications on the digital marketing landscape, as it questions whether we can really rely upon large scale social data analysis for advertising and marketing targeting and segmentation. 

Why does the research call Twitter and Facebook “skewed samples”? Well, Twitter is used by 10% of the US population, but a pretty specific segment (not a representative sample across all demographic groups) and Facebook users, while a wider sample, are likewise not representative due to race, gender, and class bias. In addition, the research questions the practice common today of measuring and analyzing only actions that users take (such as how many people “liked” a Facebook status update or retweeted a message), with no corollary analysis to weigh how many people did not take that action. According to the research, this kind of isolated, single-method research which is “partial, filtered, distorted” may lead to not only misinterpretation but also fundamental misunderstanding.

The bottom line? Big is not necessary better. Or as Princeton Professor Tufekci put it, “more data does not necessarily mean more insight.” For would-be digital marketers, we need to remember this when looking at large scale data analytics conducted on social media and make sure we know the limitations of the data set being analyzed and the assumptions that are being made to produce the results we seek. 

No comments: