How and why (NOT) should we embrace big data How and why (NOT) should we embrace big data? A reflection from the aspect of epistemology Prof. Chengshan (Frank) Liu Institue of Political Science, NSYSU 2018.5.3 @Dept. of Political Science, NCKU
fact, truth, reality, knowledge, or… ? What is PHD for? fact, truth, reality, knowledge, or… ?
What is “big data”? 5Vs: Big volume, velocity, variety, veracity, and value. Honestly, this term has gone out of fashion.
What do scholars mean by saying “big data”? In our field ”data-driven” and “method-driven” research works are labelled as “big data” studies. Methods that are associated with “big data” Text-mining (文本探勘), data-mining (資料探勘), automatic content analysis (自動內容分析), computer-assisted text analysis (電腦輔助文本分析), automatic annotation (自動附記), sentiment analysis (情緒分析), geographic information system (地理資訊系統) network analysis (網絡分析)等等。
Check out his upcoming talks May 29-30 @NTU 圖片來源:http://ppt.cc/Aqutw 國外關於大數據應用於政治學研究的出版以Gary King為主帥。其他文獻也大都或多或少受過Gary King所帶領的研究群之影響與啟發,儼然成為Gary King學派。Gary King在哈佛大學社會科學量化研 究院(Institute for Quantitative Social Science, IQSS)中,鑽研如何使用不同的研究方法與量化工具推進 社會科學研究。 Check out his upcoming talks May 29-30 @NTU
King’s Purposes of embracing big data Evaluate public policy understand what social posts say estimate the causes of death, ensure fair legislative redistricting, reverse engineer Chinese government’s censorship program, forecast elections and international conflict
主題一:資訊工具在社科(政治)應用概論 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” 2012. “Social Science Research Methods in Internet Time. 2014. “Restructuring the Social Sciences: Reflections from Harvard’s Institute for Quantitative Social Science.” 2015. “Computer-Assisted Text Analysis for Comparative Politics.” 2015. “No! Formal Theory, Causal Inference, and Big Data Are Not Contradictory Trends in Political Science.” 2015. “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” 2015. “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites.” 2016. “Machine Translation: Mining Text for Social Theory.”
主題二:公共言論趨勢之辨識或追蹤 2008. “Recognizing Citations in Public Comments.” 2008. “Parsing, Semantic Networks, and Political Authority Using Syntactic Analysis to Extract Semantic Relations from Dutch Newspaper Articles.” 2008. “Good News or Bad News? Conducting Sentiment Analysis on Dutch Text to Distinguish Between Positive and Negative Relations.” 2008. “Media Monitoring by Means of Speech and Language Indexing for Political Analysis.” 2012. “Media Coverage in Times of Political Crisis: A Text Mining Approach.” 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” 2014. “Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data.” 2017. “Critical News Reading with Twitter? Exploring Data-mining Practices and their Impact on Societal Discourse.”
其他主題(三~五) 主題三: 政治立場的辨識/追蹤 2003. “Extracting Policy Positions from Political Texts Using Words as Data.” 2008. “A Scaling Model for Estimating Time-series Party Positions from Texts.” 2014. “Scaling Politically Meaningful Dimensions Using Texts and Votes.” 2015. “Quantifying Social Media’s Political Space: Estimating Ideology from Publicly Revealed Preferences on Facebook.” 主題四:政治言論的管制策略 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” 2013. Media Commercialization & Authoritarian Rule in China. 2017. "How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument." 主題五:公共政策形成之探討 2005. “Using Geographic Information Systems to Study Interstate Competition.” 2014. “’Big Data’ in Research on Social Policy.” 2015. “Analyzing Big Data: Social Choice and Measurement.”
其他主題(六~八) 主題六:政治言論的語意分析 主題七:政治選舉的運用 主題八:國際關係研究 2008. “Automatic Annotation of Semantic Fields for Political Science Research.” 2015. “Uncovering Social Semantics from Textual Traces: A Theory Driven Approach and Evidence from Public Statements of US Members of Congress.” 主題七:政治選舉的運用 2014. “Political Campaigns and Big Data.” 2017. “The Pulse of the People: Can internet data outdo costly and unreliable polls in predicting election outcomes?” 主題八:國際關係研究 2012. “Richardson in the Information Age: Geographic Information Systems and Spatial Data in International Studies.”
Why (not) big data? Your epistemological and methodological stances and attitudes toward methods decide how you evaluate (if not distain) “big data”.
From Big data to Data science “Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” ~ Wikipedia
How positivists look at “big data”? Evans & Aceves (2016) “Machine Translation: Mining Text for Social Theory.”
Let’s look at the whole thing from the right angle: data-assisted meaning netting 資料輔助的意義織造 大數據的實作告訴我們,既然知識目的是探索。那就專注在在發現,而 不(必)在驗證。資料數據可用於發現關聯,更可用於探勘意義。不妨 先辨識自己有興趣的概念或面向(什麼價值、什麼行為、什麼態度?), 再透過資料進行探索。一面辨識出不同價值、態度、行為之間的可能關 係,一面與自己的預期關係進行對話。最後再來進行意義的詮釋。 Let’s make our exploration DAMN right.
Data science for extracting facts and Discovering meaning fact vs. truth vs. reality vs. knowledge
March 2016. Google watched how people use a phone in a van for over an hour at a time. Goal: complete interviewing 500 people.
Reflections from the Humanities Holmes, J. (2015). Nonsense: The Power of Not Knowing (First Edition). New York: Crown Publishers. 《無知的力量》 Lindstrom, M. (2016). Small Data: The Tiny Clues That Uncover Huge Trends. New York City: St. Martin’s Press. 《小數據獵人》 Madsbjerg, C. (2017). Sensemaking: The Power of the Humanities in the Age of the Algorithm. New York, NY: Hachette Books.
Meaning nettng Blackburn, S. (2012). What Do We Really Know? The Big Questions in Philosophy. London: Quercus. Cohen, L. H. (2013). I don’t know: In Praise of Admitting Ignorance. New York: Riverhead Books. Holmes, J. (2015). Nonsense: The Power of Not Knowing (First Edition). New York: Crown Publishers. Madsbjerg, C. (2017). Sensemaking: The Power of the Humanities in the Age of the Algorithm. New York, NY: Hachette Books. Sesno, F., & Blitzer, W. (2017). Ask More: The Power of Questions to Open Doors, Uncover Solutions, and Spark Change. New York: AMACOM. Zarkadakis, G. (2016). In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence (1 edition). Pegasus Books.
DamN Methods
資料 Taiwan Election and Democracy Studies 2016 Data Collection Period: 2017.1.17 ~ 4.28 N=1,690 $$$: > NTD 1,000,000
無政黨支持傾向者的樣貌
藍綠支持者的樣貌
不是手段上的量化vs.質化,也不是大數據vs. 厚數據 而是研究者心中資料-意義之間的對話 Conclusion 不是手段上的量化vs.質化,也不是大數據vs. 厚數據 而是研究者心中資料-意義之間的對話
How do I re-evaluate “survey” ?
你有想過,台灣民眾對於「獨立」的定義有很多種,而且很可能沒有什麼共識嗎?
Smilepoll.tw A quali-quantative platform of collecting preferences, patterns, and values for netting data and meaning.
Conclusion: How and why (NOT) should we embrace big data? Exploring new patterns via big data is the spirit of data science. (So think again what political science means.) Different epistemology camps see different uses of big data. (Which side will you take?) “Meaning mining with data” is the consequences of the above way of thinking Data size matters much less than purposes of using data. Learning new data analytical tools will help you get connected to the world of exploring patterns and facts via data. But be fully aware that we should locate our purposes first.