{"id":1699,"date":"2014-03-22T16:14:02","date_gmt":"2014-03-22T05:14:02","guid":{"rendered":"http:\/\/blog.panicola.com\/?p=1699"},"modified":"2014-04-01T07:35:42","modified_gmt":"2014-03-31T20:35:42","slug":"flu-trends-fails","status":"publish","type":"post","link":"https:\/\/blog.panicola.com\/?p=1699","title":{"rendered":"Flu Trends fails&#8230;"},"content":{"rendered":"<ul>\n<li>\u201cautomated arrogance\u201d<\/li>\n<li>big data hubris<\/li>\n<li>At its best, science is an open, cooperative and cumulative effort. If companies like Google keep their big data to themselves, they\u2019ll miss out on the chance to improve their models, and make big data worthy of the hype. \u201cTo harness the research community, they need to be more transparent,\u201d says Lazer.<strong> \u201cThe models for collaboration around big data haven\u2019t been built.\u201d<\/strong> It\u2019s scary enough to think that private companies are gathering endless amounts of data on us. It\u2019d be even worse if the conclusions they reach from that data aren\u2019t even right.<\/li>\n<\/ul>\n<p>But then this:<br \/>\nhttp:\/\/www.theatlantic.com\/technology\/archive\/2014\/03\/in-defense-of-google-flu-trends\/359688\/<\/p>\n<p>&nbsp;<\/p>\n<p>http:\/\/time.com\/23782\/google-flu-trends-big-data-problems\/<\/p>\n<header>\n<h1 itemprop=\"headline\">Google\u2019s Flu Project Shows the Failings of Big Data<\/h1>\n<div>\n<div>\n<ul>\n<li><a itemprop=\"author\" href=\"http:\/\/time.com\/author\/bryan-walsh\/\">Bryan Walsh<\/a>\u00a0<a href=\"https:\/\/twitter.com\/bryanrwalsh\" target=\"_blank\">@bryanrwalsh<\/a><\/li>\n<\/ul>\n<p><time itemprop=\"datePublished\" datetime=\"2014-03-13 17:00:19\">March 13, 2014<\/time><\/p>\n<\/div>\n<\/div>\n<\/header>\n<section itemprop=\"articleBody\">\n<figure><img decoding=\"async\" itemprop=\"image\" alt=\"Google flu trends\" src=\"http:\/\/timedotcom.files.wordpress.com\/2014\/03\/140313-google-flu.jpg?w=1100&amp;h=734&amp;crop=1\" srcset=\"http:\/\/timedotcom.files.wordpress.com\/2014\/03\/140313-google-flu.jpg?w=560&amp;h=374&amp;crop=1 320w 2x, http:\/\/timedotcom.files.wordpress.com\/2014\/03\/140313-google-flu.jpg?w=1100&amp;h=734&amp;crop=1 800w, http:\/\/timedotcom.files.wordpress.com\/2014\/03\/140313-google-flu.jpg?w=1100&amp;h=734&amp;crop=1 800w 2x\" \/><figcaption>GEORGES GOBET\/AFP\/Getty Images<\/figcaption><\/figure>\n<h2 itemprop=\"alternativeHeadline\">A new study shows that using big data to predict the future isn&#8217;t as easy as it looks\u2014and that raises questions about how Internet companies gather and use information<\/h2>\n<aside data-name=\"rr-related\">\n<div><a href=\"http:\/\/time.com\/23495\/google-search-encryption\/\" data-event=\"rr-related\"><img decoding=\"async\" title=\"A sign is posted on the exterior of Google headquarters on Jan. 30, 2014 in Mountain View, Calif. \" alt=\"Google privacy concerns\" src=\"http:\/\/timedotcom.files.wordpress.com\/2014\/02\/google-privacy.jpg?w=560&amp;h=374&amp;crop=1\" \/><\/a><\/p>\n<h6>MORE<\/h6>\n<h3>Google Will Start Encrypting Your Searches<\/h3>\n<h3>Scientists Can Now Predict the Flu<\/h3>\n<h3>Google Will Pay You $15 for Each Business User You Get Hooked on Google<\/h3>\n<\/div>\n<\/aside>\n<p>Big data: as buzzwords go,<a title=\"Data\" href=\"http:\/\/www.nytimes.com\/2013\/08\/18\/sunday-review\/is-big-data-an-economic-big-dud.html\" target=\"_blank\">it\u2019s inescapable<\/a>. Gigantic corporations like\u00a0<a title=\"Big Data\" href=\"http:\/\/www.sas.com\/en_us\/insights\/big-data\/what-is-big-data.html\" target=\"_blank\">SAS<\/a>\u00a0and<a title=\"IBM\" href=\"http:\/\/www.ibm.com\/big-data\/us\/en\/\" target=\"_blank\">IBM<\/a>\u00a0tout their big data analytics, while experts promise that big data\u2014our exponentially growing ability to collect and analyze information about anything at all\u2014will transform everything from\u00a0<a title=\"EMC\" href=\"http:\/\/www.emc.com\/campaign\/bigdata\/index.htm\" target=\"_blank\">business<\/a>\u00a0to\u00a0<a title=\"football\" href=\"http:\/\/www.slate.com\/articles\/sports\/sports_nut\/2014\/01\/aaron_clauset_how_big_data_reveals_that_basketball_football_and_hockey_are.html\" target=\"_blank\">sports<\/a>\u00a0to\u00a0<a title=\"Wired\" href=\"http:\/\/www.wired.com\/wiredscience\/2013\/11\/a-new-kind-of-food-science\/\" target=\"_blank\">cooking<\/a>. Big data was\u2014no surprise\u2014one of the major themes\u00a0<a title=\"INC\" href=\"http:\/\/www.inc.com\/rebekah-iliff\/top-trends-from-sxsw-interactive-2014.html\" target=\"_blank\">coming out<\/a>of this month\u2019s SXSW Interactive conference. It\u2019s inescapable.<\/p>\n<aside data-name=\"rr-partner\">\n<div>\n<h6>MORE<\/h6>\n<p><a href=\"http:\/\/time.com\/25410\/ibm-nsa-letter\/\" data-event=\"rr-more\">IBM: We Haven\u2019t Given the NSA Any Client Data<\/a><a href=\"http:\/\/time.com\/21252\/twitter-data-reveals-which-days-make-us-the-happiest\/\" data-event=\"rr-more\">Twitter Data Reveals Which Days Make Us the Happiest<\/a><a href=\"http:\/\/www.huffingtonpost.com\/2014\/01\/03\/marijuana-overdose_n_4538580.html\" target=\"_blank\" data-event=\"rr-partner\">Here&#8217;s An Updated Tally Of All The People Who Have Ever Died From A Marijuana Overdose\u00a0Huffington Post<\/a><a href=\"http:\/\/www.huffingtonpost.com\/2013\/11\/20\/fast-food-truths_n_4296243.html\" target=\"_blank\" data-event=\"rr-partner\">These Disturbing Fast Food Truths Will Make You Reconsider Your Lunch\u00a0Huffington Post<\/a><a href=\"http:\/\/feeds.people.com\/~r\/people\/headlines\/~3\/Vpf82bpwYaM\/\" target=\"_blank\" data-event=\"rr-partner\">Kate Winslet Wows at First Post-Baby Appearance\u00a0People<\/a><\/p>\n<\/div>\n<\/aside>\n<p>One of the most conspicuous examples of big data in action is Google\u2019s data-aggregating tool\u00a0<a title=\"Google\" href=\"https:\/\/www.google.org\/flutrends\/us\/#US\" target=\"_blank\">Google Flu Trends (GFT)<\/a>. The program is designed to provide real-time monitoring of flu cases around the world based on Google searches that match terms for flu-related activity. Here\u2019s how Google\u00a0<a title=\"Trends\" href=\"https:\/\/www.google.org\/flutrends\/about\/how.html\" target=\"_blank\">explains it<\/a>:<\/p>\n<blockquote>\n<aside data-name=\"rr-mag\">\n<div>\n<h6>POPULAR AMONG SUBSCRIBERS<\/h6>\n<div><a href=\"http:\/\/time.com\/22993\/key-and-peele-make-fun-of-everything\/?pcd=pw-pas\"><img decoding=\"async\" title=\"\" alt=\"TIME Magazine Cover, March 24, 2014\" src=\"http:\/\/timedotcom.files.wordpress.com\/2014\/03\/ideascover.jpg?w=560\" \/><\/a><\/p>\n<h3>Make Fun Of Everything<\/h3>\n<p><a href=\"https:\/\/subscription.time.com\/storefront\/subscribe-to-time\/link\/1004870.html\">Subscribe<\/a><\/p>\n<\/div>\n<h3>The South\u2019s Red-Hot Town<\/h3>\n<h3>Obama\u2019s Trauma Team<\/h3>\n<\/div>\n<\/aside>\n<p>We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms. Of course, not every person who searches for \u201cflu\u201d is actually sick, but a pattern emerges when all the flu-related search queries are added together. We compared our query counts with traditional flu surveillance systems and found that many search queries tend to be popular exactly when flu season is happening. By counting how often we see these search queries, we can estimate how much flu is circulating in different countries and regions around the world.<\/p><\/blockquote>\n<p>Seems like a perfect use of the\u00a0<a title=\"Brain\" href=\"http:\/\/news.cnet.com\/8301-1023_3-57584305-93\/google-search-scratches-its-brain-500-million-times-a-day\/\" target=\"_blank\">500 million plus Google searches<\/a>made each day. There\u2019s a reason GFT became the symbol of big data in action, in books like Kenneth Cukier and Viktor Mayer-Schonberger\u2019s\u00a0<a title=\"Big data\" href=\"http:\/\/www.amazon.com\/Big-Data-Revolution-Transform-Think\/dp\/0544002695\" target=\"_blank\"><em>Big Data: A Revolution That Will Transform How We Live, Work and Think<\/em><\/a>. But there\u2019s just one problem: as a\u00a0<a title=\"Science\" href=\"http:\/\/www.sciencemag.org\/content\/343\/6176\/1203\" target=\"_blank\">new article<\/a>\u00a0in\u00a0<i>Science\u00a0<\/i>shows, when you compare its results to the real world, GFT doesn\u2019t really work.<\/p>\n<p>GFT overestimated the prevalence of flu in the 2012-2013 and 2011-2012 seasons by more than 50%. From August 2011 to September 2013, GFT over-predicted the prevalence of the flu in 100 out 108 weeks. During the peak flu season last winter, GFT<a title=\"aat\" href=\"http:\/\/www.fastcoexist.com\/3027585\/the-failures-of-google-flu-trends-show-whats-wrong-with-big-data\" target=\"_blank\">would have had us believe<\/a>\u00a0that 11% of the U.S. had influenza, nearly double the CDC numbers of 6%. If you wanted to project current flu prevalence, you would have done much better basing your models off of 3-week-old data on cases from the CDC than you would have been using GFT\u2019s sophisticated big data methods. \u201cIt\u2019s a\u00a0<a title=\"Truman\" href=\"http:\/\/en.wikipedia.org\/wiki\/Dewey_Defeats_Truman\" target=\"_blank\">Dewey beats Truman<\/a>\u00a0moment for big data,\u201d says David Lazer, a professor of computer science and politics at Northeastern University and one of the authors of the\u00a0<i>Science<\/i>article.<\/p>\n<p>Just as the editors of the\u00a0<em>Chicago\u00a0<\/em><em>Tribune<\/em>\u00a0believed it could predict the winner of the close 1948 Presidential election\u2014they were wrong\u2014Google believed that its big data methods alone were capable of producing a more accurate picture of real-time flu trends than old methods of prediction from past data. That\u2019s a form of \u201cautomated arrogance,\u201d or big data hubris, and it can be seen in a lot of the hype around big data today. Just because companies like Google can amass an astounding amount of information about the world doesn\u2019t mean they\u2019re always capable of processing that information to produce an accurate picture of what\u2019s going on\u2014especially if turns out they\u2019re gathering the wrong information. Not only did the search terms picked by GFT often not reflect incidences of actual illness\u2014thus repeatedly overestimating just how sick the American public was\u2014it also completely missed unexpected events like the nonseasonal 2009 H1N1-A flu pandemic. \u201cA number of associations in the model were really problematic,\u201d says Lazer. \u201cIt was doomed to fail.\u201d<\/p>\n<p>Nor did help that GFT was dependent on Google\u2019s top-secret and always changing search algorithm. Google modifies its search algorithm to provide more accurate results, but also to increase advertising revenue. Recommended searches, based on what other users have searched, can throw off the results for flu trends. While GFT assumes that the relative search volume for different flu terms is based in reality\u2014the more of us are sick, the more of us will search for info about flu as we sniffle above our keyboards\u2014in fact Google itself alters search behavior through that ever-shifting algorithim. If the data isn\u2019t reflecting the world, how can it predict what will happen?<\/p>\n<p>GFT and other big data methods can be useful, but only if they\u2019re paired with what the\u00a0<em>Science<\/em>\u00a0researchers call \u201csmall data\u201d\u2014traditional forms of information collection. Put the two together, and you can get an excellent model of the world as it actually is. Of course, if big data is really just one tool of many, not an all-purpose path to omniscience, that would puncture the hype just a bit. You won\u2019t get a SXSW panel with that kind of modesty.<\/p>\n<p>A bigger concern, though, is that much of the data being gathered in \u201cbig data\u201d\u2014and the formulas used to analyze it\u2014is controlled by private companies that can be positively opaque. Google has never made the search terms used in GFT public, and there\u2019s no way for researchers to replicate how GFT works. There\u2019s\u00a0<a title=\"Gogole\" href=\"https:\/\/www.google.com\/trends\/correlate\" target=\"_blank\">Google Correlate<\/a>, which allows anyone to find search patterns that purport to map real-life trends, but as the\u00a0<em>Science<\/em>researchers wryly note: \u201cClicking the link titled \u2018match the pattern of actual flu actvity (this is how we built Google Flu Trends!)\u2019 will not, ironically, produce a replication of the GFT search terms.\u201d Even in the academic papers on GFT written by Google researchers, there\u2019s no clear contact information, other than a generic Google email address. (Academic papers almost always contain direct contact information for lead authors.)<\/p>\n<p>At its best, science is an open, cooperative and cumulative effort. If companies like Google keep their big data to themselves, they\u2019ll miss out on the chance to improve their models, and make big data worthy of the hype. \u201cTo harness the research community, they need to be more transparent,\u201d says Lazer. \u201cThe models for collaboration around big data haven\u2019t been built.\u201d It\u2019s scary enough to think that private companies are gathering endless amounts of data on us. It\u2019d be even worse if the conclusions they reach from that data aren\u2019t even right.<\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>\u201cautomated arrogance\u201d big data hubris At its best, science is an open, cooperative and cumulative effort. If companies like Google keep their big data to themselves, they\u2019ll miss out on the chance to improve their models, and make big data worthy of the hype. \u201cTo harness the research community, they need to be more transparent,\u201d &hellip; <a href=\"https:\/\/blog.panicola.com\/?p=1699\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Flu Trends fails&#8230;<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28,14,5,9,3],"tags":[],"class_list":["post-1699","post","type-post","status-publish","format-standard","hentry","category-business","category-complex-adaptive-systems","category-data-saving-lives","category-healthcare","category-rapid-learning-health-systems"],"_links":{"self":[{"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/posts\/1699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1699"}],"version-history":[{"count":2,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/posts\/1699\/revisions"}],"predecessor-version":[{"id":1721,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=\/wp\/v2\/posts\/1699\/revisions\/1721"}],"wp:attachment":[{"href":"https:\/\/blog.panicola.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1699"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.panicola.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}