Channel / Source:
TEDx Talks
Published: 2011-05-24
Source: https://www.youtube.com/watch?v=HSVQ5RDBEJs
so the image on the screen is the digital footprint of a relationship specifically it's my relationship with my girlfriend and fellow Wolverine over the past three years that I've known her so I made this image by scraping my hard drive and the web to recover about a thousand digital messages that we had sent between each other and for each one of these messages %HESITATION applauded a
bar on the screen and I had who sent the message the blues meet the red as her I had the medium that was sent through but it was an email Facebook Twitter %HESITATION the purple is instant messaging conversations and I have the date the time and the text the concert on the message itself so I want to try to use this data to see if I
could if I could understand the evolution of my relationship or maybe relationships in general now the first thing I did was we start parsing the Texan these messages to see you when words would show up and how frequently and very quickly story started to emerge so like any %HESITATION warm blooded a sophomore %HESITATION when I met a girl at a party I said hi and immediately
found her on Facebook I so you know they say that the way to a girl's heart is through humor %HESITATION so unfairly and we're pretty hard at this because you to pop up as one of the most frequently used phrases as we send videos back and forth but you know it worked because our vocabularies are to shift and phrases such as I like you would pop
up in the conversation and so far so good I've been pretty lucky and like this and the love and you can almost see the day the very day that that language started to shift as relationship move from friendship to romance so aside from just looking at at the tax the continent these messages I also learned a bit from the structure of this time for they mention
the purple are instant messaging conversations and so their disappearance about three quarters the way through this timeline really represent the technology shift and so wrapped up in all this information about how you know relationships move from from friendship to romance is really a lot of of of really insights into how we use communication technologies to maintain his relationship and so that sort of thing you okay
how can I use this to to make a relationship better the width of each one of these bars represents how long that person had to wait for a response and so now I get to stand before you and make this grand gesture and say Erin I'm sorry for making you wait so long to %HESITATION return your phone calls but you know the amazing thing about all
of this is I didn't have to actively archive any of this data this was just sitting there in my email client sitting there on Facebook servers and only had to do was was downloaded and like this only represents a a sample size of any pulls one you can think there are five hundred million users of Facebook each with an average one hundred and thirty friends and
so very quickly you can imagine numbers start getting up into the billions with with nine zeros there and Facebook a offers you a button to press that you can see every message sent back and forth with someone every party you've attended together and so you know you start thinking about the questions well what can we learn about cultures or generations if we had access to a
billion of these timelines it's a this briefly to what I really want talk about today and that is this this growing field of big data so what is big data well think about everything that you do in a day or maybe today is a good example and then think about how many of those things involve the internet or a computer so just by virtue of using
these technologies you know because of the way they work all these little bits of information are getting stored on hard drives and on servers and he's little these digital breadcrumbs are are encoding you know a lot about what we're doing when we're doing it and we're doing it from and so for the first time really researchers are being given access to the bread crumbs and we
can start piecing together an incredible amount about the people in the behaviors that generated them in the first place so after their Maynard this talk I want to share with you a few examples of how I use this big did in my research %HESITATION in the human mobility and networks lab at MIT %HESITATION and and maybe inspire you to frame some questions that you might have
in terms of of this data but before I get there I need to address the elephant room and that is this issue of privacy so the reason that all this data so incredibly powerful is because there's so much of it and it's so personal and so whenever we we are thinking about you know looking at it we we need to preserve anonymity %HESITATION and keep it
secure as I want to to %HESITATION just say that you know we work with all these companies to provide this is data that we make sure it's anonymous but you know anyone wanting to look into this %HESITATION into this field it's a big issue and I hope we can keep an open mind about it because they truly believe that the power of this data to to
teach us something out weighs the cost of dealing with these problems so I'd leave the relationship they showed in the beginning really got this question of how we communicate with each and services like Facebook and Twitter of really opened up enormous windows for researchers to look at how we share and consume information with each other and the natural extension of that is to ask what can
we see how people how that consumption of information drives actions and choices and so specifically all focus on the choice to to adopt the technology so this could be a buying a book that someone recommended for you or signing up for a service and all look at Twitter because you know I'm sure old you're tweeting right now but Twitter is really %HESITATION is really an example
of a piece of information that can go viral it's social it's free and it's really unlimited so this is the United States and we have the first three and a half million people to adopt later so if you adopt a Twitter out before the end of two thousand nine with yourself %HESITATION and and I will play this movie in these dots are the are the cities
in they're gonna grow as the number of trees is grows and they're gonna change color when trip Twitter reaches this critical mass so if I start this movie then %HESITATION this this line represents the number of new users signing up every days or off the kind of a slow start bowl beside you see Silicon Valley starts lighting up then then Cambridge roommates years in Arbor Austin
Texas now all of the countries is reaches critical mass and then you see this explosion of Twitter users at the end of two thousand nine and so the idea here is that you want to start you know understanding the process behind this because this is what is going on you know when when that musical is going viral on YouTube or whenever Charlie Sheen says something ridiculous
so but you know it's not so surprising that what we're seeing is Silicon Valley and then in other places that are young tech savvy demographics are adopting technologies first but we've seen it on a massive massive scale so just by knowing you know very simply when a person signs up and where they're signing up from you can really get a clear picture of of demographics but
then we can ask what's happening at the end of this time line %HESITATION when neo Twitter's just exploding what are the forces behind that and anyone who who publishes books will be familiar with the OPR bomb so that giant spike there was actually due to the single tweet when Opara adopted Twitter and then you know told everyone else to and so this is an example of
hyper influential people that are driving massive adoptions technologies but that second bomb which is about half the size is also media influence with a very different a very different one instead of one person dry meat you have tons of news outlets writing articles about the Iranian revolution and social media this %HESITATION role in that and so you know the idea here %HESITATION is is is that
we can start combining these things to to really %HESITATION get a good picture of what's going on in the very high skills you might be saying well I I mean this data set I didn't adopt where because of Oprah and if you're sitting in this room then you're probably right because if we look at an Arbor they were totally unimpressed by Oprah's endorsement but but Denver
Colorado on the other hand what's different demographics %HESITATION was was really into it so the idea here is that if we understand how we communicate with each other then if we have behaviors are information that we want to spread it we want to spread them efficiently and effectively and we can use %HESITATION that what we learn about this process to drive actions and choices in the
future but the other question that that be my one answer instead of communicating with each other you wonder how acting with our in environments and so %HESITATION to do this we're going to censor and I'm willing to bet that almost everyone in this room has one sitting in the pocket and that is a mobile phone so there's been a lot of buzz recently about you know
how your your mobile phone providers is really stalking you and who's getting this data and and what they're doing with it so I sort of wanted to to come here and introduce myself because I'm getting this data and I wanna show you what I'm doing with it %HESITATION but %HESITATION you if it's really an amazing thing so how does this work well let's see you're walking
on the street and and you take your phone out to make a call or two get directions to a restaurant and your phone if it's a smart phone my connect to GPS satellite you could be connected to wifi in a cafe and we know you can't be very far from that cafe where you can make a call on on any old phone it will go through
tower renewable location that tower in so we can combine all these things to pinpoint the location of a phone and he's he's predictions are getting credibly accurate so for any given phone you can tell where it is to within about twenty five meters so this is you know when things are getting creepy but they also get really cool because we can say okay this is downtown
Boston on the the bottom right here and then MIT or my lab is across the river in Cambridge we can start single what if we knew where three no three or four million phones where across Boston and we knew where where you know how many phones are being activated %HESITATION on every intersection of the city at every hour of the day what would that look like
well would probably look something like this %HESITATION so so this is %HESITATION that map but now each one of these bars represents %HESITATION one of these twenty five meter boxes and we know how many phones are there at every hour of the day for about four months and so we just played some time we start seeing is you know this is the biology of the city
is a city breathing sleeping %HESITATION you know the heart beat the pulse of the city and and right now you'll see the city sort of skip a beat and a pause it here and then we can sort out what's going on servings blue %HESITATION this blue means that everyone's using this location at night red would be in the morning %HESITATION and and we won't know what's
going on here there's tons of people way higher activity well if you look at the date is February twelfth Friday %HESITATION at nine PM so this is Valentine's day weekend in the whole city now is getting done with their dinners and and you know dispersing out through the city so the idea behind this is that you know if we can start using this type of data
to understand how people are are using spaces in real time here then we can design better transportation systems we can make them cleaner %HESITATION more efficient I am reduce traffic and pollution I or if you are an epidemiologist you might ask how can this help me make a better model of disease spread and and more abstractly you know we we think about ourselves as having all
this autonomy in free will and yet our behavior aggregates in these incredibly periodic regular patterns you know and it really it really sort of questions %HESITATION you know how much how much were actually you know cognitively doing so so the idea behind all of this is that we understand how we're interacting with our environments than in the future we can you know we can design and
better and now you may be saying well this is just looking at rich people with with smart phones in a rich city in a rich country but as Hans Rosling %HESITATION famous Ted stern statistician would say oh let my data set change your mindset so if we look here at a map of mobile phone distribution for the nearly five billion mobile phones in the world we
see that this idea of developing versus developed world really doesn't hold water when you're looking at this particular %HESITATION technological infrastructure for the United States you know has as many mobile phones per capita as place like South Africa in only a few more than that Libya and China Wallace mobile phone penetration isn't quite as high into the the massive number of people means that the most
mobile phones in the world are there so the ideas that if we can design a better city in a place like Boston where we can validate calibrate older models with really expensive survey and census data and we can bring all of those insights to almost anywhere in the world because the world has all of these %HESITATION it's you know cell phones and technological infrastructure mostly in
place and doesn't change from from culture to culture so you know that's really where the power of all this data comes from is not just you know the the extreme level of detail that we can start looking into for the fact that we can do it almost anywhere and so was this sort of brings me to my crazy idea and the first part of that is
is a challenge to you guys because you are really the generators of this information you know your tweets are the ones that are being archives but also you know you will be the curators of this information as you go out and create the next generation of technologies and and you know run the companies to keep this data and so we really need to be thinking not
only about how we can use this data to to learn something but how we can package it and provide it back to the people who are creating it influence actions or behaviors or teach them something and when we start doing this I think that this idea will go a little bit less crazy and that's that we can and should use this massive amount of data to
