Learning Center
Plans & pricing Sign in
Sign Out



									Paving the Path to a
Big Data Commons
Paving the Path to a Big Data Commons
In 2012, a decade of double-digit growth rates
resulted in global mobile phone subscriptions
                                                      Zoom In, Zoom Out: The Levels and
surging past the 6 billion mark, with nearly 1.2
                                        i             Origins of Personal Data
billion mobile broadband subscriptions . By
2015, mobile phone subscriptions are expected         Not all data use is the same - some types of
to exceed total world population . This               data are more granular than others. One useful
                                                      framework is to segment the data into
heightened level of connectivity not only offers
                                                      categories using the degree to which it relates to
increased communication, connectivity and
                                                      an individual and the origin of its creation.
game-changing mobile applications it provides
the world with data on individuals about whom         Scale
very little was previously known.
                                                      Individual level: Data generated by an
The growth of ‘Big Data’ and its potential            individual and used in relation to the needs and
applications is a hot topic; the Forum’s 2012         preferences of that individual
paper Big Data, Big Impact highlighted the
                                                      Community level: Data generated by
positive impact that innovative use of this data
                                                      individuals but analyzed and used to understand
could have, especially in developing countries.       the needs, preferences and action by a
Yet the discussion of its promise needs to be         community, understood here as cities, towns or
accompanied by the recognition that there is a        districts
personal side to this flood of data: it is created
by unique individuals, and as such there are          National/Global level: Data generated by
issues related to privacy and data ownership          individuals and analyzed and used at a global or
                                                      national level by governments, development
that must be addressed if the potential for good
                                                      organizations and multinational corporations
is to be realized.
This briefing explores personal data and how
these issues could be addressed in the context        Declared: Data voluntarily provided by an
of a new ‘data commons,’ a space in which             individual, either explicitly for data collection or
actors across sectors develop ways of                 in venues such as social networking sites
identifying the ground rules for the safe and
                                                      Observed: Data created as data exhaust by
beneficial management and use of anonymized
                                                      individual use of technology such as mobile
personal data. This commons requires not only
                                                      phones (for example, information on the number
adequate technical protections and                    and destination of a mobile subscriber’s text
infrastructure for managing massive data sets         messages)
and using them to predict crises and deliver
services; it also needs mechanisms for                Derived: Data obtained on an individual
enforcement, accountability and monitoring that       through the use of analytical models, with inputs
will let governments, firms, development              such as demographic characteristics often
                                                      mixed with observed data to predict and
organizations and individuals realize the promise
                                                      understand patterns
of Big Data while mitigating the risks that hold it

Relating to the Data                                  with technology, even when data provision and
                                                      collection is not the primary purpose of the
 As the box on the previous page demonstrates         interaction. Such data is often referred to as
‘big data’ is generally about relationships. More     ‘data exhaust’ because it is produced as a by-
specifically, it is about how individuals relate to   product of other technology functions. Security
other individuals and institutions having access      cameras provide a good illustration of this
to information about them. Electronic health          category. Security cameras’ primary function is
records provide a useful example. For                 to prevent and respond to criminal actions, but in
individuals, data collected from these records        doing so they capture data about the movement
provides benefits such as improved continuity of      of individuals or vehicles, which can be used to
health care by empowering them to demand the          understand human or vehicular traffic patterns.
care they need and arming health workers with         Because so many of the forms of technology
relevant medical history that can be used in          with which humans interact contain digital
treatment decisions. At the community or global       functions, the amount of data exhaust is
level though, this data, if aggregated and            increasing at a far faster rate than existing data
analyzed properly, can provide powerful               management systems are capable of
perspective on the effectiveness of medications       processing.
or treatment regimes, or on the health needs of
different demographic segments—invaluable             Finally, derived data is data that is created when
information when governments are funding              an individual’s interactions with technology, such
medical research or allocating resources.             as browsing history, are co-analyzed with
                                                      declared or observed data to gain an
The level at which data is used or analyzed is        understanding of the individual’s preferences
only part of the story, however. Equally              and tendencies. The use of derived data in
important is the origin of the data in question.      targeted web-based email services provides an
These origins can be roughly divided into three       example with which many people are familiar. A
major categories. The first category is declared      person who has accessed a social networking
data, which is data volunteered by individuals,       site’s mobile application and discussed their
either explicitly in return for compensation or       interest in jogging with friends through its chat
through participatory means such as social            functions might notice an ad next to an email
media. Many individuals may be willing to allow       message promoting a running-oriented app for
such data to be collected about them in return        her handset.
for some sort of financial compensation. Some
mobile users in the US, for example, agree to         The potential benefits of analyzing personal data
allow their information to be shared with             from these sources, especially observed and
telemarketers in exchange for a lower monthly         derived data, is mind-boggling, in part because
subscription price. In the developing world,          analysis of people’s behavior often provides a
Jana, a company allowing companies to survey          clearer picture of their needs and desires than
consumers via mobile phone, pays its                  their declared statements. But the potential for
respondents in free airtime, a major incentive for    harm is also considerable, because of the
people with little disposable income.                 possibility that people will lose control of their
                                                      personal data and because this data could be
The second category of data is observed data,         used to harm them—politically, economically, or
which is created by an individual’s interactions      socially. Identifying these benefits and risks is a

necessary first step. The graphic above                                  Analyzing the Actors:
analyzes the potential benefits and risks by both                        Finding the Right Levers
data scale and origin.
                                                                         A data sharing regime also requires identifying
The graphic provides examples of ways in which
                                                                         the forms of value that different ecosystem
observed and derived data, analyzed and used
                                                                         actors get from personal data and the levers and
at the community and global or national level,
                                                                         incentives they respond to. The graphic below
could be a powerful force in efficiently allocating
                                                                         identifies these factors.
resources, predicting and combating outbreaks,
and analyzing important social and economic                              Consumers, the ultimate source of most of the
trends. Yet as the daily drumbeat of data breach                         data concerned, are also the group with the
stories reminds, there are huge potential                                least power, in part because they are dispersed
downsides as well. Sensitive personal data                               and often do not speak with a single voice. They
could be used to blackmail individuals, while                            can be incentivized to provide data to
repressive governments could analyze data for                            companies or government, but they may also
identifying political opponents. And these                               require the ability to opt out in order to protect
potential negatives are magnified because these                          the most sensitive data. Government, which
types of data are those over which the individual                        owns large amounts of data related to
has the least ownership. The developments                                populations, such as census data or population
highlight the central question surrounding                               surveys, has greater power than individuals to
personal data: is there a level of anonymization                         protect data, but it does not own as much real-
that offers protection to individuals while                              time data on consumer behavior and tastes as
allowing the use of aggregated personal data for                         the private sector.

                                     -Consumer preferences    -Socioeconomic trends              -Crisis prevention

                                     -Democratic action       -Outbreak response                 -Economic efficiency

                                     -Political retaliation   -Reduced responsiveness            -Political/economic
                                                               to stated needs/desires            Repression

                                     -Service needs           -Local service needs               -Demand trends

                                     -Civic action            -Resource allocation efficiency

                                     -Unintended use          -Unauthorized consumer             -Potential exploitation
                                      (e.g., commercial)       identification

                                     -Financial rewards       -Improved health care              -Improved disease prevention

                                     -Targeted Offers         -Targeted offers                   -Marketing efficiency
Data Scale

                                     -Loss of personal data   -Identity theft                    -Loss of privacy
                                      ownership                                                  -Loss of control

                                         Declared                    Observed                           Derived
                       Data Source

Private sector firms, and in particular                         infrastructure, and regulation. Market
telecommunications and IT companies, own the                    environment incentives include understanding
greatest amount of data, and thus creating                      the benefits that greater data sharing can bring
incentive structures that will keep companies                   to the market environment in which operators
constructively engaged is one of the central                    work. Tangible evidence that innovative use of
issues in building a data commons that works for                data can prevent crises that harm operators’
the benefit of everyone.                                        business or open doors to insightful new
                                                                business models could increase operators’
The Rules of the Road:                                          incentives to contribute anonymized data to a
Incentivizing Mobile Providers                                  centralized data commons. Such understanding
                                                                and awareness could be created through better
In the developing world, mobile operators own                   monitoring and evaluation of data use initiatives,
perhaps the greatest amount of usable data                      and by more intensive efforts to publicize the
from individual activity, largely because of the                results of this monitoring, in addition to
fact that mobile phones represent many                          highlighting the role of operators that proactively
developing-world consumers’ only use of                         share data.
interactive technology. As such, they provide an
interesting test case for efforts to induce greater             In the digital infrastructure area, operators might
productive data sharing and use. What, then, do                 react positively to efforts to build a shared data
operators need to become more involved in                       commons with mechanisms for contributing
efforts to build a data commons that works for                  anonymized data to which their competitors are
everyone’s benefit?                                             also contributing. This open data commons, with
                                                                agreed privacy standards, a code of conduct,
Incentives can be broadly segmented into three                  and legal protections, would reduce the barriers
different categories: market environment, digital               to more active participation (any single carrier is

         Consumers                           Government                             Private Sector
  Value: Personalized nature of       Value: Aggregated data from              Value: Real-time observed and
  data                                individuals (e.g., census)               derived data

  Disadvantage: Dispersed             Disadvantage: Lack of real-time          Disadvantage: Lack of trust by
  nature/collective action            observed data                            individuals

  Incentive: Personalized services,   Incentive: Improved analysis             Incentive: Improved declared
  financial incentives                ability for resource allocation and      data and demand prediction
                                      crisis management
  Disincentive: Loss of privacy and                                            Disincentive: Loss of proprietary
  data ownership                      Disincentive: Loss of trust by           data ownership, loss of consumer
                                      citizens                                 trust
  Power Level: Low
                                      Power Level: Medium                      Power Level: High
  Key Lever: Ability to opt out
                                      Key Lever: Legal/regulatory              Key Lever: Improved regulatory
                                      authority                                framework

                                          Data Commons

unlikely to finance a commons that would benefit      Call to Action
competitors too), as well as providing an
opportunity for government and development            In a highly networked world, decision makers in
players to make specific data requests to which       every sector need better decision-making tools,
firms could respond. Such a ‘curated’ and             a framework for grasping how the digital
structured space might reduce the fear                economy works, and an understanding of how
operators feel from the potential of theft when       the policy and business decisions they make
data is being shared for development purposes,        affect that economy. To support this, there is a
and could be jointly financed by industry             need for a richer, more focused discussion that
associations and actors from all three sectors.       will allow players in each sector to move past old
                                                      debates on regulation and markets. The creation
Finally, consistent regulation and legislation that   of a data commons would give key actors the
creates a more transparent set of rules around        systemic view of interdependencies and
data sharing could open the door to greater           relationships they need, allowing them to
voluntary sharing of data. The uncertainty            understand the impact of their actions on other
around privacy policy and legislation in many         ecosystem actors. The possibilities that flow
countries currently reduces operators’ incentives     from such a systemic view, in addition to the
to engage in this type of activity, and the heavy     efficiencies noted above, include smarter policy,
tax and regulatory burden on operators in many        and an ability on the part of government to
nations reduces operators’ willingness to             implement proportional regulation that mitigates
proactively share information. “Operators might       risks while not stifling innovative uses of data.
be more willing to take more risks in data
sharing if governments made some positive             In order to realize this promise and build this
steps in the regulatory environment,” noted one       data commons, there needs to be a fundamental
mobile industry executive.                            recognition of its critical application as a tool for
                                                      social and economic development and a
The building of a flexible but regulated data         willingness to take steps to advance it. A robust
commons will allow something that has not yet         data commons needs the technology that can
happened: the comprehensive mapping of data           support new and complex uses for development,
use to the needs of a highly networked                something that private-sector firms may be well-
economy, leapfrogging inefficiencies that result      positioned to provide, but it also needs
from insufficient data and creating new               recognized norms and a code of conduct that
possibilities for raising individual and business     prevents issues of privacy and data ownership
productivity. This commons can be compared to         from standing in the way of productive use of
the construction of the U.S. interstate highway       anonymized data. 
system or China’s high-speed rail network: with
better infrastructure, information flows will
improve and new efficiencies will arise.
Individuals and businesses will seize these new
opportunities, but cooperation across sectors         i
enables the various players to understand the            SUM-PDF-E.pdf

needs and interdependencies of consumers,             ii
governments, and enterprises.

Special Thanks

The World Economic Forum acknowledges the work of Vital
Wave Consulting in creating this briefing as well as the ongoing
work of the UN Global Pulse and World Economic Forum’s ICT
Global Agenda Council in advancing the transformative
potential of big data. 

  About Vital Wave Consulting

  Vital Wave engages multinational corporations and development
  organizations to provide actionable solutions for scaling business
  and programs in emerging markets. Working across diverse,
  global markets and sectors, the firm is a recognized leader at the
  intersection of development and technology.

  Vital Wave draws on decades of field experience and research in
  low and middle-income countries, proven analytical methods and
  world-class professional services to achieve sustained impact and
  growth for its clients at the global, regional and local levels.

  Call 650-964-1316, email or visit to learn more.


To top