Automating Privacy Testing of Smartphone Applications.pdf by sushaifj


									               Automating Privacy Testing of Smartphone Applications
                   Peter Gilbert† , Byung-Gon Chun⋆ , Landon P. Cox† , Jaeyeon Jung‡
                         Duke University, ⋆ Intel Labs Berkeley, ‡ Intel Labs Seattle

Abstract                                                          tool like TaintDroid at all times, and once leaks are de-
                                                                  tected with such a tool and analyzed, it may be too late to
Smartphones have revolutionized mobile computing, but
                                                                  protect the sensitive data.
have created concerns that many third-party mobile ap-
plications do not properly handle users’ privacy-sensitive
data. In this paper, we propose AppInspector, an auto-         Ideally, a user could read a simple report describing
mated privacy validation system that analyzes apps and      whether an app presents any potential dangers before in-
generates reports of potential privacy risks. A key insight stalling it rather than trying to comprehend the subtleties
is that distinguishing acceptable disclosures from privacy  of an app’s behavior on her own. Experts employed by
violations often requires analyzing the context in which    an app store or a third-party security firm could gener-
data is transmitted. Just knowing that sensitive data has   ate such reports by manually exercising an app’s func-
left a device is insufficient. We describe our vision for    tionality and observing its behavior. Unfortunately, ex-
making smartphone apps more secure through automated        perience with Apple’s App Store approval process has
testing and outline key challenges such as detecting and    demonstrated that this approach is less than ideal. Apple’s
analyzing privacy violations, ensuring thorough test cov-   approval process can introduce costly delays and uncer-
erage, and scaling to large numbers of apps.                tainty into the development cycle, while banned behavior
                                                            such as WiFi-3G bridging [7] and alleged violators of Ap-
1 Introduction                                              ple’s privacy policies [3, 4] have still slipped into the App
The success of Apple’s App Store and Google’s Android Store. Research on automated testing promises to reduce
Market has transformed mobile phones into a first-class the time and effort needed to validate an app’s behavior,
development platform. The number of third-party smart- but current early approaches based on static analysis [15]
phone applications or apps that the average smartphone exhibit high false-positive rates and have more funda-
user installs has grown rapidly [6], and browsing app mental shortcomings due to aliasing and context sensitiv-
stores has become a form of inexpensive entertainment ity [21, 24].
for millions of people. Apps are small programs that of-
ten provide their functionality by accessing sensitive data    In this paper we outline a different approach to auto-
from on-device sensors (e.g., GPS, camera, and micro- matically validating the privacy properties of mobile apps
phone) and services located in the cloud (e.g., Google, based on dynamic analysis. At the heart of our approach
Facebook, and Twitter). Ensuring that the flow of sensi- is the observation that diagnosing privacy violations re-
tive data through apps to remote servers does not violate quires more than identifying when sensitive data exits the
a user’s privacy is an important and difficult problem.      device. In particular, disambiguating proper and improper
   Our prior work on TaintDroid [16] allows a user to track sharing of sensitive data often requires an understanding
the propagation of her sensitive data through and between of the execution path that led to a disclosure, including
her apps and can raise an alert when sensitive data leaves the triggering source event (e.g., a UI event or an expiring
her device. However, a simple, network-level event such timer) and the sequence of method calls linking the trig-
as an app sending sensitive data to a remote server is a ger to the disclosure (e.g., calls into well-known hashing
necessary but not sufficient condition for a privacy viola- and crypto primitives).
tion. For instance, a user likely expects her weather app
to send her GPS coordinates to a server after pressing an      We are developing an automated privacy-testing sys-
"Update Weather" button. However, the user may consider tem called AppInspector that embodies this approach. We
it improper for the same app to periodically collect GPS have identified several important challenges in designing
coordinates and send them to a third-party server in the AppInspector such as generating inputs that sufficiently
background. Similarly, disclosures of un-hashed unique explore an app’s functionality, logging relevant events at
identifiers such as a device’s IMEI or unencrypted trans- multiple levels of abstraction as the app executes, and us-
mission of passwords and OAuth tokens may be consid- ing these logs to accurately characterize an app’s behavior.
ered unsafe whereas transmission of hashed identifiers The rest of this paper discusses each of these challenges in
and encrypted credentials would be allowed. Furthermore, greater detail and proposes several promising techniques
it may not be practical for every user to run a real-time for addressing them.

2   System Overview
                                                                       /      /
We envision a privacy validation service that analyzes
apps available through popular app stores and facilitates
producing easy-to-understand reports informing users of                                          /            >      W
potential privacy risks. Due to shortcomings of static anal-
ysis techniques applied by previous solutions, we propose                                   ^         K^
an approach which dynamically explores an app’s possi-                                  s                             Z
ble execution paths while monitoring its use of privacy-
sensitive information as it runs. In order to scale to hun-                   Figure 1: AppInspector architecture
dreds of thousands of apps in a cost-effective manner, this
                                                                   mation, to allow them to make informed decisions about
process must be automated to the greatest extent possible.
                                                                   which apps to install and use. To this end, we must first an-
   Along the same lines, it would be cost-prohibitive to
                                                                   swer an important question: How do we define a privacy
test such a large number of apps on actual mobile de-
vices. Instead, we propose using commodity PCs to em-
                                                                      Because many apps collect sensitive information such
ulate smartphones. This will enable a large-scale privacy
                                                                   as location or user identifiers in order to provide useful
validation service to be built at low cost by utilizing the
                                                                   functionality, simply detecting a transmission of sensitive
cloud for computation. A single host may be capable of
                                                                   data is not sufficient to declare a privacy violation. At a
running multiple “virtual” device instances at once, and
                                                                   high level, a privacy violation occurs when an app releases
a cloud-hosted validation service could test many apps in
                                                                   sensitive data to a remote party in a way neither expected
                                                                   nor desired by the user. However, encoding user prefer-
   Building such a privacy validation system presents
                                                                   ence and expectations inside automated analysis is diffi-
three key challenges:
                                                                   cult. As a result, for the purposes of automated detection,
• C1. How do we track and log privacy-sensitive informa-           we define a privacy violation as follows: we consider a
  tion flows to enable root cause analysis and application          violation to occur when an app discloses sensitive infor-
  behavior profiling?                                               mation without first notifying the user through a prompt
• C2. How do we identify privacy violations from col-              or license agreement.
  lected logs and pinpoint the root cause and execution               Whether or not a disclosure is considered a privacy vio-
  path that led to the violation?                                  lation by a user will often depend on its purpose or intent
• C3. How do we traverse diverse code paths of apps in             as perceived by the user: for example, a user may toler-
  order to ensure that the analysis is thorough?                   ate her location being sent to an content server in order
                                                                   to deliver content tailored to her location, but she might
   In the following paragraph, we give an overview of Ap-
                                                                   object to her location being sent to a third-party analytics
pInspector, our proposed system to address these chal-
                                                                   service. In general, multiple components may be involved
lenges, and outline the basic steps involved in analyzing
                                                                   in causing a violation, including the app itself, as well as
an app. The major components of the system mentioned
                                                                   libraries provided by third-party analytics and advertising
in the overview are illustrated in Figure 1.
                                                                   services and plugged in by app developers for monetiza-
   AppInspector first installs and loads the app on a vir-
                                                                   tion. However, we note that the involvement of third-party
tual smartphone. An input generator running on the host
                                                                   code is not necessary for a violation to occur.
PC then begins injecting user interface and sensor input.
The smartphone application runtime is augmented with an               In order to detect disclosures and then identify the spe-
execution explorer that aids in traversing possible execu-         cific functionality or code component(s) involved in a dis-
tion paths of the app. These two components address C3.            closure, we need to be able to pinpoint the root cause
While the app runs, an information-flow tracking com-               and execution path that eventually led to an outgoing net-
ponent monitors privacy-sensitive information flows and             work transmission containing sensitive data. To support
generates logs, addressing C1. Finally, to address C2, Ap-         this kind of analysis, it is necessary to track both explicit
pInspector provides privacy analysis tools which can be            flows, in which privacy-sensitive information is propa-
used after execution completes to interpret the logs and           gated through the app and potentially external libraries
generate a report.                                                 and system components through direct data dependencies,
   In the next two sections we further describe these four         as well as implicit flows, e.g., when information is leaked
components.                                                        due to a sensitive value influencing the control flow of the
3   Information-Flow Tracking & Analysis                           Tracking Explicit Flows. To track explicit flows of sensi-
Our ultimate goal is to help smartphone users better un-           tive data, we propose applying system-wide dynamic taint
derstand how apps handle their privacy-sensitive infor-            analysis, or taint tracking [14, 22]. Taint tracking involves

attaching a “label” to data at a sensitive source, such as        time performance overheads could affect the number of
an API call which returns location data, and propagating          execution paths that can feasibly explored as well as the
this label through program variables, IPC messages, and           computational cost (which equates to monetary cost in the
to persistent storage, in order to detect when it reaches a       cloud-hosted scenario).
sink such as an outgoing network transmission. We take               With these issues in mind, we propose logging the fol-
advantage of the fact that apps are often written primarily       lowing information: taint source and sink invocations,
in interpreted code and executed by a virtual machine, in         bytecode instructions which touch sensitive data along
order to simplify the implementation and reduce the run-          with code origin, call graph information including inter-
time overhead of taint propagation [16].                          preted methods as well as native code (JNI) invocations,
Tracking Implicit Flows. Implicit flows leak sensitive             IPC and file system accesses involving sensitive data,
information through program control flow. For example,             along with timestamps for all logged events. We believe
consider the following if-else statement: if (w == 0)             that we can log these categories of information by instru-
x = y; else z = y; where the value of w is                        menting the application runtime and system libraries in a
privacy-sensitive. By watching the values of x and z,             way that will not impose prohibitive performance or log
which are affected by the control flow, one can learn              volume overheads.
whether w is 0 or not. To detect such leaks via im-               Explicit Notifications and License Agreements. As pre-
plicit flows, we can track control dependencies by creat-          viously mentioned, end-user license agreements (EULAs)
ing control-dependency edges; e.g., in the above example,         and explicit notifications of data collection can provide
edges between w and x and between w and z. This can               useful hints for determining whether a disclosure of sensi-
potentially result in overtainting, or labeling and prop-         tive data amounts to privacy violation. When a disclosure
agating false dependencies. A possible approach for ad-           is detected, we can check if a notification is displayed to
dressing this drawback is to selectively propagate tainted        the user in the causal taint tracking path from the taint
control dependencies as in DTA++ [18].                            source to the taint sink. Then, we can try to determine if
Privacy Analysis. The next challenge we consider is how           the notification contains messages informing the user of
to generate useful privacy reports using our information-         data collection or requesting permission to transmit the
flow tracking runtime. An abstraction that we believe will         data in question. Similarly, we can check if the EULA
prove useful is dependency graphs, which illustrate the           mentions private data collection.
path from the event determined to be the root cause of a             We plan to explore two promising approaches for inter-
disclosure, through the data and control flow of the app           preting the text of user notifications and EULAs: 1) ap-
and potentially other system components, to the eventual          plying natural language processing and 2) crowdsourcing
network transmission flagged as containing sensitive data.         like Amazon Mechanical Turk [1]. In the future, this anal-
Once a dependency graph is available, analysis techniques         ysis could be made easier if developers utilize P3P [5] to
including backward slicing, filtering, and aggregation can         express privacy policies in a machine-readable format.
be applied. Backward slicing traverses vertices that are
causally dependent from sinks to sources. Filtering pro-
                                                                  4   Input Generation & Execution Explo-
duces a filtered log of an execution by excluding instruc-             ration
tions which are unrelated and unaffected by sensitive in-         So far, we have discussed how to track flows of privacy-
formation. Finally, aggregation produces a summarized             sensitive information and analyze execution logs in or-
log of an execution that affects sensitive information.           der to identify privacy violations. A fundamental limita-
   Using those primitives, we can identify disclosures and        tion of the proposed dynamic tracking techniques is that
then attempt to pinpoint responsible APIs (e.g., third-           any analysis is limited to execution paths which are actu-
party library vs. application code), how data passed              ally traversed during testing. In practice, an app may offer
among apps (or between the app and the system), and the           many diverse execution paths, and to ensure that analysis
original action or API that triggered the exposure (e.g.,         is thorough, we must determine if there is a feasible path
platform API vs. a button click). In addition, we report          from any source of private information to a corresponding
the types, recipients, and timings of released information.       sink among all feasible execution paths. This poses an-
   Dependency graphs are constructed once testing com-            other challenge: Can we explore diverse execution paths
pletes using information collected during execution.              of an app accurately and scalably in an automated man-
Choosing which information to log and the logging gran-           ner?
ularity is an important decision which affects both the              First, we must keep in mind that smartphone apps are
depth and quality of analysis that can be performed later         primarily event-based programs. Execution is typically
as well as the runtime performance of the app under test-         driven by two broad types of events: 1) UI input events,
ing. While it is not critical for a system driven by auto-        normally taps or drags on the touchscreen or text inputs,
mated input to achieve real-time performance, huge run-           and 2) callbacks triggered by other devices or sensors,

                             50                                                                                              60000

         Code coverage (%)

                                                                                                        Number of branches

                                  Comics Enmt. Social News& Shop LifestyleTools Health Games                                         0   200   400          600    800   1000
Figure 2: Code coverage for random testing (30 minutes)                                                                       Figure 3: Branch counts of 1100 apps
such as GPS location updates and camera photos. As a
                                                                                                   login prompt. To enable more thorough exploration of ex-
result, our system must be capable of generating and de-
                                                                                                   ecution paths, we advocate a more systematic approach.
livering arbitrary sequences of UI and sensor input events.
Fortunately, many apps use standard UI input elements                                                 Symbolic execution [19], which is typically used for
such as text input fields and buttons provided by platform                                          finding bugs [9], systematically explores all possible ex-
SDKs. To aid in delivering meaningful inputs to these                                              ecution paths. Applying symbolic execution to the en-
apps, it may be useful to instrument standard UI libraries                                         tire software stack, including apps, libraries, and the OS,
to treat common types of input elements such as user-                                              would eliminate all false negatives and false positives.
name/password prompts as special cases.                                                            However, as we mentioned earlier, complete system-wide
   When considering which execution paths to explore,                                              symbolic execution is not scalable.
ideally we would like to avoid both false negatives and                                               Instead, we propose applying a mixed-execution ap-
false positives. A false negative occurs when testing fails                                        proach, also known as concolic execution, which com-
to cover a feasible execution path that leads to a disclosure                                      bines symbolic and concrete execution (e.g., as in
of sensitive information. On the other hand, a false pos-                                          DART [17], CUTE [25], EXE [10], and S2E [12]). Since
itive occurs when testing reports that an execution path                                           our goal is to explore diverse paths of a specific third-
leaks private data, when in reality the path is globally                                           party app, it is only necessary to apply symbolic execu-
infeasible. Exhaustively exploring all feasible execution                                          tion to the app itself, while the rest of the environment
paths, which yields no false negatives or false positives, is                                      (e.g., Android libraries and system) can be executed con-
not scalable due the well-known path explosion problem,                                            cretely. To test the basic feasibility of this approach, we
i.e., the number of execution paths grows exponentially                                            analyzed branch counts of 1100 apps, which are shown in
as the number of branches increases. This fundamental                                              Figure 3. 90% of the apps have 4187 or fewer branches.
tradeoff between accuracy and scalability presents inter-                                          The results suggest that symbolic execution performed se-
esting research opportunities.                                                                     lectively may be feasible for these apps: a recent study [8]
   To begin, we consider the simple strategy of random                                             demonstrated path exploration for programs with similar
testing, which explores concrete execution paths by inject-                                        complexity. Parallel symbolic execution [13] can further
ing randomly-generated inputs, thus avoiding false posi-                                           speed up symbolic execution.
tives and scalability problems. While attractive in its sim-                                          In symbolic execution, we maintain state associated
plicity, random testing has been found to achieve poor                                             with the current execution path including a path constraint
code overage for other types of applications [8, 9]. To get                                        (a boolean formula over symbolic inputs) and symbolic
an idea of whether random testing is a viable strategy for                                         values of program variables. When we switch from sym-
testing smartphone apps for privacy violations, we chose                                           bolic execution to concrete execution, we use a constraint
nine apps from a late 2010 survey of the most popular                                              solver to produce concrete values for the execution. To
free apps in each category of the Android Market [2] and                                           switch from concrete execution to symbolic execution, we
supplied each with a continuous stream of touchscreen                                              can add the concrete return value and related side effects
taps and drags, hardware button presses, and location up-                                          as part of a constraint. This can cause an overconstraining
dates for 30 minutes. To measure coverage of execution                                             problem, which can in turn lead to false positives. An-
paths, we modified Android’s Dalvik VM to collect ba-                                               other possibility is to make the return value and related
sic block code coverage. The results presented in Figure 2                                         side effects symbolic; however this could cause the sys-
show that random testing achieves 40% or lower cover-                                              tem to explore infeasible paths since we may not consider
age in all cases. While the experiments did yield at least                                         calling contexts properly. Exploring this tradeoff is an im-
two disclosures of location data, we observed that the tests                                       portant research question. Finally, we must decide which
commonly got “stuck” in terminal parts of the apps’ UI.                                            program variables should be symbolic and which should
One example of the shortcomings of random testing is the                                           be concrete. In general, we choose to make private infor-
test of a social networking app, which achieved less than                                          mation such as location symbolic; in contrast, for some
1% coverage because it could not progress past the initial                                         variables such as remote IP addresses, we want to use the

concrete value in order to determine where private data is         [2] Android market.
being sent.                                                        [3] Apple sued over apps privacy issues; google may be
   We are developing this mixed execution engine on top                next.
of the Android platform to evaluate the accuracy and scal-             USTRE6BR1Y820101228.
                                                                   [4] iphone and android apps breach privacy. www.foxnews.
ability of our proposed approach. At a high level, we plan
to modify Android’s Dalvik VM to add extra state includ-
                                                                   [5] P3P 1.1 Specification.
ing a path constraint or symbolic expression to variables              P3P11/.
(local variables, operands, fields), and to interpret byte-         [6] Average number of apps downloaded to iphone: 40, an-
codes in a way that properly manages the extra state by                droid: 25., September 2010.
updating the symbolic expression or by forking and up-             [7] Flashlight app sneaks tethering into app store (for now)
dating state for a branch using a constraint solver.                   [pulled]., July 2010.
                                                                   [8] J. Burnim and K. Sen. Heuristics for scalable dynamic test
5   Related Work                                                       generation. In TR UCB/EECS-2008-123, 2008.
We briefly describe key related work on software security           [9] C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted
                                                                       and automatic generation of high-coverage tests for com-
analysis. PiOS [15] shares the goal of investigating smart-
                                                                       plex systems programs. In OSDI, 2008.
phone apps for potential privacy violations. Unlike our
                                                                  [10] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R.
work, PiOS employs static data flow analysis techniques                 Engler. EXE: Automatically generating inputs of death. In
and is implemented for the Apple iOS system. The use of                ACM CCS, 2006.
static analysis enables exploring broad execution paths in-       [11] G. Candea, S. Bucur, and C. Zamfir. Automated software
cluding infeasible ones. However, it is prone to false pos-            testing as a service. In ACM SOCC, 2010.
itives because of well-known problems in static analysis          [12] V. Chipounov, V. Kuznetsov, and G. Candea. S2E: A plat-
such as alias and context sensitivity problems. Further-               form for in-vivo multi-path analysis of software systems.
more, it provides incomplete analysis due to the difficulty             In ASPLOS, 2011.
of resolving messaging destinations and handling system           [13] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Can-
calls. We hope to overcome these challenges by directly                dea. Cloud9: A software testing service. In LADIS, 2009.
                                                                  [14] J. Clause, W. Li, and A. Orso. Dytan: A generic dynamic
instrumenting the smartphone platform and tracking in-
                                                                       taint analysis framework. In ISSTA, 2007.
formation flow at runtime.
                                                                  [15] M. Egele, C. Kruegel, E. Kirda, and G. Vigna. PiOS: De-
   Dynamic information flow analysis techniques have                    tecting privacy leaks in ios applications. In NDSS, 2011.
proven useful for intrusion detection and malware anal-           [16] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. Mc-
ysis. BackTracker keeps track of the causality of process-             Daniel, and A. N. Sheth. TaintDroid: An information-flow
level events for backtracking intrusion [20]. Panorama                 tracking system for realtime privacy monitoring on smart-
uses the instruction-level dynamic taint analysis for de-              phones. In OSDI, 2010.
tecting information exfilteration by malware in Windows            [17] P. Godefroid, N. Klarlund, and K. Sen. DART: Directed
OS [26]. Unlike these systems, we explore diverse exe-                 automated random testing. In PLDI, 2005.
cution paths systematically by mixing symbolic and con-           [18] M. G. Kang, S. McCamant, P. Poosankam, and D. Song.
crete execution focusing on third-party smartphone apps.               DTA++: Dynamic taint analysis with targeted control-flow
                                                                       propagation. In NDSS, 2011.
   Recently, TaaS proposed a service for automated soft-          [19] J. C. King. Symbolic execution and program testing. Com-
ware testing [11]. CloudAV proposed antivirus as an in-                munications of the ACM, 1976.
cloud network service [23]. Similarly, we envision Ap-            [20] S. T. King and P. M. Chen. Backtracking intrusions. In
pInspector being used as a service for privacy validation              SOSP, 2003.
of smartphone apps in the cloud.                                  [21] W. Landi. Undecidability of static analysis. ACM Letters
                                                                       on Programming Languages and Systems, 1992.
6   Conclusion                                                    [22] J. Newsome and D. Song. Dynamic taint analysis: Auto-
This paper proposes AppInspector, an automated privacy                 matic detection, analysis, and signature generation of ex-
validation service that analyzes smartphone apps submit-               ploit attacks on commodity software. In NDSS, 2005.
                                                                  [23] J. Oberheide, E. Cooke, and F. Jahanian. CloudAV: N-
ted to app stores and generates reports informing users of
                                                                       version antivirus in the network cloud. In USENIX Secu-
potential privacy risks. We presented the high-level de-               rity, 2008.
sign of AppInspector and the challenges of building Ap-           [24] G. Ramalingam. The undecidability of aliasing. In ACM
pInspector. We believe that large-scale automated testing              TOPLAS, 1994.
is an important step towards enabling more secure mobile          [25] K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit
computing.                                                             testing engine for c. In FSE, 2005.
                                                                  [26] H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda.
References                                                             Panorama: Capturing system-wide information flow for
 [1] Amazon mechanical turk.                            malware detection and analysis. In ACM CCS, 2007.


To top