Addressing Privacy Concerns with Web-Based Data
Collection
Nick Graham and Peter Dacin
Draft of 29 November 2006
Studies are increasingly using the World Wide Web to support the collection of data.
Web-based applications can have great advantages, allowing people to report data from
their own homes or offices without the inconvenience of phone calls or paper submission.
If a study guarantees confidentiality of the data being collected, however, care must be
taken to ensure that the use of the web does not unduly increase the risk of this data being
made public.
While there are many threats to the security of web-based systems [2], these can be
summarized as four fundamental ways in which privacy can be compromised:
Insecure transmission of confidential data over a network
Poor technical security of the server used to collect the data
Poor security in the web application itself
Poor privacy policies around the server’s users
We briefly discuss each of these problems, and provide guidelines as to how to address
them.
Transmission of Data over a Network
Data transmitted over a network does not flow directly point-to-point, but instead flows
through a set of intermediate computers. At each of these intermediate points, malicious
people may “snoop” the data. Snooping is a widely used technique aimed at collecting
information such as credit card numbers and passwords. It is not uncommon for people to
use “snooping” techniques to spy on their neighbours.
The ability to collect useable information via snooping can be made very difficult by
encrypting data that is transmitted over the Internet. For web-based applications, this
means using the secure HTTP protocol (or HTTPS). HTTPS is widely used for secure
applications, such as Internet banking, and when implemented properly it ensures
reasonable protection from snooping.
Technical Security of Server
Web-based data collection systems reside on a server computer. The collected data may
be stored on the server directly or may be stored on another computer connected to the
server by a network. Server computers attached to the Internet are vulnerable to many
kinds of security attacks. Such attacks range from vandalism to attempts to take over the
server’s resources for malicious purposes (such as sending “spam” email.)
To make them less vulnerable to such attacks, the administrators of these server
computers should ensure:
The server computer is up to date with the latest security patches for the operating
system and web server software. Policies should be in place to ensure that this
occurs.
The server computer has a firewall prohibiting access via network to all
unnecessary services. Ideally, the firewall would restrict access to port 443, the
port on which encrypted information (HTTPS) travels. Firewalls may reside on
the server, or as a separate hardware component; both are acceptable, but the
latter is preferred when implemented properly to handle both incoming and
outgoing traffic.
Any administrator passwords for services accessible via the Internet (including
the data collection system itself) should be sufficiently obscure that they can not
easily be guessed. Passwords such as telephone numbers, names, or words that
can be found in the dictionary should not be used. A strong password should
appear to be a random set of alphanumeric characters and be at least 8 characters
in length. Administrators should also consider changing their passwords regularly.
Jurisdictional issues should be considered in determining where the data is to be hosted.
The USA’s Patriot Act permits the US government to obtain data from servers located
with the USA in cases where national security is deemed to be at risk. Canada’s Anti-
Terrorism Act provides similar powers. This issue has received recent attention in the
national press [3].
Security of Data Collection Software
Errors in the programming of the data collection software may lead to confidential data
about subjects being revealed to users of the system. Programming errors may range from
logic errors that permit users to inadvertently access data that they are not meant to see,
to more subtle security vulnerabilities that allow malicious users access to data.
Lebanidze provides a good survey of such issues [5, pp. 15-19].
These problems can be avoided by using a well-constructed survey tool. If the survey is
using custom software or software that does not have a proven track-record, it is a good
idea to submit the software to a security audit by a professional who is conversant with
best security practices for web applications.
Privacy Policies for Server’s Users
The computer server used for data collection may have users who legitimately access it
for purposes other than the study for which data is being collected. An administrator may
access the server to maintain it; others may use the server for their day to day work.
These users, particularly those with administrator access, may have access to confidential
data collected by the application.
If back-ups of confidential data are made, they may be easily accessible (e.g., in the form
of a CD).
Some ways to address this problem are:
Grant administrator access only to those who legitimately require it to maintain
the server computer and have these administrators sign an agreement indicating
that they will respect the private nature of the data on the computer.
Put procedures in place to ensure that back-ups are stored securely.
Require a login and secure password to access the server computer.
Ensure that the server computer is not in a public location.
Commercial Survey Tools
There are numerous commercially available survey tools [1, 4, 6, 7, 8, 9]. Some of these
are hosted by the company; some require the researcher to host the software on his/her
own web site. In most cases, using one of these commercial packages will provide the
simplest way of addressing the security risks described above. Specifically,
If the survey tool is hosted by the company, the researcher need only be
concerned with the location at which the data is hosted (i.e., for some data, there
may be ethical issues in hosting in the USA.)
If the survey tool is hosted by the researcher, issues of technical security of the
server and privacy policies for the server’s users must be addressed.
References
[1] Apian Survey Pro. http://apian.com/
[2] CACI, Computer Security Threats, Available at
http://www.caci.com/business/ia/threats.html
[3] Caroline Alphonso, Universities move to hide work from U.S. eyes, Globe and
Mail, Nov. 11, 2006. Available at
http://www.theglobeandmail.com/servlet/story/RTGAM.20061111.wxuniversities
11/BNStory/National/?page=rss&id=RTGAM.20061111.wxuniversities11
[4] Ennect Survey. http://www.ennect.com/Survey/index.asp
[5] Eugene Lebanidze, Securing Enterprise Web Applications at the Source: An
Application Security Perspective, Open Web Application Security Project.
Available at
http://www.owasp.org/images/8/83/Securing_Enterprise_Web_Applications_at_t
he_Source.pdf
[6] Snap Surveys. http://www.snapsurveys.com/
[7] Ultimate Survey Software. http://www.prezzatech.com/land/
[8] Vista Online Surveys. http://www.vanguardsw.com/vista/
[9] WISCO Survey Power. http://www.wiscosurvey.com/
Appendix A: ITS-Hosting of Survey Software
I discussed the issue of commercial survey tools with David Hallett, Manager of ITS’
University Information Services.
David believes that ITS should host a survey tool available to the community.
Researchers would use a web-based tool to design their survey and access its
results. This would involve no programming or HTML coding.
ITS would be responsible for security, back-ups and general maintenance of the
tool.
For researchers, using the tool would require a nominal charge (e.g., $100 per
survey.)
This would provide an easy path for researchers to administer web-based surveys without
having to worry about security issues.
David’s suggestion is that we strike a committee to determine which survey tool to use.
We feel that this should involve representatives of the users of the survey tool, as well as
a GREB representative.
Is GREB interested in participating in such an initiative?
Appendix B: Draft Security Checklist
Note that some of these questions will be not applicable if an application hosted by a
trusted third-party is used.
Is application data communicated via a secure protocol such as HTTPS?
Are the latest security patches installed on the web server? Are procedures in
place to ensure that security patches are installed in a timely manner?
Is the server protected by a firewall, restricting access to all but necessary
ports?
Is the server password protected, and are procedures in place to ensure that
the passwords are not easily guessable?
Have jurisdictional issues been considered in the location of the server?
Is administrator access restricted to those who need it? Have all
administrators agreed to ethically treat their ability to access confidential
data?
Are back-ups stored securely? Are procedures in place to ensure that back-
ups will always be stored securely?
Is the server computer stored in a secure location?
For custom-built data collection tools, has a security audit been performed on
the application’s source code?